Kubecost can run on clusters with thousands of nodes when resource consumption is properly tuned. Here’s a chart with some of the steps you can take to tune kubecost, along with descriptions of each.
On Secondaries: Disabling Cloud Assets and Running Kubecost in Agent Mode/With ETL and caching disabled
Cloud Assets for all accounts can be pulled in on just primaries by pointing Kubecost to one or more management accounts. You can disable Cloud Assets on secondaries by setting .Values.kubecostModel.etlCloudAsset: false
Secondaries can be configured strictly as metric emitters to save memory.
Learn more about how to best configure secondaries here: https://guide.kubecost.com/hc/en-us/articles/4423256582551-Kubecost-Secondary-Clusters
Exclude Provider IDs in Cloud Assets
Kubecost is capable of tracking each individual cloud billing line item; however on certain accounts this can be quite large.
(AWS Only, GCP/Azure coming soon) If this is excluded, we don’t cache granular data; instead we cache aggregate data and make an ad-hoc query to the cost and usage report to get granular data resulting in slow load times but less memory consumption.
Lowering query resolution will reduce memory consumption but will cause short running pods to be sampled and rounded to the nearest interval for their runtimes
The default is 300s
This can be tuned with the Helm value “–set kubecostModel.etlResolutionSeconds 600”
Longer Scrape Interval
Fewer data points scraped from Prometheus means less data to collect and store, at the cost of Kubecost making estimations that possibly miss spikes of usage or short running pods
node-exporter is optional. Some health alerts will be disabled if node-exporter is disabled, but savings recommendations and core cost allocation will function as normal
Filestore is an improvement over our legacy in-memory ETL store of Prometheus data. It was optional in versions up to v1.94.0, but will become the default afterward.
This can be enabled on older versions with the Helm value “–set kubecostModel.etlFileStoreEnabled true”