Kubecost Aggregator

Aggregator is the primary query backend for Kubecost. It is enabled in all configurations of Kubecost. In a default installation, it runs within the cost-analyzer pod, but in a multi-cluster installation of Kubecost, some settings must be changed. Multi-cluster Kubecost uses the Federated ETL configuration without Thanos (replacing the Federator component).

Existing documentation for Kubecost APIs will use endpoints for non-Aggregator environments unless otherwise specified, but will still be compatible after configuring Aggregator.

Configuring Aggregator

Prerequisites

  • Multi-cluster Aggregator can only be configured in a Federated ETL environment

  • All clusters in your Federated ETL environment must be configured to build & push ETL files to the object store via .Values.federatedETL.federatedCluster and .Values.kubecostModel.federatedStorageConfigSecret. See our Federated ETL doc for more details.

  • If you've enabled Cloud Integration, it must be configured via the cloud integration secret. Other methods are now deprecated. See our Multi-Cloud Integrations doc for more details.

  • This documentation is for Kubecost v2.0 and higher.

If you are upgrading to Kubecost v2.0 from the following environments, see our specialized migration guides instead:

Basic configuration

This configuration is estimated to be sufficient for environments monitoring < 20k unique containers per day. You can check this metric on the /diagnostics page.

kubecostAggregator:
  replicas: 1
  deployMethod: statefulset
  cloudCost:
    enabled: true
federatedETL:
  federatedCluster: true
kubecostModel:
  containerStatsEnabled: true
  cloudCost:
    enabled: false
  federatedStorageConfigSecret: federated-store
kubecostProductConfigs:
  clusterName: YOUR_CLUSTER_NAME
  cloudIntegrationSecret: cloud-integration
  productKey:
    enabled: true
    key: YOUR_KEY
prometheus:
  server:
    global:
      external_labels:
        cluster_id: YOUR_CLUSTER_NAME
# when using managed identity/irsa, set the service account accordingly:
serviceAccount:
  create: false
  name: kubecost-irsa-sa

Aggregator Optimizations

For larger deployments of Kubecost, Aggregator can be tuned. The settings below are in addition to the basic configuration above.

This configuration is estimated to be sufficient for environments monitoring < 60k unique containers per day. You can check this metric on the /diagnostics page.

Aggregator is a memory and disk-intensive process. Ensure that your cluster has enough resources to support the configuration below.

Because the Aggregator PV is relatively small, the least expensive performance gain will be to move the storage class to a faster SSD. The storageClass name varies by provider, the terms used are gp3/extreme/premium/etc.

kubecostAggregator:
  env:
    # This interval defines how long the Aggregator spends ingesting ETL data
    # from the federated store bucket into SQL tables, before exiting its job to
    # enter the derivation step. If set too low for large scale users, the
    # Aggregator may not have enough time to ingest all new data that exists in
    # the federated store bucket. If set too high, there will be a delay in data
    # between the Kubecost Agents and the Aggregator.
    #
    # Note, that the default value is set to 60m to optimize for the
    # first-install experience of Kubecost (i.e. it prioritizes small data
    # becoming available more quickly).
    #
    # default: 60m
    DB_BUCKET_REFRESH_INTERVAL: 2h

    # How much data to ingest from the federated store bucket, and how much data
    # to keep in the DB before rolling the data off.
    # 
    # Note: If increasing this value to backfill historical data, it will take
    # time to gradually ingest & process those historical ETL files. Consider
    # also increasing the resources available to the aggregator as well as the
    # refresh & concurrency env vars.
    # 
    # default: 91
    ETL_DAILY_STORE_DURATION_DAYS: "365"
    
    # How many threads to use when ingesting Asset/Allocation/CloudCost data
    # from the federated store bucket. In most cases the default is sufficient,
    # but can be increased if trying to backfill historical data.
    # default: 3
    DB_CONCURRENT_INGESTION_COUNT: "5"

    # Memory limit applied to read database connections.
    # default: n/a
    DB_MEMORY_LIMIT: 8GB

    # Memory limit applied to write database connections.
    # default: n/a
    DB_WRITE_MEMORY_LMIT: 8GB

    # How many threads the read database is configured with (i.e. Kubecost API /
    # UI queries). If increasing this value, it is recommended to increase the
    # aggregator's memory requests & limits.
    # default: 1
    DB_READ_THREADS: 3

    # How many threads the write database is configured with (i.e. ingestion of
    # new data from S3). If increasing this value, it is recommended to increase
    # the aggregator's memory requests & limits.
    # default: 1
    DB_WRITE_THREADS: 3

    # log level
    # default: info
    LOG_LEVEL: info
  aggregatorDbStorage:
    # governs storage size of aggregator DB storage
    # !!NOTE!! disk performance is _critically important_ to aggregator performance
    # ensure disk is specd high enough, and check for bottlenecks
    # default: 128Gi
    storageRequest: 512Gi
  resources:
    requests:
      cpu: 4
      memory: 12Gi
    limits:
      cpu: 6
      memory: 16Gi

Running the upgrade

If you have not already, create the required Kubernetes secrets. Refer to the Federated ETL doc and Cloud Integration doc for more details.

kubectl create secret generic federated-store -n kubecost --from-file=federated-store.yaml
kubectl create secret generic cloud-integration -n kubecost --from-file=cloud-integration.json

Finally, upgrade your existing Kubecost installation. This command will install Kubecost if it does not already exist.

If you are upgrading from an existing installation, make sure to append your existing values.yaml configurations to the ones described above.

helm upgrade --install "kubecost" \
  --repo https://kubecost.github.io/cost-analyzer/ cost-analyzer \
  --namespace kubecost \
  -f aggregator.yaml

Validating Aggregator pod is running successfully

When first enabled, the aggregator pod will ingest the last 90 days (if applicable) of ETL data from the federated-store. Because the combined folder is ignored, the legacy Federator pod is not used here, but can still run if needed. As ETL_DAILY_STORE_DURATION_DAYS increases, the amount of time it will take for Aggregator to make data available will increase. You can run kubectl get pods and ensure the aggregator pod is running, but should still wait for all data to be ingested.

Troubleshooting Aggregator

Resetting Aggregator StatefulSet data

When deploying the Aggregator as a StatefulSet, it is possible to perform a reset of the Aggregator data. The Aggregator itself doesn't store any data, and relies on object storage. As such, a reset involves removing that Aggregator's local storage, and allowing it to re-ingest data from the object store. The procedure is as follows:

  1. Scale down the Aggregator StatefulSet to 0

  2. When the Aggregator pod is gone, delete the aggregator-db-storage-xxx-0 PVC

  3. Scale the Aggregator StatefulSet back to 1. This will re-create the PVC, empty.

  4. Wait for Kubecost to re-ingest data from the object store. This could take from several minutes to several hours, depending on your data size and retention settings.

Aggregator not displaying any data to frontend after several hours

One reason you may not see data in the frontend yet is because the Aggregator is processing all your ETL files in the federated store bucket into SQL tables.

If you are seeing a lot of the following logs, it could be an indicator that your .Values.kubecostAggregator.env.DB_BUCKET_REFRESH_INTERVAL may be set too low, causing the Aggregator to continuously restart its data ingestion process:

INF asset worker context cancelled: context canceled
INF allocation worker context cancelled: context canceled

To fix, try continuously increasing the environment variable's value, until the errors no longer appear. We recommend starting with 2h. More details about the environment variable described above.

kubecostAggregator:
  env:
    DB_BUCKET_REFRESH_INTERVAL: 2h

Checking the database for node metadata

Confirming whether node metadata exists in your database can be useful when troubleshooting missing data. Run the following command which will open a shell into the Aggregator pod:

kubectl exec -it KUBECOST-AGGREGATOR-POD-NAME sh

Point to the path where your database exists

cd /var/configs/waterfowl/duckdb/v0_9_2
ls -lah

Copy the database to a new file for testing to avoid modifications to the original data

cp kubecost-example.duckdb.read kubecost-example.duckdb.read.kubecost.copy

Open a DuckDB REPL pointed at the copied database

duckdb kubecost-example.duckdb.read.kubecost.copy

Run the following debugging queries to check if node data is available:

show tables;
describe node_1d;
select * from node_1d;
select providerid,windowstart,windowend,* from node_1d;

.maxrows 100;

Last updated