Export metrics to external observability tools

Akamai Distribution of the OpenTelemetry Collector for Akamai Cloud Pulse is Akamai’s version of the Open Telemetry Collector, used under the Apache License, Version 2.0.

This collector (aclp-collector) facilitates the export of telemetry data from Akamai Cloud services to external observability tools that support the OpenTelemetry standard. This guide provides instructions for installing and using the collector, which is available as a precompiled binary, a Docker image, or via a Helm chart for Kubernetes environments.

📘

Currently, the collector supports Managed Databases (service_type: "dbaas"). Support for additional services will be added in future releases.

Supported platforms

Build artifacts for the collector are specifically compiled for AMD64 Linux platforms and should only be deployed to compatible Linux environments.

Akamai Distribution of the OpenTelemetry Collector for Akamai Cloud Pulse has been verified on Linux OS:

  • Ubuntu 22.08+
  • Debian 11+,
  • AlmaLinux 8+
  • Alpine 3.19+
  • CentOS Stream 9
  • Fedora 41
  • Rocky Linux 8.

Step 1: Create your config.yaml file

To export your telemetry data, you'll need to create a configuration file. To get started, you can download a sample config.yaml containing a basic set of supported components. Modify the components and their associated parameters to suit your specific environment, and add your personal access token (PAT).

Supported components

The following components are available in the collector. You can customize them and their settings to suit your specific use case.

Receiver configuration

Akamai Cloud Pulse provides users with access to key operational metrics for core Akamai Cloud services. These metrics are exposed through the ACLP REST API and persisted in a Time Series Database (TSDB) managed by Akamai.

akamaicloudpulsereceiver queries the API, extracts relevant service metrics, and converts them into OpenTelemetry-compliant telemetry data for downstream processing and export.

This section outlines the configurable parameters for akamaicloudpulsereceiver. Customize these parameters in your config.yaml file.

ParameterMandatoryService-level overrideDescription and example usage
service_typeYesN/AThe type of service for which metrics are being fetched: dbaas

Example usage: service-type: "dbaas"
metric_namesYesN/AThe metrics to be collected from the service. Find the list of metrics for a given service using the API .
You can list up to five metric_names per service_type.

Example usage: metric_names: ["cpu_usage", "memory_usage", "disk_usage"]
agg_functionYesN/AThe function used to aggregate multiple data points, for example, min, max, oravg. Find the aggregate functions supported by a given service using the API .

Example usage: agg_function: "min"
entity_idsYesN/AThe identifiers for the entities for which you wish to collect metrics, for example, 123or 456. Use "*" to collect metrics for all entities.

This parameter can't be used with regions.

Example usage: entity_ids: ["123","456"]
regionsYesN/AThe regions for the entities for which you wish to collect metrics, for example,us-ord, us-iad. Use "*" to collect metrics for all regions.

This parameter can't be used with entity_ids..

Example usage:regions: [us-ord, us-iad]
polling_intervalYesYesThe frequency at which services are checked and new metrics data collected. The polling interval must be equal to or greater than one minute. See also: query_delay.

A polling interval specified at the receiver level will be overridden if also specified at the service level.

Example usage: polling_interval: "1m"
PATYesYesThe personal access token (PAT) used to authenticate a user when querying APIs.

A PAT specified at the receiver level will be overridden if also specified at the service level.

Note: The Personal Access Token (PAT) used for fetching service metrics must be created with Monitor Read/Write permissions and the corresponding service’s Read/Write permissions to ensure full access for metric collection.

Example usage: PAT: "<your-PAT-token>"
query_delayNoN/AThe number of minutes to shift a query backward in time to account for data latency in Akamai Cloud Pulse. For example, if the current time is X and the query delay is set to 2 minutes:

endTime = X - 2 minutes
startTime = endTime - polling_interval.

Example usage: query_delay: "2m"

Note: Each metric has a predefined scrape_interval, which can be found using the Linode API .
Within the receiver, the scrape_interval for your metric request is computed using the above calculated endTime and startTime values:

computed_scrape_interval = endTime - startTime

Ensure that your query_delay and poll_interval values are configured such that computed_scrape_interval ≥ predefined scrape_interval defined for the metric in the Linode API .
thread_pool_sizeNoNoThe maximum number of parallel workers used to collect metrics. Thread pool size can only be specified at receiver level. The default value is coresize * 8.

Example usage: thread_pool_size: 80
refresh_intervalNoYesThe interval at which the list of services is re-evaluated and updated. The default interval is 15 minutes, but can be set to a custom value of 10 minutes or more.

A refresh interval specified at the receiver level will be overridden if also specified at the service level.

Example usage: refresh_interval: "30m"
group_byYesN/AThe value used to group metric data or results. This field is mandatory for release 1.0.0 and above.

Example usage: group_by: ["entity_id"]

Naming conventions

  • When defining multiple receivers of the type akamaicloudpulsereceiver, add suffixes to the receiver names following the pattern akamaicloudpulsereceiver/1, akamaicloudpulsereceiver/2, akamaicloudpulsereceiver/3, and so on.
  • When defining multiple services of the same type within a receiver block, add suffixes to the service names following the pattern dbaas/1, dbaas/2, and so on.

Example configuration

  akamaicloudpulsereceiver/1:
    polling_interval: "1m"
    refresh_interval: "30m"
    PAT: "<your-PAT-token>"
    thread_pool_size: 80
    services:
    	- service_type: "dbaas"
      	entity_ids: ["*"]
      	metric_names: ["memory_usage", "disk_usage"]
      	agg_function: "min"
      	query_delay: "2m"
      	group_by: ["entity_id"]
  akamaicloudpulsereceiver/2:
    services:
      - service_type: "dbaas/1"
        entity_ids : [123456, 7891011]
        metric_names: ["cpu_usage"]
        polling_interval: "1m"
        PAT : "<your-PAT-token>"
        agg_function: "avg"
        group_by: ["entity_id"]
      - service_type: "dbaas/2"
        regions : ["*"]
        metric_names: ["memory_usage"]
        polling_interval: "5m"
        PAT : "<your-PAT-token>"
        agg_function: "sum"
        refresh_interval: "20m"
        group_by: ["entity_id"]

Step 2: Run the collector

You can deploy the collector as a precompiled binary, as a Docker image, or via a Helm chart in Kubernetes.

Collector binary

Download the aclp-collector binary. Once downloaded, deploy and run the binary with the config.yaml file using:

chmod +x aclp-collector_v<binary_version>_linux_amd64
./aclp-collector_v<binary_version>_linux_amd64 --config=<path/to/config.yaml>

Docker image

Run the collector using the Docker image from Docker Hub along with a configured config.yaml file. Make sure the config.yaml file has been downloaded and customized before running the following commands:

docker run -d --name aclp-collector \
  -v <path/to/config.yaml>:/etc/aclp-collector/config.yaml \
  linode/aclp-collector:<tag-name> \
  --config /etc/aclp-collector/config.yaml

Helm chart

Run the collector on a Kubernetes cluster using the Helm chart from Docker Hub along with a configured config.yaml file. Make sure the config.yaml file has been downloaded and customized before running the following commands:

helm install aclp-collector oci://registry-1.docker.io/linode/aclp-collector \
  --version <chart-version> \
  --set-file config=<path/to/config.yaml>

Step 3: Visualize metrics

Assuming you have configured the otlphttp or Prometheus exporter sink, connect a visualization tool to the sink endpoint to visualize telemetry data from the collector. See opentelemetry-collector and opentelemetry-collector-contrib to learn more about configuring and using the supported exporters.

Debugging

You can use the following exporters to verify the collector pipeline and inspect the telemetry data it emits.

Collector logs

By default, the collector sends its logs to standard output (stdout). If you want to save these logs to a file to debug later, the collector's config file doesn't include a path setting for log files. Instead, you can redirect stdout to a file using a pipe:

./aclp-collector --config=<path/to/config.yaml> > aclp-collector.log 2>&1

You can configure the collector's telemetry logs by adding the telemetry section to the collector's config file:

telemetry:
  logs:
    level: "debug"           # Available: debug, info, warn, error, DPanic, panic, fatal
    encoding: "console"      # Options: "json" or "console"

Debug Exporter

You can use the debug exporter to print telemetry data (metrics, in this case) directly to the collector’s console logs. This is useful for troubleshooting without sending data to an external backend.

To use the Debug exporter, add the following to the exporters section of your config.yaml:

exporters:
  debug:
    verbosity: detailed

Prometheus Exporter

The Prometheus exporter exposes metrics on an HTTP endpoint that can be scraped by Prometheus or visualized in Grafana.

To use the Prometheus exporter, add the following to the exporters section of your config.yaml:

exporters:  
  prometheus:  
    endpoint: "<bind_address>:<port>"

You can view the raw exported metrics in a browser by visiting http://<bind_address>:<port>/metrics, or by using curl:

curl http://<bind_address>:<port>/metrics

Health check

OpenTelemetry Collector provides a built-in health check endpoint to monitor its liveness and readiness. This is especially useful in containerized or orchestrated environments like Docker and Kubernetes.

Health Check Extension

The health check extension exposes a simple HTTP endpoint that returns "200 OK" when the collector is healthy or "500" if the collector isn't ready or has failed.

To use health check, add the extension to the extensions section of your config.yaml:

extensions:  
  health_check:  
    endpoint: "<bind_address>:<port>"

Once the collector is running, you can test the health check endpoint using:

curl http://<bind_address>:<port>/health

Performance optimization

If you observe issues like delayed metrics or missed data in the visualization or debug logs, you can fine tune the configuration for better performance.

For example, you might want to consider increasing the value of thread_pool_size to improve parallelism.

Hot reloading the collector with configuration updates

The collector supports hot-reloading of its configuration using signals such as SIGHUP. This means configuration adjustments can be applied in production environments without stopping and restarting the collector, minimizing downtime.

It is strongly recommended that you validate configuration changes before issuing a SIGHUP. Invalid configurations may cause errors or data loss. To validate a configuration, download the collector binary
and run the following command:

./aclp-collector_v<binary-version>\_linux_amd64 validate --config=\<path/to/config.yaml>

If any issues are present, the validation command will report them in advance.

Assuming the configuration is validated and the collector is already running, you can apply the updated configuration using SIGHUP. The process differs slightly depending on the deployment method.

Hot reload in binary mode

Modify the configuration file.

Find the process ID (PID) of the running collector:

ps aux | grep aclp-collector_v<binary-version>\_linux_amd64

Send a SIGHUP signal to the collector process:

kill -HUP <collector-pid>

Hot reload in Docker

Modify the configuration file.

Send a SIGHUP signal to the running container:

docker kill --signal=SIGHUP <container-name>

Hot reload in Kubernetes

Retrieve the running pods:

kubectl get pods

Inspect the collector process inside the pod to confirm the PID:

kubectl exec -it <pod-name> -n <namespace> -- sh
> ps aux | grep aclp-collector
> exit

📘

In Kubernetes, configuration changes can't be directly applied via SIGHUP. You must first update the ConfigMap and reapply it.

Update and apply the ConfigMap:

kubectl create configmap <configmap-name> --from-file=config.yaml=config.yaml -o yaml --dry-run=client | 
kubectl apply -f -

Once the updated ConfigMap is applied, send a SIGHUP to the collector process:

kubectl exec -n <namespace> <pod-name> -- kill -HUP <pid>