Create data streams

Data streams are monitoring and analytical components you set up independently of and later assign to your properties.

This component-rule relationship lets you configure multiple data streams, each focused on gathering a specific type of data. When combined, data streams deliver customized metrics and health insights about your traffic across multiple properties.

What you'll do

Configure a data stream, set up the push of raw data logs to a given destination, and connect it to your properties.

Get your properties

The properties you use in DataStream integrations must have room to pick up a new data stream and be active on a network.

You can assign up to three data streams to a single property. To see how many data streams are assigned to a property, review its Datastream rule.
A single data stream can support up to 100 properties. Use the Data streams data source to get a list of its assigned properties.

Note: If you need, create a new property and activate it on a network.

Get a list of your properties and determine which of them you'll use with your data stream.

data "akamai_properties" "my_properties" {
    group_id    = "12345"
    contract_id = "C-0N7RAC7"
}

output "my_properties" {
     value = data.akamai_properties.my_properties
}

+ my_properties = {
      + contract_id = "C-0N7RAC7"
      + group_id    = "12345"
      + id          = "grp_12345ctr_C-0N7RAC7"
      + properties  = [
          + {
              + contract_id        = "ctr_C-0N7RAC7"
              + group_id           = "grp_12345"
              + latest_version     = 8
              + note               = "Added hostname."
              + product_id         = "prd_Adaptive_Media_Delivery"
              + production_version = 8
              + property_id        = "prp_12345"
              + property_name      = "my_property1"
              + rule_format        = ""
              + staging_version    = 3
            },
            + {
              + contract_id        = "ctr_C-0N7RAC7"
              + group_id           = "grp_12345"
              + latest_version     = 3
              + note               = "File type update."
              + product_id         = "prd_Object_Delivery"
              + production_version = 3
              + property_id        = "prp_98765"
              + property_name      = "my_property2"
              + rule_format        = ""
              + staging_version    = 2
            },
        ]
    }

Export all chosen properties using the Terraform CLI.

The CLI export command places property configuration files in your current directory. To export them to a different location, add the --tfworkpath <path> flag to the command.

When exporting more than one, loop through the CLI command with the --tfworkpath <path> flag, changing the property name and export location every iteration.
```
akamai terraform --edgerc <edgerc-file-location> --section <edgerc-section> export-property <property-name>
```
Run the included import script to populate your Terraform state. This prevents Terraform from attempting to recreate your assets.

After you create a data stream, you'll update your properties' rule trees.

Create a data stream

To create a data stream, configure a log destination and choose data sets that shape what's monitored and collected about your traffic.

Average processing time 90 minutes - 3 hours

Provide a destination

You can use a custom HTTPS endpoint or a third-party object storage location. Supported storage locations:

Amazon S3
Azure Storage
Datadog
Elasticsearch
Google Cloud Storage

Loggly
New Relic
Oracle Cloud
Splunk
Sumo Logic

Create a connector block for your data destination. Use the argument column's heading as is and add it to _connector to name your connector's block, for example, gcs_connector.

Include all required arguments for your destination.

gcs_connector {
    bucket               = "my_bucket"
    display_name         = "my_connector_name"
    path                 = "akamai/logs"
    private_key          = "-----BEGIN PRIVATE KEY-----\nprivate_key\n-----END PRIVATE KEY-----\n"
    project_id           = "my_project_id"
    service_account_name = "my_service_account_name"
  }

Argument	Required	Description
azure
`access_key`	✔	The account access key for authentication.
`account_name`	✔	The Azure Storage account.
`display_name`	✔	The connector's name.
`container_name`	✔	The Azure Storage container name.
`path`	✔	The path to the log storage folder.
datadog
`auth_token`	✔	Your account's API key.
`display_name`	✔	The connector's name.
`endpoint`	✔	The storage endpoint for the logs.
`tags`		The Datadog connector tags.
`compress_logs`		Boolean that sets the compression of logs.
`service`		The Datadog service connector.
`source`		The Datadog source connector.
elasticsearch
`display_name`	✔	The connector's name.
`endpoint`	✔	The storage endpoint for the logs.
`user_name`	✔	The `BASIC` user name for authentication.
`password`	✔	The `BASIC` password for authentication.
`index_name`	✔	The index name for where to store log files.
`tls_hostname`		The hostname that verifies the server's certificate and matches the Subject Alternative Names (SANs) in the certificate. If not provided, DataStream fetches the hostname from the endpoint URL.
`ca_cert`		The certification authority (CA) certificate used to verify the origin server's certificate. If the certificate is not signed by a well-known certification authority, enter the CA certificate in PEM format for verification.
`client_cert`		The digital certificate in the PEM format you want to use to authenticate requests to your destination. If you want to use mutual authentication, you need to provide both the client certificate and the client key in PEM format.
`client_key`		The private key for back-end authentication in non-encrypted PKCS8 format. If you want to use mutual authentication, you need to provide both the client certificate and the client key.
`content_type`		The content type to pass in the log file header.
`custom_header_name`		A custom header name passed with the request to the destination.
`custom_header_value`		The custom header's value passed with the request to the destination.
gcs
`bucket`	✔	The bucket name.
`display_name`	✔	The connector's name.
`private_key`	✔	A JSON private key for a Google Cloud Storage account.
`project_id`	✔	A Google Cloud project ID.
`service_account_name`	✔	The name of the service account with the storage object create permission or storage object creator role.
`path`		The path to the log storage folder.
https
`authentication_type`	✔	Either `NONE` for no authentication or `BASIC` for username and password authentication.
`display_name`	✔	The connector's name.
`endpoint`	✔	The storage endpoint for the logs.
`content_type`		The content type to pass in the log file header.
`compress_logs`		Boolean that sets the compression of logs.
`custom_header_name`		A custom header name passed with the request to the destination.
`custom_header_value`		The custom header's value passed with the request to the destination.
`password`		The `BASIC` password for authentication.
`user_name`		The `BASIC` user name for authentication.
`tls_hostname`		The hostname that verifies the server's certificate and matches the Subject Alternative Names (SANs) in the certificate. If not provided, DataStream fetches the hostname from the endpoint URL.
`ca_cert`		The certification authority (CA) certificate used to verify the origin server's certificate. If the certificate is not signed by a well-known certification authority, enter the CA certificate in PEM format for verification.
`client_cert`		The digital certificate in the PEM format you want to use to authenticate requests to your destination. If you want to use mutual authentication, you need to provide both the client certificate and the client key in PEM format.
`client_key`		The private key for back-end authentication in non-encrypted PKCS8 format. If you want to use mutual authentication, you need to provide both the client certificate and the client key.
loggly
`display_name`	✔	The connector's name.
`endpoint`	✔	The storage endpoint for the logs.
`auth_token`	✔	The HTTP code for your Loggly bulk endpoint.
`content_type`		The content type to pass in the log file header.
`tags`		Tags to segment and filter log events in Loggly.
`custom_header_name`		A custom header name passed with the request to the destination.
`custom_header_value`		The custom header's value passed with the request to the destination.
new_relic
`display_name`	✔	The connector's name.
`endpoint`	✔	The storage endpoint for the logs.
`auth_token`	✔	Your account's API key.
`content_type`		The content type to pass in the log file header.
`custom_header_name`		A custom header name passed with the request to the destination.
`custom_header_value`		The custom header's value passed with the request to the destination.
oracle
`access_key`	✔	The account access key for authentication.
`bucket`	✔	The bucket name.
`display_name`	✔	The connector's name.
`namespace`	✔	The Oracle Cloud storage account's namespace.
`path`	✔	The path to the log storage folder.
`region`	✔	The region where the bucket resides.
`secret_access_key`	✔	The account access key for authentication.
s3
`access_key`	✔	The account access key for authentication.
`bucket`	✔	The bucket name.
`display_name`	✔	The connector's name.
`path`	✔	The path to the log storage folder.
`region`	✔	The region where the bucket resides.
`secret_access_key`	✔	The secret access key used to authenticate requests to the Amazon S3 account.
splunk
`display_name`	✔	The connector's name.
`event_collector_token`	✔	The Splunk account's event collector token.
`endpoint`	✔	The storage endpoint for the logs.
`client_key`		The private key for back-end authentication in non-encrypted PKCS8 format. If you want to use mutual authentication, you need to provide both the client certificate and the client key.
`ca_cert`		The certification authority (CA) certificate used to verify the origin server's certificate. If the certificate is not signed by a well-known certification authority, enter the CA certificate in PEM format for verification.
`client_cert`		The digital certificate in the PEM format you want to use to authenticate requests to your destination. If you want to use mutual authentication, you need to provide both the client certificate and the client key in PEM format.
`custom_header_name`		A custom header name passed with the request to the destination.
`custom_header_value`		The custom header's value passed with the request to the destination.
`tls_hostname`		The hostname that verifies the server's certificate and matches the Subject Alternative Names (SANs) in the certificate. If not provided, DataStream fetches the hostname from the endpoint URL.
`compress_logs`		Boolean that sets the compression of logs.
sumologic
`collector_code`	✔	The Sumo Logic endpoint's HTTP collector code.
`display_name`	✔	The connector's name.
`endpoint`	✔	The storage endpoint for the logs.
`content_type`		The content type to pass in the log file header.
`compress_logs`		Boolean that sets the compression of logs.
`custom_header_name`		A custom header name passed with the request to the destination.
`custom_header_value`		The custom header's value passed with the request to the destination.

Choose data sets

Data set fields represent the types of data collected and returned in your log files.

Choose a data set field and add their IDs as a comma-separated list of integers in the dataset_fields argument. The order in which you place these determines the order in which they appear in log files.

🚧
For fields that require additional behaviors, wait to adjust your property configuration until after data stream creation.

ID	Field name	Description
Log information
999	Stream ID	The ID for the stream that logged the request data. You can log this field to troubleshoot and group logs between different streams.
1000	CP code	The CP code associated with the request.
1002	Request ID	The request ID.
1100	Request time	The time when the edge server accepted the request from the client.
2024	Edge attempts	The number of attempts to download the content from the edge in a specific time interval. Value based on the number of total manifest requests received.
Message exchange data
1005	Bytes	The content bytes served in the response body. For HTTP/2, this includes overhead bytes.
1006	Client IP	The requesting client's IPv4 or IPv6 address.
1008	HTTP status code	The returned HTTP response code.
1009	Protocol type	The request-response scheme, either HTTP or HTTPS.
1011	Request host	The value of the host in the request header.
1012	Request method	A request's HTTP method.
1013	Request path	The path to a resource in the request, excluding query parameters.
1014	Request port	The client TCP port number of the requested service.
1015	Response Content-Length	The size of the entity-body in bytes returned to the client.
1016	Response Content-Type	The type of the content returned to the client.
1017	User-Agent	The URI-encoded user agent making the request.
2001	TLS overhead time	The time in milliseconds between when the edge server accepts the connection and the completion of the SSL handshake.
2002	TLS version	The protocol of the TLS handshake, either TLSv1.2 or TLSv1.3.
2003	Object size	The size of the object, excluding HTTP response headers.
2004	Uncompressed size	The size of the uncompressed object if zipped before sending it to the client.
2006	Overhead bytes	TCP overhead in bytes for the request and response.
2008	Total bytes	The total bytes served in the response, including content and HTTP overhead.
2009	Query string	The query string in the incoming URL from the client. To monitor this parameter in your logs, you need to update your property configuration to set the cache key query parameters behavior to include all parameters.
2023	File size bucket	Groups of response content sorted into different buckets by size in kilobytes, megabytes, and gigabytes.
2060	Brotli status	This field reports the status when serving a Brotli-compressed object. This field is available only for Ion Standard, Ion Premier, and Ion Media Advanced products. For details, see Brotli status.
2061	Origin Content-Length	The compressible content-length object value, in bytes, in the response header from the origin. This field is only available for Ion Standard, Ion Premier, and Ion Media Advanced products.
2062	Download initiated	The number of successful download initiations in a specific time interval.
2063	Download completed	The number of successful downloads completed.
Request header data
1019	Accept-Language	The list of languages acceptable in the response.
1023	Cookie	A list of HTTP cookies previously sent by the server with the Set-Cookie header.
1031	Range	The requested entity part returned.
1032	Referer	The address of the resource that forwarded the request URL.
1037	X-Forwarded-For	The originating IP address of a client connecting to a web server through an HTTP proxy or load balancer.
2005	Max-Age	The time in seconds a response object is valid for positive cache responses.
Network performance data
1033	Request end time	The time in milliseconds it takes the edge server to fully read the request.
1068	Error code	A description detailing the issue with serving a request.
1102	Turn around time	The time in milliseconds from when the edge server receives the last byte of the request to when it sends the first bytes of the response.
1103	Transfer time	The time in milliseconds from when the edge server is ready to send the first byte of the response to when the last byte reaches the kernel.
2007	DNS lookup time	The time in seconds between the start of the request and the completion of the DNS lookup, if one was required. For cached IP addresses, this value is zero.
2021	Last byte	The last byte of the object that was served in a response. `0` indicates a part of a byte-range response. This field is available for all products supported by DataStream.
2022	Asnum	The Autonomous System Number (ASN) of the request's internet service provider.
2025	Time to first byte	The time taken to download the first byte of the received content in milliseconds.
2026	Startup errors	The number of download initiation failures in a specific time interval.
2027	Download time	The time taken to download the object in milliseconds.
2028	Throughput	The byte transfer rate for the selected time interval in kilobits per second.
Cache data
2010	Cache status	Returns `0` if there was no object in the cache, and `1` if the object was present in the cache. In the event of negatively cached errors or stale content, the object is served from upstream even if cached.
2019	Cacheable	Returns `1` if the object is cacheable based on response headers and metadata, and `0` if the object is not cacheable.
2020	Breadcrumbs	Returns additional breadcrumbs data about the HTTP request-response cycle for improved visibility into the Akamai platform, such as the IP of the node or host, component, request end, turnaround, and DNS lookup time. This field is available only for Adaptive Media Delivery, Download Delivery, Object Delivery, Dynamic Site Accelerator, Ion Standard, Ion Premier, and API Acceleration products. To log this parameter for Dynamic Site Accelerator, Ion Standard, and API Acceleration, you need to enable the `breadcrumbs` behavior in your stream's property configuration. For details, see Breadcrumbs.
Geo data
1066	Edge IP	The IP address of the edge server that served the response to the client. This is useful when resolving issues with your account representative.
2012	Country/Region	The ISO code of the country or region where the request originated.
2013	State	The state or province where the request originated.
2014	City	The city where the request originated.
2052	Server country/region	The ISO code of the country or region from where the request was served.
2053	Billing region	The Akamai geographical price zone for where the request was served.
Web security
2050	Security rules	Returns data on security policy ID, non-deny, and deny rules when the request triggers any configured WAF or Bot Manager rules. Requires configuring the Web Application Firewall (WAF) behavior in your property or adding hostnames in your security configurations.
EdgeWorkers
3000	EdgeWorkers usage	Returns EdgeWorkers data for client requests and responses if EdgeWorkers is enabled. The field format is: `//[EdgeWorkers-Id]/[Version]/[Event Handler]/[Off Reason]/[Logic Executed]/[Status]/#[Metrics]`.
3001	EdgeWorkers execution	Returns EdgeWorkers execution information if enabled, including the stage of execution, the EdgeWorker ID, process, total, and total stage time in milliseconds, used memory (in kilobytes), ghost flow, error code, HTTP status change when the response is generated using the API, CPU flits consumed during processing, tier ID for the request, indirect CPU time (in milliseconds) and ghost error code.
Media
2080	CMCD	Returns a Common Media Client Data (CMCD) payload with detailed data on media traffic. This field is available only for the Adaptive Media Delivery product. For details, see Common media client data.
2081	Delivery type	Limits logged data to a specific media delivery type, such as live or video on demand.
2082	Delivery format	Returns `1` if media encryption is enabled for the content delivered from the edge to the client.
2083	Media encryption	Returns `1` if an edge server prefetched the content delivered from the edge to the client.
Content protection
3011	Content protection information	Returns Enhanced Proxy Detection (EPD) data, including the GeoGuard category and the action EPD performed on the request.
Midgress traffic
2084	Prefetch midgress hits	The midgress traffic within the Akamai network, such as between two edge servers. To use this, enable the `collect_midgress_traffic` option in the DataStream behavior for your property in Property Manager. As a result, the second slot in the log line returns processing information about a request. `0`, if the request was processed between the client device and edge server (`CLIENT_REQ`), and isn't logged as midgress traffic. `1`, if the request was processed by an edge server within the region (`PEER_REQ`), and is logged as midgress traffic. `2`, if the request was processed by a parent Akamai edge server in the parent-child hierarchy (`CHILD_REQ`), and is logged as midgress traffic.
Custom fields
1082	Custom field	The data specified in the custom log field of the log requests details that you want to receive in the stream. For details, see Custom log field.

Construct the resource

Add your connector block and data stream fields and property lists to the remaining required arguments to create a data stream.

Argument	Description
Required
`active`	Whether your data stream is activated along with creation. Important: Because the data stream creation process can take a bit, set the value to `true` as it removes a second round of processing for activation.
`delivery_configuration`	A set that provides configuration information for the logs. `field_delimiter`. Sets a space as a delimiter to separate data set fields in log lines. Value is `SPACE`. If used, you must also use the `format` argument set to `STRUCTURED`. `format`. Required. The format in which you want to receive log files, `STRUCTURED` or `JSON`. If you've used a delimiter, the format must be `STRUCTURED`. `frequency`. Required. A set that includes `interval_in_secs`. The time in seconds after which the system bundles log lines into a file and sends the file to a destination. Possible values are `30` and `60`. `upload_file_prefix`. The log file prefix to send to a destination. Maximum characters, 200. If unspecified, it defaults to `ak`. `upload_file_suffix`. The log file suffix to send to a destination. Maximum characters, 10. If unspecified, it defaults to `ds`.
`contract_id`	Your contract's ID.
`dataset_fields`	A set of IDs for the data set fields within the product for which you want to receive logs. The order of the IDs defines their order in the log lines. For values, use the dataset_fields data source to get the available fields for your product. For details on each data set, see Choose data sets.
`group_id`	Your group's ID.
`properties`	A list of properties the data stream monitors. Data can only be logged on active properties.
`stream_name`	The name of or for your stream.
`{connector}_connector`	Destination details for the data stream. Replace `{connector}` with the respective type listed in the connector table.
Optional
`notification_emails`	A list of email addresses to which the data stream's activation and deactivation status are sent.
`collect_midgress`	Boolean that sets the collection of midgress data.

resource "akamai_datastream" "my_datastream" {
  active = true
  delivery_configuration {
    field_delimiter = "SPACE"
    format          = "STRUCTURED"
    frequency {
      interval_in_secs = 30
    }
    upload_file_prefix = "prefix"
    upload_file_suffix = "suffix"
  }
  contract_id = "C-0N7RAC7"
  dataset_fields = [
    1000, 1002, 1102
  ]
  group_id = 12345
  properties = [
    12345, 98765
  ]
  stream_name = "Datastream_Example1"
  gcs_connector {
    bucket               = "my_bucket"
    display_name         = "my_connector_name"
    path                 = "akamai/logs"
    private_key          = "-----BEGIN PRIVATE KEY-----\nprivate_key\n-----END PRIVATE KEY-----\n"
    project_id           = "my_project_id"
    service_account_name = "my_service_account_name"
  }
  notification_emails = [
    "example1@example.com",
    "example2@example.com",
  ]
  collect_midgress = true
}

There is no default standard output as the attribute values are sensitive, but you can get your data stream's ID from the last line of the process log or by using the data stream data source.

akamai_datastream.my_datastream: Creation complete after 1h20m16s [id=12345]

Use the ID to connect your data stream to your property in the data stream rule.

Add rules and behaviors

Add the DataStream rule and behavior and any additional behaviors required by a data set to your properties' default rule. For options configuration, see the datastream behavior.

For more than one property, loop through each property to add the rule.
If you use includes in your rule tree, activate them before you activate your property.

{
 "name": "Datastream",
 "children": [],
 "behaviors": [
  {
   "name": "datastream",
   "options": {
    "streamType": "LOG",
    "logStreamTitle": "",
    "logEnabled": true,
    "logStreamName": [
     "12345"
    ],
    "samplingPercentage": 100,
    "collectMidgressTraffic": false
   }
  }
 ],
 "criteria": [],
 "criteriaMustSatisfy": ""
}

Activate properties

Activate your properties on a network to start collecting data with your data stream.

The required arguments for a property activation are property_id, contact, and version. If you don't specify a network, the default action targets staging.

// Change the network value to production for the production network
resource "akamai_property_activation" "my_activation" {
    property_id                    = "prp_12345"
    network                        = "staging"
    contact                        = ["jsmith@example.com"]
    note                           = "Sample activation"
    version                        = "1"
    auto_acknowledge_rule_warnings = true
    timeouts {
        default = "1h"
    }
}

Updated 3 months ago