Create data streams

Version 5.0 subprovider update

We've updated our DataStream subprovider to provide a better developer experience and support new platform capabilities.

The improvements introduce changes incompatible with the previous version.

Version 4.1 will continue to work for existing integrations until January 2024. To update your integration, see the migration information.

Data streams are monitoring and analytical components you set up independently of and later assign to your properties.

This component-rule relationship lets you configure multiple data streams, each focused on gathering a specific type of data. When combined, data streams deliver customized metrics and health insights about your traffic across multiple properties.

What you'll do

Configure a data stream, set up the push of raw data logs to a given destination, and connect it to your properties.

Get your properties

The properties you use in DataStream integrations must have room to pick up a new data stream and be active on a network.

  • You can assign up to three data streams to a single property. To see how many data streams are assigned to a property, review its Datastream rule.
  • A single data stream can support up to 100 properties. Use the Data streams resource to get a list of its assigned properties.

Note: If you need, create a new property and activate it on a network.

  1. Get a list of your properties and determine which of them you'll use with your data stream.

    data "akamai_properties" "my_properties" {
        group_id    = "12345"
        contract_id = "C-0N7RAC7"
    }
    
    output "my_properties" {
      value = data.akamai_properties.my_properties
    }
    
    + my_properties = {
          + contract_id = "C-0N7RAC7"
          + group_id    = "12345"
          + id          = "grp_12345ctr_C-0N7RAC7"
          + properties  = [
              + {
                  + contract_id        = "ctr_C-0N7RAC7"
                  + group_id           = "grp_12345"
                  + latest_version     = 8
                  + note               = "Added hostname."
                  + product_id         = "prd_Adaptive_Media_Delivery"
                  + production_version = 8
                  + property_id        = "prp_12345"
                  + property_name      = "my_property1"
                  + rule_format        = ""
                  + staging_version    = 3
                },
                + {
                  + contract_id        = "ctr_C-0N7RAC7"
                  + group_id           = "grp_12345"
                  + latest_version     = 3
                  + note               = "File type update."
                  + product_id         = "prd_Object_Delivery"
                  + production_version = 3
                  + property_id        = "prp_98765"
                  + property_name      = "my_property2"
                  + rule_format        = ""
                  + staging_version    = 2
                },
            ]
        }
    
  2. Export all chosen properties using the Terraform CLI.

    The CLI export command places property configuration files in your current directory. To export them to a different location, add the --tfworkpath <path> flag to the command.

    When exporting more than one, loop through the CLI command with the --tfworkpath <path> flag, changing the property name and export location every iteration.

    akamai terraform --edgerc <edgerc-file-location> --section <edgerc-section> export-property <property-name>
    
  3. Run the included import script to populate your Terraform state. This prevents Terraform from attempting to recreate your assets.

After you create a data stream, you'll update your properties' rule trees.

Create a data stream

To create a data stream, configure a log destination and choose data sets that shape what's monitored and collected about your traffic.

Average processing time90 minutes - 3 hours

Provide a destination

You can use a custom HTTPS endpoint or a third-party object storage location. Supported storage locations:

  • Amazon S3
  • Azure Storage
  • Datadog
  • Elasticsearch
  • Google Cloud Storage
  • Loggly
  • New Relic
  • Oracle Cloud
  • Splunk
  • Sumo Logic

Create a connector block for your data destination. Use the argument column's heading as is and add it to _connector to name your connector's block, for example, gcs_connector.

Include all required arguments for your destination.

gcs_connector {
    bucket               = "my_bucket"
    display_name         = "my_connector_name"
    path                 = "akamai/logs"
    private_key          = "-----BEGIN PRIVATE KEY-----\nprivate_key\n-----END PRIVATE KEY-----\n"
    project_id           = "my_project_id"
    service_account_name = "my_service_account_name"
  }
Argument Required Description
azure
access_key ‚úĒ The account access key for authentication.
account_name ‚úĒ The Azure Storage account.
display_name ‚úĒ The connector's name.
container_name ‚úĒ The Azure Storage container name.
path ‚úĒ The path to the log storage folder.
compress_logs Boolean that sets the compression of logs.
datadog
auth_token ‚úĒ Your account's API key.
display_name ‚úĒ The connector's name.
endpoint ‚úĒ The storage endpoint for the logs.
tags The Datadog connector tags.
compress_logs Boolean that sets the compression of logs.
service The Datadog service connector.
source The Datadog source connector.
elasticsearch
display_name ‚úĒ The connector's name.
endpoint ‚úĒ The storage endpoint for the logs.
user_name ‚úĒ The BASIC user name for authentication.
password ‚úĒ The BASIC password for authentication.
index_name ‚úĒ The index name for where to store log files.
tls_hostname The hostname that verifies the server's certificate and matches the Subject Alternative Names (SANs) in the certificate. If not provided, DataStream fetches the hostname from the endpoint URL.
ca_cert The certification authority (CA) certificate used to verify the origin server's certificate. If the certificate is not signed by a well-known certification authority, enter the CA certificate in PEM format for verification.
client_cert The digital certificate in the PEM format you want to use to authenticate requests to your destination. If you want to use mutual authentication, you need to provide both the client certificate and the client key in PEM format.
client_key The private key for back-end authentication in non-encrypted PKCS8 format you. If you want to use mutual authentication, you need to provide both the client certificate and the client key.
m_tls Boolean that sets mTLS enablement.
content_type The content type to pass in the log file header.
custom_header_name A custom header name passed with the request to the destination.
custom_header_value The custom header's value passed with the request to the destination.
gcs
bucket ‚úĒ The bucket name.
display_name ‚úĒ The connector's name.
private_key ‚úĒ A JSON private key for a Google Cloud Storage account.
project_id ‚úĒ A Google Cloud project ID.
service_account_name ‚úĒ The name of the service account with the storage object create permission or storage object creator role.
compress_logs Boolean that sets the compression of logs.
path The path to the log storage folder.
https
authentication_type ‚úĒ Either NONE for no authentication or BASIC for username and password authentication.
display_name ‚úĒ The connector's name.
content_type ‚úĒ The content type to pass in the log file header.
endpoint ‚úĒ The storage endpoint for the logs.
m_tls Boolean that sets mTLS enablement.
compress_logs Boolean that sets the compression of logs.
custom_header_name A custom header name passed with the request to the destination.
custom_header_value The custom header's value passed with the request to the destination.
password The BASIC password for authentication.
user_name The BASIC user name for authentication.
tls_hostname The hostname that verifies the server's certificate and matches the Subject Alternative Names (SANs) in the certificate. If not provided, DataStream fetches the hostname from the endpoint URL.
ca_cert The certification authority (CA) certificate used to verify the origin server's certificate. If the certificate is not signed by a well-known certification authority, enter the CA certificate in PEM format for verification.
client_cert The digital certificate in the PEM format you want to use to authenticate requests to your destination. If you want to use mutual authentication, you need to provide both the client certificate and the client key in PEM format.
client_key The private key for back-end authentication in non-encrypted PKCS8 format you. If you want to use mutual authentication, you need to provide both the client certificate and the client key.
loggly
display_name ‚úĒ The connector's name.
endpoint ‚úĒ The storage endpoint for the logs.
auth_token ‚úĒ The HTTP code for your Loggly bulk endpoint.
content_type The content type to pass in the log file header.
tags Tags to segment and filter log events in Loggly.
custom_header_name A custom header name passed with the request to the destination.
custom_header_value The custom header's value passed with the request to the destination.
new_relic
display_name ‚úĒ The connector's name.
endpoint ‚úĒ The storage endpoint for the logs.
auth_token ‚úĒ Your account's API key.
content_type The content type to pass in the log file header.
custom_header_name A custom header name passed with the request to the destination.
custom_header_value The custom header's value passed with the request to the destination.
oracle
access_key ‚úĒ The account access key for authentication.
bucket ‚úĒ The bucket name.
compress_logs ‚úĒ Boolean that sets the compression of logs.
display_name ‚úĒ The connector's name.
namespace ‚úĒ The Oracle Cloud storage account's namespace.
path ‚úĒ The path to the log storage folder.
region ‚úĒ The region where the bucket resides.
secret_access_key ‚úĒ The account access key for authentication.
s3
access_key ‚úĒ The account access key for authentication.
bucket ‚úĒ The bucket name.
display_name ‚úĒ The connector's name.
path ‚úĒ The path to the log storage folder.
region ‚úĒ The region where the bucket resides.
secret_access_key ‚úĒ The secret access key used to authenticate requests to the Amazon S3 account.
compress_logs Boolean that sets the compression of logs.
splunk
display_name ‚úĒ The connector's name.
event_collector_token ‚úĒ The Splunk account's event collector token.
endpoint ‚úĒ The storage endpoint for the logs.
client_key The private key for back-end authentication in non-encrypted PKCS8 format you. If you want to use mutual authentication, you need to provide both the client certificate and the client key.
ca_cert The certification authority (CA) certificate used to verify the origin server's certificate. If the certificate is not signed by a well-known certification authority, enter the CA certificate in PEM format for verification.
client_cert The digital certificate in the PEM format you want to use to authenticate requests to your destination. If you want to use mutual authentication, you need to provide both the client certificate and the client key in PEM format.
m_tls Boolean that sets mTLS enablement.
custom_header_name A custom header name passed with the request to the destination.
custom_header_value The custom header's value passed with the request to the destination.
tls_hostname The hostname that verifies the server's certificate and matches the Subject Alternative Names (SANs) in the certificate. If not provided, DataStream fetches the hostname from the endpoint URL.
compress_logs Boolean that sets the compression of logs.
sumologic
collector_code ‚úĒ The Sumo Logic endpoint's HTTP collector code.
content_type ‚úĒ The content type to pass in the log file header.
display_name ‚úĒ The connector's name.
endpoint ‚úĒ The storage endpoint for the logs.
compress_logs Boolean that sets the compression of logs.
custom_header_name A custom header name passed with the request to the destination.
custom_header_value The custom header's value passed with the request to the destination.

Choose data sets

Data set fields represent the types of data collected and returned in your log files.

Choose data set field and add their IDs as a comma separated list of integers in the dataset_fields argument. The order in which you place these determines the order in which they appear in log files.

ūüöß

For fields that require additional behaviors, wait to adjust your property configuration until after data stream creation.

ID Field name Description
Log information
999 Stream ID The ID for the stream that logged the request data. You can log this field to troubleshoot and group logs between different streams.
1000 CP code The CP code associated with the request.
1002 Request ID The request ID.
1100 Request time The time when the edge server accepted the request from the client.
2024 Edge attempts The number of attempts to download the content from the edge in a specific time interval. Value based on the number of total manifest requests received.
Message exchange data
1005 Bytes The content bytes served in the response body. For HTTP/2, this includes overhead bytes.
1006 Client IP The requesting client's IPv4 or IPv6 address.
1008 HTTP status code The returned HTTP response code.
1009 Protocol type The request-response scheme, either HTTP or HTTPS.
1011 Request host The value of the host in the request header.
1012 Request method A request's HTTP method.
1013 Request path The path to a resource in the request, excluding query parameters.
1014 Request port The client TCP port number of the requested service.
1015 Response Content-Length The size of the entity-body in bytes returned to the client.
1016 Response Content-Type The type of the content returned to the client.
1017 User-Agent The URI-encoded user agent making the request.
2001 TLS overhead time The time in milliseconds between when the edge server accepts the connection and the completion of the SSL handshake.
2002 TLS version The protocol of the TLS handshake, either TLSv1.2 or TLSv1.3.
2003 Object size The size of the object, excluding HTTP response headers.
2004 Uncompressed size The size of the uncompressed object if zipped before sending to the client.
2006 Overhead bytes TCP overhead in bytes for the request and response.
2008 Total bytes The total bytes served in the response, including content and HTTP overhead.
2009 Query string The query string in the incoming URL from the client. To monitor this parameter in your logs, you need to update your property configuration to set the cache key query parameters behavior to include all parameters.
2023 File size bucket Groups of response content sorted into different buckets by size in kilobytes, megabytes, and gigabytes.
2060 Brotli status This field reports the status when serving a Brotli-compressed object. This field is available only for Ion Standard, Ion Premier and Ion Media Advanced products. For details, see Brotli status.
2061 Origin Content-Length The compressible content-length object value, in bytes, in the response header from the origin. This field is only available for Ion Standard, Ion Premier, and Ion Media Advanced products.
2062 Download initiated The number of successful download initiations in a specific time interval.
2063 Download completed The number of successful downloads completed.
Request header data
1019 Accept-Language The list of languages acceptable in the response.
1023 Cookie A list of HTTP cookies previously sent by the server with the Set-Cookie header.
1031 Range The requested entity part returned.
1032 Referer The address of the resource that forwarded the request URL.
1037 X-Forwarded-For The originating IP address of a client connecting to a web server through an HTTP proxy or load balancer.
2005 Max-Age The time in seconds a response object is valid for positive cache responses.
Network performance data
1033 Request end time The time in milliseconds it takes the edge server to fully read the request.
1068 Error code A description detailing the issue with serving a request.
1102 Turn around time The time in milliseconds from when the edge server receives the last byte of the request to when it sends the first bytes of the response.
1103 Transfer time The time in milliseconds from when the edge server is ready to send the first byte of the response to when the last byte reaches the kernel.
2007 DNS lookup time The time in seconds between the start of the request and the completion of the DNS lookup, if one was required. For cached IP addresses, this value is zero.
2021 Last byte The last byte of the object that was served in a response. 0 indicates a part of a byte-range response. This field is available for all products supported by DataStream.
2022 Asnum The Autonomous System Number (ASN) of the request's internet service provider.
2025 Time to first byte The time taken to download the first byte of the received content in milliseconds.
2026 Startup errors The number of download initiation failures in a specific time interval.
2027 Download time The time taken to download the object in milliseconds.
2028 Throughput The byte transfer rate for the selected time interval in kilobits per second.
Cache data
2010 Cache status Returns 0 if there was no object in the cache, and 1 if the object was present in the cache. In the event of negatively cached errors or stale content, the object is served from upstream even if cached.
2011 Cache refresh source ?
2019 Cacheable Returns 1 if the object is cacheable based on response headers and metadata, and 0 if the object is not cacheable.
2020 Breadcrumbs Returns additional breadcrumbs data about the HTTP request-response cycle for improved visibility into the Akamai platform, such as the IP of the node or host, component, request end, turnaround, and DNS lookup time. This field is available only for Adaptive Media Delivery, Download Delivery, Object Delivery, Dynamic Site Accelerator, Ion Standard, Ion Premier, and API Acceleration products.

To log this parameter for Dynamic Site Accelerator, Ion Standard, and API Acceleration, you need to enable the breadcrumbs behavior in your stream's property configuration. For details, see Breadcrumbs.
Geo data
1066 Edge IP The IP address of the edge server that served the response to the client. This is useful when resolving issues with your account representative.
2012 Country/Region The ISO code of the country or region where the request originated.
2013 State The state or province where the request originated.
2014 City The city where the request originated.
2052 Server country/region The ISO code of the country or region from where the request was served.
2053 Billing region The Akamai geographical price zone for where the request was served.
Web security
2050 Security rules Returns data on security policy ID, non-deny, and deny rules when the request triggers any configured WAF or Bot Manager rules. Requires configuring the Web Application Firewall (WAF) behavior in your property or adding hostnames in your security configurations.
EdgeWorkers
3000 EdgeWorkers usage Returns EdgeWorkers data for client requests and responses if EdgeWorkers is enabled. The field format is: //[EdgeWorkers-Id]/[Version]/[Event Handler]/[Off Reason]/[Logic Executed]/[Status]/#[Metrics].
3001 EdgeWorkers execution Returns EdgeWorkers execution information if enabled, including the stage of execution, the EdgeWorker ID, process, total, and total stage time in milliseconds, used memory (in kilobytes), ghost flow, error code, HTTP status change when the response is generated using the API, CPU flits consumed during processing, tier ID for the request, indirect CPU time (in milliseconds) and ghost error code.
Media
2080 CMCD Returns a Common Media Client Data (CMCD) payload with detailed data on media traffic. This field is available only for the Adaptive Media Delivery product. For details, see Common media client data.
2081 Delivery type Limits logged data to a specific media delivery type, such as live or video on demand.
2082 Delivery format Returns 1 if media encryption is enabled for the content delivered from the edge to the client.
2083 Media encryption Returns 1 if an edge server prefetched the content delivered from the edge to the client.
Content protection
3011 Content protection information Returns Enhanced Proxy Detection (EPD) data, including the GeoGuard category and the action EPD performed on the request.
Midgress traffic
2084 Prefetch midgress hits The midgress traffic within the Akamai network, such as between two edge servers. To use this, enable the collect_midgress_traffic option in the [DataStream behavior](ga-datastream) for your property in Property Manager. As a result, the second slot in the log line returns processing information about a request.
  • 0, if the request was processed between the client device and edge server (CLIENT_REQ), and isn't logged as midgress traffic.
  • 1, if the request was processed by an edge server within the region (PEER_REQ), and is logged as midgress traffic.
  • 2, if the request was processed by a parent Akamai edge server in the parent-child hierarchy (CHILD_REQ), and is logged as midgress traffic‚Äč.
Custom fields
1082 Custom field The data specified in the custom log field of the log requests details that you want to receive in the stream. For details, see Custom log field.

Construct the resource

Add your connector block and data stream fields and property lists to the remaining required arguments to create a data stream.

ArgumentDescription
Required
activeWhether your data stream is activated along with creation.

Important: Because the data stream creation process can take a bit, set the value to true as it removes a second round of processing for activation.
delivery_configurationA set that provides configuration information for the logs.
  • field_delimiter. Sets a space as a delimiter to separate data set fields in log lines. Value is SPACE. If used, you must also use the format argument set to STRUCTURED.
  • format. Required. The format in which you want to receive log files, STRUCTURED or JSON. If you've used a delimiter, the format must be STRUCTURED.
  • frequency. Required. A set that includes interval_in_secs. The time in seconds after which the system bundles log lines into a file and sends the file to a destination. Possible values are 30 and 60.
  • upload_file_prefix. The log file prefix to send to a destination. Maximum characters, 200. If unspecified, it defaults to ak.
  • upload_file_suffix. The log file suffix to send to a destination. Maximum characters, 10. If unspecified, it defaults to ds.
contract_idYour contract's ID.
dataset_fieldsAn set of IDs for the data set fields within the product for which you want to receive logs. The order of the IDs defines their order in the log lines. For values, use the dataset_fields data source to get the available fields for your product.
group_idYour group's ID
propertiesA list of properties the data stream monitors. Data can only be logged on active properties.
stream_nameThe name of or for your stream.
<connector>_connectorDestination details for the data stream. Replace <connector> with the respective type listed in the connector table.
Optional
notification_emailsA list of email addresses to which the data stream's activation and deactivation status are sent.
collect_midgressBoolean that sets the collection of midgress data.
resource "akamai_datastream" "my_datastream" {
  active = true
  delivery_configuration {
    field_delimiter = "SPACE"
    format          = "STRUCTURED"
    frequency {
      interval_in_secs = 30
    }
    upload_file_prefix = "prefix"
    upload_file_suffix = "suffix"
  }
  contract_id = "C-0N7RAC7"
  dataset_fields = [
    1000, 1002, 1102
  ]
  group_id = 12345
  properties = [
    12345, 98765
  ]
  stream_name = "Datastream_Example1"
  gcs_connector {
    bucket               = "my_bucket"
    display_name         = "my_connector_name"
    path                 = "akamai/logs"
    private_key          = "-----BEGIN PRIVATE KEY-----\nprivate_key\n-----END PRIVATE KEY-----\n"
    project_id           = "my_project_id"
    service_account_name = "my_service_account_name"
  }
  notification_emails = [
    "example1@example.com",
    "example2@example.com",
  ]
  collect_midgress = true
}

There is no default standard output as the attribute values are sensitive, but you can get your data stream's ID from the last line of the process log or by using the data stream data source.

akamai_datastream.my_datastream: Creation complete after 1h20m16s [id=12345]

Use the ID to connect your data stream to your property in the data stream rule.

Add rules and behaviors

Add the DataStream rule and behavior and any additional behaviors required by a data set to your properties' default rule. For options configuration, see the datastream behavior.

  • For more than one property, loop through each property to add the rule.
  • If you use includes in your rule tree, activate them before you activate your property.
{
    "name": "Datastream",
    "children": [],
    "behaviors": [
        {
            "name": "datastream",
            "options": {}
        }
    ],
    "criteria": [],
    "criteriaMustSatisfy": ""
}

Activate properties

Activate your properties on a network to start collecting data with your data stream.

The required arguments for a property activation are property_id, contact, and version. If you don't specify a network, the default action targets staging.

// Change the network value to production for the production network
resource "akamai_property_activation" "my_activation" {
     property_id                    = "prp_12345"
     network                        = "staging"
     contact                        = ["jsmith@example.com"]
     note                           = "Sample activation"
     version                        = "1"
     auto_acknowledge_rule_warnings = true
     timeouts {
       default = "1h"
     }
}