GuideReference
TrainingSupportCommunity
Guide

Protect your webpages from scrapers

Prevent harmful web scrapers from accessing your webpages, while still allowing beneficial bots to remain.

Before you begin

What you'll do

Create customizable rules that define how to monitor and mitigate web scrapers.

Create a content protection rule

Each rule requires a scope and response strategy. The scope defines the part of your site to protect, while the response determines the action to take on web scrapers.

Use the Content Protector API or Content Protector in Control Center to create a rule. After you create the rule, you can start using the resource in Terraform.

If you already know which rule conditions to specify, you can create the resource directly with Terraform. Either provide a file path to the conditions, as shown in the below example, or add it inline as a JSON encoded array. Then run terraform validate to verify your syntax and terraform apply to create the rule.

resource "akamai_botman_content_protection_rule" "my_content_protection_rule" {
    config_id               = 12345
    security_policy_id      = "1234_56789"
    content_protection_rule = file("${path.module}/content-protection-rule.json")
}

Reorder content protection rules

Content Protector evaluates rules from top-to-bottom. The most recently created rule, at the beginning of the list, takes precedence if rules are overlapping or conflicting.

  1. Create an array of rule IDs in the new order, starting with the rule you want evaluated first at the beginning.
resource "akamai_botman_content_protection_rule_sequence" "my_content_protection_rule_sequence" {
  config_id    			          = 12345
  content_protection_rule_ids = ["1234abcd-5678-efgh-910i-jk11l12mn13o", "5678efgh-1234-abcd-24ef-689ghijk1011"]
}
  1. Run terraform validate to verify your syntax and then run terraform apply to save the new sequence.

Create a JavaScript injection rule

Content Protector automatically injects JavaScript into all HTML content you include in your content protection rule scopes. The JavaScript collects data that identifies bots. However, if your rule's scope only covers AJAX, it could miss any HTML that calls AJAX. If that HTML doesn't need to be protected, you don't need to add it to your scope. Instead, inject JavaScript into the HTML that calls your JavaScript content. This improves Content Protector's accuracy and avoids false positives.

Use the Content Protector API or Content Protector in Control Center to create a rule. After you create the rule, you can start using the resource in Terraform.

If you already know which rule conditions to specify, you can create the resource directly with Terraform. Either provide a file path to the conditions, as shown in the below example, or add it inline as a JSON encoded array. Then run terraform validate to verify your syntax and terraform apply to create the rule.

📘

To create the rule conditions file or JSON array, use the Content Protector API or Content Protector in Control Center to create a rule. Then, export it or use the corresponding data source to get the JSON file.

resource "akamai_botman_content_protection_javascript_injection_rule" "my_javascript_injection_rule" {
  config_id                                       = 123456
  security_policy_id                              = "abc1_12345"
  content_protection_javascript_injection_rule_id = "abcd1ef2-345g-6789-h012-34ij5kl6mn7o"
  content_protection_javascript_injection_rule    = file("${path.module}/my_javascript_injection_rule.json")
}