Regular expressions with Cloudlets rules

Regular expression support

Some Cloudlets support the use of regular expressions in match rules. The regular expression can be up to 256 characters.

RE2 regular expression library

Cloudlets use the RE2 regular expressions library, which uses the finite-state machine (FSM) computational model and uses a C++ interface. Check out Syntax documentation on GitHub, or go to a similar site, for information about RE2's syntax.

Cloudlets that support regular expressions

These Cloudlets support regular expressions in match rules:

Cloudlet

Support provided

Audience Segmentation

  • Use regular expressions to match on the fully-qualified URL of an incoming request.
  • Use regular expression capture groups to form the forward path and query string.
  • Edge Redirector

  • Use regular expressions to match on the fully-qualified URL of an incoming request.
  • Use regular expression capture groups to form the redirect URL.
  • Forward Rewrite

  • Use regular expressions to match on the fully-qualified URL of an incoming request.
  • Use regular expression capture groups to form the forward path and query string.
  • Input Validation

  • Use regular expressions to match on the fully-qualified URL of an incoming request.
  • Use regular expressions to match on form field names or values in the incoming request.
  • Valid characters

    The following diagram shows the characters you can use for each part of the URL when creating a regular expression for Cloudlets.

    regexpregexp

    Escaping special characters

    If your regular expression includes any characters that have a special use in regular expressions (like “.”, “+”, or “?”), you must use a backslash (“\”) to escape each special character.

    Capture groups and substitution patterns

    Capture groups allow you to capture incoming information from the source URL, while substitution patterns allow you to refer to those capture groups in the modified URL. Substitution patterns use the back slash character (“\”) followed by a number to refer to the capture groups. For example, \1 is the first capture group, \2 is the second, etc.

    📘

    Input Validation does not use capture groups.

    The following diagram shows how you can set up numbered capture groups on parts of an inbound URL for Edge Redirector.

    capture_groupscapture_groups

    Named capture groups

    The URL Regular Expression match for Cloudlets supports a maximum of nine numbered substitutions using capture groups.

    When using named capture groups, use the following syntax before the regular expression (regex): (?Pregex). As the RE2 library supports named capture groups as numbered substitutions, use \n for the substitution string.

    Remember that the maximum size of a regular expression is 256 characters.

    Here's how a regex for Edge Redirector would look using both capture group methods:

    Capture Group Type

    Syntax

    Numbered Capture Group

  • **Regex**: (http|https)://www.(vanity|vanity1).com/(.*)
  • **Redirect URL:** \1://www.company.com/\2/\3
  • Named Capture Group

  • **Regex:** (?Phttp|https)://www.(?Pvanity|vanity1).com/(?P.*)
  • **Redirect URL:** \1://www.company.com/\2/\3
  • Using the Regex Tester

    You can use the Regex Tester to verify that the regular expression and substitution patterns you enter will produce the desired results.

    You can access the Regex Tester when you select the URL Regular Expression match type on the Create a Rule screen.

    How to

    1. Go to > CDN > Edge logic Cloudlets.

    2. On the Cloudlet Policies screen, select a policy.

    3. On the Policy Details screen, either create a new version or click the version you want to view.

    4. On the rule manager page, click Add Rule.

    5. On the Create a Rule screen,

      1. select URL Regular Expression.

      2. click Show Regex Tester.

    6. Complete the following fields for the Regex Tester:

    Regex Tester Fields

    Action

    Regular Expression

    Enter a regular expression of up to 256 characters to match on the inbound URL, minus the port. For example: (http|https)://www.(vanity|vanity1).com/(.*)

    Redirect URL (Edge Redirector)

    If applicable, enter the substitution pattern to create the modified URL. For example: \1://www.company.com/\2/\3

    Note that the pattern uses backslash, "\", and not dollar sign "$".

    Path and Query String (Forward Rewrite)

    Enter the substitution pattern to create the path and query string that will form the modified URL. To define the substitution pattern, use the capture groups in the Regular Expression field. For example: /\1/\2

    Note that the pattern uses backslash, "\", and not dollar sign "$".

    Test URL

    Enter a URL to test whether the regular expression and any capture groups entered produce the desired URL.

    📘

    In the THEN section of the rule screen, the Redirect URL or Path and Query String field is disabled, but displays values entered in the same field for the Regex Tester.

    1. Click Validate.
    2. In the results section, verify that the values entered produced the desired URLs.

    General considerations when using regular expressions

    Before using the URL Regular Expression match with Cloudlets, consider the following:

    • There is a maximum processing cost per policy. Regular expressions have a very high processing cost, often 100 times more expensive than other match criteria. The actual number of rules processed per policy depends on the complexity of the regular expressions defined. You can exceed the maximum cost for the policy by as few as 50 to 100 regular expressions.

    • Only use regular expressions and capture groups if you need to extract a value and use it in either a redirect or a forward path. They add significant cost.

    • When using regular expressions, you can reduce the cost by constructing your rule to first match based on path or query string before matching on the regex. The path and query string matches both allow wildcards.

    • Don't include the incoming protocol in the regex if the redirect path uses the same protocol. In the regex implementation for Cloudlets, the incoming protocol is included by default.

      For example, if your regex is ^https?://www.test.com/(.*)/ and \1://www.test1.com/\2 is the redirect, you can use www.test.com/(.*) as the regex and www.test1.com/\1 as the redirect instead.

    • As processing errors occur during runtime, the only way currently to determine whether your policy will exceed the maximum processing cost is through thorough testing. If you hit the maximum during testing, try making your regular expressions more efficient, and follow the best practices listed below.

    Best practices when using regular expressions

    If you need to use a regex match, consider following these best practices:

    • Don't use regular expressions to specify alternatives. For example, if you want to match on a limited number of paths, like http://www.example.com/(choice1|choice2|choice3)/, create separate rules for each option instead:

      • http://www.example.com/choice1/
      • http://www.example.com/choice2/
      • http://www.example.com/choice3/

      In this case, while using the regular expression reduces the number of rules you have, it increases the cost significantly. Remember, the cost to evaluate one regular expression is often 100 times more expensive than the corresponding set of rules rewritten without regular expressions.

    • Review the list of rules for the entire policy version and sort based on the following order of precedence:

      • Protocol (HTTP/HTTPS)
      • Hostname
      • Path
      • Query String
    • If you have to use regular expressions in a rule, include a combination of hostname, path, and query string matches whenever possible to reduce the cost. For example:

    Example

    Match Structure

    Substitution Pattern for Redirect

    You want to extract the product ID from a query string parameter, and redirect using the ID as a path parameter.

    1. Query String match: prod_id=*
    2. Regex match: ^https?://host1.example.com/path1(?:.*)[?&]prod_id(?:=([^&]*))?

    https://host2.example.com/products/\1

    You want to capture everything after /path1/* on host1.example.com and re-route to /path2/ on host2.example.com.

    1. Path match: /path1/*
    2. Regex match: ^https?://host1.example.com/path1/?(.*)?

    https://host2.example.com/path2/\1


    Did this page help you?