Query or import large data sets

User data can be managed using the /entity API:

EndpointDescription
/entityRetrieves a single user record.
/entity.findRetrieves a set of user records, determine by the filter applied.
/entity.createCreates a single user record.
/entity.bulkCreateCreates multiple user records in a single API call.
/entity.updateUpdates only the specified attributes for an existing user record.
/entity.replaceReplaces all attributes for an existing user record; any attributes not specified will be replaced with null values.

Querying large data sets

When using the entity.find endpoint to iterate over large sets of data (> 100,000), queries should be optimized using natural database sorting by sorting on the id attribute. This has two benefits:

  • Records created between when the time iteration begins and when the time iteration ends are included in the results.

  • Efficient and consistent performance querying and loading for each page of results.

The following tips will help you optimize your queries:

  • Use the attributes parameter to limit the number of attributes returned for each record to minimize the size of the HTTP payload.

  • Experiment with the max_results parameter to optimize for responses under 10 seconds.

  • Include the timeout parameter (up to 60 seconds) if, and only if, you are unable to keep responses under 10 seconds using the max_results parameter.

The sample code below (written in Python) shows how to iterate over every record updated since January 1, 2016. Only the iduuid, and email attributesare returned in the result set, and up to 100 records ar returned with each request.

import requests
import json
last\_id = 0
while True:
    response = requests.post(
       'https://YOUR\_APP.janraincapture.com/entity.find',
        headers={
            'Authorization': 'Basic aW1fYV...NfbXk='
       },
        data={
            'type\_name': 'user',
            'max\_results': '100',
            'attributes': '["id", "uuid", "email"]',
            'sort\_on': '["id"]',
            'filter': "id > {} and lastUpdated >= '2016-01-01'".format(last\_id),
        }
    )
    json\_resp = json.loads(response.text)
    if json\_resp['stat'] == 'ok' and json\_resp.get('result\_count', 0) > 0:
        for record in json\_resp['results']:
            # do something with record
            print(record)
            # update last\_id variable with last record in the results
            last\_id = record['id']
    else:
        # stop iterating when there are no more results
        break

Bulk data imports

If you need to import user records from an existing data store into the Identity Cloud platform, the /entity.bulkCreate API endpoint can be used for bulk loading data. The Janrain Data Loader is an example script utilizing this API that you may use to perform your own data migrations.

If you are considering utilizing this script, we recommend that you consult with ​Akamai​ Professional Services on setting appropriate arguments for batch size and rate limit. Always alert ​Akamai​ of the date and time you plan to run any bulk data events by submitting a Traffic Event request through the Support Portal.

Note that the entity.bulkCreate endpoint limits you to a body parameter no larger than 5 MB. If you encounter a client intended to send too large body error, you'll need to reduce the size of the body parameter (for example, by dividing the list of accounts to be created in half, and then making two API calls).


Did this page help you?