Query or import large data sets

User data can be managed using theĀ /entity API:

OperationDescription
/entityRetrieves a single user record.
/entity.findRetrieves a set of user records, determine by the filter applied.
/entity.createCreates a single user record.
/entity.bulkCreateCreates multiple user records in a single API call.
/entity.updateUpdates only the specified attributes for an existing user record.
/entity.replaceReplaces all attributes for an existing user record; any attributes not specified will be replaced with null values.

Querying large data sets

When using theĀ entity.findĀ operation to iterate over large sets of data (> 100,000), queries should be optimized using natural database sorting by sorting on theĀ idĀ attribute. This has two benefits:

  • Records created between when the time iteration begins and when the time iteration ends are included in the results.

  • Efficient and consistent performance querying and loading for each page of results.

The following tips will help you optimize your queries:

  • Use theĀ attributesĀ member to limit the number of attributes returned for each record to minimize the size of the HTTP payload.

  • Experiment with theĀ max_resultsĀ member to optimize for responses under 10 seconds.

  • Include theĀ timeoutĀ member (up to 60 seconds) if, and only if, you are unable to keep responses under 10 seconds using theĀ max_resultsĀ member.

The sample code below (written in Python) shows how to iterate over every record updated since January 1, 2016. Only theĀ id,Ā uuid, andĀ emailĀ attributes are returned in the result set, and up to 100 records are returned with each request.

import requests
import json
last\_id = 0
while True:
    response = requests.post(
       'https://YOUR\_APP.janraincapture.com/entity.find',
        headers={
            'Authorization': 'Basic aW1fYV...NfbXk='
       },
        data={
            'type\_name': 'user',
            'max\_results': '100',
            'attributes': '["id", "uuid", "email"]',
            'sort\_on': '["id"]',
            'filter': "id > {} and lastUpdated >= '2016-01-01'".format(last\_id),
        }
    )
    json\_resp = json.loads(response.text)
    if json\_resp['stat'] == 'ok' and json\_resp.get('result\_count', 0) > 0:
        for record in json\_resp['results']:
            # do something with record
            print(record)
            # update last\_id variable with last record in the results
            last\_id = record['id']
    else:
        # stop iterating when there are no more results
        break

Bulk data imports

If you need to import user records from an existing data store into the Identity Cloud platform, the /entity.bulkCreate API operation can be used for bulk loading data. The Janrain Data Loader is an example script utilizing this API that you may use to perform your own data migrations.

If you are considering utilizing this script, we recommend that you consult with ā€‹Akamaiā€‹ Professional Services on setting appropriate arguments for batch size and rate limit. Always alert ā€‹Akamaiā€‹ of the date and time you plan to run any bulk data events by submitting a Traffic Event request through theĀ Support Portal.

Note that the entity.bulkCreate operation limits you to a request body no larger than 5 MB. If you encounter a client intended to send too large body error, you'll need to reduce the size of the request body (for example, by dividing the list of accounts to be created in half, and then making two API calls).