Alternatives to self-service data migration
Before we go any further, there's a question we should probably address: are there any alternatives to the self-service data migration?
Well, sort of; after all, there are alternatives to practically anything. That said, however, the question that should really be asked, and answered, is this: are any of these alternatives truly viable alternatives?
For example, you might decide not to migrate your legacy data after all; that’s definitely simpler than migrating your data. If you go that route, however, you’ll lose all your legacy data (or, at the least, you won’t be able to access that data using Identity Cloud applications). In addition, your users will all need to recreate their accounts and repopulate their user profiles. That might be an acceptable to you but, to be honest, it’s probably not acceptable to your users.
Another option is to have Akamai do the data migration for you: contact your Akamai representative for details regarding a data migration executed by Akamai on your behalf. Taking this approach will definitely reduce your workload: for example, you won’t have to make sure that your computer meets all the requirements for running the data migration script, and you won’t have to actually run that script.
Despite that, however, your workload will not be not eliminated. Although Akamai will create the needed data transformations, perform as many dry runs as needed, and then carry out the migration, you’ll still be responsible for such things as mapping the legacy data to your Identity Cloud user profile, creating the data migration datafile (the import CSV file), and verifying that any practice runs complete successfully. That entails at least some effort on your part. In fact, unless you have a really complicated migration scenario, you might find that having Akamai do the data migration results in a lengthier and more complex process than if you did that migration yourself.
Why? Largely, because, by doing it yourself, you eliminate the need for constant back-and-forth communication between yourself and Akamai support personnel. And, of course, you can do things on your schedule, not on someone else’s schedule.
A final option is to write your own script for doing data migration. That’s not impossible: you just need a script that calls <>'s [entity.bulkCreate API endpoint](https://techdocs.akamai.com/identity-cloud-entity/reference/post-entity-create). There’s nothing wrong with that, but <>'s dataload.py script already uses the entity.bulkCreate API endpoint, and has proven itself to be both tried and true. Because of that, there’s typically very little reason to reinvent the data migration wheel.
Completion time estimates
Akamai supports the import of 10,000 records per minute when using dataload.py, which equates to 600,000 records per hour. These upper limits can be helpful in planning, but your actual run time can vary depending on the complexity of the records being imported. For example, records with a lot of plural data take longer to process. If you find your records-per-minute average is below 10,000, you can try and tweak performance by changing the following arguments:
- -b (BATCH_SIZE)
- -w (WORKERS)
- -r (RATE_LIMIT)
Be careful not to allow too many API calls (-r) per second, as your Akamai Identity Cloud APIs are limited in the number of calls that can be made in one minute, and are limited in the number of concurrent calls at any given time. Note that this measurement includes all traffic to your Akamai Identity Cloud instance, not just API calls from dataload. Because of that, you might want to plan your migrations to coincide with periods of non-peak traffic.
A rough calculation that can be used is this:
BATCH_SIZE x RATE_LIMIT x 60 = Number of records per minute
This calculation assumes that the API response time from entity.bulkCreate is less than WORKERS/RATE_LIMIT. A larger BATCH_SIZE generally means a higher API response time. More attributes per record and the inclusion of complex structures like plurals will also increase API response time.
The best strategy to improve your dataload performance is to start by migrating a small sample of test records that are very similar to the format of your actual records and keeping track of how long the migration process takes. (It’s recommend that you perform test migrations in a non-production environment.) After the first test, adjust the arguments noted above and repeat as necessary.
Updated 9 months ago