Retry a failed migration

There might be times during the execution of dataload.py when network or server errors prevent a batch of records from completely loading into the ​Akamai​ Identity Cloud.  Depending on the nature of the error, some records from the batch might have been created while others might not have been created. From the dataload script perspective, these records are in an unknown state.  To assist in troubleshooting this scenario, dataload creates a CSV file named retry.csv and populates it with records when certain types of errors occur during processing.  

The format of retry.csv is exactly the same as the initial import CSV. That means that you can execute dataload.py a second time and specify retry.csv as the data file to be imported. This second migration will attempt to import records contained in retry.csv and produce artifacts (success.csv and fail.csv) that help you understand the true disposition of the records in an unknown state. After executing dataload with retry.csv, entries in success.csv represent unknown state records that were not written during original migration but were successfully processed during the retry migration. Entries in fail.csv that have the error “Attempted to update a duplicate value”, represent unknown state records that were successful in the original migration (the entity.bulkCreate response is telling us that these records do, in fact, exist!).

It’s possible that running dataload with retry.csv will produce yet another retry.csv  You can continue to run dataload with the newly-produced retry.csv files until a retry.csv file is not produced during a migration.  When a retry.csv file is not produced that means that there are no unknown state records.  Examine success.csv and fail.csv to understand the final disposition of the records in your migration.

To help you keep track of all your migration retries, retry.csv files are appended with a datetime. 

By default, dataload interprets the following response codes as “unknown state” errors.  You might want to add or subtract from this list (found in dataload.py) if you encounter additional scenarios that warrant a retry.

  • API response codes: 403, 500, 504, 510
  • HTTP response codes: 403, 500, 501, 502