Managing bulk exports of failed events

📘
Webhooks v3 bulk export of failed events is a new Identity Cloud feature currently in limited availability. Because the feature isn't in general release, that means that both the product, and this documentation, are subject to change at any time.

Like the rest of Webhooks v3, you need to use the Webhooks v3 API to manage the bulk export of failed events service. Those APIs have been now include operations that allow you to:

List your bulk export of failed events jobs. This operation returns information about all the jobs you’ve initiated in the past 7 days. Jobs older than 7 days are automatically deleted by Webhooks v3.
Check the status of a bulk export of failed events job. This operation returns information about a specific job. This operation is typically used to monitor the progress of an on-going job and to retrieve the link for downloading the exported events.
Start a bulk exportof failed events job. This operation exports all the events in the failed events table that haven’t already been exported. (After an event has been exported it’s marked with a timestamp to ensure that it isn’t exported a second time.) Note that your only option is to export all the failed events for a subscription: there’s no way to export, say, only the events that occurred on a specified date.
Cancel a bulk export of failed events job. This operation terminates an export job. When a job is terminated no download file is written to the Amazon Web Services S3 bucket. In addition, no changes are made to any of the events in the failed events table. That means that those events remain as-is and will be exported the next time you start a job.
[Download exported events]page:downloading-a-bulk-export-file). Downloads the specified file and automatically placed the downloaded file in your Downloads folder. Keep in mind that files are only available for 7 days before being automatically deleted.

Also like the rest of the Webhooks v3 API, the new operations require the use of bearer token authentication. See the article Get started for more information.

How do you know if you have failed events that need exporting?

At this point in time there’s no way to directly view the events in the failed events table. So how do you know whether you need to do an export of failed events? Here are a couple of pointers that might help you in that regard:

As noted, this process is designed for disaster recovery: your listener endpoint crashed over a three-day weekend, no one noticed, and you now have tens of thousands of undelivered events. That’s a big deal, which means that it’s likely that you know about it. Has a major disaster like this taken place recently? If the answer to that question is “no,” then there’s a good chance you don’t need to use this feature. Yes, you might have a few failed events (one-offs happen every now and then), but a handful of failed events can easily be retrieved by using the /redelivery endpoint.

If you’re uncertain as to how many failed events you have, you can use the /events endpoint and the state parameter to return information about all the failed events in the event store. If the store contains large number of failed events, and if you haven’t done a job recently, then there’s a good chance that you have a large number of events in the failed events table.

However, keep in mind that, by itself, using the /events operation to return the failed events in the failed events store doesn’t mean those same events are in the failed events table. For example, suppose it’s Monday, a disaster occurs, and you now have 10,000 failed events in the event store. At that point, you’ll have a similar number of failed events in the failed events table.

On Tuesday, however, you export all those failed events from the failed events table. You now have 0 failed events in that table: after an event has been exported it’s timestamped to ensure that it doesn’t get exported again. However, if you query the event store right now the store will still show 10,000 failed events. That’s because an export doesn’t do anything to events in the event store: those 10,000 events remain in the store until 7 days have passed. After that, they’re automatically deleted.

The moral of the story? If you have a lot of failed events in the event store then, before you do anything else, you might want to check the timestamps for those events and then check to see if you’ve done an export since those events were written to the event store.