Data Warehousing using the Katalys API
This page describes how a partner might use the Katalys API to pull their Katalys Partner Account data into their data warehouse. This use case assumes the partner has an account in good standing with Katalys, and has generated an API Key for their organization.
This document is meant as a high-level proof of concept. We cannot offer specific recommendations for your data infrastructure. Please consult your own development team to create a solution that works for your specific case!
Overview
A basic data warehousing architecture would entail 2 regularly-scheduled tasks (or “cron jobs”):
A task that runs frequently, such as every hour, to query and store “realtime” data. This task will process the previous 2 hours of data. (2 hours was selected due to the 2-hour “data settlement window”)
A task that runs infrequently, such as once per week, to update your datastore with any conversion adjustments. This task will process the previous 60 days of data. (60 days was selected due to Katalys standard billing practices, where certain programs might have changes up to 60 days later in extreme cases)
Both tasks will use the Conversion Report API endpoint to download a list of conversions. Because data can settle (read more), the report must replace the previously-stored report for that time period so that your system stays in sync with Katalys. The downloaded report must include the seq and order_time columns. Also include dimensions relevant to your use case, such as payout and conversion_status.
Data Schema
When building your schema, we recommend using the Katalys Conversion’s seq value as your primary key. The Katalys seq or “Sequence” value is an acceptable durable key for de-duplicating rows across reports, and for performing row updates. The seq values are case-sensitive.
When updating your data, we recommend deleting-then-inserting based on timestamp. This is the best way for your system to be in sync with Katalys.
Under most circumstances, once a seq appears it will remain visible in your dataset forever, even as status or value may change. However, there are cases where a seq might “disappear” from your view. These include testing data, which is sometimes purged, and merged orders, where data received indicates that a seq was actually a duplicate order. These cases are infrequent, but can happen. Read more about Data Settlement →
Using Postbacks
As an alternative approach to using the Katalys API, you can use Partner Postbacks to send create, update, and delete events from Katalys into your data warehouse. This approach skips any cron job requirements and may be easier for your development team to operationalize.