Question
How does the de-duplication process occur in Segment warehouses?
Product
Twilio Segment
Environment
Segment Console
Answer
We have a de-duplication layer at the front door that detects duplicate event data before it reaches the warehouse pipeline. In cases where duplicated event data gets through, we have a secondary layer that detects and processes this data before writing into your warehouse. The event de-duplication process occurs differently depending on the warehouse connector you're using:
- Snowflake, Postgres & Databricks: We will discard event data that already exists when loading into your warehouse.
- Redshift: We will overwrite event data that exists in your warehouse.
- BigQuery: We won't event de-duplicate when loading data into your warehouse.
Additional Information
More details on our de-duplication mechanisms and warehouses below: