Lessons from Tracardi 1.0: Towards a Scalable, Flexible and Insightful Data Platform

The first version of Tracardi provided invaluable insights into the demands of real-world data processing at scale. While the system functioned successfully in a number of use cases, production feedback and technical limitations exposed critical lessons and areas for improvement. This article outlines the key observations, necessary architectural changes, and strategic shifts that will shape the future evolution of Tracardi.
1. Performance Bottlenecks and the Need for Scalability
Tracardi 1.0 could handle around 500-700 requests per second. However, modern data systems, especially those used in real-time marketing automation and analytics, must process 10,000 to 50,000 requests per second. The current linear architecture simply does not scale to this level.
Solution: Redesign the platform to support parallel processing and adopt better queue-based ingestion mechanisms (e.g. Apache Pulsar, Kafka). This allows for concurrent data handling and ensures that spikes in load do not cripple the system.
Example: A client sending thousands of product view events every second expects those events to be stored instantly. With a message queue buffer and consumer workers, we can decouple ingestion from processing.
2. Enforcing Immutable Event Schemas
Clients frequently attempted to change the structure of events they had already defined and were sending. This broke profile logic, automation, and analytics processes downstream.
Lesson: Schema mutability must be strictly controlled.
Solution: Enforce event structure immutability through schema validation. Once an event type is defined, future events with mismatching structure should be rejected with a clear error message.
However, to support evolving business needs while maintaining schema consistency, other techniques should be incorporated. These include property aliasing and event reshaping, which allow for schema evolution without breaking downstream processes.
Example: Suppose the original event structure for "checkout" includes a property cart_total
. Later, the sending system changes this to total_price
. Instead of redefining the entire event schema, the system should map total_price
as an alias of cart_total
, ensuring that the internal schema remains unchanged. This enables backward compatibility while supporting newer clients.
Additionally, reshaping tools can normalize data—e.g., nesting or flattening structures—before they enter the processing pipeline, so that the schema remains uniform across versions.
Example: If a "product-view" event originally has product_id
, price
, and user_id
, adding a new nested discount
field without schema migration will be blocked.
3. Separation of Data Collection and Processing
In the original architecture, data collection and processing were tightly coupled, meaning any processing issue could delay or block data ingestion.
Solution: Introduce two distinct APIs:
- Collector API: Responsible for receiving and queuing events.
- Processor API: Asynchronously processes events from the queue.
This separation improves resilience and modularity.
Example: Even if the database is temporarily offline, the Collector API can queue data and return 200 OK responses, while the Processor API retries writes later in time.
4. Historical Replay for Fault Tolerance
In Tracardi 1.0, there was no reliable mechanism to recover the state of a user profile or any other entity if it was corrupted or deleted due to bugs, processing failures, or schema mismatches. Once such an issue occurred, the profile data could be corrupted.
Solution: Introduce a system-wide mechanism for storing all incoming raw events in a durable, append-only medium. These raw events act as the source of truth.
In parallel, implement a entity rebuild engine (state reconciliation) that can replay the entire event history for a given entity. This replay mechanism should consume events in chronological order and reapply all processing logic to rebuild the entity's current state exactly as it would have been.
Example: Imagine a bug in a processing plugin deletes all properties from a user's profile. Instead of losing that data permanently, the system can identify the profile ID, fetch all historical events associated with it from the event store, and replay them through the same enrichment and transformation pipeline. As a result, the user profile is reconstructed with full fidelity, preserving all original behavior and insights.
5. Tracardi as a General Fact Processing Platform
Tracardi began as a Customer Data Platform (CDP), primarily designed to collect and process user-related events such as logins, product views, or cart updates. However, production feedback revealed that clients needed to process far more than customer actions. They were increasingly interested in capturing operational data points like shipment updates, inventory changes, and even backend application logs or error events.
Lesson: The scope of Tracardi must expand beyond customer-centric use cases. To be truly valuable in modern data ecosystems, it should function as a general-purpose event processing platform capable of modeling and reacting to all business facts—not just user behavior.
Solution: Redesign the internal data model to support a wide range of entities (e.g., orders, products, shipments, warehouses, infrastructure nodes) and not just users. Each event should be linked to one or more entities, and the system should allow for relationship mapping between them to reflect how they interact and affect one another.
Example: A "delivery-status" event might update the shipment
entity’s status to "delivered". That shipment is related to an order
, which is in turn linked to a specific customer
. Through this relational chain, the system can infer derived events (e.g., “order completed”) or trigger follow-up automations (e.g., send a satisfaction survey to the customer).
This relational and multi-entity approach not only broadens Tracardi’s utility but also aligns it with the architecture of modern fact-sourcing platforms, making it a foundational tool for capturing a complete picture of business operations.
6. Unified Event and Entity Model
To support more dynamic analytics, advanced automations, and full-context business tracking, Tracardi must adopt a unified and generalized observation model that treats both events and entities as first-class citizens.
Lesson: Events and entities are inherently interrelated. Events describe actions that occur over time, while entities are the participants or subjects of those actions. To capture both behavior and structure, the system should standardize the representation of both.
Solution: Introduce a normalized observation schema that encapsulates both events and entities using shared metadata fields such as observation_type
(e.g., "event", "entity"), entity_type
, entity_id
, timestamp
, and a flexible payload
. This model enables Tracardi to process and analyze observations uniformly, regardless of whether they represent a one-time action or a persistent object.
Example: An event like "email-sent" may relate to the user
and campaign
entities, whereas a persistent entity like product
may periodically emit status updates (e.g., price changes, stock updates). All of these can be expressed through the same observation interface, making it easier to query, join, and analyze across types.
This structure allows seamless correlation between actions and entities, simplifies downstream analytics, and lays the groundwork for consistent data modeling across the platform.
7. Destinations Should Trigger Workflows
While open-source users often relied on workflows, commercial clients preferred direct destination integrations.
Solution: Destinations should support triggering workflows—either within Tracardi or via external tools (Zapier, Airflow, etc.).
Example: A destination that pushes data to HubSpot should optionally trigger a lead-nurturing workflow.
8. Rethinking Consent Management
The current consent management implementation is too simplistic and burdensome to performance.
Solution: Move consent checking to be part of profile enrichment instead of live processing. Store and version consents in a separate service.
Example: Before sending data to a marketing platform, a batch job checks if the profile has valid consent for the target channel.
9. SQL Integration for External Analytics
Clients want to connect their BI tools to raw event and profile data using SQL.
Solution: Store all normalized events and entities in an SQL-accessible data warehouse (e.g. ClickHouse, StarRocks).
Example: A business analyst writes a SQL query in Metabase to find how many users clicked a link after receiving an email campaign.
10. Segmentation on Any Entity
Segmentation was originally limited to user profiles, but clients want to segment anything—orders, shipments, campaigns.
Solution: Build a general-purpose segmentation engine with filters and rules that apply to all entity types.
Example: Segment all orders over $500 shipped to a specific region in the last 7 days.
11. Retention and Backup Strategy Redesign
Elasticsearch originally handled data retention automatically, but the shift to SQL and streaming changes this.
Solution: Implement a configurable backup and retention engine. Allow streaming cold data to cloud storage (e.g. S3, GCS).
Example: Events older than 90 days are archived to cold storage and removed from the live DB, but can be re-imported if needed.
12. Enabling AI Through Context-Rich Observations
As artificial intelligence (AI) systems become integral to business operations, their effectiveness heavily relies on access to structured, context-rich data. To facilitate this, Tracardi integrates the Model Context Protocol (MCP), an open standard designed to enable seamless connections between AI models and external data sources . This integration allows AI systems to interact directly with Tracardi's structured data, enhancing their ability to answer complex business queries.
Solution: Implement MCP within Tracardi to provide AI models with standardized access to the platform's rich, structured datasets. This ensures that AI systems can retrieve and process data efficiently, leading to more accurate and insightful responses.
Example: Consider a scenario where an AI assistant is asked, "What is our best-selling product, and when does it sell best?" Through MCP, the AI can directly access Tracardi's standardized event data, such as "product-purchased" events linked with product_id
, order_value
, and timestamps. This structured access enables the AI to analyze sales patterns, reason about it if needed, and provide precise answers regarding top-performing products and their peak sales periods.
By integrating MCP, Tracardi becomes a log time memory of your business history.
Conclusion
The first version of Tracardi laid the foundation, but real-world usage uncovered fundamental shifts in how modern systems must be designed. From performance and fault tolerance to flexible data modeling and AI readiness, the next generation of Tracardi is being designed to be more modular, scalable, and intelligent. With these lessons, Tracardi moves beyond customer profiles toward becoming a universal fact-processing platform for all digital businesses.
If you're building a CDP or event processing platform, we hope these lessons help you avoid similar pitfalls—and maybe even spark new ideas.