← Blog

Wednesday, July 1, 2026

The real CDP decision: where should your customer profile live?

Most Customer Data Platofrms (CDPs) discussions I see start with features: connectors, audience builders, identity resolution, real-time events, and reverse ETL. But the harder question usually sits one level below the feature list: where should the customer profile actually live?

Imagine an AI assistant preparing a next-best-action suggestion for a sales rep. It can see signals from the CRM, billing, product usage, and the warehouse, but those systems may not agree on what the customer is doing or which state the account is really in. Before the assistant can recommend anything useful, the company has to decide which customer profile should be trusted, and where that profile should be built.

This is the basic prerequisite behind many AI use cases in marketing, sales, and customer success: a reliable view of the customer.

Whether the use case is an assistant that suggests the next best action for a sales rep, a workflow that prioritizes accounts at risk, or a campaign that adapts its message to product usage, they all need the same basic thing. They need to know who the customer is, what they did, which systems know something about them, and which version of that information should be trusted.

That sounds obvious until you look at how customer data usually lives inside a company. The CRM may hold contact and account details, billing may know the plan and payment history, the product database may have users and events, while support, email, and analytics tools each keep their own useful but partial view. Somewhere there may also be a data warehouse, where some or all of these signals are copied, modeled, and analyzed.

At that point, “the customer” is not a single record. It is closer to a distributed system, and this is the kind of problem Customer Data Platforms were created to solve.

The real CDP decision is not only which tool has the best connectors or audience builder. It is where the customer profile should live.

What a CDP is really doing

A Customer Data Platform, or CDP, is a system that collects customer data from different sources, connects records and identifiers that belong to the same person or account, builds customer profiles and segments, and makes those profiles available to other tools.

In practice, most CDPs deal with four broad jobs: collecting data, resolving identities, building profiles or audiences, and activating those audiences in downstream systems such as CRMs, email tools, ad platforms, support tools, or product engagement platforms.

For example, a CDP may receive an anonymous web event, later connect it to a logged-in user, enrich that user with CRM and billing data, place the user into a segment, and sync that segment to a marketing automation tool.

The value is easy to understand. Instead of every tool keeping its own partial view of the customer, the company gets a more coherent customer profile that can be used across teams.

The important question is not whether this is useful. It usually is. The more important question is where this profile should live.

The traditional model: the CDP as the center

Traditional CDPs solve the problem by creating a specialized platform with its own storage, data model, identity resolution logic, segmentation engine, and activation connectors.

The architecture is roughly this:

Sources → CDP database → Customer profiles / segments → Destinations

This model made a lot of sense, especially when many companies did not have a mature data warehouse or a data team able to build and maintain this infrastructure internally. A traditional CDP gave marketing and growth teams a ready-made environment to collect events, unify identities, build audiences, and push data to other tools without waiting months for a custom data platform.

This is worth saying clearly: traditional CDPs solved a real problem. They made customer data operational, not just analytical. They gave non-technical teams a way to work with data that would otherwise have stayed trapped in databases, logs, exports, or tickets in the data team backlog.

The trade-off is architectural.

A traditional CDP often becomes another place where customer data lives. Data is copied from source systems into the CDP. The CDP builds its own representation of the customer. That representation may then coexist with another one in the data warehouse, another one in the CRM, and several partial versions in downstream tools.

That can work, but it introduces familiar problems: duplication, synchronization logic, possible inconsistencies, vendor lock-in, and governance that is split across multiple systems. If the CDP says a customer is “active” and the warehouse says they are “at risk,” which one is correct? If an AI workflow uses a customer profile, which version should it use?

These are not just philosophical questions. They become practical as soon as customer data starts driving automated decisions.

Why warehouse-first changed the conversation

Warehouse-first comes from a different context.

In many companies today, the data warehouse is no longer just a place for dashboards and monthly reports. It is where product events, CRM data, billing data, support data, marketing data, and finance data are combined. Whether the stack is built on Snowflake, BigQuery, Redshift, Databricks, or something else, the warehouse is often the closest thing the company has to a governed data foundation.

At that point, the CDP conversation changes. If the warehouse is already where the company combines and governs its data, it is reasonable to ask why the canonical customer profile should live somewhere else.

A warehouse-first CDP starts from this premise. Instead of treating the CDP vendor’s database as the center, it treats the customer’s warehouse as the center.

The architecture becomes closer to this:

Sources → Customer warehouse → Customer profiles / segments → Destinations

The CDP still has a job to do. It may handle ingestion, identity resolution, profile building, audience management, reverse ETL, or user-facing workflows. But the key difference is that the profile is built in the customer's data environment, not inside a proprietary external silo.

For developers and data teams, this is the part that matters. The warehouse is queryable. It can be inspected. Models can be versioned. Lineage can be understood. Access control can follow existing policies. Profiles and segments can be reused by analytics, product, finance, customer success, and AI workflows, not only by marketing tools.

Warehouse-first is not always better, but it aligns the CDP with the place where many companies already manage their most important data assets.

Warehouse-first is not one architecture

Part of the confusion is that “warehouse-first CDP” is not a single architecture. It is a family of approaches.

Some tools are mostly read-in-place. They connect to tables that already exist in the warehouse, let users define audiences or mappings, and materialize only the outputs needed for activation.

Some tools are closer to reverse ETL platforms. They assume the warehouse already contains clean customer tables, traits, and segments, and focus on syncing those records to SaaS applications like Salesforce, HubSpot, Braze, Customer.io, Intercom, or advertising platforms.

Other systems take a more managed approach. They collect events, import identities and attributes from SaaS tools, and normalize everything into a canonical CDP schema inside the customer’s warehouse. On top of that schema, they perform identity resolution, build unified profiles, and activate those profiles back to downstream systems.

That last model matters because it sits between a pure “bring your own modeled tables” approach and a traditional CDP. It does create normalized tables, and in some cases that means duplicating data that was already present in the warehouse. But the copy lives in the customer’s warehouse, follows a schema the customer can inspect and control, and becomes part of the company’s own data architecture rather than a black box owned by the CDP vendor.

That is a small architectural detail with a large practical impact.

Duplication is not the real issue

Warehouse-first CDPs are often described as a way to avoid data duplication. That is directionally true compared with a traditional CDP that stores another customer database outside the warehouse, but it can also be misleading.

Some duplication is unavoidable. If customer data lives in SaaS tools, and you want to combine it with product usage, billing, support, and marketing data, you usually need to copy it into a common environment. If data already exists in the warehouse, it may still be useful to normalize it into a common schema for identity resolution, profile building, and activation.

So the real distinction is not “duplication or no duplication.” The better distinction is: who controls the duplication, where does it happen, and what purpose does it serve?

In a traditional CDP, the normalized customer profile often lives in the vendor's platform. In a warehouse-first model, any necessary materialization happens in the customer's warehouse. That does not eliminate every copy, but it avoids creating a separate proprietary customer data silo.

For a technical audience, this is the more honest way to frame it. Warehouse-first is not magic. It is an architectural choice. It says that if customer profiles, identities, and segments are strategic data assets, they should be built where the company can query them, govern them, reuse them, and move away from a vendor without losing the foundation of its customer data work.

AI raises the stakes

AI makes this discussion harder to ignore because it puts more pressure on the quality and consistency of customer context.

If AI systems are going to summarize accounts, suggest actions, personalize messages, classify users, detect churn risk, or update CRM records, they need customer context. That context cannot be a random mix of stale attributes, conflicting identifiers, and partial records from disconnected tools.

A traditional CDP can support AI use cases too. That is not the issue. But the warehouse-first model is often a more natural fit when AI needs cross-functional context: product usage, revenue, support, lifecycle, consent, account hierarchy, and operational history. Those signals often already exist, or should exist, in the warehouse.

In that sense, AI does not replace the need for a CDP. It raises the bar for what the underlying customer data layer needs to provide.

A CDP is an architecture decision

The comparison between traditional CDPs and warehouse-first CDPs is sometimes framed as a feature comparison: connectors, audience builder, real-time events, identity resolution, reverse ETL, and so on. Those features matter, but they are not the decision that shapes the architecture.

The core issue is where the customer profile should live. A traditional CDP builds it inside the CDP platform and activates it from there; a warehouse-first CDP builds it in the customer’s warehouse and uses the CDP layer to make it operational.

Both approaches can be valid, depending on the company’s maturity, team structure, use cases, and urgency. A marketing team without a mature data warehouse may get value from a traditional CDP faster. A company with a strong warehouse, a data team, and broader use cases across product, sales, customer success, and AI may prefer to keep the customer profile closer to its existing data foundation.

That trade-off should be explicit from the start. A CDP is not only a tool for sending audiences to other tools. It is a decision about the architecture of customer data.

Once customer data starts feeding automation and AI, that decision is no longer just an implementation detail.