Collect identities
Prepare user data for identity resolution.
Krenalis collects user data from different sources, including:
- User records from applications, databases, and files
- User traits collected from events on websites and mobile apps
User data is collected through pipelines. Each pipeline defines:
- Which users are collected
- How their data is transformed
- How it is mapped to the shared profile schema
Collected users are stored in the workspace data warehouse as identities. If an identity already exists, it is updated. Otherwise, a new identity is created. As a result, each identity represents what is known about a user from a specific source at ingestion time.
Anonymous users and strategies
Users collected through event-based sources, such as websites and mobile apps, are anonymous until they are recognized, for example through login, and may become anonymous again, for example after logout.
As a result, a user typically transitions through three phases:
- Anonymous
- Recognized
- Anonymous again
In Krenalis, you can use strategies to determine whether activity across these phases should remain associated with separate users or be consolidated into a single user. The selected strategy is applied by a Krenalis SDK at the time of the transition between phases.
The strategy you choose affects how sessions are tracked and influences the result of the identity resolution process performed later in Krenalis. Unlike identity resolution, which is fully reversible in Krenalis, a change to the strategy applies only to future events, meaning events generated after the change.
In other words, updating the strategy affects how new events are associated from that point forward, without changing how past events were processed.
Anonymous and user identifiers
An identity can be anonymous or non-anonymous, depending on whether the user can be identified at the time the data is collected. Identification is based on the presence of a User ID, a value that uniquely identifies a user within a source.
Anonymous identities
Anonymous identities represent users who interact with a website or mobile app before a User ID becomes available. They are created only for event-based sources, such as websites and mobile apps. Batch sources, including databases and files, always produce non-anonymous identities.
Each anonymous identity is uniquely identified by an Anonymous ID, which is always present on events and is used to group events that belong to the same person while the user is still unknown. An identity remains anonymous until a User ID becomes available.
Non-anonymous identities
A non-anonymous identity represents a user who is known within a source.
Each non-anonymous identity includes a User ID that uniquely identifies the user in that source. The User ID varies depending on the source type:
- Applications: the User ID is the identifier used by the application to uniquely identify a user.
- Databases: the User ID is read from the column configured in the pipeline to represent the identity.
- Files: the User ID is read from the column configured in the pipeline to represent the identity.
- Event-based sources (websites and mobile apps): the User ID is provided through events when the user is identified.
As long as a User ID is present, the identity is considered non-anonymous.
Choosing the User ID
The User ID must uniquely identify a user within a source connection. If two different users share the same User ID for the same connection, their identities will overwrite each other and lead to unexpected results.
The User ID should be unique within the connection, stable over time, and consistent across pipelines of the same connection.
-
Applications
Applications already provide a stable unique identifier for each user (for example, the HubSpot ID). You don't need to provide one. -
Databases
Choose a column that uniquely identifies each row returned by your query. Good choices include a primary key and a stable unique business identifier (if it is guaranteed unique). If you have multiple pipelines under the same connection, they must use a User ID that is consistent across all pipelines. In practice, this means they should all refer to the same identifier domain (for example, the same user primary key), so identities from the same connection can be reconciled correctly. -
Files
Choose a column that uniquely identifies each user record in the file. Good choices include an internal user ID and a stable unique key that is guaranteed unique within the file. If you have multiple pipelines under the same connection, they must use a User ID that is consistent across all pipelines, just like database pipelines. -
Event-based sources
For event-based sources, such as websites and mobile apps, choose a property that uniquely identifies a user after they are identified—typically an internal user ID from your application. Avoid identifiers that can change over time (for example an email) unless you are certain they are stable and unique.
From anonymous to non-anonymous
At some point, a user may be recognized—for example, after logging in or signing up. When this happens, a User ID becomes available. The anonymous identity may be converted into a non-anonymous one, or a new non-anonymous identity may be created, depending on the identity strategy configured for the connection.
Key points
- An identity represents a user as seen by a single pipeline.
- Anonymous identities do not have a User ID.
- Non-anonymous identities always have a User ID.
- Anonymous identities are identified by an Anonymous ID.
- Anonymous identities exist only for event-based sources.
- Both User ID and Anonymous ID are core inputs to identity resolution.
Pipeline isolation
Identities created by a pipeline are stored separately from those created by other pipelines. This keeps identities isolated by pipeline and preserves a clear link to their origin.
Because identities are not mixed at ingestion time:
- changes to one pipeline never affect the others
- removing a pipeline removes only the identities it created
- profiles can always be rebuilt from the remaining identities
This design provides several benefits:
-
Safe experimentation Pipelines can be added, changed, or removed without risking existing profiles.
-
Clean rollbacks If a pipeline produces incorrect data, it can be removed and profiles rebuilt as if it never existed.
-
Clear data ownership Every identity can be traced back to the pipeline that created it.
-
Predictable results Given the same identities and configuration, identity resolution always produces the same profiles.
-
No hidden side effects Pipelines do not interfere with each other.
Pipeline isolation ensures that identity resolution remains reliable, transparent, and reproducible over time.