Pipelines: Import users from files

This type of pipeline imports users from a file into the workspace's data warehouse. It is available only for source file storage connections.

Endpoints

Create pipeline

Create a source pipeline that imports users from a file.

Request

name
string Required
The pipeline's name.
Must be a non-empty string with a maximum of 60 characters.
connection
string Required
The ID of the connection from which to read the users. It must be a source file storage.
target
string Required
The entity on which the pipeline operates, which must be "User" in order to create a pipeline that imports users.
Possible values: "User".
enabled
boolean
Indicates if the pipeline is enabled once created. If disabled, the pipeline cannot run, and attempts to run it will fail.
format
string Required
The file format. It corresponds to the code of a file connector.
Possible values: "csv", "excel", "parquet" or "json".
path
string Required
The file path relative to the root path defined in the file storage connection. Refer to the file storage connector documentation for details on the specific format.
Must be a non-empty string with a maximum of 1024 characters.
sheet
string Conditionally Required
The sheet name. It can only be used with the "excel" format, where it is required.

When provided, it must have a length between 1 and 31 characters, not start or end with a single quote ', and cannot contain any of the following characters: *, /, :, ?, [, \, and ].
compression
string
The compression format of the file. It is empty if the file is not compressed.

Note that an Excel file is inherently compressed, so no compression format needs to be specified unless the file has been further compressed.
Possible values: "", "Zip", "Gzip" or "Snappy".
formatSettings
nullable json
The specific settings of the pipeline, which vary based on the file connector specified in the format field.

Please refer to the page that documents the settings for each connector type.
filter
nullable object
The filter applied to the users in the file. If it's not null, only the users that match the filter will be included.

See the filters documentation for more details.
- filter.logical
  string Required Possible values: "and" or "or".
- filter.conditions
  array of object Required
  A filter's condition.
  - property
    string Required
    The name or path of the property. If the property has a json type, it can include a json path.
  - operator
    string Required
    The condition's operator. The allowed values depend on the property's type.
    Possible values: "is", "is not", "is less than", "is less than or equal to", "is greater than", "is greater than or equal to", "is between", "is not between", "contains", "does not contain", "is one of", "is not one of", "starts with", "ends with", "is before", "is on or before", "is after", "is on or after", "is true", "is false", "is empty", "is not empty", "is null", "is not null", "exists" or "does not exist".
  - values
    array of string
    The values the operator applies to, if any. These depend on both the operator and the property's type, including whether they're present and how many there are.
userIDColumn
string Required
The column in the file that uniquely identifies each user in the connection. It serves as the single, unique identifier for each user record, ensuring that each user can be distinctly referenced.

Only columns with types corresponding to the following Krenalis types can be used as an identity: string, int, uuid, and json.
Must be a non-empty string with a maximum of 1024 characters.
updatedAtColumn
string
The column that stores the date when a user record was last updated. It tracks the most recent modification made to the user's data, helping to identify when changes occurred.

The value of this column is used for incremental imports, where only records that have been modified since the last import need to be processed.

Only columns with types corresponding to the following Krenalis types can be used as the update time: string, datetime, date, and json.
It cannot be longer than 1024 characters.
updatedAtFormat
string Conditionally Required
The format of the value in the update time column. It can be set to "ISO8601" if the column value follows the ISO 8601 format. If format is "excel", it can also be set to "Excel". Otherwise, it should follow a format accepted by the Python strftime function.

This field is only required if the updatedAtColumn is provided, is not empty, and has a type string or json.
Must be a non-empty string with a maximum of 64 characters.
incremental
boolean
Determines whether users are imported incrementally:
- true: are imported only users whose update time is equal to or later than the last imported user's change time.
- false: all users are imported again, regardless of their update time. false is the default value.
If set to true, a column for the update time must be specified (i.e., updatedAtColumn is not null).
transformation
object Conditionally Required
The mapping or function responsible for transforming file users into user identities linked to the pipeline. Once the identity resolution process is complete, the user identities associated with all pipelines are merged into unified users.

One of either a mapping or a function must be provided, but not both. The one that is not provided can be either missing or set to null.
- transformation.function
  nullable object Conditionally Required
  The transformation function. A JavaScript or Python function that given a user in the file, returns an identity.
  - transformation.function.source
    string Required
    The source code of the JavaScript or Python function.
    Must be a non-empty string with a maximum of 50000 characters.
  - transformation.function.language
    string Required
    The language of the function.
    Possible values: "JavaScript" or "Python".
  - transformation.function.preserveJSON
    boolean
    Specifies whether JSON values are passed to and returned from the function as strings, keeping their original format without any encoding or decoding.
  - transformation.function.inPaths
    array of string Required
    The paths of the properties that will be passed to the function. At least one path must be present.
  - transformation.function.outPaths
    array of string Required
    The paths of the properties that may be returned by the function. At least one path must be present.
inSchema
schema Required
The schema for the properties used in the filter, the identity column, the update time column, and the input properties for the transformation.

When importing users from files, this should be a subset of the file schema.
outSchema
schema Required
The schema for the output properties of the transformation.

When importing users from files, this should be a subset of the profile schema.

Response

id
string
The ID of the pipeline.

POST /v1/pipelines

curl https://example.com/v1/pipelines \
  -H "Authorization: Bearer api_xxxxxxx" \
  --json '{
    "name": "Newsletter Subscribers",
    "connection": "2Jm9Zq4Lx8Nc",
    "target": "User",
    "enabled": true,
    "format": "excel",
    "path": "subscribers.xlsx",
    "sheet": "Sheet1",
    "formatSettings": {
      "HasColumnNames": true
    },
    "filter": {
      "logical": "and",
      "conditions": [
        {
          "property": "country",
          "operator": "is",
          "values": [
            "US"
          ]
        }
      ]
    },
    "userIDColumn": "email",
    "updatedAtColumn": "updated_at",
    "updatedAtFormat": "ISO8601",
    "incremental": true,
    "transformation": {
      "function": {
        "source": "def transform(user: dict) -> dict:\n\treturn {}\n",
        "language": "Python",
        "preserveJSON": false,
        "inPaths": [
          "email",
          "firstName",
          "lastName"
        ],
        "outPaths": [
          "email",
          "first_name",
          "last_name"
        ]
      }
    },
    "inSchema": {
      "kind": "object",
      "properties": [
        {
          "name": "email",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "firstName",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "lastName",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "country",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "updated_at",
          "type": {
            "kind": "string",
            "maxLength": 60
          }
        }
      ]
    },
    "outSchema": {
      "kind": "object",
      "properties": [
        {
          "name": "first_name",
          "type": {
            "kind": "string",
            "maxLength": 100
          },
          "readOptional": true,
          "description": "First name"
        },
        {
          "name": "last_name",
          "type": {
            "kind": "string",
            "maxLength": 100
          },
          "readOptional": true,
          "description": "Last name"
        },
        {
          "name": "email",
          "type": {
            "kind": "string",
            "maxLength": 254
          },
          "readOptional": true,
          "description": "Email"
        }
      ]
    }
  }'

Response

{
  "id": "7XcP4mZ9aQ2r"
}

Errors

404

workspace does not exist

422

connection does not exist

422

format does not exist

422

format settings are not valid

422

transformation language is not supported

Update pipeline

Update a source pipeline that imports users from a file.

Request

:id
string Required
The ID of the source file pipeline.
name
string Required
The pipeline's name.
Must be a non-empty string with a maximum of 60 characters.
enabled
boolean
Indicates if the pipeline is enabled. Set to false to disable the pipeline. A disabled pipeline cannot run, and attempts to run it will fail.

Use the Set status endpoint to change only the pipeline's status.
format
string Required
The file format. It corresponds to the code of a file connector.
Possible values: "csv", "excel", "parquet" or "json".
path
string Required
The file path relative to the root path defined in the file storage connection. Refer to the file storage connector documentation for details on the specific format.
Must be a non-empty string with a maximum of 1024 characters.
sheet
string Conditionally Required
The sheet name. It can only be used with the "excel" format, where it is required.

When provided, it must have a length between 1 and 31 characters, not start or end with a single quote ', and cannot contain any of the following characters: *, /, :, ?, [, \, and ].
compression
string
The compression format of the file. It is empty if the file is not compressed.

Note that an Excel file is inherently compressed, so no compression format needs to be specified unless the file has been further compressed.
Possible values: "", "Zip", "Gzip" or "Snappy".
formatSettings
nullable json
The specific settings of the pipeline, which vary based on the file connector specified in the format field.

Please refer to the page that documents the settings for each connector type.
filter
nullable object
The filter applied to the users in the file. If it's not null, only the users that match the filter will be included.

See the filters documentation for more details.
- filter.logical
  string Required Possible values: "and" or "or".
- filter.conditions
  array of object Required
  A filter's condition.
  - property
    string Required
    The name or path of the property. If the property has a json type, it can include a json path.
  - operator
    string Required
    The condition's operator. The allowed values depend on the property's type.
    Possible values: "is", "is not", "is less than", "is less than or equal to", "is greater than", "is greater than or equal to", "is between", "is not between", "contains", "does not contain", "is one of", "is not one of", "starts with", "ends with", "is before", "is on or before", "is after", "is on or after", "is true", "is false", "is empty", "is not empty", "is null", "is not null", "exists" or "does not exist".
  - values
    array of string
    The values the operator applies to, if any. These depend on both the operator and the property's type, including whether they're present and how many there are.
userIDColumn
string Required
The column in the file that uniquely identifies each user in the connection. It serves as the single, unique identifier for each user record, ensuring that each user can be distinctly referenced.

Only columns with types corresponding to the following Krenalis types can be used as an identity: string, int, uuid, and json.
Must be a non-empty string with a maximum of 1024 characters.
updatedAtColumn
string
The column that stores the date when a user record was last updated. It tracks the most recent modification made to the user's data, helping to identify when changes occurred.

The value of this column is used for incremental imports, where only records that have been modified since the last import need to be processed.

Only columns with types corresponding to the following Krenalis types can be used as the update time: string, datetime, date, and json.
It cannot be longer than 1024 characters.
updatedAtFormat
string Conditionally Required
The format of the value in the update time column. It can be set to "ISO8601" if the column value follows the ISO 8601 format. If format is "excel", it can also be set to "Excel". Otherwise, it should follow a format accepted by the Python strftime function.

This field is only required if the updatedAtColumn is provided, is not empty, and has a type string or json.
Must be a non-empty string with a maximum of 64 characters.
incremental
boolean
Determines whether users are imported incrementally:
- true: are imported only users whose update time is equal to or later than the last imported user's change time.
- false: all users are imported again, regardless of their update time. false is the default value.
If set to true, a column for the update time must be specified (i.e., updatedAtColumn is not null).
transformation
object Conditionally Required
The mapping or function responsible for transforming file users into user identities linked to the pipeline. Once the identity resolution process is complete, the user identities associated with all pipelines are merged into unified users.

One of either a mapping or a function must be provided, but not both. The one that is not provided can be either missing or set to null.
- transformation.function
  nullable object Conditionally Required
  The transformation function. A JavaScript or Python function that given a user in the file, returns an identity.
  - transformation.function.source
    string Required
    The source code of the JavaScript or Python function.
    Must be a non-empty string with a maximum of 50000 characters.
  - transformation.function.language
    string Required
    The language of the function.
    Possible values: "JavaScript" or "Python".
  - transformation.function.preserveJSON
    boolean
    Specifies whether JSON values are passed to and returned from the function as strings, keeping their original format without any encoding or decoding.
  - transformation.function.inPaths
    array of string Required
    The paths of the properties that will be passed to the function. At least one path must be present.
  - transformation.function.outPaths
    array of string Required
    The paths of the properties that may be returned by the function. At least one path must be present.
inSchema
schema Required
The schema for the properties used in the filter, the identity column, the update time column, and the input properties for the transformation.

When importing users from files, this should be a subset of the file schema.
outSchema
schema Required
The schema for the output properties of the transformation.

When importing users from files, this should be a subset of the profile schema.

Response

No response.

PUT /v1/pipelines/:id

curl -X PUT https://example.com/v1/pipelines/7XcP4mZ9aQ2r \
  -H "Authorization: Bearer api_xxxxxxx" \
  --json '{
    "name": "Newsletter Subscribers",
    "enabled": true,
    "format": "excel",
    "path": "subscribers.xlsx",
    "sheet": "Sheet1",
    "formatSettings": {
      "HasColumnNames": true
    },
    "filter": {
      "logical": "and",
      "conditions": [
        {
          "property": "country",
          "operator": "is",
          "values": [
            "US"
          ]
        }
      ]
    },
    "userIDColumn": "email",
    "updatedAtColumn": "updated_at",
    "updatedAtFormat": "ISO8601",
    "incremental": true,
    "transformation": {
      "function": {
        "source": "def transform(user: dict) -> dict:\n\treturn {}\n",
        "language": "Python",
        "preserveJSON": false,
        "inPaths": [
          "email",
          "firstName",
          "lastName"
        ],
        "outPaths": [
          "email",
          "first_name",
          "last_name"
        ]
      }
    },
    "inSchema": {
      "kind": "object",
      "properties": [
        {
          "name": "email",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "firstName",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "lastName",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "country",
          "type": {
            "kind": "string"
          }
        },
        {
          "name": "updated_at",
          "type": {
            "kind": "string",
            "maxLength": 60
          }
        }
      ]
    },
    "outSchema": {
      "kind": "object",
      "properties": [
        {
          "name": "first_name",
          "type": {
            "kind": "string",
            "maxLength": 100
          },
          "readOptional": true,
          "description": "First name"
        },
        {
          "name": "last_name",
          "type": {
            "kind": "string",
            "maxLength": 100
          },
          "readOptional": true,
          "description": "Last name"
        },
        {
          "name": "email",
          "type": {
            "kind": "string",
            "maxLength": 254
          },
          "readOptional": true,
          "description": "Email"
        }
      ]
    }
  }'

Errors

404

workspace does not exist

404

pipeline does not exist

422

format does not exist

422

format settings are not valid

422

transformation language is not supported

Get pipeline

Get a source pipeline that imports users from a file.

Request

:id
string Required
The ID of the source file pipeline.

Response

id
string
The ID of the source pipeline.
name
string
The pipeline's name.
It is not longer than 60 characters.
connector
string
The code of the connection's connector.
connectorType
string
The type of the connection's connector. It is always "FileStorage" when the pipeline imports users from a file.
Possible values: "Application", "Database", "FileStorage", "SDK", "MessageBroker" or "Webhook".
connection
string
The ID of the connection from which the file is read. It is a source file storage.
connectionRole
string
The role of the pipeline's connection. It is always "Source" when the pipeline imports users from a file.
Possible values: "Source" or "Destination".
target
string
The entity on which the pipeline operates. It is always "User" when the pipeline imports users from a file.
Possible values: "User" or "Event".
enabled
boolean
Indicates if the pipeline is enabled.
format
string
The file format. It corresponds to the code of a file connector.
Possible values: "csv", "excel", "parquet" or "json".
path
string
The file path relative to the root path defined in the file storage connection. Refer to the file storage connector documentation for details on the specific format.
It is not longer than 1024 characters.
sheet
nullable string
The name of the sheet. It is empty if the format is not "excel".
compression
string
The compression format of the file. It is empty if the file is not compressed.

Note that an Excel file is inherently compressed, so no compression format needs to be specified unless the file has been further compressed.
Possible values: "", "Zip", "Gzip" or "Snappy".
userIDColumn
string
The column in the file that uniquely identifies each user in the connection.
updatedAtColumn
nullable string
The column that stores the timestamp of the last update to a user record. It is null if no such column exists.
updatedAtFormat
nullable string
The format of the value in the update time column. It is null if no such column exists or if the corresponding Krenalis type is datetime or date.

It is "ISO8601" if the column value follows the ISO 8601 format. It is "Excel" if the format is "excel" and the column value follows the Excel format. Otherwise, it follows the format accepted by the Python strftime function.
incremental
boolean
Indicates whether users are imported incrementally:
- true: are imported only users whose update time is equal to or later than the last imported user's change time.
- false: all users are imported again, regardless of their update time.
transformation
object
The mapping or function responsible for transforming file users into user identities linked to the pipeline. Once the identity resolution process is complete, the user identities associated with all pipelines are merged into unified users.

One of either a mapping or a function is present, but not both. The one that is not present is null.
- transformation.mapping
  nullable object with string values
  The transformation mapping. A key represents a property path in the profile schema, and its corresponding value is an expression. This expression can reference columns of the file.
- transformation.function
  nullable object
  The transformation function. A JavaScript or Python function that given a user in the file, returns an identity.
  - transformation.function.source
    string
    The source code of the JavaScript or Python function.
    It is not longer than 50000 characters.
  - transformation.function.language
    string
    The language of the function.
    Possible values: "JavaScript" or "Python".
  - transformation.function.preserveJSON
    boolean
    Specifies whether JSON values are passed to and returned from the function as strings, keeping their original format without any encoding or decoding.
  - transformation.function.inPaths
    array of string
    The paths of the properties that will be passed to the function. At least one path must be present.
  - transformation.function.outPaths
    array of string
    The paths of the properties that may be returned by the function. At least one path must be present.
inSchema
schema
The schema for the properties used in the filter, the identity column, the update time column, and the input properties for the transformation.
outSchema
schema
The schema for the output properties of the transformation.
running
boolean
Indicates if the pipeline is running.
scheduleStart
nullable int
The start time of the schedule in minutes, counting from 00:00. It specifies the minute when the first scheduled run of the day begins. Subsequent runs occur based on the interval defined by the scheduler period. If the scheduler is disabled, this value is null.
schedulePeriod
nullable string
The schedule period, which determines how often the import runs automatically. If it is null, the scheduler is disabled, and no automatic run will occur.

To change the schedule period, use the Set schedule period endpoint.
Possible values: "5m", "15m", "30m", "1h", "2h", "3h", "6h", "8h", "12h" or "24h".

GET /v1/pipelines/:id

curl https://example.com/v1/pipelines/7XcP4mZ9aQ2r \
  -H "Authorization: Bearer api_xxxxxxx"

Response

{
  "id": "7XcP4mZ9aQ2r",
  "name": "Newsletter Subscribers",
  "connector": "sftp",
  "connectorType": "FileStorage",
  "connection": "2Jm9Zq4Lx8Nc",
  "connectionRole": "Source",
  "target": "User",
  "enabled": true,
  "format": "excel",
  "path": "subscribers.xlsx",
  "sheet": "Sheet1",
  "userIDColumn": "email",
  "updatedAtColumn": "updated_at",
  "updatedAtFormat": "ISO8601",
  "incremental": true,
  "transformation": {
    "function": {
      "source": "const transform = (user) => { ... }",
      "language": "JavaScript",
      "preserveJSON": false,
      "inPaths": [
        "email",
        "firstName",
        "lastName"
      ],
      "outPaths": [
        "email",
        "first_name",
        "last_name"
      ]
    }
  },
  "inSchema": {
    "kind": "object",
    "properties": [
      {
        "name": "email",
        "type": {
          "kind": "string"
        }
      },
      {
        "name": "firstName",
        "type": {
          "kind": "string"
        }
      },
      {
        "name": "lastName",
        "type": {
          "kind": "string"
        }
      },
      {
        "name": "country",
        "type": {
          "kind": "string"
        }
      },
      {
        "name": "updated_at",
        "type": {
          "kind": "string",
          "maxLength": 60
        }
      }
    ]
  },
  "outSchema": {
    "kind": "object",
    "properties": [
      {
        "name": "first_name",
        "type": {
          "kind": "string",
          "maxLength": 100
        },
        "readOptional": true,
        "description": "First name"
      },
      {
        "name": "last_name",
        "type": {
          "kind": "string",
          "maxLength": 100
        },
        "readOptional": true,
        "description": "Last name"
      },
      {
        "name": "email",
        "type": {
          "kind": "string",
          "maxLength": 254
        },
        "readOptional": true,
        "description": "Email"
      }
    ]
  },
  "running": false,
  "scheduleStart": 15,
  "schedulePeriod": "1h"
}

Errors

404

workspace does not exist

404

pipeline does not exist

Import users from files

Create pipeline

Request

name

connection

target

enabled

format

path

sheet

compression

formatSettings

filter

filter.logical

filter.conditions

property

operator

values

userIDColumn

updatedAtColumn

updatedAtFormat

incremental

transformation

transformation.function

transformation.function.source

transformation.function.language

transformation.function.preserveJSON

transformation.function.inPaths

transformation.function.outPaths

inSchema

outSchema

Response

id

Update pipeline

Request

:id

name

enabled

format

path

sheet

compression

formatSettings

filter

filter.logical

filter.conditions

property

operator

values

userIDColumn

updatedAtColumn

updatedAtFormat

incremental

transformation

transformation.function

transformation.function.source

transformation.function.language

transformation.function.preserveJSON

transformation.function.inPaths

transformation.function.outPaths

inSchema

outSchema

Response

Get pipeline

Request

:id

Response

id

name

connector

connectorType

connection

connectionRole

target

enabled

format

path

sheet

compression

userIDColumn