Parquet

View as Markdown

What you can do with this integration

The integration for Parquet lets you:

  • Read user data from a Parquet file and unify them as user profiles inside Krenalis.
  • Write unified users back into Parquet and keep the target synchronized over time.

How Parquet columns are imported

This section summarizes how Parquet column types are imported into Krenalis.

Physical types

This table describes how Parquet physical types (without any logical type annotations) are imported into Krenalis.

Parquet Type Imported in Krenalis as
BOOLEAN boolean
INT32 int(32)
INT64 int(64)
INT96 datetime 1
FLOAT float(32)
DOUBLE float(64)
BYTE_ARRAY string
FIXED_LEN_BYTE_ARRAY string

Logical and Converted types

This table describes how Parquet logical and converted types are imported into Krenalis.

Logical (or Converted) type Underlying physical type Imported in Krenalis as
STRING BYTE_ARRAY string
ENUM BYTE_ARRAY string
UUID FIXED_LEN_BYTE_ARRAY uuid
INT(8, true) INT32 int(8)
INT(16, true) INT32 int(16)
INT(32, true) INT32 int(32)
INT(64, true) INT64 int(64)
INT(8, false) INT32 unsigned int(8)
INT(16, false) INT32 unsigned int(16)
INT(32, false) INT32 unsigned int(32)
INT(64, false) INT64 unsigned int(64)
INT_8 INT32 int(8)
INT_16 INT32 int(16)
INT_32 INT32 int(32)
INT_64 INT64 int(64)
UINT_8 INT32 unsigned int(8)
UINT_16 INT32 unsigned int(16)
UINT_32 INT32 unsigned int(32)
UINT_64 INT64 unsigned int(64)
DECIMAL INT32 decimal 2
DECIMAL INT64 decimal 2
DECIMAL FIXED_LEN_BYTE_ARRAY decimal 2
DECIMAL BYTE_ARRAY decimal 2
DECIMAL (converted type) INT32 decimal 2
DECIMAL (converted type) INT64 decimal 2
DECIMAL (converted type) FIXED_LEN_BYTE_ARRAY decimal 2
DECIMAL (converted type) BYTE_ARRAY decimal 2
FLOAT16 - Not supported
DATE INT32 date
TIME (unit MILLIS) INT32 time
TIME (unit MICROS) INT64 time
TIME (unit NANOS) INT64 time
TIME_MILLIS INT32 time
TIME_MICROS INT64 time
TIMESTAMP (unit MILLIS) INT64 datetime
TIMESTAMP (unit MICROS) INT64 datetime
TIMESTAMP (unit NANOS) INT64 datetime
TIMESTAMP_MILLIS - Not supported 3
TIMESTAMP_MICROS - Not supported 3
INTERVAL - Not supported
JSON BYTE_ARRAY json
BSON - Not supported 4
VARIANT - Not supported
GEOMETRY - Not supported
GEOGRAPHY - Not supported
LIST - Not supported 5
MAP - Not supported 6
UNKNOWN - Not supported

Column groups

Import of columns groups is currently not supported.

How Krenalis types are exported to Parquet

The following table shows how the user property types in Krenalis are mapped to the column types in the exported Parquet file:

Type of user property in Krenalis Physical Type of exported Parquet column Logical Type of exported Parquet column
boolean BOOLEAN (none)
int(8) INT32 INT(8, true)
int(16) INT32 INT(16, true)
int(24) INT32 (none)
int(32) INT32 (none)
int(64) INT36 (none)
unsigned int(8) INT32 INT(8, false)
unsigned int(16) INT32 INT(16, false)
unsigned int(24) INT32 INT(32, false)
unsigned int(32) INT64 INT(32, false)
unsigned int(64) INT64 INT(64, false)
float(32) FLOAT (none)
float(64) DOUBLE (none)
decimal(p, s) with p ≤ 9 INT32 DECIMAL(p, s)
decimal(p, s) with 10 ≤ p ≤ 18 INT64 DECIMAL(p, s)
decimal(p, s) with p ≥ 19 BYTE_ARRAY DECIMAL(p, s)
datetime INT64 TIMESTAMP(isAdjustedToUTC=true, unit=NANOS)
date INT32 DATE
time INT64 TIME(isAdjustedToUTC=true, unit=MICROS) 7
year INT32 (none)
uuid FIXED_LEN_BYTE_ARRAY with length 16 UUID
json BYTE_ARRAY JSON
ip BYTE_ARRAY STRING
string BYTE_ARRAY STRING
array BYTE_ARRAY 8 JSON 8
object (column groups) -
map BYTE_ARRAY 6 JSON 6

  1. INT96 types are always treated as datetime Krenalis types, because that is in fact how they are used in the Parquet files. However, please note that this type of representation is deprecated, and is kept in the integration for Parquet only for compatibility with older Parquet files. ↩︎

  2. DECIMAL types from Parquet are supported if the precision is ≤ 76 and the scale is ≤ 37. ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  3. Support for importing TIMESTAMP_MILLIS and TIMESTAMP_MICROS is discussed here: https://github.com/krenalis/krenalis/issues/1385 ↩︎ ↩︎

  4. Support for the BSON type is discussed in: https://github.com/krenalis/krenalis/issues/1400↩︎

  5. Support for importing LIST columns is discussed here: https://github.com/krenalis/krenalis/issues/1325 ↩︎

  6. Support for importing MAP columns is discussed here: https://github.com/krenalis/krenalis/issues/1371 ↩︎ ↩︎ ↩︎

  7. The microseconds precision is used for time values instead of nanoseconds. See https://github.com/krenalis/krenalis/issues/1392↩︎

  8. Support for array properties is discussed here: https://github.com/krenalis/krenalis/issues/1325 ↩︎ ↩︎