# Parquet ## What you can do with this integration The integration for Parquet lets you: * **Read user data from a Parquet file** and unify them as user profiles inside Krenalis. * **Write unified users back into Parquet** and keep the target synchronized over time. ## How Parquet columns are imported This section summarizes how Parquet column types are imported into Krenalis. ### Physical types This table describes how Parquet physical types (without any logical type annotations) are imported into Krenalis. | Parquet Type | Imported in Krenalis as | |------------------------|-------------------------| | `BOOLEAN` | `boolean` | | `INT32` | `int(32)` | | `INT64` | `int(64)` | | `INT96` | `datetime` [^int96] | | `FLOAT` | `float(32)` | | `DOUBLE` | `float(64)` | | `BYTE_ARRAY` | `string` | | `FIXED_LEN_BYTE_ARRAY` | `string` | ### Logical and Converted types This table describes how Parquet logical and converted types are imported into Krenalis. | Logical (or Converted) type | Underlying physical type | Imported in Krenalis as | |-----------------------------|--------------------------|----------------------------------------| | `STRING` | `BYTE_ARRAY` | `string` | | `ENUM` | `BYTE_ARRAY` | `string` | | `UUID` | `FIXED_LEN_BYTE_ARRAY` | `uuid` | | `INT(8, true)` | `INT32` | `int(8)` | | `INT(16, true)` | `INT32` | `int(16)` | | `INT(32, true)` | `INT32` | `int(32)` | | `INT(64, true)` | `INT64` | `int(64)` | | `INT(8, false)` | `INT32` | `unsigned int(8)` | | `INT(16, false)` | `INT32` | `unsigned int(16)` | | `INT(32, false)` | `INT32` | `unsigned int(32)` | | `INT(64, false)` | `INT64` | `unsigned int(64)` | | `INT_8` | `INT32` | `int(8)` | | `INT_16` | `INT32` | `int(16)` | | `INT_32` | `INT32` | `int(32)` | | `INT_64` | `INT64` | `int(64)` | | `UINT_8` | `INT32` | `unsigned int(8)` | | `UINT_16` | `INT32` | `unsigned int(16)` | | `UINT_32` | `INT32` | `unsigned int(32)` | | `UINT_64` | `INT64` | `unsigned int(64)` | | `DECIMAL` | `INT32` | `decimal` [^decimal_limits] | | `DECIMAL` | `INT64` | `decimal` [^decimal_limits] | | `DECIMAL` | `FIXED_LEN_BYTE_ARRAY` | `decimal` [^decimal_limits] | | `DECIMAL` | `BYTE_ARRAY` | `decimal` [^decimal_limits] | | `DECIMAL` (converted type) | `INT32` | `decimal` [^decimal_limits] | | `DECIMAL` (converted type) | `INT64` | `decimal` [^decimal_limits] | | `DECIMAL` (converted type) | `FIXED_LEN_BYTE_ARRAY` | `decimal` [^decimal_limits] | | `DECIMAL` (converted type) | `BYTE_ARRAY` | `decimal` [^decimal_limits] | | `FLOAT16` | - | Not supported | | `DATE` | `INT32` | `date` | | `TIME` (unit `MILLIS`) | `INT32` | `time` | | `TIME` (unit `MICROS`) | `INT64` | `time` | | `TIME` (unit `NANOS`) | `INT64` | `time` | | `TIME_MILLIS` | `INT32` | `time` | | `TIME_MICROS` | `INT64` | `time` | | `TIMESTAMP` (unit `MILLIS`) | `INT64` | `datetime` | | `TIMESTAMP` (unit `MICROS`) | `INT64` | `datetime` | | `TIMESTAMP` (unit `NANOS`) | `INT64` | `datetime` | | `TIMESTAMP_MILLIS` | - | Not supported [^timestamp_milli_micro] | | `TIMESTAMP_MICROS` | - | Not supported [^timestamp_milli_micro] | | `INTERVAL` | - | Not supported | | `JSON` | `BYTE_ARRAY` | `json` | | `BSON` | - | Not supported [^bson_support] | | `VARIANT` | - | Not supported | | `GEOMETRY` | - | Not supported | | `GEOGRAPHY` | - | Not supported | | `LIST` | - | Not supported [^list_support] | | `MAP` | - | Not supported [^map_support] | | `UNKNOWN` | - | Not supported | ### Column groups Import of columns groups is currently not supported. [^int96]: `INT96` types are always treated as `datetime` Krenalis types, because that is in fact how they are used in the Parquet files. However, please note that this type of representation is deprecated, and is kept in the integration for Parquet only for compatibility with older Parquet files. [^list_support]: Support for importing `LIST` columns is discussed here: https://github.com/krenalis/krenalis/issues/1325 [^map_support]: Support for importing `MAP` columns is discussed here: https://github.com/krenalis/krenalis/issues/1371 [^timestamp_milli_micro]: Support for importing `TIMESTAMP_MILLIS` and `TIMESTAMP_MICROS` is discussed here: https://github.com/krenalis/krenalis/issues/1385 [^decimal_limits]: `DECIMAL` types from Parquet are supported if the precision is ≤ 76 and the scale is ≤ 37. [^bson_support]: Support for the `BSON` type is discussed in: https://github.com/krenalis/krenalis/issues/1400. ## How Krenalis types are exported to Parquet The following table shows how the user property types in Krenalis are mapped to the column types in the exported Parquet file: | Type of user property in Krenalis | Physical Type of exported Parquet column | Logical Type of exported Parquet column | |------------------------------------|------------------------------------------|-------------------------------------------------------------| | `boolean` | `BOOLEAN` | *(none)* | | `int(8)` | `INT32` | `INT(8, true)` | | `int(16)` | `INT32` | `INT(16, true)` | | `int(24)` | `INT32` | *(none)* | | `int(32)` | `INT32` | *(none)* | | `int(64)` | `INT36` | *(none)* | | `unsigned int(8)` | `INT32` | `INT(8, false)` | | `unsigned int(16)` | `INT32` | `INT(16, false)` | | `unsigned int(24)` | `INT32` | `INT(32, false)` | | `unsigned int(32)` | `INT64` | `INT(32, false)` | | `unsigned int(64)` | `INT64` | `INT(64, false)` | | `float(32)` | `FLOAT` | *(none)* | | `float(64)` | `DOUBLE` | *(none)* | | `decimal(p, s)` with `p` ≤ 9 | `INT32` | `DECIMAL(p, s)` | | `decimal(p, s)` with 10 ≤ `p` ≤ 18 | `INT64` | `DECIMAL(p, s)` | | `decimal(p, s)` with `p` ≥ 19 | `BYTE_ARRAY` | `DECIMAL(p, s)` | | `datetime` | `INT64` | `TIMESTAMP(isAdjustedToUTC=true, unit=NANOS)` | | `date` | `INT32` | `DATE` | | `time` | `INT64` | `TIME(isAdjustedToUTC=true, unit=MICROS)` [^time_precision] | | `year` | `INT32` | *(none)* | | `uuid` | `FIXED_LEN_BYTE_ARRAY` with length 16 | `UUID` | | `json` | `BYTE_ARRAY` | `JSON` | | `ip` | `BYTE_ARRAY` | `STRING` | | `string` | `BYTE_ARRAY` | `STRING` | | `array` | `BYTE_ARRAY` [^array_support] | `JSON` [^array_support] | | `object` | *(column groups)* | - | | `map` | `BYTE_ARRAY` [^map_support] | `JSON` [^map_support] | [^array_support]: Support for array properties is discussed here: https://github.com/krenalis/krenalis/issues/1325 [^map_support]: Support map properties is discussed here: https://github.com/krenalis/krenalis/issues/1371 [^time_precision]: The microseconds precision is used for `time` values instead of nanoseconds. See https://github.com/krenalis/krenalis/issues/1392.