Parquet
What you can do with this integration
The integration for Parquet lets you:
- Read user data from a Parquet file and unify them as user profiles inside Krenalis.
- Write unified users back into Parquet and keep the target synchronized over time.
-
Ingest users
Read and sync user data from a Parquet file.
-
Activate users
Write unified profiles to a Parquet file and keep the data updated.
How Parquet columns are imported
This section summarizes how Parquet column types are imported into Krenalis.
Physical types
This table describes how Parquet physical types (without any logical type annotations) are imported into Krenalis.
| Parquet Type | Imported in Krenalis as |
|---|---|
BOOLEAN |
boolean |
INT32 |
int(32) |
INT64 |
int(64) |
INT96 |
datetime 1 |
FLOAT |
float(32) |
DOUBLE |
float(64) |
BYTE_ARRAY |
string |
FIXED_LEN_BYTE_ARRAY |
string |
Logical and Converted types
This table describes how Parquet logical and converted types are imported into Krenalis.
| Logical (or Converted) type | Underlying physical type | Imported in Krenalis as |
|---|---|---|
STRING |
BYTE_ARRAY |
string |
ENUM |
BYTE_ARRAY |
string |
UUID |
FIXED_LEN_BYTE_ARRAY |
uuid |
INT(8, true) |
INT32 |
int(8) |
INT(16, true) |
INT32 |
int(16) |
INT(32, true) |
INT32 |
int(32) |
INT(64, true) |
INT64 |
int(64) |
INT(8, false) |
INT32 |
unsigned int(8) |
INT(16, false) |
INT32 |
unsigned int(16) |
INT(32, false) |
INT32 |
unsigned int(32) |
INT(64, false) |
INT64 |
unsigned int(64) |
INT_8 |
INT32 |
int(8) |
INT_16 |
INT32 |
int(16) |
INT_32 |
INT32 |
int(32) |
INT_64 |
INT64 |
int(64) |
UINT_8 |
INT32 |
unsigned int(8) |
UINT_16 |
INT32 |
unsigned int(16) |
UINT_32 |
INT32 |
unsigned int(32) |
UINT_64 |
INT64 |
unsigned int(64) |
DECIMAL |
INT32 |
decimal 2 |
DECIMAL |
INT64 |
decimal 2 |
DECIMAL |
FIXED_LEN_BYTE_ARRAY |
decimal 2 |
DECIMAL |
BYTE_ARRAY |
decimal 2 |
DECIMAL (converted type) |
INT32 |
decimal 2 |
DECIMAL (converted type) |
INT64 |
decimal 2 |
DECIMAL (converted type) |
FIXED_LEN_BYTE_ARRAY |
decimal 2 |
DECIMAL (converted type) |
BYTE_ARRAY |
decimal 2 |
FLOAT16 |
- | Not supported |
DATE |
INT32 |
date |
TIME (unit MILLIS) |
INT32 |
time |
TIME (unit MICROS) |
INT64 |
time |
TIME (unit NANOS) |
INT64 |
time |
TIME_MILLIS |
INT32 |
time |
TIME_MICROS |
INT64 |
time |
TIMESTAMP (unit MILLIS) |
INT64 |
datetime |
TIMESTAMP (unit MICROS) |
INT64 |
datetime |
TIMESTAMP (unit NANOS) |
INT64 |
datetime |
TIMESTAMP_MILLIS |
- | Not supported 3 |
TIMESTAMP_MICROS |
- | Not supported 3 |
INTERVAL |
- | Not supported |
JSON |
BYTE_ARRAY |
json |
BSON |
- | Not supported 4 |
VARIANT |
- | Not supported |
GEOMETRY |
- | Not supported |
GEOGRAPHY |
- | Not supported |
LIST |
- | Not supported 5 |
MAP |
- | Not supported 6 |
UNKNOWN |
- | Not supported |
Column groups
Import of columns groups is currently not supported.
How Krenalis types are exported to Parquet
The following table shows how the user property types in Krenalis are mapped to the column types in the exported Parquet file:
| Type of user property in Krenalis | Physical Type of exported Parquet column | Logical Type of exported Parquet column |
|---|---|---|
boolean |
BOOLEAN |
(none) |
int(8) |
INT32 |
INT(8, true) |
int(16) |
INT32 |
INT(16, true) |
int(24) |
INT32 |
(none) |
int(32) |
INT32 |
(none) |
int(64) |
INT36 |
(none) |
unsigned int(8) |
INT32 |
INT(8, false) |
unsigned int(16) |
INT32 |
INT(16, false) |
unsigned int(24) |
INT32 |
INT(32, false) |
unsigned int(32) |
INT64 |
INT(32, false) |
unsigned int(64) |
INT64 |
INT(64, false) |
float(32) |
FLOAT |
(none) |
float(64) |
DOUBLE |
(none) |
decimal(p, s) with p ≤ 9 |
INT32 |
DECIMAL(p, s) |
decimal(p, s) with 10 ≤ p ≤ 18 |
INT64 |
DECIMAL(p, s) |
decimal(p, s) with p ≥ 19 |
BYTE_ARRAY |
DECIMAL(p, s) |
datetime |
INT64 |
TIMESTAMP(isAdjustedToUTC=true, unit=NANOS) |
date |
INT32 |
DATE |
time |
INT64 |
TIME(isAdjustedToUTC=true, unit=MICROS) 7 |
year |
INT32 |
(none) |
uuid |
FIXED_LEN_BYTE_ARRAY with length 16 |
UUID |
json |
BYTE_ARRAY |
JSON |
ip |
BYTE_ARRAY |
STRING |
string |
BYTE_ARRAY |
STRING |
array |
BYTE_ARRAY 8 |
JSON 8 |
object |
(column groups) | - |
map |
BYTE_ARRAY 6 |
JSON 6 |
-
INT96types are always treated asdatetimeKrenalis types, because that is in fact how they are used in the Parquet files. However, please note that this type of representation is deprecated, and is kept in the integration for Parquet only for compatibility with older Parquet files. ↩︎ -
DECIMALtypes from Parquet are supported if the precision is ≤ 76 and the scale is ≤ 37. ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ -
Support for importing
TIMESTAMP_MILLISandTIMESTAMP_MICROSis discussed here: https://github.com/krenalis/krenalis/issues/1385 ↩︎ ↩︎ -
Support for the
BSONtype is discussed in: https://github.com/krenalis/krenalis/issues/1400. ↩︎ -
Support for importing
LISTcolumns is discussed here: https://github.com/krenalis/krenalis/issues/1325 ↩︎ -
Support for importing
MAPcolumns is discussed here: https://github.com/krenalis/krenalis/issues/1371 ↩︎ ↩︎ ↩︎ -
The microseconds precision is used for
timevalues instead of nanoseconds. See https://github.com/krenalis/krenalis/issues/1392. ↩︎ -
Support for array properties is discussed here: https://github.com/krenalis/krenalis/issues/1325 ↩︎ ↩︎