Utility: Verify Parquet#
Checks if a parquet file can be loaded into Vast DB.
Vast DB is able to import Parquet files from Vast S3 in a highly optimised manner. The Python SDK methods are:
The vastdb import functionality requires parquet files to have only supported datatypes. This page provides an example script to verify columns in a parquet file and print out any offending columns.
Limitations#
Currently unable to calculate max column size for nested types (List, Map, Struct).
Install#
pip3 install --upgrade --quiet git+https://github.com/snowch/vastdb_parq_schema_file.git --use-pep517