Utility: Verify Parquet#

Checks if a parquet file can be loaded into Vast DB.

Vast DB is able to import Parquet files from Vast S3 in a highly optimised manner. The Python SDK methods are:

The vastdb import functionality requires parquet files to have only supported datatypes. This page provides an example script to verify columns in a parquet file and print out any offending columns.

Limitations#

  • Currently unable to calculate max column size for nested types (List, Map, Struct).

Install#

pip3 install --upgrade --quiet git+https://github.com/snowch/vastdb_parq_schema_file.git --use-pep517

Examples#

New York Taxi Data#

NYT Data

Column too wide example:#

String field too large