Data Ingest/Transformation#
Efficient data ingestion is a critical first step in leveraging the full potential of the VAST Database. This section focuses on the best practices, tools, and techniques for getting your data into VAST quickly and reliably. Whether you’re moving large-scale datasets, streaming real-time data, or transferring files from diverse sources, VAST’s architecture is designed to handle massive data ingestion with ease, ensuring high performance and scalability. Explore this section to learn how to optimize your ingestion workflows and seamlessly integrate data into the VAST Database, enabling fast, powerful analytics right from the start.
Table of Contents#
- Python Data Ingestion
- Python SDK - import JSON
- Python SDK - import CSV
- Python SDK - import Parquet
- Python SDK - import Grib2
- Apache NiFi
- Spark Data Ingestion
- Kafka via Python
- Kafka via Spark Streaming
- Trino with Vast Data
- Query Merge vs Insert Merge
- Spark Query Time Merge (id)
- Spark Query Time Merge (non-id)
- Trino Query Time Merge (id)
- Trino Query Time Merge (non-id)
- Apache Beam Sink (Python)