VAST Vector Database Overview#
VAST Vector Database enables high-performance vector similarity search with SQL-based querying. It integrates vector operations into a standard database workflow, allowing you to combine vector search with traditional SQL filtering and ordering.
Key Capabilities#
Vector Storage and Retrieval
Store high-dimensional vectors alongside structured data in tables
Define vector columns with fixed dimensions using PyArrow list types
Native support for vector similarity search operations
SQL-Based Vector Search
Query vectors using standard SQL syntax
Built-in distance functions:
array_distance()
for Euclidean distancearray_cosine_distance()
for cosine similarity
Combine vector search with WHERE clauses for filtered queries
Order results by similarity and limit top matches
Integration
Uses PyArrow for data structures and Arrow tables for efficient data transfer
ADBC (Arrow Database Connectivity) driver for querying
VastDB SDK for table creation and data insertion
Hierarchical organization: buckets → schemas → tables
Common Use Cases#
Semantic Search: Find similar items based on vector embeddings from text, images, or other data
Recommendation Systems: Retrieve similar products, content, or users based on embedding vectors
Hybrid Search: Combine vector similarity with traditional filters (timestamps, categories, metadata) in a single SQL query
Time-Series Vector Data: Store and query vectors with timestamps for temporal analysis
Getting Started#
The typical workflow involves:
Obtaining the ADBC driver (libadbc_driver_vastdb.so)
Creating tables with vector columns using the VastDB SDK
Inserting vector data with PyArrow tables
Querying with SQL and vector distance functions via ADBC
Vector columns are defined with fixed dimensions, and queries return Arrow tables that integrate seamlessly with pandas for analysis.