Skip to main content

Persistent Storage

All data and computed results are automatically stored and versioned.

Incremental Updates

Data transformations run automatically on new data—no orchestration code needed.

Multimodal-Native

Images, video, audio, and documents integrate seamlessly with structured data.

AI Integration

Built-in support for OpenAI, Anthropic, Gemini, Hugging Face, and dozens more.

Get started

Many documentation pages are interactive notebooks (marked with ). Open them in Colab, Kaggle, or locally to follow along.

What can you build?

Declarative Pipelines

Replace complex orchestration with simple computed columns. Define transformations once—they run automatically on all data.

Multimodal Workloads

Production RAG with automatic embedding indexing. Find relevant scenes in video. Semantic search across text, images, and audio.

Version Control and Lineage

Automatic versioning on every change. Time travel queries to any point. Full data lineage for reproducibility.

AI Agents & MCP

Build tool-calling agents with persistent memory, MCP server integration, and automatic conversation history.

ML Feature Engineering

Curate, augment, and export data to PyTorch, Parquet, COCO format, LanceDB, and pandas for training and analytics.

Explore by use case

Next steps