> ## Documentation Index > Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt > Use this file to discover all available pages before exploring further. # Data Wrangling for ML > Wrangle video, audio, documents, and images into ML-ready datasets with Pixeltable computed columns, iterators, and embedding indices. **Who:** ML Engineers, Data Scientists **Output:** Training/evaluation datasets **Pixeltable is your system of record**—all data, cached results, and references stay in sync. *** ## Data Lifecycle Load from any source: [`import_csv()`](/sdk/latest/io#func-import_csv), [`import_parquet()`](/sdk/latest/io#func-import_parquet), [HuggingFace](/howto/cookbooks/data/data-import-huggingface), [S3/GCS/Azure](/integrations/cloud-storage), RDBMS via Python DB API Load images/videos from cloud storage Load datasets from HuggingFace Hub Statistics & sampling: [`select()`](/tutorials/queries-and-expressions), [`.sample()`](/howto/cookbooks/data/data-sampling), `.head()` Sample and filter large datasets efficiently Transform & extract: [`add_computed_column()`](/tutorials/computed-columns), [`FrameIterator`](/platform/iterators), [`DocumentSplitter`](/platform/iterators) Process video into frame-level data Audio to text with Whisper **Model-in-the-loop:** Auto-generate labels with AI models * **Object Detection:** [`yolox.yolox()`](/sdk/latest/yolox), [`huggingface.detr_for_object_detection()`](/sdk/latest/huggingface) * **Vision LLMs:** [`openai.chat_completions()`](/sdk/latest/openai), [`anthropic.messages()`](/sdk/latest/anthropic), [`gemini.generate_content()`](/sdk/latest/gemini) * **Classification:** [`huggingface.image_classification()`](/sdk/latest/huggingface) Run YOLOX detection on images Analyze images with GPT-4o **Human-in-the-loop:** Refine labels with human annotators [Label Studio](/howto/using-label-studio-with-pixeltable) sync, [FiftyOne](/howto/working-with-fiftyone) export, [`add_embedding_index()`](/platform/embedding-indexes) for curation search Sync annotations bidirectionally Visualize and curate datasets **Model-in-the-loop vs Human-in-the-loop:** Use pre-annotation to generate initial labels with AI models, then refine with human annotators. Pixeltable keeps both in sync—model outputs and human corrections live in the same table. Find similar examples with embedding search, filter by quality metrics [`add_embedding_index()`](/platform/embedding-indexes), [`.similarity()`](/platform/embedding-indexes), `.where()`, `.order_by()` Find visually similar samples Search by meaning, not keywords **Test transformations before committing:** Run `SELECT` to preview results on samples before adding computed columns ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Test on 5 rows first (no storage cost) t.select(t.image, new_label=my_classifier(t.image)).head(5) # Happy? Commit to full dataset t.add_computed_column(new_label=my_classifier(t.image)) ``` Test UDFs and expressions before committing Version control: [`create_snapshot()`](/platform/version-control), [`create_view()`](/platform/views), [`history()`](/platform/version-control), lineage tracking Track changes and revert to previous states **Why curate?** ML models are only as good as their training data. Use Pixeltable's search and filtering to find edge cases, remove duplicates, balance classes, and iterate on your data quality before export. Publish to cloud: [`publish()`](/platform/data-sharing), [`replicate()`](/platform/data-sharing), `push()`, `pull()` Collaborate with your team via cloud replicas Training and data formats: [`export_csv()`](/sdk/latest/io#func-export_csv), [`export_json()`](/sdk/latest/io#func-export_json), [`export_parquet()`](/sdk/latest/io#func-export_parquet), [`to_pytorch_dataset()`](/sdk/latest/query#method-to_pytorch_dataset), [`to_coco_dataset()`](/sdk/latest/query#method-to_coco_dataset), [`export_lancedb()`](/sdk/latest/io#func-export_lancedb) Convert to PyTorch DataLoader format All import/export formats *** ## End-to-End Examples Complete workflow: ingest video → extract frames → detect objects → export Transcribe and analyze audio at scale Extract structured data from images with GPT-4o Auto-generate image descriptions