Model Hubs

Hugging Face Models

Pixeltable provides seamless integration with Hugging Face’s transformers library through built-in UDFs. These functions allow you to use state-of-the-art models directly in your data workflows.

Requirements: Install required dependencies with pip install transformers. Some models may require additional packages like sentence-transformers or torch.

CLIP Models

from pixeltable.functions.huggingface import clip

# For text embedding
t.add_computed_column(
    text_embedding=clip(
        t.text_column,
        model_id='openai/clip-vit-base-patch32'
    )
)

# For image embedding
t.add_computed_column(
    image_embedding=clip(
        t.image_column,
        model_id='openai/clip-vit-base-patch32'
    )
)

Perfect for multimodal applications combining text and image understanding.

Cross-Encoders

from pixeltable.functions.huggingface import cross_encoder

t.add_computed_column(
    similarity_score=cross_encoder(
        t.sentence1,
        t.sentence2,
        model_id='cross-encoder/ms-marco-MiniLM-L-4-v2'
    )
)

Ideal for semantic similarity tasks and sentence pair classification.

DETR Object Detection

from pixeltable.functions.huggingface import detr_for_object_detection

t.add_computed_column(
    detections=detr_for_object_detection(
        t.image,
        model_id='facebook/detr-resnet-50',
        threshold=0.8
    )
)

# Convert to COCO format if needed
t.add_computed_column(
    coco_format=detr_to_coco(t.image, t.detections)
)

Powerful object detection with end-to-end transformer architecture.

Sentence Transformers

from pixeltable.functions.huggingface import sentence_transformer

t.add_computed_column(
    embeddings=sentence_transformer(
        t.text,
        model_id='sentence-transformers/all-mpnet-base-v2',
        normalize_embeddings=True
    )
)

State-of-the-art sentence and document embeddings for semantic search and similarity.

Speech2Text Models

from pixeltable.functions.huggingface import speech2text_for_conditional_generation

# Basic transcription
t.add_computed_column(
    transcript=speech2text_for_conditional_generation(
        t.audio,
        model_id='facebook/s2t-small-librispeech-asr'
    )
)

# Multilingual translation
t.add_computed_column(
    translation=speech2text_for_conditional_generation(
        t.audio,
        model_id='facebook/s2t-medium-mustc-multilingual-st',
        language='fr'
    )
)

Support for both transcription and translation of audio content.

Vision Transformer (ViT)

from pixeltable.functions.huggingface import vit_for_image_classification

t.add_computed_column(
    classifications=vit_for_image_classification(
        t.image,
        model_id='google/vit-base-patch16-224',
        top_k=5
    )
)

Modern image classification using transformer architecture.

Integration Features

Model Selection Guide

1

Choose Task

Select the appropriate model family based on your task:

  • Text/Image Similarity → CLIP
  • Object Detection → DETR
  • Text Embeddings → Sentence Transformers
  • Speech Processing → Speech2Text
  • Image Classification → ViT
2

Check Requirements

Install necessary dependencies:

pip install transformers torch sentence-transformers
3

Setup Integration

Import and use the model in your Pixeltable workflow:

from pixeltable.functions.huggingface import clip, sentence_transformer

Need help choosing the right model? Check our example notebooks or join our Discord community.