Model Hub & Repositories

Model hubs

Hugging Face Hub

Access thousands of pre-trained models across vision, text, and audio domains

Replicate

Deploy and run ML models through Replicate’s cloud infrastructure

Hugging Face models

Pixeltable provides seamless integration with Hugging Face’s transformers library through built-in UDFs. These functions allow you to use state-of-the-art models directly in your data workflows.

Requirements: Install required dependencies with pip install transformers. Some models may require additional packages like sentence-transformers or torch.

CLIP models

from pixeltable.functions.huggingface import clip

# For text embedding
t.add_computed_column(
    text_embedding=clip(
        t.text_column,
        model_id='openai/clip-vit-base-patch32'
    )
)

# For image embedding
t.add_computed_column(
    image_embedding=clip(
        t.image_column,
        model_id='openai/clip-vit-base-patch32'
    )
)

Perfect for multimodal applications combining text and image understanding.

Cross-encoders

from pixeltable.functions.huggingface import cross_encoder

t.add_computed_column(
    similarity_score=cross_encoder(
        t.sentence1,
        t.sentence2,
        model_id='cross-encoder/ms-marco-MiniLM-L-4-v2'
    )
)

Ideal for semantic similarity tasks and sentence pair classification.

DETR object detection

from pixeltable.functions.huggingface import detr_for_object_detection

t.add_computed_column(
    detections=detr_for_object_detection(
        t.image,
        model_id='facebook/detr-resnet-50',
        threshold=0.8
    )
)

# Convert to COCO format if needed
t.add_computed_column(
    coco_format=detr_to_coco(t.image, t.detections)
)

Powerful object detection with end-to-end transformer architecture.

Sentence transformers

from pixeltable.functions.huggingface import sentence_transformer

t.add_computed_column(
    embeddings=sentence_transformer(
        t.text,
        model_id='sentence-transformers/all-mpnet-base-v2',
        normalize_embeddings=True
    )
)

State-of-the-art sentence and document embeddings for semantic search and similarity.

Speech2Text models

from pixeltable.functions.huggingface import speech2text_for_conditional_generation

# Basic transcription
t.add_computed_column(
    transcript=speech2text_for_conditional_generation(
        t.audio,
        model_id='facebook/s2t-small-librispeech-asr'
    )
)

# Multilingual translation
t.add_computed_column(
    translation=speech2text_for_conditional_generation(
        t.audio,
        model_id='facebook/s2t-medium-mustc-multilingual-st',
        language='fr'
    )
)

Support for both transcription and translation of audio content.

Vision Transformer (ViT)

from pixeltable.functions.huggingface import vit_for_image_classification

t.add_computed_column(
    classifications=vit_for_image_classification(
        t.image,
        model_id='google/vit-base-patch16-224',
        top_k=5
    )
)

Modern image classification using transformer architecture.

Integration features

Computed Columns

All models can be used directly in computed columns for automated processing:

# Example: Combine CLIP embeddings with ViT classification
t.add_computed_column(
    image_features=clip(t.image, model_id='openai/clip-vit-base-patch32')
)
t.add_computed_column(
    classifications=vit_for_image_classification(t.image, model_id='google/vit-base-patch16-224')
)

Batch Processing

Pixeltable automatically handles batch processing and optimization:

# Pixeltable efficiently processes large datasets
t.add_computed_column(
    embeddings=sentence_transformer(
        t.text,
        model_id='all-mpnet-base-v2'
    )
)

Model Output Format

# Object Detection Output
{
    'scores': [0.99, 0.98],  # confidence scores
    'labels': [25, 30],      # class labels
    'label_text': ['cat', 'dog'], # human-readable labels
    'boxes': [[x1, y1, x2, y2], ...] # bounding boxes
}

# Image Classification Output
{
    'scores': [0.8, 0.15],   # class probabilities
    'labels': [340, 353],    # class IDs
    'label_text': ['zebra', 'gazelle'] # class names
}

Model selection guide

Choose Task

Select the appropriate model family based on your task:

Text/Image Similarity → CLIP
Object Detection → DETR
Text Embeddings → Sentence Transformers
Speech Processing → Speech2Text
Image Classification → ViT

Check Requirements

Install necessary dependencies:

pip install transformers torch sentence-transformers

Setup Integration

Import and use the model in your Pixeltable workflow:

from pixeltable.functions.huggingface import clip, sentence_transformer

Need help choosing the right model? Check our example notebooks or join our Discord community.

Built-in Integrations

Infrastructure

Bring Your Own

Model hubs

Hugging Face Hub

Replicate

Hugging Face models

CLIP models

Cross-encoders

DETR object detection

Sentence transformers

Speech2Text models

Vision Transformer (ViT)

Integration features

Model selection guide

Built-in Integrations

Infrastructure

Bring Your Own

​Model hubs

Hugging Face Hub

Replicate

​Hugging Face models

​CLIP models

​Cross-encoders

​DETR object detection

​Sentence transformers

​Speech2Text models

​Vision Transformer (ViT)

​Integration features

​Model selection guide

Model hubs

Hugging Face models

CLIP models

Cross-encoders

DETR object detection

Sentence transformers

Speech2Text models

Vision Transformer (ViT)

Integration features

Model selection guide