pixeltable.functions.huggingface - Pixeltable Documentation

Pixeltable UDFs that wrap various models from the Hugging Face transformers package. These UDFs will cause Pixeltable to invoke the relevant models locally. In order to use them, you must first pip install transformers (or in some cases, sentence-transformers, as noted in the specific UDFs). View source on GitHub

UDFs

`clip()` _udf

Computes a CLIP embedding for the specified text or image. model_id should be a reference to a pretrained CLIP Model. Requirements:

pip install torch transformers

Signature:

# Signature 1:
clip(
    text: String,
    model_id: String
)-> Array[(None,), Float]

# Signature 2:
clip(
    image: Image,
    model_id: String
)-> Array[(None,), Float]

Parameters:

text (String): The string to embed.
model_id (String): The pretrained model to use for the embedding.

Returns:

Array[(None,), Float]: An array containing the output of the embedding model.

Example: Add a computed column that applies the model openai/clip-vit-base-patch32 to an existing Pixeltable column tbl.text of the table tbl:

tbl.add_computed_column(result=clip(tbl.text, model_id='openai/clip-vit-base-patch32'))

`cross_encoder()` _udf

Performs predicts on the given sentence pair. model_id should be a pretrained Cross-Encoder model, as described in the Cross-Encoder Pretrained Models documentation. Requirements:

pip install torch sentence-transformers

Signature:

cross_encoder(
    sentences1: String,
    sentences2: String,
    model_id: String
)-> Float

Parameters:

sentences1 (String): The first sentence to be paired.
sentences2 (String): The second sentence to be paired.
model_id (String): The identifier of the cross-encoder model to use.

Returns:

Float: The similarity score between the inputs.

Example: Add a computed column that applies the model ms-marco-MiniLM-L-4-v2 to the sentences in columns tbl.sentence1 and tbl.sentence2:

tbl.add_computed_column(result=sentence_transformer(tbl.sentence1, tbl.sentence2, model_id='ms-marco-MiniLM-L-4-v2'))

`cross_encoder_list()` _udf

Signature:

cross_encoder_list(
    sentence1: String,
    sentences2: Json,
    model_id: String
)-> Json

`detr_for_object_detection()` _udf

Computes DETR object detections for the specified image. model_id should be a reference to a pretrained DETR Model. Requirements:

pip install torch transformers

Signature:

detr_for_object_detection(
    image: Image,
    model_id: String,
    threshold: Float,
    revision: String
)-> Json

Parameters:

image (Image): The image to embed.
model_id (String): The pretrained model to use for object detection.

Returns:

Json: A dictionary containing the output of the object detection model, in the following format:

{
    'scores': [0.99, 0.999],  # list of confidence scores for each detected object
    'labels': [25, 25],  # list of COCO class labels for each detected object
    'label_text': ['giraffe', 'giraffe'],  # corresponding text names of class labels
    'boxes': [[51.942, 356.174, 181.481, 413.975], [383.225, 58.66, 605.64, 361.346]]
        # list of bounding boxes for each detected object, as [x1, y1, x2, y2]
}

Example: Add a computed column that applies the model facebook/detr-resnet-50 to an existing Pixeltable column image of the table tbl:

tbl.add_computed_column(
    detections=detr_for_object_detection(tbl.image, model_id='facebook/detr-resnet-50', threshold=0.8)
)

`detr_to_coco()` _udf

Converts the output of a DETR object detection model to COCO format. Signature:

detr_to_coco(
    image: Image,
    detr_info: Json
)-> Json

Parameters:

image (Image): The image for which detections were computed.
detr_info (Json): The output of a DETR object detection model, as returned by detr_for_object_detection.

Returns:

Json: A dictionary containing the data from detr_info, converted to COCO format.

Example: Add a computed column that converts the output tbl.detections to COCO format, where tbl.image is the image for which detections were computed:

tbl.add_computed_column(detections_coco=detr_to_coco(tbl.image, tbl.detections))

`sentence_transformer()` _udf

Computes sentence embeddings. model_id should be a pretrained Sentence Transformers model, as described in the Sentence Transformers Pretrained Models documentation. Requirements:

pip install torch sentence-transformers

Signature:

sentence_transformer(
    sentence: String,
    model_id: String,
    normalize_embeddings: Bool
)-> Array[(None,), Float]

Parameters:

sentence (String): The sentence to embed.
model_id (String): The pretrained model to use for the encoding.
normalize_embeddings (Bool): If True, normalizes embeddings to length 1; see the Sentence Transformers API Docs for more details

Returns:

Array[(None,), Float]: An array containing the output of the embedding model.

Example: Add a computed column that applies the model all-mpnet-base-2 to an existing Pixeltable column tbl.sentence of the table tbl:

tbl.add_computed_column(result=sentence_transformer(tbl.sentence, model_id='all-mpnet-base-v2'))

`sentence_transformer_list()` _udf

Signature:

sentence_transformer_list(
    sentences: Json,
    model_id: String,
    normalize_embeddings: Bool
)-> Json

`speech2text_for_conditional_generation()` _udf

Transcribes or translates speech to text using a Speech2Text model. model_id should be a reference to a pretrained Speech2Text model. Requirements:

pip install torch torchaudio sentencepiece transformers

Signature:

speech2text_for_conditional_generation(
    audio: Audio,
    model_id: String,
    language: Optional[String]
)-> String

Parameters:

audio (Audio): The audio clip to transcribe or translate.
model_id (String): The pretrained model to use for the transcription or translation.
language (Optional[String]): If using a multilingual translation model, the language code to translate to. If not provided, the model’s default language will be used. If the model is not translation model, is not a multilingual model, or does not support the specified language, an error will be raised.

Returns:

String: The transcribed or translated text.

Example: Add a computed column that applies the model facebook/s2t-small-librispeech-asr to an existing Pixeltable column audio of the table tbl:

tbl.add_computed_column(
    transcription=speech2text_for_conditional_generation(tbl.audio, model_id='facebook/s2t-small-librispeech-asr')
)

Add a computed column that applies the model facebook/s2t-medium-mustc-multilingual-st to an existing Pixeltable column audio of the table tbl, translating the audio to French:

tbl.add_computed_column(
    translation=speech2text_for_conditional_generation(
        tbl.audio, model_id='facebook/s2t-medium-mustc-multilingual-st', language='fr'
    )
)

`vit_for_image_classification()` _udf

Computes image classifications for the specified image using a Vision Transformer (ViT) model. model_id should be a reference to a pretrained ViT Model. Note: Be sure the model is a ViT model that is trained for image classification (that is, a model designed for use with the ViTForImageClassification class), such as google/vit-base-patch16-224. General feature-extraction models such as google/vit-base-patch16-224-in21k will not produce the desired results. Requirements:

pip install torch transformers

Signature:

vit_for_image_classification(
    image: Image,
    model_id: String,
    top_k: Int
)-> Json

Parameters:

image (Image): The image to classify.
model_id (String): The pretrained model to use for the classification.
top_k (Int): The number of classes to return.

Returns:

Json: A dictionary containing the output of the image classification model, in the following format:

{
    'scores': [0.325, 0.198, 0.105],  # list of probabilities of the top-k most likely classes
    'labels': [340, 353, 386],  # list of class IDs for the top-k most likely classes
    'label_text': ['zebra', 'gazelle', 'African elephant, Loxodonta africana'],
        # corresponding text names of the top-k most likely classes

Example: Add a computed column that applies the model google/vit-base-patch16-224 to an existing Pixeltable column image of the table tbl, returning the 10 most likely classes for each image:

tbl.add_computed_column(
    image_class=vit_for_image_classification(tbl.image, model_id='google/vit-base-patch16-224', top_k=10)
)

SDK Reference

​UDFs

​clip() udf

​cross_encoder() udf

​cross_encoder_list() udf

​detr_for_object_detection() udf

​detr_to_coco() udf

​sentence_transformer() udf

​sentence_transformer_list() udf

​speech2text_for_conditional_generation() udf

​vit_for_image_classification() udf

UDFs

`clip()` _udf

`cross_encoder()` _udf

`cross_encoder_list()` _udf

`detr_for_object_detection()` _udf

`detr_to_coco()` _udf

`sentence_transformer()` _udf

`sentence_transformer_list()` _udf

`speech2text_for_conditional_generation()` _udf

`vit_for_image_classification()` _udf