transformers package.
These UDFs will cause Pixeltable to invoke the relevant models locally. In order to use them, you must first pip install transformers (or in some cases, sentence-transformers, as noted in the specific UDFs).
View source on GitHub
UDFs
clip() udf
Computes a CLIP embedding for the specified text or image. model_id should be a reference to a pretrained
CLIP Model.
Requirements:
pip install torch transformers
text(String): The string to embed.model_id(String): The pretrained model to use for the embedding.
- Array[(None,), Float]: An array containing the output of the embedding model.
openai/clip-vit-base-patch32 to an existing Pixeltable column tbl.text of the table tbl:
cross_encoder() udf
Performs predicts on the given sentence pair.
model_id should be a pretrained Cross-Encoder model, as described in the Cross-Encoder Pretrained Models documentation.
Requirements:
pip install torch sentence-transformers
sentences1(String): The first sentence to be paired.sentences2(String): The second sentence to be paired.model_id(String): The identifier of the cross-encoder model to use.
- Float: The similarity score between the inputs.
ms-marco-MiniLM-L-4-v2 to the sentences in columns tbl.sentence1 and tbl.sentence2:
cross_encoder_list() udf
Signature:
detr_for_object_detection() udf
Computes DETR object detections for the specified image. model_id should be a reference to a pretrained
DETR Model.
Requirements:
pip install torch transformers
image(Image): The image to embed.model_id(String): The pretrained model to use for object detection.
- Json: A dictionary containing the output of the object detection model, in the following format:
facebook/detr-resnet-50 to an existing Pixeltable column image of the table tbl:
detr_to_coco() udf
Converts the output of a DETR object detection model to COCO format.
Signature:
image(Image): The image for which detections were computed.detr_info(Json): The output of a DETR object detection model, as returned bydetr_for_object_detection.
- Json: A dictionary containing the data from
detr_info, converted to COCO format.
tbl.detections to COCO format, where tbl.image is the image for which detections were computed:
sentence_transformer() udf
Computes sentence embeddings. model_id should be a pretrained Sentence Transformers model, as described
in the Sentence Transformers Pretrained Models documentation.
Requirements:
pip install torch sentence-transformers
sentence(String): The sentence to embed.model_id(String): The pretrained model to use for the encoding.normalize_embeddings(Bool): IfTrue, normalizes embeddings to length 1; see the Sentence Transformers API Docs for more details
- Array[(None,), Float]: An array containing the output of the embedding model.
all-mpnet-base-2 to an existing Pixeltable column tbl.sentence of the table tbl:
sentence_transformer_list() udf
Signature:
speech2text_for_conditional_generation() udf
Transcribes or translates speech to text using a Speech2Text model. model_id should be a reference to a
pretrained Speech2Text model.
Requirements:
pip install torch torchaudio sentencepiece transformers
audio(Audio): The audio clip to transcribe or translate.model_id(String): The pretrained model to use for the transcription or translation.language(Optional[String]): If using a multilingual translation model, the language code to translate to. If not provided, the model’s default language will be used. If the model is not translation model, is not a multilingual model, or does not support the specified language, an error will be raised.
- String: The transcribed or translated text.
facebook/s2t-small-librispeech-asr to an existing Pixeltable column audio of the table tbl:
facebook/s2t-medium-mustc-multilingual-st to an existing Pixeltable column audio of the table tbl, translating the audio to French:
vit_for_image_classification() udf
Computes image classifications for the specified image using a Vision Transformer (ViT) model.
model_id should be a reference to a pretrained ViT Model.
Note: Be sure the model is a ViT model that is trained for image classification (that is, a model designed for use with the ViTForImageClassification class), such as google/vit-base-patch16-224. General feature-extraction models such as google/vit-base-patch16-224-in21k will not produce the desired results.
Requirements:
pip install torch transformers
image(Image): The image to classify.model_id(String): The pretrained model to use for the classification.top_k(Int): The number of classes to return.
- Json: A dictionary containing the output of the image classification model, in the following format:
google/vit-base-patch16-224 to an existing Pixeltable column image of the table tbl, returning the 10 most likely classes for each image: