transformers
package.
These UDFs will cause Pixeltable to invoke the relevant models locally. In order to use them, you must first pip install transformers
(or in some cases, sentence-transformers
, as noted in the specific UDFs).
View source on GitHub
UDFs
clip()
udf
Computes a CLIP embedding for the specified text or image. model_id
should be a reference to a pretrained
CLIP Model.
Requirements:
pip install torch transformers
text
(String): The string to embed.model_id
(String): The pretrained model to use for the embedding.
- Array[(None,), Float]: An array containing the output of the embedding model.
openai/clip-vit-base-patch32
to an existing Pixeltable column tbl.text
of the table tbl
:
cross_encoder()
udf
Performs predicts on the given sentence pair.
model_id
should be a pretrained Cross-Encoder model, as described in the Cross-Encoder Pretrained Models documentation.
Requirements:
pip install torch sentence-transformers
sentences1
(String): The first sentence to be paired.sentences2
(String): The second sentence to be paired.model_id
(String): The identifier of the cross-encoder model to use.
- Float: The similarity score between the inputs.
ms-marco-MiniLM-L-4-v2
to the sentences in columns tbl.sentence1
and tbl.sentence2
:
cross_encoder_list()
udf
Signature:
detr_for_object_detection()
udf
Computes DETR object detections for the specified image. model_id
should be a reference to a pretrained
DETR Model.
Requirements:
pip install torch transformers
image
(Image): The image to embed.model_id
(String): The pretrained model to use for object detection.
- Json: A dictionary containing the output of the object detection model, in the following format:
facebook/detr-resnet-50
to an existing Pixeltable column image
of the table tbl
:
detr_to_coco()
udf
Converts the output of a DETR object detection model to COCO format.
Signature:
image
(Image): The image for which detections were computed.detr_info
(Json): The output of a DETR object detection model, as returned bydetr_for_object_detection
.
- Json: A dictionary containing the data from
detr_info
, converted to COCO format.
tbl.detections
to COCO format, where tbl.image
is the image for which detections were computed:
sentence_transformer()
udf
Computes sentence embeddings. model_id
should be a pretrained Sentence Transformers model, as described
in the Sentence Transformers Pretrained Models documentation.
Requirements:
pip install torch sentence-transformers
sentence
(String): The sentence to embed.model_id
(String): The pretrained model to use for the encoding.normalize_embeddings
(Bool): IfTrue
, normalizes embeddings to length 1; see the Sentence Transformers API Docs for more details
- Array[(None,), Float]: An array containing the output of the embedding model.
all-mpnet-base-2
to an existing Pixeltable column tbl.sentence
of the table tbl
:
sentence_transformer_list()
udf
Signature:
speech2text_for_conditional_generation()
udf
Transcribes or translates speech to text using a Speech2Text model. model_id
should be a reference to a
pretrained Speech2Text model.
Requirements:
pip install torch torchaudio sentencepiece transformers
audio
(Audio): The audio clip to transcribe or translate.model_id
(String): The pretrained model to use for the transcription or translation.language
(Optional[String]): If using a multilingual translation model, the language code to translate to. If not provided, the model’s default language will be used. If the model is not translation model, is not a multilingual model, or does not support the specified language, an error will be raised.
- String: The transcribed or translated text.
facebook/s2t-small-librispeech-asr
to an existing Pixeltable column audio
of the table tbl
:
facebook/s2t-medium-mustc-multilingual-st
to an existing Pixeltable column audio
of the table tbl
, translating the audio to French:
vit_for_image_classification()
udf
Computes image classifications for the specified image using a Vision Transformer (ViT) model.
model_id
should be a reference to a pretrained ViT Model.
Note: Be sure the model is a ViT model that is trained for image classification (that is, a model designed for use with the ViTForImageClassification class), such as google/vit-base-patch16-224
. General feature-extraction models such as google/vit-base-patch16-224-in21k
will not produce the desired results.
Requirements:
pip install torch transformers
image
(Image): The image to classify.model_id
(String): The pretrained model to use for the classification.top_k
(Int): The number of classes to return.
- Json: A dictionary containing the output of the image classification model, in the following format:
google/vit-base-patch16-224
to an existing Pixeltable column image
of the table tbl
, returning the 10 most likely classes for each image: