Skip to main content

module  pixeltable.functions.voyageai

Pixeltable UDFs that wrap various endpoints from the Voyage AI API. In order to use them, you must first pip install voyageai and configure your Voyage AI credentials, as described in the Working with Voyage AI tutorial.

udf  embeddings()

Signature
embeddings(
    input: pxt.String,
    *,
    model: pxt.String,
    input_type: pxt.String | None = None,
    truncation: pxt.Bool | None = None,
    output_dimension: pxt.Int | None = None,
    output_dtype: pxt.String | None = None
) -> pxt.Array[(None,), Float]
Creates an embedding vector representing the input text. Equivalent to the Voyage AI embeddings API endpoint. For additional details, see: https://docs.voyageai.com/docs/embeddings Request throttling: Applies the rate limit set in the config (section voyageai, key rate_limit). If no rate limit is configured, uses a default of 600 RPM. Requirements:
  • pip install voyageai
Parameters:
  • input (pxt.String): The text to embed.
  • model (pxt.String): The model to use for the embedding. Recommended options: voyage-3-large, voyage-3.5, voyage-3.5-lite, voyage-code-3, voyage-finance-2, voyage-law-2.
  • input_type (pxt.String | None): Type of the input text. Options: None, query, document. When input_type is None, the embedding model directly converts the inputs into numerical vectors. For retrieval/search purposes, we recommend setting this to query or document as appropriate.
  • truncation (pxt.Bool | None): Whether to truncate the input texts to fit within the context length. Defaults to True.
  • output_dimension (pxt.Int | None): The number of dimensions for resulting output embeddings. Most models only support a single default dimension. Models voyage-3-large, voyage-3.5, voyage-3.5-lite, and voyage-code-3 support: 256, 512, 1024 (default), and 2048.
  • output_dtype (pxt.String | None): The data type for the embeddings to be returned. Options: float, int8, uint8, binary, ubinary. Only float is currently supported in Pixeltable.
Returns:
  • pxt.Array[(None,), Float]: An array representing the application of the given embedding to input.
Examples: Add a computed column that applies the model voyage-3.5 to an existing Pixeltable column tbl.text of the table tbl:
tbl.add_computed_column(
    embed=embeddings(tbl.text, model='voyage-3.5', input_type='document')
)
Add an embedding index to an existing column text, using the model voyage-3.5:
tbl.add_embedding_index(
    'text', string_embed=embeddings.using(model='voyage-3.5')
)

udf  multimodal_embed()

Signatures
# Signature 1:
multimodal_embed(
    text: pxt.String,
    model: pxt.String,
    input_type: pxt.String | None,
    truncation: pxt.Bool
) -> pxt.Array[(1024,), Float]

# Signature 2:
multimodal_embed(
    image: pxt.Image,
    model: pxt.String,
    input_type: pxt.String | None,
    truncation: pxt.Bool
) -> pxt.Array[(1024,), Float]
Creates an embedding vector for text or images using Voyage AI’s multimodal model. Equivalent to the Voyage AI multimodal_embed API endpoint. For additional details, see: https://docs.voyageai.com/docs/multimodal-embeddings Request throttling: Applies the rate limit set in the config (section voyageai, key rate_limit). If no rate limit is configured, uses a default of 600 RPM. Requirements:
  • pip install voyageai
Parameters:
  • text (String): The text to embed.
  • model (String, default: Literal('voyage-multimodal-3')): The model to use. Currently only voyage-multimodal-3 is supported.
  • input_type (String | None, default: Literal(None)): Type of the input. Options: None, query, document. For retrieval/search, set to query or document as appropriate.
  • truncation (Bool, default: Literal(True)): Whether to truncate inputs to fit within context length. Defaults to True.
Returns:
  • pxt.Array[(1024,), Float]: An array of 1024 floats representing the embedding.
Examples: Embed a text column description:
tbl.add_computed_column(
    embed=multimodal_embed(tbl.description, input_type='document')
)
Add an embedding index for column description:
tbl.add_embedding_index(
    'description',
    string_embed=multimodal_embed.using(model='voyage-multimodal-3'),
)
Embed an image column img:
tbl.add_computed_column(
    embed=multimodal_embed(tbl.img, input_type='document')
)

udf  rerank()

Signature
rerank(
    query: pxt.String,
    documents: pxt.Json,
    *,
    model: pxt.String,
    top_k: pxt.Int | None = None,
    truncation: pxt.Bool = True
) -> pxt.Json
Reranks documents based on their relevance to a query. Equivalent to the Voyage AI rerank API endpoint. For additional details, see: https://docs.voyageai.com/docs/reranker Request throttling: Applies the rate limit set in the config (section voyageai, key rate_limit). If no rate limit is configured, uses a default of 600 RPM. Requirements:
  • pip install voyageai
Parameters:
  • query (pxt.String): The query as a string.
  • documents (pxt.Json): The documents to be reranked as a list of strings.
  • model (pxt.String): The model to use for reranking. Recommended options: rerank-2.5, rerank-2.5-lite.
  • top_k (pxt.Int | None): The number of most relevant documents to return. If not specified, all documents will be reranked and returned.
  • truncation (pxt.Bool): Whether to truncate the input to satisfy context length limits. Defaults to True.
Returns:
  • pxt.Json: A dictionary containing:
    • results: List of reranking results with index, document, and relevance_score
    • total_tokens: The total number of tokens used
Examples: Rerank similarity search results for better relevance. First, create a table with an embedding index, then use a query function to retrieve candidates and rerank them:
docs = pxt.create_table('docs', {'text': pxt.String})
docs.add_computed_column(embed=embeddings(docs.text, model='voyage-3.5'))
docs.add_embedding_index('text', embed=docs.embed)


@pxt.query
def get_candidates(query_text: str):
    sim = docs.text.similarity(
        query_text, embed=embeddings.using(model='voyage-3.5')
    )
    return docs.order_by(sim, asc=False).limit(20).select(docs.text)


queries = pxt.create_table('queries', {'query': pxt.String})
queries.add_computed_column(candidates=get_candidates(queries.query))
queries.add_computed_column(
    reranked=rerank(
        queries.query,
        queries.candidates.text,
        model='rerank-2.5',
        top_k=5,
    )
)