Skip to main content

module  pixeltable.functions.vllm

Pixeltable UDFs for vLLM models. Provides integration with vLLM for high-throughput inference with large language models, supporting chat completions and text generation with HuggingFace models.

udf  chat_completions()

Signature
@pxt.udf
chat_completions(
    messages: pxt.Json[(Json, ...)],
    *,
    model: pxt.String,
    engine_args: pxt.Json | None = None,
    sampling_params: pxt.Json | None = None
) -> VllmRequestOutput
Generate a chat completion from a list of messages using vLLM. For additional details, see the vLLM documentation. Requirements:
  • pip install vllm
Parameters:
  • messages (pxt.Json[(Json): A list of messages to generate a response for. Each message should be a dict with role and content keys, following the OpenAI chat format.
  • model (Any): The HuggingFace model identifier (e.g., 'Qwen/Qwen2.5-0.5B-Instruct').
  • engine_args (Any): Additional keyword args for the vLLM LLM constructor, such as dtype, max_model_len, gpu_memory_utilization, tensor_parallel_size. For details, see the vLLM engine args documentation.
  • sampling_params (Any): Keyword args for vLLM SamplingParams, such as max_tokens, temperature, top_p, top_k. For details, see the vLLM sampling params documentation.
Returns:
  • VllmRequestOutput: A dict containing the vLLM RequestOutput in its native format.
Examples: Add a computed column that generates chat completions:
t.add_computed_column(
    result=chat_completions(
        t.messages, model='Qwen/Qwen2.5-0.5B-Instruct'
    )
)
With custom sampling parameters:
t.add_computed_column(
    result=chat_completions(
        t.messages,
        model='Qwen/Qwen2.5-0.5B-Instruct',
        sampling_params={'max_tokens': 256, 'temperature': 0.7},
    )
)

udf  generate()

Signature
@pxt.udf
generate(
    prompt: pxt.String,
    *,
    model: pxt.String,
    engine_args: pxt.Json | None = None,
    sampling_params: pxt.Json | None = None
) -> VllmRequestOutput
Generate text completion for a given prompt using vLLM. Uses vLLM’s high-throughput inference engine for efficient local LLM serving. Models are loaded from HuggingFace and cached for reuse across calls. For additional details, see the vLLM documentation. Requirements:
  • pip install vllm
Parameters:
  • prompt (pxt.String): The text prompt to generate a completion for.
  • model (pxt.String): The HuggingFace model identifier (e.g., 'Qwen/Qwen2.5-0.5B-Instruct').
  • engine_args (pxt.Json | None): Additional keyword args for the vLLM LLM constructor, such as dtype, max_model_len, gpu_memory_utilization, tensor_parallel_size. For details, see the vLLM engine args documentation.
  • sampling_params (pxt.Json | None): Keyword args for vLLM SamplingParams, such as max_tokens, temperature, top_p, top_k. For details, see the vLLM sampling params documentation.
Returns:
  • VllmRequestOutput: A dict containing the vLLM RequestOutput in its native format.
Examples: Add a computed column that generates text completions:
t.add_computed_column(
    result=generate(t.prompt, model='Qwen/Qwen2.5-0.5B-Instruct')
)
Last modified on June 12, 2026