module pixeltable.functions.vllm
Pixeltable UDFs for vLLM models. Provides integration with vLLM for high-throughput inference with large language models, supporting chat completions and text generation with HuggingFace models.udf chat_completions()
Signature
pip install vllm
messages(pxt.Json[(Json): A list of messages to generate a response for. Each message should be a dict withroleandcontentkeys, following the OpenAI chat format.model(Any): The HuggingFace model identifier (e.g.,'Qwen/Qwen2.5-0.5B-Instruct').engine_args(Any): Additional keyword args for the vLLMLLMconstructor, such asdtype,max_model_len,gpu_memory_utilization,tensor_parallel_size. For details, see the vLLM engine args documentation.sampling_params(Any): Keyword args for vLLMSamplingParams, such asmax_tokens,temperature,top_p,top_k. For details, see the vLLM sampling params documentation.
VllmRequestOutput: A dict containing the vLLMRequestOutputin its native format.
udf generate()
Signature
pip install vllm
prompt(pxt.String): The text prompt to generate a completion for.model(pxt.String): The HuggingFace model identifier (e.g.,'Qwen/Qwen2.5-0.5B-Instruct').engine_args(pxt.Json | None): Additional keyword args for the vLLMLLMconstructor, such asdtype,max_model_len,gpu_memory_utilization,tensor_parallel_size. For details, see the vLLM engine args documentation.sampling_params(pxt.Json | None): Keyword args for vLLMSamplingParams, such asmax_tokens,temperature,top_p,top_k. For details, see the vLLM sampling params documentation.
VllmRequestOutput: A dict containing the vLLMRequestOutputin its native format.