Documentation Index
Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
Use this file to discover all available pages before exploring further.
llama.cpp
integration to run local LLMs efficiently.
Important notes
- Models are automatically downloaded from Hugging Face and cached locally
- Different quantization levels are available for performance/quality tradeoffs
- Consider memory usage when choosing models and quantizations
Set up environment
First, let’s install Pixeltable with llama.cpp support:Create a table for chat completions
Now let’s create a table that will contain our inputs and responses.Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory ‘llama_demo’.
Created table ‘chat’.
Next, we add a computed column that calls the Pixeltable
create_chat_completion UDF, which adapts the corresponding llama.cpp
API call. In our examples, we’ll use pretrained models from the Hugging
Face repository. llama.cpp makes it easy to do this by specifying a
repo_id (from the URL of the model) and filename from the model repo;
the model will then be downloaded and cached automatically.
(If this is your first time using Pixeltable, the
Pixeltable
Fundamentals tutorial contains more details about table creation,
computed columns, and UDFs.)
For this demo we’ll use Qwen2.5-0.5B, a very small (0.5-billion
parameter) model that still produces decent results. We’ll use Q5_K_M
(5-bit) quantization, which gives an excellent balance of quality and
efficiency.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s
No rows affected.
Test chat completion
Let’s try a simple query:Inserted 3 rows with 0 errors in 6.74 s (0.44 rows/s)
3 rows inserted.
Comparing models
Local model frameworks likellama.cpp make it easy to compare the
output of different models. Let’s try comparing the output from Qwen
against a somewhat larger model, Llama-3.2-1B. As always, when we add
a new computed column to our table, it’s automatically evaluated against
the existing table rows.
Added 3 column values with 0 errors in 6.32 s (0.47 rows/s)
Added 3 column values with 0 errors in 0.03 s (113.79 rows/s)
Just for fun, let’s try running against a different system prompt with a
different persona.
Added 3 column values with 0 errors in 7.70 s (0.39 rows/s)
Added 3 column values with 0 errors in 0.02 s (143.54 rows/s)