This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
When Pixeltable generates media files (thumbnails, extracted frames,
processed images), by default it stores them locally. For production
workflows, you can configure Pixeltable to upload these files directly
to cloud blob storage including Amazon S3, Google Cloud Storage, Azure
Blob Storage, and S3-compatible services like Cloudflare R2, Backblaze
B2, and Tigris.
Key features:
- Computed media (AI-generated outputs) automatically uploads to your
bucket
- Input media can optionally be persisted for durability
- Files are cached locally and downloaded on-demand
Configuration options:
-
Global defaults in
config.toml:
[pixeltable]
input_media_dest = "s3://my-bucket/input/"
output_media_dest = "s3://my-bucket/output/"
-
Per-column destination (computed columns only):
t.add_computed_column(
thumbnail=t.image.thumbnail((128, 128)),
destination='s3://my-bucket/thumbnails/'
)
In this notebook, you’ll learn how to configure blob storage
destinations for your media files.
What you’ll learn
- Where Pixeltable stores files by default
- How to specify destinations for individual columns
- How to configure global destinations for all columns
- How destination precedence works
How it works
Pixeltable decides where to store media files using this priority:
- Column destination (highest priority) —
destination parameter
in add_computed_column()
- Global configuration —
input_media_dest / output_media_dest
in config file
- Pixeltable’s default local storage — Used if nothing else is
configured
Prerequisites
For this notebook, you’ll need:
pixeltable and boto3 installed
- (Optional) Cloud storage credentials if you want to use a cloud
provider
%pip install -qU pixeltable boto3
Setup
Let’s set up our demo environment. We’ll create a Pixeltable directory
for this demo, set up local destination paths, create a table, and
insert a sample image.
You can substitute cloud storage URIs (like s3://my-bucket/path/)
anywhere you see a local destination path.
import pixeltable as pxt
from pathlib import Path
# Clean slate for this demo
pxt.drop_dir('blob_storage_demo', force=True)
pxt.create_dir('blob_storage_demo')
Now we’ll create a table with an image column and insert a sample image
from the web.
# Create table
t = pxt.create_table(
'blob_storage_demo/media',
{'source_image': pxt.Image},
if_exists='replace',
)
Created table ‘media’.
We can inspect the schema before adding images to our table:
Let’s insert a single sample image.
sample_image = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'
t.insert(source_image=sample_image)
Inserted 1 row with 0 errors in 0.77 s (1.29 rows/s)
1 row inserted.
And we can see the image in our table:
Default destinations
By default, Pixeltable stores all media files in local storage under
~/.pixeltable/media:
- Input files (files you insert) — If you insert a URL, Pixeltable
stores the URL and downloads it to cache on access. If you insert a
local file path, Pixeltable just stores the path reference (the file
stays where it is).
- Output files (files Pixeltable generates) — Stored in
~/.pixeltable/media
This works out of the box with no configuration. You can change these
defaults, which we’ll cover in the rest of this notebook.
Let’s check where the source image is stored. Since we inserted a URL
(not a local file), Pixeltable stores the URL reference and will
download it to cache when we access it.
# Let's see where the source_image is stored by default
t.select(t.source_image.fileurl).collect()
Now let’s add a computed column without specifying a destination. This
will show us where Pixeltable stores output files by default.
# Add computed column with no destination specified - uses default
t.add_computed_column(
flipped=t.source_image.transpose(0), if_exists='replace'
)
Added 1 column value with 0 errors in 0.02 s (45.44 rows/s)
1 row updated.
Check the file URL - it points to ~/.pixeltable/media, the default
location for generated files.
t.select(t.flipped, t.flipped.fileurl).collect()
Per-column destinations
When you create a computed column, you can specify exactly where to
store generated files using the destination= parameter. This gives you
fine-grained control over outputs, which may be costly and/or difficult
to re-generate.
We’ll create a destination directory for storing one of our processed
images. For this demo, we’re using a local directory on your Desktop,
but you can replace this path with a cloud storage URI (like
s3://my-bucket/rotated/).
# Create a local destination directory
# For S3: dest_rotated = "s3://my-bucket/rotated/"
# For GCS: dest_rotated = "gs://my-bucket/rotated/"
base_path = Path.home() / 'Desktop' / 'pixeltable_outputs'
base_path.mkdir(parents=True, exist_ok=True)
dest_rotated = str(base_path / 'rotated')
# Create directory (only needed for local paths)
Path(dest_rotated).mkdir(exist_ok=True)
Now let’s add a computed column with an explicit destination to see
the difference from the default behavior.
# Add column WITH explicit destination
t.add_computed_column(
rotated=t.source_image.rotate(90),
destination=dest_rotated,
if_exists='replace',
)
Added 1 column value with 0 errors in 0.02 s (48.98 rows/s)
1 row updated.
Compare the file URLs. The rotated image uses our explicit
destination, while flipped (created earlier) uses the default
~/.pixeltable/media location.
t.select(t.rotated, t.rotated.fileurl).collect()
t.select(t.flipped, t.flipped.fileurl).collect()
Changing global destinations
Instead of setting destination= on every column, you can change the
global default for ALL columns.
You can configure two types of global destinations:
output_media_dest — Changes the default for files Pixeltable
generates (computed columns)
input_media_dest — Changes the default for files you insert
into tables
You can set them to the same bucket or different buckets depending on
your needs.
You have two options:
Option 1: Configuration file (~/.pixeltable/config.toml)
[pixeltable]
# Where files Pixeltable generates are stored
output_media_dest = "s3://my-bucket/output/"
# Where files you insert are stored
input_media_dest = "s3://my-bucket/input/"
Option 2: Environment variables
export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/output/"
export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/input/"
For complete authentication and setup details, see the Cloud Storage
documentation.
Overriding global destinations
Even if you configure global destinations, you can still override them
for specific columns using the destination= parameter in
add_computed_column().
Let’s create a new destination directory and add a thumbnail column that
uses it.
# Create a different destination for thumbnails
dest_thumbnails = str(base_path / 'thumbnails')
Path(dest_thumbnails).mkdir(exist_ok=True)
# Add column with explicit destination (overrides any global default)
t.add_computed_column(
thumbnail=t.source_image.thumbnail((128, 128)),
destination=dest_thumbnails,
if_exists='replace',
)
Added 1 column value with 0 errors in 0.02 s (47.89 rows/s)
1 row updated.
Let’s view the thumbnail and its file URL. The explicit destination=
parameter always wins, regardless of global configuration.
t.select(t.thumbnail, t.thumbnail.fileurl).collect()
Getting URLs for your files
When your files are in blob storage, you can get URLs that point
directly to them. These URLs work in HTML, APIs, or any application you
need to serve media with.
The .fileurl property gives you direct URLs you can use anywhere.
t.select(
source=t.source_image.fileurl,
rotated=t.rotated.fileurl,
flipped=t.flipped.fileurl,
).collect()
Generating presigned URLs
Note: This section only applies if you’re using cloud storage (S3,
GCS, Azure, R2, B2, Tigris). If you’re following along with local
destinations (as in the examples above), you can skip this section or
configure cloud storage to try it out.
When your files are in cloud storage, the .fileurl property returns
storage URIs like s3://bucket/path/file.jpg. These aren’t directly
accessible over HTTP.
For private buckets or when you need time-limited HTTP access, use
presigned URLs. These are temporary, authenticated URLs that allow
anyone to access your files for a limited time without needing
credentials.
Presigned URLs are particularly useful for:
- Sharing files from private buckets without making them public
- Creating temporary download links with expiration
- Serving media in web applications without exposing credentials
- Providing time-limited access to sensitive content
Use the presigned_url function from pixeltable.functions.net:
import os
# Use HTTPS URL format for Backblaze B2
b2_region = 'us-east-005'
b2_bucket = 'pixeltable'
cloud_destination = (
f'https://s3.{b2_region}.backblazeb2.com/{b2_bucket}/presigned-demo/'
)
# Add the computed column
t.add_computed_column(
cloud_thumbnail=t.source_image.thumbnail((64, 64)),
destination=cloud_destination,
if_exists='replace',
)
Added 1 column value with 0 errors in 0.22 s (4.46 rows/s)
1 row updated.
# Now generate presigned URLs for the cloud-stored files
from pixeltable.functions import net
t.select(
cloud_thumbnail=t.cloud_thumbnail,
storage_url=t.cloud_thumbnail.fileurl,
presigned_url=net.presigned_url(
t.cloud_thumbnail.fileurl, 3600
), # 1-hour expiration
).collect()
The presigned URLs in the output are fully authenticated HTTP/HTTPS URLs
that can be accessed directly in a browser or used in APIs without any
credentials.
Common expiration times
Note: Different storage providers have different maximum expiration
limits. For example, Google Cloud Storage has a maximum 7-day expiration
for presigned URLs.
Troubleshooting presigned URLs
If presigned_url() isn’t working:
-
Local files: Presigned URLs only work with cloud storage (S3,
GCS, Azure, R2, B2, Tigris). If your files are stored locally
(default), you’ll get an error. Configure a cloud destination first.
-
Already HTTP URLs: If
.fileurl returns an http:// or
https:// URL (not a storage URI like s3://), the file is already
publicly accessible and doesn’t need a presigned URL.
-
Credentials: Ensure your cloud storage credentials are properly
configured. See the Cloud Storage
documentation
for provider-specific setup.
Common patterns
Here are a few real-world patterns you might use:
If you want everything in the same bucket, configure both input and
output destinations in ~/.pixeltable/config.toml:
[pixeltable]
input_media_dest = "s3://my-bucket/media/"
output_media_dest = "s3://my-bucket/media/"
Or set environment variables:
export PIXELTABLE_INPUT_MEDIA_DEST="s3://my-bucket/media/"
export PIXELTABLE_OUTPUT_MEDIA_DEST="s3://my-bucket/media/"
Keep source files separate from processed files in
~/.pixeltable/config.toml:
[pixeltable]
input_media_dest = "s3://my-bucket/uploads/"
output_media_dest = "s3://my-bucket/processed/"
Pattern 3: Override for specific columns
Use a global default, but send some columns elsewhere. First, set a
global default in your config:
[pixeltable]
output_media_dest = "s3://my-bucket/processed/"
Then in your code, most columns use the global default, but you can
override specific ones:
# Uses global default (s3://my-bucket/processed/)
t.add_computed_column(
thumbnail=t.image.thumbnail((128, 128))
)
# Overrides global default - goes to different location
t.add_computed_column(
large_thumbnail=t.image.thumbnail((512, 512)),
destination='s3://my-bucket/thumbnails/'
)
Where do my files go?
Understanding how Pixeltable handles different types of input files
helps you make better decisions about storage configuration.
When you configure a cloud destination, Pixeltable populates both the
destination and the local cache efficiently during insert(). For URLs,
this means downloading once and using that download for both the upload
and cache—avoiding wasteful upload→download cycles.
What you learned
- Pixeltable uses local storage by default for all media files
- You can override the default for specific columns with the
destination parameter
- You can change the global default with
input_media_dest and
output_media_dest
- Precedence: column destination > global config > Pixeltable’s
default local storage
- Use
.fileurl to get URLs for your stored files
- Use
net.presigned_url() to generate time-limited, authenticated
HTTP URLs for cloud storage files
- Pixeltable handles caching intelligently to avoid wasteful
operations
See also
Next steps