Documentation Index
Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
Use this file to discover all available pages before exploring further.
Problem
You have media files stored in cloud storage (S3, GCS) or accessible via HTTP URLs. You need to process these files with AI models without downloading them all upfront.Solution
What’s in this recipe:- Reference media files by URL (S3, HTTP, local paths)
- Automatic caching of remote files on access
- Process files lazily without bulk downloads
Setup
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘cloud_demo’.
<pixeltable.catalog.dir.Dir at 0x10d31f710>
Load images from HTTP URLs
Reference images by URL—Pixeltable downloads them on demand:Created table ‘images’.
Inserting rows into `images`: 3 rows [00:00, 767.91 rows/s]
Inserted 3 rows with 0 errors.
3 rows inserted, 6 values computed.
Load videos from S3
Reference videos in S3 buckets (using public Multimedia Commons bucket):Created table ‘videos’.
Inserting rows into `videos`: 2 rows [00:00, 1477.13 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 4 values computed.
Add computed columns on remote media
Process remote media with computed columns—files are fetched automatically:Added 3 column values with 0 errors.
Added 3 column values with 0 errors.
3 rows updated, 6 values computed.
Generate presigned URLs for serving media
When you store media in private cloud storage, you need presigned URLs to serve files over HTTP. Thepresigned_url function converts storage
URIs to time-limited, publicly accessible URLs:
Added 2 column values with 0 errors.
2 rows updated, 4 values computed.
Use cases for presigned URLs:
- Serve private media in web applications without exposing credentials
- Generate download links for end users
- Integrate with CDNs or video players that require HTTP URLs
Supported URL formats
Pixeltable supports multiple URL schemes for media files: *Configure AWS/GCP credentials via environment variables or config files.Explanation
How caching works:- URLs are stored as references in the table
- Files are downloaded on first access (query or computed column)
- Downloaded files are cached in
~/.pixeltable/file_cache/ - Cache uses LRU eviction when space is needed
- Lazy loading - Only download files when needed
- Deduplication - Same URL is cached once
- Incremental processing - Add files without bulk downloads
- Cloud-native - Works directly with object storage
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM roles (when running on EC2/ECS)
See also
- Upload to S3 - Store generated media in S3/GCS
- Import from CSV - Load structured data
- Extract frames from videos - Process video files
- Analyze images in batch - AI vision on images
- Configure API keys - Set up credentials