Skip to main content
Learn how to publish datasets to Pixeltable Cloud and replicate datasets from the cloud to your local environment.

Overview

Pixeltable Cloud enables you to:
  • Publish your datasets for sharing with teams or the public
  • Replicate datasets from the cloud to your local environment
  • Share multimodal AI datasets (images, videos, audio, documents) without managing infrastructure
This guide demonstrates both publishing and replicating datasets.

Setup

Data sharing functionality requires Pixeltable version 0.4.24 or later.
%pip install -qU pixeltable>=0.4.24
import pixeltable as pxt

Replicating Datasets

You can replicate any public dataset from Pixeltable Cloud to your local environment without needing an account or API key.

Replicate a Public Dataset

Let’s replicate a mini-version of the COCO-2017 dataset from Pixeltable Cloud. You can find this dataset at pixeltable.com/t/pixeltable:fiftyone/coco_mini_2017, or browse for other public datasets. When calling replicate():
  • remote_uri (required): The URI of the cloud dataset you want to replicate
  • local_path (your choice): The local directory/table name where you want to store the replica
  • Variable name (your choice): The Python variable in your session/script to reference the table (e.g., coco_copy)
See the replicate() SDK reference for full documentation.
# The remote_uri is the specific cloud dataset you want to replicate
# The local_path and variable name are yours to choose
coco_copy = pxt.replicate(
    remote_uri='pxt://pixeltable:fiftyone/coco_mini_2017',
    local_path='coco-copy'
)
You can check that the replica exists at the local path with list_tables().
pxt.list_tables()

Working with Replicas

Replicated datasets are read-only locally, but you can query, explore, and use them in powerful ways: 1. Query and explore the data
# View the replicated data
coco_copy.limit(3).collect()
image coco_id num_detections width height caption
41 5 640 427 A person wearing a helmet and protective gear rides a skateboard down a residential street, with houses and parked cars visible in the background.
47 9 426 640 A young man in a red shirt and black cap is cooking at a grill in a diner-like setting, while various condiments and kitchen utensils are visible on the counter and walls.
44 1 640 427 A brown eagle with a white head is flying low over a body of water, its wings spread wide against the dark, rippling surface.
2. Perform similarity searches Replicas include embedding indexes, so you can immediately perform similarity searches:
# Get a sample image to search with
sample_img = coco_copy.select(coco_copy.image).limit(1).collect()[0]['image']
sample_img
# Perform image-based similarity search
sim = coco_copy.image.similarity(sample_img)
results = (
    coco_copy
    .order_by(sim, asc=False)
    .limit(5)
    .select(coco_copy.image, sim)
    .collect()
)
results
image similarity
1.
0.708
0.669
0.607
0.606
Because the COCO dataset uses CLIP embeddings (which are multimodal), you can also search using text queries:
# Perform text-based similarity search
sim = coco_copy.image.similarity('surfing')
results = (
    coco_copy
    .order_by(sim, asc=False)
    .limit(4)
    .select(coco_copy.image, sim)
    .collect()
)
results
image similarity
0.268
0.262
0.234
0.22
3. Access replicas in new sessions In a new Python session, use list_tables() and get_table() to access your replicas:
# List all tables to see your replica
pxt.list_tables()
# Assign a handle to the replica
coco_copy = pxt.get_table('coco-copy')
4. Create an independent copy To work with the data in new ways, create an independent table with the replica as the source:
# Create a fresh table with values only
my_coco = pxt.create_table('my-coco-table', source=coco_copy)
This copies the values in the source, but drops the computational definitions and cannot be updated if the source table changes.

Updating Replicas with Pull

If the upstream table changes, you can update your local replica using pull():
# Update your local replica with changes from the cloud
coco_copy.pull()
This synchronizes your local replica with any updates made to the source dataset.

Publishing Datasets

Requirements:
  • A Pixeltable Cloud account (Community Edition includes 1TB storage - see pricing)
  • Your API key from the account dashboard
Publishing allows you to share your datasets with your team or make them publicly available.

Configure Your API Key

Pixeltable looks for your API key in the PIXELTABLE_API_KEY environment variable. Choose one of these methods: Option 1: In your notebook (secure and convenient) Run this cell to securely enter your API key (get it from pixeltable.com/dashboard):
from getpass import getpass
import os

os.environ['PIXELTABLE_API_KEY'] = getpass('Pixeltable API Key:')
Option 2: Environment variable Add to your ~/.zshrc or ~/.bashrc:
export PIXELTABLE_API_KEY='your-api-key-here'
Option 3: Config file Add to ~/.pixeltable/config.toml:
[pixeltable]
api_key = 'your-api-key-here'
See the Configuration Guide for details.

Create a Sample Dataset

Let’s create a table with images from this repository to publish. The comment parameter provides a description that will be visible on Pixeltable Cloud:
# Create a fresh directory
pxt.drop_dir('sample-images', force=True)
pxt.create_dir('sample-images')
t = pxt.create_table(
    'sample-images.photos',
    schema={'image': pxt.Image, 'description': pxt.String},
    comment='Sample image dataset for demonstrating Pixeltable Cloud publishing'
)
base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
t.insert([
    {'image': f'{base_url}/000000000009.jpg', 'description': 'Kitchen scene'},
    {'image': f'{base_url}/000000000025.jpg', 'description': 'Street view'},
    {'image': f'{base_url}/000000000042.jpg', 'description': 'Indoor setting'},
])

Publish Your Dataset

Publish your table to Pixeltable Cloud. When calling publish():
  • source (required): An existing local table - either a table path string (e.g., 'sample-images.photos') or table handle (e.g., t)
    • If you use a local table path string, it must match a table in your local database (you can verify with pxt.list_tables())
  • destination_uri (required): The cloud URI where you want to publish, in the format pxt://orgname/dataset
    • Pixeltable automatically creates any directory structure in the cloud based on this URI
    • Your local directory structure doesn’t need to match the cloud structure
See the publish() SDK reference for full documentation.
# Option 1: Publish using table path (string)
pxt.publish(
    source='sample-images.photos',  # Table path from list_tables()
    destination_uri='pxt://your-orgname/sample-images'
)

# Option 2: Publish using table handle
# pxt.publish(
#     source=t,  # Table handle you assigned
#     destination_uri='pxt://your-orgname/sample-images'
# )

Understanding Destination URIs

The destination_uri in publish() uses the format: pxt://org:database/path URI components:
  • org (required): Your organization name
  • database (optional): Database name - defaults to main if omitted
  • path (required): Directory and table path in the cloud
Examples:
  • pxt://orgname/my-dataset → Uses the default main database
  • pxt://orgname:main/my-dataset → Explicitly specifies the main database
  • pxt://orgname:analytics/my-dataset → Uses the analytics database
About databases:
  • Every Pixeltable Cloud account includes a main database by default
  • Each database has its own storage bucket
  • You can create additional databases in your Pixeltable dashboard

Updating Published Datasets with Push

After you’ve published a dataset, you can update the cloud replica with local changes using push():
# Make some changes to your local table
t.insert([{'image': f'{base_url}/000000000049.jpg', 'description': 'Outdoor scene'}])

# Push the changes to your published dataset
t.push()
This updates the published dataset on Pixeltable Cloud with your local changes. Your dataset is now published and can be replicated by others using:
import pixeltable as pxt

sample_images = pxt.replicate(
    remote_uri='pxt://your-orgname/sample-images',
    local_path='sample-images-copy'
)
Note: If you are the owner of a published table, you cannot use replicate() to create a replica of your own table. This is because the table already exists in your Pixeltable database. The replicate() function is intended for pulling datasets published by others into your environment.

Access Control

The access parameter in publish() controls who can replicate your dataset:
  • access='private' (default): Only your team members can access the dataset
  • access='public': Anyone can replicate your dataset
You can set access control either at the time of publish using the access parameter, or change it later in the Pixeltable Cloud UI. You can also manage team members and permissions in your dashboard.

Deleting Published Tables

If you want to delete a published table, you have two options: Option 1: Using the Pixeltable SDK Use drop_table() with your table’s destination URI (the same pxt:// URI you used when publishing):
pxt.drop_table('pxt://your-orgname/sample-images')
Option 2: Using the Pixeltable Cloud dashboard Navigate to your Pixeltable Cloud dashboard and delete the table from the UI.

Get Help

Have questions or need support? Join our community:
  • Discord Community: Ask questions, get community support, and share what you build with Pixeltable
  • YouTube: Watch tutorials, demos, and feature walkthroughs
  • GitHub Issues: Report bugs or request features

Resources