Skip to main content

Backup Strategies

Deployment ApproachBackup StrategyRecovery Method
Orchestration LayerExternal RDBMS + Blob Storage backupsRe-run transformation pipelines
Full Backendpg_dump of ~/.pixeltable/pgdata + S3/GCS versioningRestore pgdata + media files

Full Backend Backup

For deployments using Pixeltable as the full backend:
# Backup PostgreSQL data
pg_dump -h ~/.pixeltable/pgdata -U postgres pixeltable > backup.sql

# Backup media files (if stored locally)
tar -czf media_backup.tar.gz ~/.pixeltable/media/

# For cloud media storage, ensure S3/GCS versioning is enabled

Orchestration Layer Backup

For orchestration-only deployments:
  • Primary data lives in your external RDBMS and blob storage
  • Pixeltable state can be rebuilt by re-running transformation pipelines
  • Back up your setup_pixeltable.py and UDF code in version control

Recovery Procedures

Full Backend Recovery

  1. Stop the Pixeltable application
  2. Restore PostgreSQL data: psql -f backup.sql
  3. Restore media files to ~/.pixeltable/media/
  4. Restart the application

Orchestration Layer Recovery

  1. Deploy fresh Pixeltable instance
  2. Run setup_pixeltable.py to recreate schema
  3. Re-process data through computed columns (incremental)

Security Best Practices

Security LayerRecommendationImplementation
NetworkDeploy within private VPCDo not expose PostgreSQL port (5432) to internet
AuthenticationApplication layer (FastAPI/Django)Pixeltable does not manage end-user accounts
Cloud CredentialsIAM Roles / Workload IdentityAvoid long-lived keys in config.toml

Network Security

# Example: Kubernetes NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: pixeltable-network-policy
spec:
  podSelector:
    matchLabels:
      app: pixeltable
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api-server
      ports:
        - protocol: TCP
          port: 8000

Secrets Management

Never hardcode secrets. Use environment variables or secrets managers:
# config.py - Load from environment
import os

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY')

# Or use python-dotenv for local development
from dotenv import load_dotenv
load_dotenv()
For production, use:
  • AWS: Secrets Manager, Parameter Store
  • GCP: Secret Manager
  • Kubernetes: Secrets, External Secrets Operator

Cloud Storage Credentials

For S3/GCS/Azure media storage:
# Prefer IAM roles over long-lived credentials
# AWS: Use EC2 instance profile or EKS IRSA
# GCP: Use Workload Identity

# If credentials required, set via environment variables:
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
# GOOGLE_APPLICATION_CREDENTIALS

Audit and Compliance

Data Lineage

Pixeltable automatically tracks:
  • Table versions and schema changes
  • Computed column definitions and dependencies
  • Insert/update/delete operations
# View table history
table.history()

# Get specific version
old_version = pxt.get_table('myapp.documents:5')  # Version 5

Access Logging

Implement application-level access logging:
from fastapi import FastAPI, Request
import logging

logger = logging.getLogger("audit")

@app.middleware("http")
async def audit_log(request: Request, call_next):
    logger.info(f"User: {request.user} Action: {request.method} {request.url}")
    response = await call_next(request)
    return response

Disaster Recovery

Recovery Time Objectives

DeploymentRTOStrategy
Orchestration LayerMinutesSpin up new instance, re-run pipelines
Full BackendHoursRestore from backup, validate data integrity

Recommendations

  1. Regular backups: Daily for production workloads
  2. Test recovery: Quarterly disaster recovery drills
  3. Multi-region: Store backups in different region than primary
  4. Immutable backups: Use S3 Object Lock or GCS retention policies