Learn more about queries and expressions with this in-depth guide.

What are Queries?

Queries in Pixeltable allow you to filter, transform, and extract data from tables and views. Using Pixeltable’s expressive query interface, you can build powerful data pipelines without writing complex SQL or using external transformation tools.

Pixeltable queries use a fluent API design where operations can be chained together to:

  • Filter rows based on specific conditions using where()
  • Select and transform columns using select()
  • Sort results with order_by()
  • Limit result sets with limit()
  • Perform aggregations and calculations

Queries don’t execute until you call collect(), making them efficient for complex operations. You can build queries incrementally, adding conditions and transformations as needed.

# A basic query that filters, transforms, and sorts data
result = movies.where(
    (movies.year >= 2000) & (movies.budget > 100.0)
).select(
    movies.title,
    roi=movies.revenue / movies.budget
).order_by(
    'roi', asc=False
).limit(5).collect()

Basic Filtering

This guide builds on tables created in previous sections. The movies table from Tables

Use where() to filter rows based on basic conditions:

# Find movies with budget over $200M
movies.where(
    movies.budget >= 200.0
).collect()
titleyearbudget
Titanic1997200.0
Avengers: Endgame2019356.0

Intermediate Queries

Advanced Queries

Text Analysis

Analyze plot text and titles:

# Calculate plot length statistics
movies.select(
    movies.title,
    plot_length=movies.plot.len()
).collect()

# Find movies with subtitles (colon in title)
movies.where(
    movies.title.like('%: %')
).collect()

Numeric Analysis

Budget calculations and comparisons:

# Calculate budget statistics
movies.select(
    avg_budget=movies.budget.avg(),
    total_budget=movies.budget.sum(),
    movie_count=movies.title.count()
).collect()

# Movies above average budget
avg_budget = movies.select(
    movies.budget.avg()
).collect()[0][0]

movies.where(
    movies.budget > avg_budget
).collect()

Complex Filtering

Combine multiple operations:

# Top 3 highest-budget movies since 2000
movies.where(
    movies.year >= 2000
).order_by(
    movies.budget, 
    asc=False
).limit(3).collect()

# Analyze plots excluding specific keywords
movies.where(
    ~(movies.plot.contains('secret') | 
      movies.plot.contains('dream'))
).select(
    movies.title,
    plot_length=movies.plot.len()
).collect()

Key Concepts

Column References

Access columns using dot notation (movies.title).

Chaining Operations

Combine where(), select(), order_by(), and limit() in any order.

Boolean Logic

Use & (and), | (or), and ~ (not) for complex conditions.

Transformations

Transform data during selection with mathematical and string operations.

Build complex queries incrementally by starting with basic filters and adding operations one at a time. This makes debugging easier and helps ensure your query returns the expected results.