Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Generate concise summaries of long text, articles, or documents using large language models.

Problem

You have long text content—articles, transcripts, documents—that needs to be summarized. Processing each piece manually is time-consuming and inconsistent.

Solution

What’s in this recipe:
  • Summarize text using OpenAI GPT models
  • Customize summary style with prompts
  • Process multiple documents automatically
You add a computed column that calls an LLM to generate summaries. When you insert new text, summaries are generated automatically.

Setup

%pip install -qU pixeltable openai
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
Note: you may need to restart the kernel to use updated packages.
import os
import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import openai

Load sample text

# Create a fresh directory
pxt.drop_dir('summarize_demo', force=True)
pxt.create_dir('summarize_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘summarize_demo’.
<pixeltable.catalog.dir.Dir at 0x30d758b10>
# Create table for articles
articles = pxt.create_table('summarize_demo.articles', {
    'title': pxt.String,
    'content': pxt.String
})
Created table ‘articles’.
# Sample articles to summarize
sample_articles = [
    {
        'title': 'The Rise of Electric Vehicles',
        'content': '''Electric vehicles (EVs) have seen unprecedented growth in recent years,
        transforming the automotive industry. Sales increased by 60% globally in 2023,
        with China leading the market followed by Europe and North America. Major automakers
        like Tesla, BYD, and traditional manufacturers have invested billions in EV technology.
        Battery costs have dropped significantly, making EVs more affordable for consumers.
        Government incentives and stricter emissions regulations continue to drive adoption.
        Charging infrastructure is expanding rapidly, with new fast-charging networks being
        deployed across major highways. Despite challenges like range anxiety and charging
        times, consumer acceptance is growing steadily.'''
    },
    {
        'title': 'Advances in Renewable Energy',
        'content': '''Solar and wind power capacity reached record levels in 2023, accounting
        for over 30% of global electricity generation. The cost of solar panels has fallen
        by 90% over the past decade, making renewable energy competitive with fossil fuels.
        Offshore wind farms are being built at scale, with turbines now reaching heights
        of over 250 meters. Energy storage solutions, particularly lithium-ion batteries,
        are addressing intermittency challenges. Countries like Denmark and Scotland have
        achieved periods of 100% renewable electricity. Corporate power purchase agreements
        are accelerating the transition, with tech giants committing to carbon-neutral operations.'''
    }
]

articles.insert(sample_articles)
Inserting rows into `articles`: 2 rows [00:00, 316.21 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 4 values computed.
# View articles
articles.select(articles.title, articles.content).collect()

Generate summaries

Add a computed column that generates summaries using GPT:
# Create prompt template for summarization
prompt = 'Summarize the following article in 2-3 sentences:\n\n' + articles.content

# Add computed column for LLM response
articles.add_computed_column(
    response=openai.chat_completions(
        messages=[{'role': 'user', 'content': prompt}],
        model='gpt-4o-mini'
    )
)
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# Extract the summary text from the response
articles.add_computed_column(
    summary=articles.response.choices[0].message.content
)
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# View titles and summaries
articles.select(articles.title, articles.summary).collect()

Custom summary styles

You can customize the summary format by changing the prompt:
# Add bullet-point summary
bullet_prompt = 'List the 3 key points from this article as bullet points:\n\n' + articles.content

articles.add_computed_column(
    bullet_response=openai.chat_completions(
        messages=[{'role': 'user', 'content': bullet_prompt}],
        model='gpt-4o-mini'
    )
)

articles.add_computed_column(
    key_points=articles.bullet_response.choices[0].message.content
)
Added 2 column values with 0 errors.
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# View bullet-point summaries
articles.select(articles.title, articles.key_points).collect()

Automatic processing

New articles are automatically summarized when inserted:
# Insert a new article - summaries are generated automatically
articles.insert([{
    'title': 'AI in Healthcare',
    'content': '''Artificial intelligence is revolutionizing healthcare diagnostics
    and treatment planning. Machine learning models can now detect diseases from
    medical images with accuracy matching or exceeding human specialists. AI-powered
    drug discovery is accelerating the development of new treatments. Natural language
    processing is being used to extract insights from clinical notes and research papers.'''
}])
Inserting rows into `articles`: 1 rows [00:00, 411.57 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 6 values computed.
# View all summaries including the new article
articles.select(articles.title, articles.summary).collect()

Explanation

Prompt engineering for summaries:
Cost optimization:
  • Use gpt-4o-mini for most summarization tasks (fast and affordable)
  • Use gpt-4o for complex documents requiring deeper understanding
  • Summaries are cached—you only pay once per article and stuand toofor trL para

See also