This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Generate concise summaries of long text, articles, or documents using
large language models.
Problem
You have long text content—articles, transcripts, documents—that needs
to be summarized. Processing each piece manually is time-consuming and
inconsistent.
Solution
What’s in this recipe:
- Summarize text using OpenAI GPT models
- Customize summary style with prompts
- Process multiple documents automatically
You add a computed column that calls an LLM to generate summaries. When
you insert new text, summaries are generated automatically.
Setup
%pip install -qU pixeltable openai
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
WARNING: Ignoring invalid distribution ~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
Note: you may need to restart the kernel to use updated packages.
import os
import getpass
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')
import pixeltable as pxt
from pixeltable.functions import openai
Load sample text
# Create a fresh directory
pxt.drop_dir('summarize_demo', force=True)
pxt.create_dir('summarize_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘summarize_demo’.
<pixeltable.catalog.dir.Dir at 0x30d758b10>
# Create table for articles
articles = pxt.create_table('summarize_demo.articles', {
'title': pxt.String,
'content': pxt.String
})
Created table ‘articles’.
# Sample articles to summarize
sample_articles = [
{
'title': 'The Rise of Electric Vehicles',
'content': '''Electric vehicles (EVs) have seen unprecedented growth in recent years,
transforming the automotive industry. Sales increased by 60% globally in 2023,
with China leading the market followed by Europe and North America. Major automakers
like Tesla, BYD, and traditional manufacturers have invested billions in EV technology.
Battery costs have dropped significantly, making EVs more affordable for consumers.
Government incentives and stricter emissions regulations continue to drive adoption.
Charging infrastructure is expanding rapidly, with new fast-charging networks being
deployed across major highways. Despite challenges like range anxiety and charging
times, consumer acceptance is growing steadily.'''
},
{
'title': 'Advances in Renewable Energy',
'content': '''Solar and wind power capacity reached record levels in 2023, accounting
for over 30% of global electricity generation. The cost of solar panels has fallen
by 90% over the past decade, making renewable energy competitive with fossil fuels.
Offshore wind farms are being built at scale, with turbines now reaching heights
of over 250 meters. Energy storage solutions, particularly lithium-ion batteries,
are addressing intermittency challenges. Countries like Denmark and Scotland have
achieved periods of 100% renewable electricity. Corporate power purchase agreements
are accelerating the transition, with tech giants committing to carbon-neutral operations.'''
}
]
articles.insert(sample_articles)
Inserting rows into `articles`: 2 rows [00:00, 316.21 rows/s]
Inserted 2 rows with 0 errors.
2 rows inserted, 4 values computed.
# View articles
articles.select(articles.title, articles.content).collect()
Generate summaries
Add a computed column that generates summaries using GPT:
# Create prompt template for summarization
prompt = 'Summarize the following article in 2-3 sentences:\n\n' + articles.content
# Add computed column for LLM response
articles.add_computed_column(
response=openai.chat_completions(
messages=[{'role': 'user', 'content': prompt}],
model='gpt-4o-mini'
)
)
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# Extract the summary text from the response
articles.add_computed_column(
summary=articles.response.choices[0].message.content
)
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# View titles and summaries
articles.select(articles.title, articles.summary).collect()
Custom summary styles
You can customize the summary format by changing the prompt:
# Add bullet-point summary
bullet_prompt = 'List the 3 key points from this article as bullet points:\n\n' + articles.content
articles.add_computed_column(
bullet_response=openai.chat_completions(
messages=[{'role': 'user', 'content': bullet_prompt}],
model='gpt-4o-mini'
)
)
articles.add_computed_column(
key_points=articles.bullet_response.choices[0].message.content
)
Added 2 column values with 0 errors.
Added 2 column values with 0 errors.
2 rows updated, 2 values computed.
# View bullet-point summaries
articles.select(articles.title, articles.key_points).collect()
Automatic processing
New articles are automatically summarized when inserted:
# Insert a new article - summaries are generated automatically
articles.insert([{
'title': 'AI in Healthcare',
'content': '''Artificial intelligence is revolutionizing healthcare diagnostics
and treatment planning. Machine learning models can now detect diseases from
medical images with accuracy matching or exceeding human specialists. AI-powered
drug discovery is accelerating the development of new treatments. Natural language
processing is being used to extract insights from clinical notes and research papers.'''
}])
Inserting rows into `articles`: 1 rows [00:00, 411.57 rows/s]
Inserted 1 row with 0 errors.
1 row inserted, 6 values computed.
# View all summaries including the new article
articles.select(articles.title, articles.summary).collect()
Explanation
Prompt engineering for summaries:
Cost optimization:
- Use
gpt-4o-mini for most summarization tasks (fast and affordable)
- Use
gpt-4o for complex documents requiring deeper understanding
- Summaries are cached—you only pay once per article and stuand toofor
trL para
See also