Source	Extract
News articles	People, organizations, locations mentioned
Customer feedback	Product names, feature requests
Legal documents	Parties, dates, monetary amounts

title	entities
Tech Acquisition	{"people":["Satya Nadella"],"organizations":["Microsoft"],"locations":["Seattle"],"dates":["March 2024"]}
Research Breakthrough	{"people":["Dr. Sarah Chen"],"organizations":["Stanford University","National Science Foundation"],"locations":["Palo Alto","California"],"dates":[]}
Sports Update	{"people":["LeBron James","Darvin Ham"],"organizations":["Los Angeles Lakers","Boston Celtics","Staples Center"],"locations":[],"dates":["Tuesday night"]}

Entity Type	Examples
People	Names, titles
Organizations	Companies, institutions
Locations	Cities, countries, addresses
Dates	Specific dates, time periods
Money	Amounts, currencies
Products	Brand names, model numbers

## Solution **What’s in this recipe:** * Extract entities as structured JSON * Use OpenAI’s structured output for reliable parsing * Access extracted entities as queryable columns You use structured output to get entities in a consistent JSON format. The entities are stored as JSON columns that you can query and filter. ### Setup ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} %pip install -qU pixeltable openai ```

  WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
  WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
  WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
  WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
  WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
  WARNING: Ignoring invalid distribution \~orch (/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages)
  Note: you may need to restart the kernel to use updated packages.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} import getpass import os if 'OPENAI_API_KEY' not in os.environ: os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ') ``` ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} import json import pixeltable as pxt from pixeltable.functions.openai import chat_completions # Create a fresh directory pxt.drop_dir('entities_demo', force=True) pxt.create_dir('entities_demo') ```

  Created directory 'entities\_demo'.
  \

### Define entity extraction schema ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Define the JSON schema for entity extraction entity_schema = { 'type': 'json_schema', 'json_schema': { 'name': 'entities', 'strict': True, 'schema': { 'type': 'object', 'properties': { 'people': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Names of people mentioned', }, 'organizations': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Names of companies, institutions, or groups', }, 'locations': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Geographic locations (cities, countries, addresses)', }, 'dates': { 'type': 'array', 'items': {'type': 'string'}, 'description': 'Dates or time references', }, }, 'required': ['people', 'organizations', 'locations', 'dates'], 'additionalProperties': False, }, }, } ``` ### Create extraction pipeline ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Create table for articles articles = pxt.create_table( 'entities_demo/articles', {'title': pxt.String, 'content': pxt.String} ) ```

  Created table 'articles'.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Add entity extraction column extraction_prompt = ( 'Extract all named entities from the following text:\n\n' + articles.content ) articles.add_computed_column( extraction_response=chat_completions( messages=[{'role': 'user', 'content': extraction_prompt}], model='gpt-4o-mini', model_kwargs={'response_format': entity_schema}, ) ) ```

  Added 0 column values with 0 errors.
  No rows affected.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Extract the entities JSON articles.add_computed_column( entities=articles.extraction_response.choices[0].message.content ) ```

  Added 0 column values with 0 errors.
  No rows affected.

### Extract entities from text ```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # Insert sample articles sample_articles = [ { 'title': 'Tech Acquisition', 'content': 'Microsoft announced today that CEO Satya Nadella will lead the acquisition of a Seattle-based startup. The deal, expected to close in March 2024, is valued at $500 million.', }, { 'title': 'Sports Update', 'content': 'LeBron James led the Los Angeles Lakers to victory against the Boston Celtics on Tuesday night at Staples Center. Coach Darvin Ham praised the teams performance.', }, { 'title': 'Research Breakthrough', 'content': 'Dr. Sarah Chen at Stanford University published groundbreaking research on renewable energy. The study, funded by the National Science Foundation, was conducted in Palo Alto, California.', }, ] articles.insert(sample_articles) ```

  Inserting rows into \`articles\`: 3 rows \[00:00, 404.21 rows/s]
  Inserted 3 rows with 0 errors.
  3 rows inserted, 12 values computed.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}} # View extracted entities articles.select(articles.title, articles.entities).collect() ```

## Explanation **Structured output ensures reliable extraction:** By using OpenAI’s structured output (`response_format`), the model always returns valid JSON matching the schema. No post-processing or error handling needed. **Common entity types:**

**Customizing the schema:** Modify the `entity_schema` to extract domain-specific entities—product SKUs, legal terms, medical conditions, etc. ## See also * [Extract structured data from images](/howto/cookbooks/images/vision-structured-output) - JSON extraction from images * [Extract fields from JSON](/howto/cookbooks/core/workflow-json-extraction) - Parse LLM response fields