> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pixeltable.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Split data into multiple rows with iterators

> Split a single row into many derived rows in Pixeltable using built-in component iterators for chunks, frames, video segments, and tiled data.

<a href="https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/release/docs/release/howto/cookbooks/core/data-split-rows.ipynb" id="openKaggle" target="_blank" rel="noopener noreferrer"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open in Kaggle" style={{ display: 'inline', margin: '0px' }} noZoom /></a>  <a href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/release/howto/cookbooks/core/data-split-rows.ipynb" id="openColab" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" style={{ display: 'inline', margin: '0px' }} noZoom /></a>  <a href="https://raw.githubusercontent.com/pixeltable/pixeltable/refs/tags/release/docs/release/howto/cookbooks/core/data-split-rows.ipynb" id="downloadNotebook" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/%E2%AC%87-Download%20Notebook-blue" alt="Download Notebook" style={{ display: 'inline', margin: '0px' }} noZoom /></a>

<Tip>This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.</Tip>

Transform a single document, video, image, or audio file into multiple
rows for granular processing.

**What’s in this recipe:**

* Split documents into text chunks for RAG
* Extract frames or segments from videos
* Tile images for high-resolution analysis
* Chunk audio files for transcription

## Problem

You have documents, videos, or text that you need to break into smaller
pieces for processing. A PDF needs to be split into chunks for
retrieval-augmented generation. A video needs individual frames for
analysis. Text needs to be divided into sentences or sliding windows.

You need a way to transform one source row into multiple output rows
automatically.

## Solution

You create views with iterator functions that split source data into
multiple rows. Pixeltable provides built-in iterators for documents,
videos, images, audio, and strings.

### Setup

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
%pip install -qU pixeltable spacy tiktoken click
!python -m spacy download en_core_web_sm -q
```

### Split documents into chunks

Use `document_splitter` to break documents (PDF, HTML, Markdown, TXT)
into text chunks.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
import pixeltable as pxt
from pixeltable.functions.document import document_splitter

pxt.drop_dir('split_demo', force=True)
pxt.create_dir('split_demo')

docs = pxt.create_table('split_demo/docs', {'doc': pxt.Document})
docs.insert(
    [
        {
            'doc': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/rag-demo/Jefferson-Amazon.pdf'
        }
    ]
)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Inserted 1 row with 0 errors in 0.13 s (7.68 rows/s)
  1 row inserted.
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
chunks = pxt.create_view(
    'split_demo/doc_chunks',
    docs,
    iterator=document_splitter(
        docs.doc, separators='sentence,token_limit', limit=300
    ),
)
chunks.select(chunks.text).limit(3).collect()
```

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[0] }} />

**Available separators:**

* `heading` — Split on HTML/Markdown headings
* `sentence` — Split on sentence boundaries (requires spacy)
* `token_limit` — Split by token count (requires tiktoken)
* `char_limit` — Split by character count
* `page` — Split by page (PDF only)

[SDK Reference:
document\_splitter](/sdk/latest/document)

### Extract frames from videos

Use `frame_iterator` to extract frames at specified intervals.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import frame_iterator

videos = pxt.create_table('split_demo/videos', {'video': pxt.Video})
videos.insert(
    [
        {
            'video': 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/bangkok.mp4'
        }
    ]
)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Inserted 1 row with 0 errors in 1.28 s (0.78 rows/s)
  1 row inserted.
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
frames = pxt.create_view(
    'split_demo/frames',
    videos,
    iterator=frame_iterator(videos.video, fps=1.0),
)
frames.select(frames.frame, frames.frame_attrs).limit(3).collect()
```

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[1] }} />

**frame\_iterator options:**

* `fps` — Frames per second to extract
* `num_frames` — Extract exact number of frames (evenly spaced)
* `keyframes_only` — Extract only keyframes

[SDK Reference:
frame\_iterator](/sdk/latest/video)

### Split videos into segments

Use `video_splitter` to divide videos into smaller clips.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.video import video_splitter

segments = pxt.create_view(
    'split_demo/segments',
    videos,
    iterator=video_splitter(
        videos.video, duration=5.0, min_segment_duration=1.0
    ),
)
segments.select(
    segments.segment_start, segments.segment_end, segments.video_segment
).limit(3).collect()
```

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[2] }} />

**video\_splitter options:**

* `duration` — Duration of each segment in seconds
* `overlap` — Overlap between segments in seconds
* `min_segment_duration` — Drop last segment if shorter than this

[SDK Reference:
video\_splitter](/sdk/latest/video)

### Split strings into sentences

Use `string_splitter` to divide text into sentences.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.string import string_splitter

texts = pxt.create_table('split_demo/texts', {'content': pxt.String})
texts.insert(
    [
        {
            'content': 'AI data infrastructure simplifies ML workflows. Declarative pipelines update incrementally. This makes development faster and more maintainable.'
        }
    ]
)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Inserted 1 row with 0 errors in 0.03 s (38.38 rows/s)
  1 row inserted.
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
sentences = pxt.create_view(
    'split_demo/sentences',
    texts,
    iterator=string_splitter(texts.content, separators='sentence'),
)
sentences.select(sentences.text).collect()
```

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[3] }} />

[SDK Reference:
string\_splitter](/sdk/latest/string)

### Tile images for analysis

Use `tile_iterator` to divide large images into a grid of smaller tiles.
This is useful for processing high-resolution images that are too large
to analyze at once, or for running object detection on different
regions.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.image import tile_iterator

images = pxt.create_table('split_demo/images', {'image': pxt.Image})
images.insert(
    [
        {
            'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/pixeltable-logo-large.png'
        }
    ]
)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Inserted 1 row with 0 errors in 0.09 s (11.69 rows/s)
  1 row inserted.
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tiles = pxt.create_view(
    'split_demo/tiles',
    images,
    iterator=tile_iterator(images.image, tile_size=(100, 100)),
)
```

**tile\_iterator options:**

* `tile_size` — Size of each tile as `(width, height)`
* `overlap` — Overlap between adjacent tiles as `(width, height)`

[SDK Reference:
tile\_iterator](/sdk/latest/image)

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
tiles.select(tiles.tile_coord, tiles.tile).sample(n=4).collect()
```

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[4] }} />

### Split audio into chunks

Use `audio_splitter` to divide audio files into time-based segments for
transcription or analysis.

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
from pixeltable.functions.audio import audio_splitter

audio = pxt.create_table('split_demo/audio', {'audio': pxt.Audio})
audio.insert(
    [
        {
            'audio': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'
        }
    ]
)
```

<pre style={{ 'margin': '-20px 20px 0px 20px', 'padding': '0px', 'background-color': 'transparent', 'color': 'black' }}>
  Inserted 1 row with 0 errors in 0.67 s (1.50 rows/s)
  1 row inserted.
</pre>

```python theme={"theme":{"light":"light-plus","dark":"dark-plus"}}
audio_segments = pxt.create_view(
    'split_demo/audio_chunks',
    audio,
    iterator=audio_splitter(audio.audio, duration=30.0, overlap=2.0),
)
audio_segments.select(
    audio_segments.segment_start, audio_segments.segment_end
).limit(5).collect()
```

<div style={{ 'margin': '0px 20px 0px 20px' }} dangerouslySetInnerHTML={{ __html: quartoRawHtml[5] }} />

**audio\_splitter options:**

* `duration` — Duration of each chunk in seconds
* `overlap` — Overlap between chunks in seconds
* `min_segment_duration` — Drop last chunk if shorter than this

[SDK Reference:
audio\_splitter](/sdk/latest/audio)

## See also

* [Split documents for
  RAG](/howto/cookbooks/text/doc-chunk-for-rag)
* [Extract frames from
  videos](/howto/cookbooks/video/video-extract-frames)
* [Transcribe audio
  files](/howto/cookbooks/audio/audio-transcribe)
