Experimenting with Chunking
RAG Operations in Pixeltable
In this tutorial, we'll explore Pixeltable's flexible handling of RAG operations on unstructured text. In a traditional AI workflow, such operations might be implemented as a Python script that runs on a periodic schedule or in response to certain events. In Pixeltable, as with everything else, they are implemented as persistent table operations that update incrementally as new data becomes available. In our tutorial workflow, we'll chunk Wikipedia articles in various ways with a document splitter, then apply several kinds of embeddings to the chunks.
Set Up the Table Structure
We start by installing the necessary dependencies, creating a Pixeltable directory rag_ops_demo
(if it doesn't already exist), and setting up the table structure for our new workflow.
%pip install -q pixeltable sentence-transformers spacy tiktoken
import pixeltable as pxt
# Ensure a clean slate for the demo
pxt.drop_dir('rag_ops_demo', force=True)
# Create the Pixeltable workspace
pxt.create_dir('rag_ops_demo')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory `rag_ops_demo`.
<pixeltable.catalog.dir.Dir at 0x33135d1f0>
Creating Tables and Views
Now we'll create the tables that represent our workflow, starting with a table to hold references to source documents. The table contains a single column source_doc
whose elements have type pxt.Document
, representing a general document instance. In this tutorial, we'll be working with HTML documents, but Pixeltable supports a range of other document types, such as Markdown and PDF.
docs = pxt.create_table(
'rag_ops_demo.docs',
{'source_doc': pxt.Document}
)
Created table `docs`.
If we take a peek at the docs
table, we see its very simple structure.
docs
Column Name | Type | Computed With |
---|---|---|
source_doc | Document |
Next we create a view to represent chunks of our HTML documents. A Pixeltable view is a virtual table, which is dynamically derived from a source table by applying a transformation and/or selecting a subset of data. In this case, our view represents a one-to-many transformation from source documents into individual sentences. This is achieved using Pixeltable's built-in DocumentSplitter
class.
Note that the docs
table is currently empty, so creating this view doesn't actually do anything yet: it simply defines an operation that we want Pixeltable to execute when it sees new data.
from pixeltable.iterators.document import DocumentSplitter
sentences = pxt.create_view(
'rag_ops_demo.sentences', # Name of the view
docs, # Table from which the view is derived
iterator=DocumentSplitter.create(
document=docs.source_doc,
separators='sentence', # Chunk docs into sentences
metadata='title,heading,sourceline'
)
)
Created view `sentences` with 0 rows, 0 exceptions.
Let's take a peek at the new sentences
view.
sentences
Column Name | Type | Computed With |
---|---|---|
pos | Required[Int] | |
text | Required[String] | |
title | String | |
heading | Json | |
sourceline | Int | |
source_doc | Document |
We see that sentences
inherits the source_doc
column from docs
, together with some new fields:
pos
: The position in the source document where the sentence appears.text
: The text of the sentence.title
,heading
, andsourceline
: The metadata we requested when we set up the view.
Data Ingestion
Ok, now it's time to insert some data into our workflow. A document in Pixeltable is just a URL; the following command inserts a single row into the docs
table with the source_doc
field set to the specified URL:
docs.insert([{'source_doc': 'https://en.wikipedia.org/wiki/Marc_Chagall'}])
Inserting rows into `docs`: 1 rows [00:00, 705.52 rows/s]
Inserting rows into `sentences`: 1460 rows [00:00, 3744.19 rows/s]
Inserted 1461 rows with 0 errors.
UpdateStatus(num_rows=1461, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])
We can see that two things happened. First, a single row was inserted into docs
, containing the URL representing our source document. Then, the view sentences
was incrementally updated by applying the DocumentSplitter
according to the definition of the view. This illustrates an important principle in Pixeltable: by default, anytime Pixeltable sees new data, the update is incrementally propagated to any downstream views or computed columns.
We can see the effect of the insertion with the select
command. There's a single row in docs
:
docs.select(docs.source_doc.fileurl).show()
source_doc_fileurl |
---|
https://en.wikipedia.org/wiki/Marc_Chagall |
And here are the first 20 rows in sentences
. The content of the article is broken into individual sentences, as expected.
sentences.select(sentences.text, sentences.heading).show(20)
text | heading |
---|---|
Marc Chagall - Wikipedia Jump to content Search Search | {} |
Marc Chagall 81 languages Afrikaans Alemannisch العربية | {"h1": "Marc Chagall"} |
Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी | {"h1": "Marc Chagall"} |
Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย | {"h1": "Marc Chagall"} |
Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש | {"h1": "Marc Chagall"} |
粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) "Chagall" redirects here. | {"h1": "Marc Chagall"} |
For other uses, see Chagall (disambiguation) . | {"h1": "Marc Chagall"} |
Marc Chagall Chagall, c. 1920 | {"h1": "Marc Chagall"} |
Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) | {"h1": "Marc Chagall"} |
[1] Died 28 March 1985 (1985-03-28) (aged 97) | {"h1": "Marc Chagall"} |
Saint-Paul-de-Vence , France Nationality Russian Empire, | {"h1": "Marc Chagall"} |
later French | {"h1": "Marc Chagall"} |
[2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ( m. 1915; died 1944) | {"h1": "Marc Chagall"} |
Valentina (Vava) Brodsky ( m. 1952) | {"h1": "Marc Chagall"} |
| {"h1": "Marc Chagall"} |
[3] Children 2 | {"h1": "Marc Chagall"} |
[4] | {"h1": "Marc Chagall"} |
Marc Chagall | {"h1": "Marc Chagall"} |
[a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. | {"h1": "Marc Chagall"} |
[b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints. | {"h1": "Marc Chagall"} |
Experimenting with Chunking
Of course, chunking into sentences isn't the only way to split a document. Perhaps we want to experiment with different chunking methodologies, in order to see which one performs best in a particular application. Pixeltable makes it easy to do this, by creating several views of the same source table. Here are a few examples. Notice that as each new view is created, it is initially populated from the data already in docs
.
chunks = pxt.create_view(
'rag_ops_demo.chunks', docs,
iterator=DocumentSplitter.create(
document=docs.source_doc,
separators='paragraph,token_limit',
limit=2048,
overlap=0,
metadata='title,heading,sourceline'
)
)
Inserting rows into `chunks`: 205 rows [00:00, 25110.46 rows/s]
Created view `chunks` with 205 rows, 0 exceptions.
short_chunks = pxt.create_view(
'rag_ops_demo.short_chunks', docs,
iterator=DocumentSplitter.create(
document=docs.source_doc,
separators='paragraph,token_limit',
limit=72,
overlap=0,
metadata='title,heading,sourceline'
)
)
Inserting rows into `short_chunks`: 531 rows [00:00, 32679.53 rows/s]
Created view `short_chunks` with 531 rows, 0 exceptions.
short_char_chunks = pxt.create_view(
'rag_ops_demo.short_char_chunks', docs,
iterator=DocumentSplitter.create(
document=docs.source_doc,
separators='paragraph,char_limit',
limit=72,
overlap=0,
metadata='title,heading,sourceline'
)
)
Inserting rows into `short_char_chunks`: 1764 rows [00:00, 24326.96 rows/s]
Created view `short_char_chunks` with 1764 rows, 0 exceptions.
chunks.select(chunks.text, chunks.heading).show(20)
text | heading |
---|---|
Marc Chagall - Wikipedia Jump to content Search Search | {} |
Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) "Chagall" redirects here. For other uses, see Chagall (disambiguation) . | {"h1": "Marc Chagall"} |
Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ( m. 1915; died 1944) Valentina (Vava) Brodsky ( m. 1952) [3] Children 2 [4] | {"h1": "Marc Chagall"} |
Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints. | {"h1": "Marc Chagall"} |
Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country's most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948. | {"h1": "Marc Chagall"} |
Art critic Robert Hughes referred to Chagall as "the quintessential Jewish artist of the twentieth century". According to art historian Michael J. Lewis, Chagall was considered to be "the last survivor of the first generation of European modernists". For decades, he "had also been respected as the world's pre-eminent Jewish artist". [15] Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and th ...... e experienced modernism's "golden age" in Paris, where "he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism ". Yet throughout these phases of his style "he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of life in his native village of Vitebsk." [16] "When Matisse dies", Pablo Picasso remarked in the 1950s, "Chagall will be the only painter left who understands what colour really is". [17] | {"h1": "Marc Chagall"} |
Early life and education [ edit ] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]"} |
Early life [ edit ] Marc Chagall's childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, then part of the Russian Empire . [c] [18] At the time of his birth, Vitebsk's population was about 66,000. Half of the population was Jewish. [16] A picturesque city of churches and synagogues, it was called "Russian Toledo " by artist Ilya Repin , after the cosmopolitan city of the former Spanish Empire . [19] Because the city was built mostly of wood, little of it survived years of occupation and destruction during World War II. | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish community was usually borne by a Levitic family. [20] His father, Khatskl (Zachar) Shagal, was employed by a herring merchant, and his mother, Feige-Ite, sold groceries from their home. His father worked hard, carrying heavy barrels, earning 20 roubles each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years: | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
Day after day, winter and summer, at six o'clock in the morning, my father got up and went off to the synagogue. There he said his usual prayer for some dead man or other. On his return he made ready the samovar , drank some tea and went to work. Hellish work, the work of a galley-slave. Why try to hide it? How tell about it? No word will ever ease my father's lot... There was always plenty of butter and cheese on our table. Buttered bread, like an eternal symbol, was never out of my childish hands. [21] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
One of the main sources of income for the Jewish population of the town was from the manufacture of clothing that was sold throughout the Russian Empire. They also made furniture and various agricultural tools. [22] From the late 18th century to the First World War, the Imperial Russian government confined Jews to living within the Pale of Settlement , which included modern Ukraine, Belarus, Poland, Lithuania, and Latvia, almost exactly corresponding to the territory of the Polish-Lithuanian Commonwealth which was taken over by Imperial Russia in the late 18th century. That led to the creation of Jewish market-villages ( shtetls ) throughout today's Eastern Europe, with their own markets, schools, hospitals, and other community institutions. [23] : 14 | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
Chagall wrote as a boy; "I felt at every step that I was a Jew—people made me feel it". [24] [25] During a pogrom , Chagall wrote that: "The street lamps are out. I feel panicky, especially in front of butchers' windows. There you can see calves that are still alive lying beside the butchers' hatchets and knives". [25] [26] When asked by some pogromniks "Jew or not?", Chagall remembered thinking: "My pockets are empty, my fingers sensitive, my legs weak and they are out for blood. My death would be futile. I so wanted to live". [25] [26] Chagall denied being a Jew, leading the pogromniks to shout "All right! Get along!" [25] [26] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
Most of what is known about Chagall's early life has come from his autobiography, My Life . In it, he described the major influence that the culture of Hasidic Judaism had on his life as an artist. Chagall related how he realised that the Jewish traditions in which he had grown up were fast disappearing and that he needed to document them. From the 1730s, Vitebsk itself had been a centre of that culture, with its teachings derived from the Kabbalah . Chagall scholar, Susan Tumarkin Goodman, describes the links and sources of his art to his early home: | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
Chagall's art can be understood as the response to a situation that has long marked the history of Russian Jews. Though they were cultural innovators who made important contributions to the broader society, Jews were considered outsiders in a frequently hostile society ... Chagall himself was born of a family steeped in religious life; his parents were observant Hasidic Jews who found spiritual satisfaction in a life defined by their faith and organized by prayer. [23] : 14 | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} |
Art education [ edit ] Portrait of Chagall by Yehuda Pen , his first art teacher in Vitebsk | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Art education[edit]"} |
In the Russian Empire at that time, Jewish children were not allowed to attend regular schools and universities imposed a quota on Jews . Their movement within the city was also restricted. Chagall therefore received his primary education at the local Jewish religious school, where he studied Hebrew and the Bible. At the age of 13, his mother tried to enrol him in a regular high school, and he recalled: "But in that school, they don't take Jews. Without a moment's hesitation, my courageous mother walks up to a professor." She offered the headmaster 50 roubles to let him attend, which he accepted. [21] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Art education[edit]"} |
A turning point of his artistic life came when he first noticed a fellow student drawing. Baal-Teshuva writes that, for the young Chagall, watching someone draw "was like a vision, a revelation in black and white". Chagall would later say that there was no art of any kind in his family's home and the concept was totally alien to him. When Chagall asked the schoolmate how he learned to draw, his friend replied, "Go and find a book in the library, idiot, choose any picture you like, and just copy it". He soon began copying images from books and found the experience so rewarding he then decided he wanted to become an artist. [22] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Art education[edit]"} |
Goodman writes that Chagall eventually confided to his mother, "I want to be a painter", although she could not yet understand his sudden interest in art or why he would choose a vocation that "seemed so impractical". The young Chagall explained: "There's a place in town; if I'm admitted and if I complete the course, I'll come out a regular artist. I'd be so happy!" It was 1906, and he had noticed the studio of Yehuda (Yuri) Pen , a realist artist who operated a drawing school in Vitebsk. At the same time, future artists El Lissitzky and Ossip Zadkine were also Pen's students. Due to Chagall's youth and lack of income, Pen offered to teach him free of charge. However, after a few months at the school, Chagall realized that academic portrait painting did not suit him. [22] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Art education[edit]"} |
Artistic inspiration [ edit ] Marc Chagall, 1912, Calvary ( Golgotha ) , oil on canvas, 174.6 × 192.4 cm, Museum of Modern Art , New York. Alternative titles: Kreuzigung Bild 2 Christus gewidmet [Golgotha. Crucifixion. Dedicated to Christ] . Sold through Galerie Der Sturm (Herwarth Walden), Berlin to Bernhard Koehler (1849–1927), Berlin, 1913. Exhibited: Erster Deutscher Herbstsalon , Berlin, 1913 | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Artistic inspiration[edit]"} |
short_chunks.select(short_chunks.text, short_chunks.heading).show(20)
text | heading |
---|---|
Marc Chagall - Wikipedia Jump to content Search Search | {} |
Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡ | {"h1": "Marc Chagall"} |
ортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한 | {"h1": "Marc Chagall"} |
국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lë | {"h1": "Marc Chagall"} |
tzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemont | {"h1": "Marc Chagall"} |
èis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi | {"h1": "Marc Chagall"} |
Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) "Chagall" redirects here | {"h1": "Marc Chagall"} |
. For other uses, see Chagall (disambiguation) . | {"h1": "Marc Chagall"} |
Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03- | {"h1": "Marc Chagall"} |
28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ( m. 1915; died 1944) Val | {"h1": "Marc Chagall"} |
entina (Vava) Brodsky ( m. 1952) [3] Children 2 [4] | {"h1": "Marc Chagall"} |
Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created | {"h1": "Marc Chagall"} |
works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints. | {"h1": "Marc Chagall"} |
Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his | {"h1": "Marc Chagall"} |
ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country's most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris | {"h1": "Marc Chagall"} |
in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948. | {"h1": "Marc Chagall"} |
Art critic Robert Hughes referred to Chagall as "the quintessential Jewish artist of the twentieth century". According to art historian Michael J. Lewis, Chagall was considered to be "the last survivor of the first generation of European modernists". For decades, he "had also been respected as the world's pre-eminent Jewish artist". [15] | {"h1": "Marc Chagall"} |
Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and the Art Institute of Chicago and the Jerusalem Windows in Israel. He also did large-scale paintings, including part of the ceiling of the Paris Opéra . He experienced | {"h1": "Marc Chagall"} |
modernism's "golden age" in Paris, where "he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism ". Yet throughout these phases of his style "he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of | {"h1": "Marc Chagall"} |
life in his native village of Vitebsk." [16] "When Matisse dies", Pablo Picasso remarked in the 1950s, "Chagall will be the only painter left who understands what colour really is". [17] | {"h1": "Marc Chagall"} |
short_char_chunks.select(short_char_chunks.text, short_char_chunks.heading).show(20)
text | heading |
---|---|
Marc Chagall - Wikipedia Jump to content Search Search | {} |
Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտա | {"h1": "Marc Chagall"} |
հայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (та | {"h1": "Marc Chagall"} |
рашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά | {"h1": "Marc Chagall"} |
Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrva | {"h1": "Marc Chagall"} |
tski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswah | {"h1": "Marc Chagall"} |
ili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy م | {"h1": "Marc Chagall"} |
صرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbe | {"h1": "Marc Chagall"} |
kcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Ro | {"h1": "Marc Chagall"} |
mână Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina S | {"h1": "Marc Chagall"} |
lovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi S | {"h1": "Marc Chagall"} |
venska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit lin | {"h1": "Marc Chagall"} |
ks From Wikipedia, the free encyclopedia Russian-French artist (1887–198 | {"h1": "Marc Chagall"} |
5) "Chagall" redirects here. For other uses, see Chagall (disambiguation | {"h1": "Marc Chagall"} |
) . | {"h1": "Marc Chagall"} |
Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1 | {"h1": "Marc Chagall"} |
887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [ | {"h1": "Marc Chagall"} |
1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , Franc | {"h1": "Marc Chagall"} |
e Nationality Russian Empire, later French [2] Known for Painting staine | {"h1": "Marc Chagall"} |
d glass Notable work See list of artworks by Marc Chagall Movement Cubis | {"h1": "Marc Chagall"} |
Now let's add a few more documents to our workflow. Notice how all of the downstream views are updated incrementally, processing just the new documents as they are inserted.
urls = [
'https://en.wikipedia.org/wiki/Pierre-Auguste_Renoir',
'https://en.wikipedia.org/wiki/Henri_Matisse',
'https://en.wikipedia.org/wiki/Marcel_Duchamp'
]
docs.insert({'source_doc': url} for url in urls)
Inserting rows into `docs`: 3 rows [00:00, 2279.51 rows/s]
Inserting rows into `sentences`: 2106 rows [00:02, 753.12 rows/s]
Inserting rows into `chunks`: 276 rows [00:00, 30292.50 rows/s]
Inserting rows into `short_chunks`: 812 rows [00:00, 35848.00 rows/s]
Inserting rows into `short_char_chunks`: 2638 rows [00:00, 13724.29 rows/s]
Inserted 5835 rows with 0 errors.
UpdateStatus(num_rows=5835, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])
Inserting rows into short_chunks
: 811 rows [00:00, 20491.70 rows/s]
Inserting rows into short_char_chunks
: 0 rows [00:00, ? rows/s]
Inserting rows into short_char_chunks
: 2636 rows [00:00, 5624.57 rows/s]
Inserted 5831 rows with 0 errors.
UpdateStatus(num_rows=5831, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])
Further Experiments
This is a good time to mention another important guiding principle of Pixeltable. The preceding examples all used the built-in DocumentSplitter
class with various configurations. That's probably fine as a first cut or to prototype an application quickly, and it might be sufficient for some applications. But other applications might want to do more sophisticated kinds of chunking, implementing their own specialized logic or leveraging third-party tools. Pixeltable imposes no constraints on the AI or RAG operations a workflow uses: the iterator interface is highly general, and it's easy to implement new operations or adapt existing code or third-party tools into the Pixeltable workflow.
Computing Embeddings
Next, let's look at how embedding indices can be added seamlessly to existing Pixeltable workflows. To compute our embeddings, we'll use the Huggingface sentence_transformer
package, running it over the chunks
view that broke our documents up into larger paragraphs. Pixeltable has a built-in sentence_transformer
adapter, and all we have to do is add a new column that leverages it. Pixeltable takes care of the rest, applying the new column to all existing data in the view.
from pixeltable.functions.huggingface import sentence_transformer
chunks['minilm_embed'] = sentence_transformer(
chunks.text, model_id='paraphrase-MiniLM-L6-v2'
)
Computing cells: 100%|███████████████████████████████████████| 481/481 [00:01<00:00, 379.93 cells/s]
Added 481 column values with 0 errors.
The new column is a computed column: it is defined as a function on top of existing data and updated incrementally as new data are added to the workflow. Let's have a look at how the new column affected the chunks
view.
chunks
Column Name | Type | Computed With |
---|---|---|
pos | Required[Int] | |
text | Required[String] | |
title | String | |
heading | Json | |
sourceline | Int | |
minilm_embed | Required[Array[(384,), Float]] | sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2') |
source_doc | Document |
chunks.select(chunks.text, chunks.heading, chunks.minilm_embed).head()
text | heading | minilm_embed |
---|---|---|
Marc Chagall - Wikipedia Jump to content Search Search | {} | [-0.262 -0.119 -0.133 0.048 0.12 -0.006 ... -0.556 0.372 0.468 -0.234 -0.226 0.164] |
Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) "Chagall" redirects here. For other uses, see Chagall (disambiguation) . | {"h1": "Marc Chagall"} | [-0.136 0.401 -0.53 -0.181 -0.453 -0.125 ... -0.184 0.122 0.644 -0.54 0.188 0.203] |
Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ( m. 1915; died 1944) Valentina (Vava) Brodsky ( m. 1952) [3] Children 2 [4] | {"h1": "Marc Chagall"} | [-0.005 0.34 -0.315 0.17 -0.124 0.384 ... -0.144 -0.131 0.104 -0.412 -0.195 -0.058] |
Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints. | {"h1": "Marc Chagall"} | [ 0.053 0.138 -0.219 0.192 -0.1 0.234 ... -0.138 -0.294 0.306 -0.012 -0.059 0.007] |
Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country's most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948. | {"h1": "Marc Chagall"} | [ 0.013 0.248 -0.692 0.143 -0.379 0.254 ... -0.232 -0.157 -0.018 -0.225 -0.208 -0.095] |
Art critic Robert Hughes referred to Chagall as "the quintessential Jewish artist of the twentieth century". According to art historian Michael J. Lewis, Chagall was considered to be "the last survivor of the first generation of European modernists". For decades, he "had also been respected as the world's pre-eminent Jewish artist". [15] Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and th ...... e experienced modernism's "golden age" in Paris, where "he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism ". Yet throughout these phases of his style "he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of life in his native village of Vitebsk." [16] "When Matisse dies", Pablo Picasso remarked in the 1950s, "Chagall will be the only painter left who understands what colour really is". [17] | {"h1": "Marc Chagall"} | [-0.172 0.348 -0.307 0.034 -0.071 0.111 ... -0.31 -0.011 0.302 -0.273 -0.163 0.152] |
Early life and education [ edit ] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]"} | [-0.213 0.418 0.094 0.135 -0.069 0.265 ... -0.548 0.164 0.075 0.205 0.309 0.277] |
Early life [ edit ] Marc Chagall's childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} | [-0.04 0.143 -0.357 0.412 -0.331 0.201 ... -0.006 -0.057 0.255 0.181 0.018 -0.021] |
Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, then part of the Russian Empire . [c] [18] At the time of his birth, Vitebsk's population was about 66,000. Half of the population was Jewish. [16] A picturesque city of churches and synagogues, it was called "Russian Toledo " by artist Ilya Repin , after the cosmopolitan city of the former Spanish Empire . [19] Because the city was built mostly of wood, little of it survived years of occupation and destruction during World War II. | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} | [ 0.123 0.198 -0.496 0.154 -0.368 0.078 ... -0.057 -0.141 -0.063 -0.096 -0.136 -0.232] |
Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish community was usually borne by a Levitic family. [20] His father, Khatskl (Zachar) Shagal, was employed by a herring merchant, and his mother, Feige-Ite, sold groceries from their home. His father worked hard, carrying heavy barrels, earning 20 roubles each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years: | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} | [-0.19 0.266 -0.4 0.129 -0.493 0.063 ... -0.194 -0.2 0.322 0.024 -0.068 0.031] |
Similarly, we might want to add a CLIP embedding to our workflow; once again, it's just another computed column:
from pixeltable.functions.huggingface import clip_text
chunks['clip_embed'] = clip_text(
chunks.text, model_id='openai/clip-vit-base-patch32'
)
Computing cells: 100%|███████████████████████████████████████| 481/481 [00:02<00:00, 199.09 cells/s]
Added 481 column values with 0 errors.
chunks
Column Name | Type | Computed With |
---|---|---|
pos | Required[Int] | |
text | Required[String] | |
title | String | |
heading | Json | |
sourceline | Int | |
minilm_embed | Required[Array[(384,), Float]] | sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2') |
clip_embed | Required[Array[(512,), Float]] | clip_text(text, model_id='openai/clip-vit-base-patch32') |
source_doc | Document |
chunks.select(chunks.text, chunks.heading, chunks.minilm_embed).head()
text | heading | minilm_embed |
---|---|---|
Marc Chagall - Wikipedia Jump to content Search Search | {} | [-0.262 -0.119 -0.133 0.048 0.12 -0.006 ... -0.556 0.372 0.468 -0.234 -0.226 0.164] |
Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) "Chagall" redirects here. For other uses, see Chagall (disambiguation) . | {"h1": "Marc Chagall"} | [-0.136 0.401 -0.53 -0.181 -0.453 -0.125 ... -0.184 0.122 0.644 -0.54 0.188 0.203] |
Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ( m. 1915; died 1944) Valentina (Vava) Brodsky ( m. 1952) [3] Children 2 [4] | {"h1": "Marc Chagall"} | [-0.005 0.34 -0.315 0.17 -0.124 0.384 ... -0.144 -0.131 0.104 -0.412 -0.195 -0.058] |
Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints. | {"h1": "Marc Chagall"} | [ 0.053 0.138 -0.219 0.192 -0.1 0.234 ... -0.138 -0.294 0.306 -0.012 -0.059 0.007] |
Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country's most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948. | {"h1": "Marc Chagall"} | [ 0.013 0.248 -0.692 0.143 -0.379 0.254 ... -0.232 -0.157 -0.018 -0.225 -0.208 -0.095] |
Art critic Robert Hughes referred to Chagall as "the quintessential Jewish artist of the twentieth century". According to art historian Michael J. Lewis, Chagall was considered to be "the last survivor of the first generation of European modernists". For decades, he "had also been respected as the world's pre-eminent Jewish artist". [15] Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and th ...... e experienced modernism's "golden age" in Paris, where "he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism ". Yet throughout these phases of his style "he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of life in his native village of Vitebsk." [16] "When Matisse dies", Pablo Picasso remarked in the 1950s, "Chagall will be the only painter left who understands what colour really is". [17] | {"h1": "Marc Chagall"} | [-0.172 0.348 -0.307 0.034 -0.071 0.111 ... -0.31 -0.011 0.302 -0.273 -0.163 0.152] |
Early life and education [ edit ] | {"h1": "Marc Chagall", "h2": "Early life and education[edit]"} | [-0.213 0.418 0.094 0.135 -0.069 0.265 ... -0.548 0.164 0.075 0.205 0.309 0.277] |
Early life [ edit ] Marc Chagall's childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} | [-0.04 0.143 -0.357 0.412 -0.331 0.201 ... -0.006 -0.057 0.255 0.181 0.018 -0.021] |
Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, then part of the Russian Empire . [c] [18] At the time of his birth, Vitebsk's population was about 66,000. Half of the population was Jewish. [16] A picturesque city of churches and synagogues, it was called "Russian Toledo " by artist Ilya Repin , after the cosmopolitan city of the former Spanish Empire . [19] Because the city was built mostly of wood, little of it survived years of occupation and destruction during World War II. | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} | [ 0.123 0.198 -0.496 0.154 -0.368 0.078 ... -0.057 -0.141 -0.063 -0.096 -0.136 -0.232] |
Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish community was usually borne by a Levitic family. [20] His father, Khatskl (Zachar) Shagal, was employed by a herring merchant, and his mother, Feige-Ite, sold groceries from their home. His father worked hard, carrying heavy barrels, earning 20 roubles each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years: | {"h1": "Marc Chagall", "h2": "Early life and education[edit]", "h3": "Early life[edit]"} | [-0.19 0.266 -0.4 0.129 -0.493 0.063 ... -0.194 -0.2 0.322 0.024 -0.068 0.031] |
Updated 17 days ago