Runtime -> Change runtime type menu item at the top, then select the
GPU radio button and click on Save.
Create a Table for Video Data
Let’s first install the Python packages we’ll need for the demo. We’re going to use the popular Whisper library, running locally. Later in the demo, we’ll see how to use the OpenAI API endpoints as an alternative.| Table 'transcription_demo.video_table' |
| Column Name | Type | Computed With |
|---|---|---|
| video | Video |
https links, but Pixeltable also accepts local files and S3
URLs as input.
| video |
|---|
audio column is defined to be the audio
track extracted from whatever’s in the video column.
| video | audio |
|---|---|
| Table 'transcription_demo.video_table' |
| Column Name | Type | Computed With |
|---|---|---|
| video | Video | |
| audio | Audio | extract_audio(video, format='mp3') |
| video | audio | metadata |
|---|---|---|
| {"size": 959276, "streams": [{"type": "audio", "frames": 0, "duration": 845703936, "metadata": {"encoder": "Lavf"}, "time_base": 7.086e-08, "codec_context": {"name": "mp3float", "profile": null, "channels": 2, "codec_tag": "\x00\x00\x00\x00"}, "duration_seconds": 59.928}], "bit_rate": 128057, "metadata": {"encoder": "Lavf60.3.100"}, "bit_exact": false} | ||
| {"size": 959276, "streams": [{"type": "audio", "frames": 0, "duration": 845703936, "metadata": {"encoder": "Lavf"}, "time_base": 7.086e-08, "codec_context": {"name": "mp3float", "profile": null, "channels": 2, "codec_tag": "\x00\x00\x00\x00"}, "duration_seconds": 59.928}], "bit_rate": 128057, "metadata": {"encoder": "Lavf60.3.100"}, "bit_exact": false} |
Create Transcriptions
Now we’ll add a step to create transcriptions of our videos. As mentioned above, we’re going to use the Whisper library for this, running locally. Pixeltable has a built-in function,whisper.transcribe, that serves as an adapter for the Whisper
library’s transcription capability. All we have to do is add a computed
column that calls this function:
| video | transcription_text |
|---|---|
| of experiencing self versus remembering self. I was hoping you can give a simple answer of how we should live life. Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event when experienced bears its fruits the most when it's remembered over and over and over and over. And maybe there is some wisdom in the fact that we can control to some degree how we remember how we evolve our memory of it, such that it can maximize the long-term happiness of that repeated experience. Oh, well, first I'll say I wish I could take you on the road with me. That was such a great description. Can I be your opening answer? Oh my God, no, I'm gonna open for you, dude. Otherwise it's like, you know, everybody leaves after you're done. | |
| worse, the young adults had episodic memory. And I always say, would say, God, that's so weird. Why would we have this period of time that's so short when we're perfect, right? Or optimal. And I like to use that word optimal now because there's such a culture of optimization right now. And it's like, I realize I have to redefine what optimal is because for most of the human condition, I think we had a series of stages of life where you have basically adults saying, okay, young adults saying, I've got a child and, you know, I'm part of this village and I have to hunt and forage and get things done. I need a prefrontal cortex so I can stay focused on the big picture and the long haul goals. Now, I'm a child. I'm in this village. I'm kind of wandering around. And I've got some safety and I need to learn about this culture because I know so little. What's the best way to do that? Let's explore. I don't want to be constrained by goals as much. I want to really be |
StringSplitter iterator.
StringSplitter creates a new view, with the audio transcriptions
broken into individual, one-sentence chunks.
| pos | text |
|---|---|
| 0 | of experiencing self versus remembering self. |
| 1 | I was hoping you can give a simple answer of how we should live life. |
| 2 | Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event when experienced bears its fruits the most when it's remembered over and over and over and over. |
| 3 | And maybe there is some wisdom in the fact that we can control to some degree how we remember how we evolve our memory of it, such that it can maximize the long-term happiness of that repeated experience. |
| 4 | Oh, well, first I'll say I wish I could take you on the road with me. |
| 5 | That was such a great description. |
| 6 | Can I be your opening answer? |
| 7 | Oh my God, no, I'm gonna open for you, dude. |
Add an Embedding Index
Next, let’s use the Huggingfacesentence_transformers library to
create an embedding index of our sentences, attaching it to the text
column of our sentences_view.
| text | similarity |
|---|---|
| Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event when experienced bears its fruits the most when it's remembered over and over and over and over. | 0.805 |
| I was hoping you can give a simple answer of how we should live life. | 0.792 |
| Why would we have this period of time that's so short when we're perfect, right? | 0.789 |
| I want to really be | 0.788 |
| Can I be your opening answer? | 0.785 |
| of experiencing self versus remembering self. | 0.785 |
| I need a prefrontal cortex | 0.785 |
| And maybe there is some wisdom in the fact that we can control to some degree how we remember how we evolve our memory of it, such that it can maximize the long-term happiness of that repeated experience. | 0.785 |
| What's the best way to do that? | 0.783 |
| And it's like, I realize I have to redefine what optimal is because for most of the human condition, I think we had a series of stages of life where you have basically adults saying, okay, young adults saying, I've got a child and, you know, I'm part of this village and I have to hunt and forage and get things done. | 0.776 |
Incremental Updates
Incremental updates are a key feature of Pixeltable. Whenever a new video is added to the original table, all of its downstream computed columns are updated automatically. Let’s demonstrate this by adding a third video to the table and seeing how the updates propagate through to the index.| video | metadata | transcription_text |
|---|---|---|
| {"size": 959276, "streams": [{"type": "audio", "frames": 0, "duration": 845703936, "metadata": {"encoder": "Lavf"}, "time_base": 7.086e-08, "codec_context": {"name": "mp3float", "profile": null, "channels": 2, "codec_tag": "\x00\x00\x00\x00"}, "duration_seconds": 59.928}], "bit_rate": 128057, "metadata": {"encoder": "Lavf60.3.100"}, "bit_exact": false} | of experiencing self versus remembering self. I was hoping you can give a simple answer of how we should live life. Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event when experienced bears its fruits the most when it's remembered over and over and over and over. And maybe there is some wisdom in the fact that we can control to some degree how we remember how we evolve our memory of it, such that it can maximize the long-term happiness of that repeated experience. Oh, well, first I'll say I wish I could take you on the road with me. That was such a great description. Can I be your opening answer? Oh my God, no, I'm gonna open for you, dude. Otherwise it's like, you know, everybody leaves after you're done. | |
| {"size": 959276, "streams": [{"type": "audio", "frames": 0, "duration": 845703936, "metadata": {"encoder": "Lavf"}, "time_base": 7.086e-08, "codec_context": {"name": "mp3float", "profile": null, "channels": 2, "codec_tag": "\x00\x00\x00\x00"}, "duration_seconds": 59.928}], "bit_rate": 128057, "metadata": {"encoder": "Lavf60.3.100"}, "bit_exact": false} | worse, the young adults had episodic memory. And I always say, would say, God, that's so weird. Why would we have this period of time that's so short when we're perfect, right? Or optimal. And I like to use that word optimal now because there's such a culture of optimization right now. And it's like, I realize I have to redefine what optimal is because for most of the human condition, I think we had a series of stages of life where you have basically adults saying, okay, young adults saying, I've got a child and, you know, I'm part of this village and I have to hunt and forage and get things done. I need a prefrontal cortex so I can stay focused on the big picture and the long haul goals. Now, I'm a child. I'm in this village. I'm kind of wandering around. And I've got some safety and I need to learn about this culture because I know so little. What's the best way to do that? Let's explore. I don't want to be constrained by goals as much. I want to really be | |
| {"size": 959276, "streams": [{"type": "audio", "frames": 0, "duration": 845703936, "metadata": {"encoder": "Lavf"}, "time_base": 7.086e-08, "codec_context": {"name": "mp3float", "profile": null, "channels": 2, "codec_tag": "\x00\x00\x00\x00"}, "duration_seconds": 59.928}], "bit_rate": 128057, "metadata": {"encoder": "Lavf60.3.100"}, "bit_exact": false} | about reusing information and making the most of what we already have. And so that's why basically again, what you see biologically is neuromodulators, for instance, these chemicals in the brain like norepinephrine, dopamine, serotonin. These are chemicals that are released during moments that tend to be biologically significant, prize, fear, stress, etc. And so these chemicals promote lasting plasticity, right? Essentially some mechanisms for which the brain can say prioritize the information that you carry with you into the future. Attention is a big factor as well, our ability to focus our attention on what's important. And so there's different schools of thought on training attention, for instance. So one of my colleagues, Amishi Jia, she wrote a book called Peak Mind and talks about mindfulness as a method for improving attention and focus. |
| text | similarity |
|---|---|
| Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event when experienced bears its fruits the most when it's remembered over and over and over and over. | 0.805 |
| These are chemicals that are released during moments that tend to be biologically significant, prize, fear, stress, etc. | 0.798 |
| I was hoping you can give a simple answer of how we should live life. | 0.792 |
| Why would we have this period of time that's so short when we're perfect, right? | 0.789 |
| I want to really be | 0.788 |
| Can I be your opening answer? | 0.785 |
| of experiencing self versus remembering self. | 0.785 |
| I need a prefrontal cortex | 0.785 |
| And maybe there is some wisdom in the fact that we can control to some degree how we remember how we evolve our memory of it, such that it can maximize the long-term happiness of that repeated experience. | 0.785 |
| Essentially some mechanisms for which the brain can say prioritize the information that you carry with you into the future. | 0.783 |
| Attention is a big factor as well, our ability to focus our attention on what's important. | 0.783 |
| What's the best way to do that? | 0.783 |
| And it's like, I realize I have to redefine what optimal is because for most of the human condition, I think we had a series of stages of life where you have basically adults saying, okay, young adults saying, I've got a child and, you know, I'm part of this village and I have to hunt and forage and get things done. | 0.776 |
| about reusing information and making the most of what we already have. | 0.774 |
| so I can stay focused on the big picture and the long haul goals. | 0.772 |
| I don't want to be constrained by goals as much. | 0.767 |
| Or optimal. | 0.767 |
| That was such a great description. | 0.766 |
| And so that's why basically again, what you see biologically is neuromodulators, for instance, these chemicals in the brain like norepinephrine, dopamine, serotonin. | 0.759 |
| So one of my colleagues, Amishi Jia, she wrote a book called Peak Mind and talks about mindfulness as a method for improving attention and focus. | 0.756 |
sentences_view.
Using the OpenAI API
This concludes our tutorial using the locally installed Whisper library. Sometimes, it may be preferable to use the OpenAI API rather than a locally installed library. In this section we’ll show how this can be done in Pixeltable, simply by using a different function to construct our computed columns. Since this section relies on calling out to the OpenAI API, you’ll need to have an API key, which you can enter below.| video | transcription_text | transcriptionfromapi_text |
|---|---|---|
| of experiencing self versus remembering self. I was hoping you can give a simple answer of how we should live life. Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event when experienced bears its fruits the most when it's remembered over and over and over and over. And maybe there is some wisdom in the fact that we can control to some degree how we remember how we evolve our memory of it, such that it can maximize the long-term happiness of that repeated experience. Oh, well, first I'll say I wish I could take you on the road with me. That was such a great description. Can I be your opening answer? Oh my God, no, I'm gonna open for you, dude. Otherwise it's like, you know, everybody leaves after you're done. | of experiencing self versus remembering self, I was hoping you can give a simple answer of how we should live life. Based on the fact that our memories could be a source of happiness or could be the primary source of happiness, that an event, when experienced, bears its fruits the most when it's remembered over and over and over and over. And maybe there is some wisdom in the fact that we can control, to some degree, how we remember it, how we evolve our memory of it, such that it can maximize the long-term happiness of that repeated experience. Okay, well, first I'll say, I wish I could take you on the road with me because that was such a great description. Can I be your opening excerpt? Oh my God, no, I'm gonna open for you, dude. Otherwise, it's like, you know, everybody leaves after you're done. Ha, ha, ha, ha, ha. | |
| worse, the young adults had episodic memory. And I always say, would say, God, that's so weird. Why would we have this period of time that's so short when we're perfect, right? Or optimal. And I like to use that word optimal now because there's such a culture of optimization right now. And it's like, I realize I have to redefine what optimal is because for most of the human condition, I think we had a series of stages of life where you have basically adults saying, okay, young adults saying, I've got a child and, you know, I'm part of this village and I have to hunt and forage and get things done. I need a prefrontal cortex so I can stay focused on the big picture and the long haul goals. Now, I'm a child. I'm in this village. I'm kind of wandering around. And I've got some safety and I need to learn about this culture because I know so little. What's the best way to do that? Let's explore. I don't want to be constrained by goals as much. I want to really be | or worse, the young adults at episodic memory. And I always would say, God, this is so weird. Why would we have this period of time that's so short when we're perfect, right? Or optimal. And I like to use that word optimal now because there's such a culture of optimization right now. And it's like, I realized I have to redefine what optimal is because for most of the human condition, I think we had a series of stages of life where you have basically adults saying, okay, young adults saying, I've got a child and I'm part of this village and I have to hunt and forage and get things done. I need a prefrontal cortex so I can stay focused on the big picture and long haul goals. Now I'm a child, I'm in this village, I'm kind of wandering around and I've got some safety and I need to learn about this culture because I know so little. What's the best way to do that? Let's explore. I don't wanna be constrained by goals as much. I wanna really be free. | |
| about reusing information and making the most of what we already have. And so that's why basically again, what you see biologically is neuromodulators, for instance, these chemicals in the brain like norepinephrine, dopamine, serotonin. These are chemicals that are released during moments that tend to be biologically significant, prize, fear, stress, etc. And so these chemicals promote lasting plasticity, right? Essentially some mechanisms for which the brain can say prioritize the information that you carry with you into the future. Attention is a big factor as well, our ability to focus our attention on what's important. And so there's different schools of thought on training attention, for instance. So one of my colleagues, Amishi Jia, she wrote a book called Peak Mind and talks about mindfulness as a method for improving attention and focus. | about reusing information and making the most of what we already have. And so that's why basically, again, what you see biologically is neuromodulators, for instance, these chemicals in the brain like norepinephrine, dopamine, serotonin. These are chemicals that are released during moments that tend to be biologically significant, surprise, fear, stress, et cetera. And so these chemicals promote lasting plasticity, right? Essentially some mechanisms by which the brain can prioritize the information that you carry with you into the future. Attention is a big factor as well, our ability to focus our attention on what's important. And so there's different schools of thought on training attention, for instance. So one of my colleagues, Amishi Jha, she wrote a book called Peak Mind and talks about mindfulness as a method for improving attention and focus. |
video_table.transcription.text in the preceding
queries, which pulls out just the text field of the transcription
results. The actual results are a sizable JSON structure that includes a
lot of metadata. To see the full output, we can select
video_table.transcription instead, to get the full JSON struct. Here’s
what it looks like (we’ll select just one row, since it’s a lot of
output):
| transcription | transcription_from_api |
|---|---|
| {"text": " of experiencing self versus remembering self. I was hoping you can give a simple answer of how we should live life. Based on the fact that our me ...... ion. Can I be your opening answer? Oh my God, no, I'm gonna open for you, dude. Otherwise it's like, you know, everybody leaves after you're done.", "language": "en", "segments": [{"id": 0, "end": 5., "seek": 0, "text": " of experiencing self versus remembering self.", "start": 0., "tokens": [50363, 286, 13456, 2116, 9051, 24865, 2116, 13, 50613], "avg_logprob": -0.282, "temperature": 0., "no_speech_prob": 0.213, "compression_ratio": 1.632}, {"id": 1, "end": 8.68, "seek": 0, "text": " I was hoping you can give a simple answer", "start": 6., "tokens": [50663, 314, 373, 7725, 345, 460, 1577, 257, 2829, 3280, 50797], "avg_logprob": -0.282, "temperature": 0., "no_speech_prob": 0.213, "compression_ratio": 1.632}, {"id": 2, "end": 10.2, "seek": 0, "text": " of how we should live life.", "start": 8.68, "tokens": [50797, 286, 703, 356, 815, 2107, 1204, 13, 50873], "avg_logprob": -0.282, "temperature": 0., "no_speech_prob": 0.213, "compression_ratio": 1.632}, {"id": 3, "end": 16.04, "seek": 0, "text": " Based on the fact that our memories", "start": 12.24, "tokens": [50975, 13403, 319, 262, 1109, 326, 674, 9846, 51165], "avg_logprob": -0.282, "temperature": 0., "no_speech_prob": 0.213, "compression_ratio": 1.632}, {"id": 4, "end": 17.84, "seek": 0, "text": " could be a source of happiness", "start": 16.04, "tokens": [51165, 714, 307, 257, 2723, 286, 12157, 51255], "avg_logprob": -0.282, "temperature": 0., "no_speech_prob": 0.213, "compression_ratio": 1.632}, {"id": 5, "end": 20.52, "seek": 0, "text": " or could be the primary source of happiness,", "start": 17.84, "tokens": [51255, 393, 714, 307, 262, 4165, 2723, 286, 12157, 11, 51389], "avg_logprob": -0.282, "temperature": 0., "no_speech_prob": 0.213, "compression_ratio": 1.632}, ..., {"id": 14, "end": 48.52, "seek": 2552, "text": " on the road with me.", "start": 47.52, "tokens": [51463, 319, 262, 2975, 351, 502, 13, 51513], "avg_logprob": -0.285, "temperature": 0., "no_speech_prob": 0.001, "compression_ratio": 1.611}, {"id": 15, "end": 50.52, "seek": 2552, "text": " That was such a great description.", "start": 48.52, "tokens": [51513, 1320, 373, 884, 257, 1049, 6764, 13, 51613], "avg_logprob": -0.285, "temperature": 0., "no_speech_prob": 0.001, "compression_ratio": 1.611}, {"id": 16, "end": 52.88, "seek": 2552, "text": " Can I be your opening answer?", "start": 51.52, "tokens": [51663, 1680, 314, 307, 534, 4756, 3280, 30, 51731], "avg_logprob": -0.285, "temperature": 0., "no_speech_prob": 0.001, "compression_ratio": 1.611}, {"id": 17, "end": 56.08, "seek": 5288, "text": " Oh my God, no, I'm gonna open for you, dude.", "start": 52.88, "tokens": [50363, 3966, 616, 1793, 11, 645, ..., 329, 345, 11, 18396, 13, 50523], "avg_logprob": -0.337, "temperature": 0., "no_speech_prob": 0.012, "compression_ratio": 1.121}, {"id": 18, "end": 57.28, "seek": 5288, "text": " Otherwise it's like, you know,", "start": 56.08, "tokens": [50523, 15323, 340, 338, 588, 11, 345, 760, 11, 50583], "avg_logprob": -0.337, "temperature": 0., "no_speech_prob": 0.012, "compression_ratio": 1.121}, {"id": 19, "end": 58.88, "seek": 5288, "text": " everybody leaves after you're done.", "start": 57.28, "tokens": [50583, 7288, 5667, 706, 345, 821, 1760, 13, 50663], "avg_logprob": -0.337, "temperature": 0., "no_speech_prob": 0.012, "compression_ratio": 1.121}]} | {"text": "of experiencing self versus remembering self, I was hoping you can give a simple answer of how we should live life. Based on the fact that our mem ...... ning excerpt? Oh my God, no, I'm gonna open for you, dude. Otherwise, it's like, you know, everybody leaves after you're done. Ha, ha, ha, ha, ha."} |