Runtime -> Change runtime type menu item at the top, then select the
GPU radio button and click on Save.
Creating a tutorial directory and table
First, let’s make sure the packages we need for this tutorial are installed: Pixeltable itself, PyTorch, and the YOLOX object detection library.detection_demo directory and a
table to hold our videos, with a single column of type pxt.Video.
FrameIterator class for this.
videos table nor the frames view has any
actual data yet, because we haven’t yet added any videos to the table.
However, the frames view is now configured to automatically track the
videos table as new data shows up.
The new view is automatically configured with six columns: - pos - a
system column that is part of every component view - video - the
column inherited from our base table (all base table columns are visible
in any of its views) - frame_idx, pos_msec, pos_frame, frame -
these four columns are created by the FrameIterator class.
Let’s have a look at the new view:
| View 'detection_demo.frames' (of 'detection_demo.videos') |
| Column Name | Type | Computed With |
|---|---|---|
| pos | Required[Int] | |
| frame_idx | Required[Int] | |
| pos_msec | Required[Float] | |
| pos_frame | Required[Int] | |
| frame | Required[Image] | |
| video | Video |
videos table and frames view were automatically
updated, expanding the single video into 461 rows in the view. Let’s
have a look at videos first.
| video |
|---|
frames:
| pos | frame | width | height |
|---|---|---|---|
| 0 | 1280 | 720 | |
| 1 | 1280 | 720 | |
| 2 | 1280 | 720 | |
| 3 | 1280 | 720 | |
| 4 | 1280 | 720 |
Object Detection with Pixeltable
Now let’s apply an object detection model to our frames. Pixeltable includes built-in support for a number of models; we’re going to use the YOLOX family of models, which are lightweight models with solid performance. We first import theyolox Pixeltable function.
select comprehension.
| frame | yolox |
|---|---|
| {"bboxes": [[338.1894836425781, 345.59979248046875, 433.25408935546875, 402.1943359375], [101.51329803466797, 419.788330078125, 259.7282409667969, 512.6688842773438], [-0.27876633405685425, 555.6809692382812, 96.86800384521484, 675.8363037109375], [478.0632629394531, 290.70819091796875, 541.0510864257812, 333.1060791015625], [317.3229064941406, 488.96636962890625, 571.7535400390625, 640.4901733398438], [561.7608032226562, 282.067138671875, 609.9308471679688, 318.03826904296875], [582.622802734375, 409.7627258300781, 675.9083862304688, 518.007568359375], [884.5014038085938, 341.38031005859375, 994.5653076171875, 413.2419128417969], [40.29541778564453, 447.2200622558594, 98.4399642944336, 512.966064453125], [261.9676513671875, 626.4561767578125, 395.9239807128906, 716.9423217773438], [483.1141357421875, 574.5985717773438, 597.1210327148438, 686.2103271484375], [881.5115356445312, 340.0090026855469, 997.948974609375, 415.1181945800781]], "scores": [0.83, 0.812, 0.797, 0.763, 0.755, 0.688, 0.606, 0.541, 0.514, 0.503, 0.501, 0.5], "labels": [2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 7]} | |
| {"bboxes": [[-0.688212513923645, 552.1438598632812, 99.95340728759766, 674.526123046875], [341.2273254394531, 343.3525695800781, 436.5255126953125, 401.6473083496094], [105.3248291015625, 417.08953857421875, 261.75408935546875, 508.39202880859375], [317.1871337890625, 489.1314392089844, 571.6358642578125, 640.3237915039062], [478.8114318847656, 288.74725341796875, 539.8158569335938, 329.8291931152344], [261.3262939453125, 623.7770385742188, 396.0945129394531, 717.0194702148438], [563.2278442382812, 280.2528991699219, 607.8761596679688, 317.1873474121094], [583.1015625, 409.9834289550781, 675.8267211914062, 516.9342041015625]], "scores": [0.823, 0.816, 0.767, 0.764, 0.76, 0.636, 0.633, 0.586], "labels": [2, 2, 2, 2, 2, 3, 2, 2]} | |
| {"bboxes": [[-0.2689361572265625, 550.2958374023438, 105.7685546875, 674.0948486328125], [342.9891052246094, 343.697998046875, 436.35302734375, 400.53289794921875], [317.16131591796875, 489.21612548828125, 571.1922607421875, 640.4578857421875], [484.03167724609375, 290.1268005371094, 541.7189331054688, 330.689697265625], [103.90556335449219, 415.7806091308594, 265.3205261230469, 503.9178161621094], [563.6137084960938, 280.4433898925781, 609.4480590820312, 316.4215393066406], [582.57763671875, 410.0472412109375, 675.9314575195312, 516.9766235351562], [886.6967163085938, 344.7193908691406, 1004.0186767578125, 418.2158203125], [829.179931640625, 309.6842346191406, 875.6467895507812, 347.74945068359375], [833.5235595703125, 257.552490234375, 880.591064453125, 302.0234069824219], [1039.58984375, 366.77166748046875, 1095.6329345703125, 410.4030456542969], [481.5091247558594, 576.0469360351562, 596.9995727539062, 686.8408203125], [885.1788330078125, 343.36322021484375, 1004.2406005859375, 417.9952392578125]], "scores": [0.847, 0.831, 0.767, 0.73, 0.722, 0.695, 0.623, 0.553, 0.546, 0.545, 0.543, 0.531, 0.502], "labels": [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7]} |
frames view:
| View 'detection_demo.frames' (of 'detection_demo.videos') |
| Column Name | Type | Computed With |
|---|---|---|
| pos | Required[Int] | |
| frame_idx | Required[Int] | |
| pos_msec | Required[Float] | |
| pos_frame | Required[Int] | |
| frame | Required[Image] | |
| detections_tiny | Required[Json] | yolox(frame, model_id='yolox_tiny', threshold=0.25) |
| video | Video |
| frame | detections_tiny |
|---|---|
| {"bboxes": [[338.189, 345.6, 433.254, 402.194], [101.513, 419.788, 259.728, 512.669], [-0.279, 555.681, 96.868, 675.836], [478.063, 290.708, 541.051, 333.106], [317.323, 488.966, 571.754, 640.49], [561.761, 282.067, 609.931, 318.038], ..., [796.496, 271.656, 833.955, 303.78], [444.182, 301.092, 475.788, 339.79], [539.506, 238.815, 584.09, 286.23], [1091.483, 384.693, 1122.668, 419.25], [455.287, 542.279, 596.589, 672.937], [829.069, 290.945, 877.77, 330.228]], "labels": [2, 2, 2, 2, 2, 2, ..., 2, 2, 2, 2, 2, 2], "scores": [0.83, 0.812, 0.797, 0.763, 0.755, 0.688, ..., 0.391, 0.361, 0.359, 0.262, 0.252, 0.251]} | |
| {"bboxes": [[-0.688, 552.144, 99.953, 674.526], [341.227, 343.353, 436.526, 401.647], [105.325, 417.09, 261.754, 508.392], [317.187, 489.131, 571.636, 640.324], [478.811, 288.747, 539.816, 329.829], [261.326, 623.777, 396.095, 717.019], ..., [486.762, 243.492, 535.403, 283.84], [830.206, 271.019, 881.183, 316.182], [830.853, 248.258, 877.105, 291.472], [455.334, 541.789, 596.258, 667.598], [804.526, 209.59, 833.616, 239.673], [1091.282, 384.247, 1125.306, 419.068]], "labels": [2, 2, 2, 2, 2, 3, ..., 2, 2, 2, 2, 2, 2], "scores": [0.823, 0.816, 0.767, 0.764, 0.76, 0.636, ..., 0.311, 0.291, 0.289, 0.264, 0.263, 0.256]} | |
| {"bboxes": [[-0.269, 550.296, 105.769, 674.095], [342.989, 343.698, 436.353, 400.533], [317.161, 489.216, 571.192, 640.458], [484.032, 290.127, 541.719, 330.69], [103.906, 415.781, 265.321, 503.918], [563.614, 280.443, 609.448, 316.422], ..., [796.171, 270.344, 832.105, 302.89], [1081.594, 381.512, 1122.797, 417.943], [1100.1, 386.188, 1132.332, 421.028], [455.572, 541.766, 596.424, 668.731], [55.976, 437.112, 109.67, 507.464], [520.065, 575.431, 596.629, 665.397]], "labels": [2, 2, 2, 2, 2, 2, ..., 2, 2, 2, 2, 3, 2], "scores": [0.847, 0.831, 0.767, 0.73, 0.722, 0.695, ..., 0.29, 0.289, 0.284, 0.275, 0.272, 0.256]} |
draw_bounding_boxes UDF for this. We could create a new
computed column to hold the superimposed images, but we don’t have to;
sometimes it’s easier just to use a select comprehension, as we did
when we were first experimenting with the detection model.
| frame | draw_bounding_boxes |
|---|---|
select comprehension ranged over the entire table, but just as
before, Pixeltable computes the output lazily: image operations are
performed at retrieval time, so in this case, Pixeltable drew the
annotations just for the one frame that we actually displayed.
Looking at individual frames gives us some idea of how well our
detection algorithm works, but it would be more instructive to turn the
visualization output back into a video.
We do that with the built-in function make_video(), which is an
aggregation function that takes a frame index (actually: any expression
that can be used to order the frames; a timestamp would also work) and
an image, and then assembles the sequence of images into a video.
| make_video |
|---|
Comparing Object Detection Models
The detections that we get out ofyolox_tiny are passable, but a
little choppy. Suppose we want to experiment with a more powerful object
detection model, to see if there is any improvement in detection
quality. We can create an additional column to hold the new inferences.
The larger model takes longer to download and run, so please be patient.
| make_video | make_video_1 |
|---|---|
Evaluating Models Against a Ground Truth
In order to do a quantitative evaluation of model performance, we need a ground truth to compare them against. Let’s generate some (synthetic) “ground truth” data by running against the largest YOLOX model available. It will take even longer to cache and evaluate this model.detections
columns.
| View 'detection_demo.frames' (of 'detection_demo.videos') |
| Column Name | Type | Computed With |
|---|---|---|
| pos | Required[Int] | |
| frame_idx | Required[Int] | |
| pos_msec | Required[Float] | |
| pos_frame | Required[Int] | |
| frame | Required[Image] | |
| detections_tiny | Required[Json] | yolox(frame, model_id='yolox_tiny', threshold=0.25) |
| detections_m | Required[Json] | yolox(frame, model_id='yolox_m', threshold=0.25) |
| detections_x | Required[Json] | yolox(frame, model_id='yolox_x', threshold=0.25) |
| video | Video |
eval_detections() and mean_ap() built-in functions.
| eval_yolox_tiny | eval_yolox_m |
|---|---|
| [{"fp": [], "tp": [], "class": 0, "scores": [], "min_iou": 0.5, "num_gts": 4}, {"fp": [0, 0, 0, 0, 0, 0, ..., 0, 1, 0, 1, 1, 1], "tp": [1, 1, 1, 1, 1, 1, ..., 1, 0, 1, 0, 0, 0], "class": 2, "scores": [0.83, 0.812, 0.797, 0.763, 0.755, 0.688, ..., 0.391, 0.361, 0.359, 0.262, 0.252, 0.251], "min_iou": 0.5, "num_gts": 27}, {"fp": [0, 0], "tp": [1, 1], "class": 3, "scores": [0.514, 0.503], "min_iou": 0.5, "num_gts": 3}, {"fp": [0, 1, 1], "tp": [1, 0, 0], "class": 7, "scores": [0.5, 0.45, 0.43], "min_iou": 0.5, "num_gts": 5}, {"fp": [], "tp": [], "class": 62, "scores": [], "min_iou": 0.5, "num_gts": 1}] | [{"fp": [0, 0, 0, 0], "tp": [1, 1, 1, 1], "class": 0, "scores": [0.583, 0.467, 0.431, 0.315], "min_iou": 0.5, "num_gts": 4}, {"fp": [0, 0, 0, 0, 0, 0, ..., 0, 1, 0, 0, 0, 1], "tp": [1, 1, 1, 1, 1, 1, ..., 1, 0, 1, 1, 1, 0], "class": 2, "scores": [0.932, 0.903, 0.902, 0.88, 0.879, 0.864, ..., 0.515, 0.512, 0.468, 0.425, 0.366, 0.318], "min_iou": 0.5, "num_gts": 27}, {"fp": [0, 0, 1, 1, 0], "tp": [1, 1, 0, 0, 1], "class": 3, "scores": [0.866, 0.69, 0.67, 0.574, 0.428], "min_iou": 0.5, "num_gts": 3}, {"fp": [1], "tp": [0], "class": 4, "scores": [0.282], "min_iou": 0.5, "num_gts": 0}, {"fp": [0, 0], "tp": [1, 1], "class": 7, "scores": [0.757, 0.388], "min_iou": 0.5, "num_gts": 5}, {"fp": [], "tp": [], "class": 62, "scores": [], "min_iou": 0.5, "num_gts": 1}] |
mean_ap() function.
| mean_ap | mean_ap_1 |
|---|---|
| {0: 0.101, 2: 0.622, 3: 0.278, 7: 0.098, 62: 0., 5: 0.01, 58: 0., 9: 0., 1: 0., 8: 0., 24: 0.} | {0: 0.564, 2: 0.911, 3: 0.723, 4: 0., 7: 0.53, 62: 0., 5: 0.483, 58: 0.032, 1: 0., 9: 0., 25: 0., 24: 0., 8: 0.} |