Voxel51
Working with Voxel51 in Pixeltable
Pixeltable can export data directly from tables and views to the popular Voxel51 frontend, providing a way to visualize and explore image and video datasets. In this tutorial, we'll learn how to:
- Export data from Pixeltable to Voxel51
- Apply labels from image classification and object detection models to exported data
We begin by installing the necessary libraries for this tutorial.
%pip install -qU pixeltable fiftyone transformers
Example 1: An Image Dataset
import fiftyone as fo
import pixeltable as pxt
# Create a Pixeltable directory for the demo. We first drop the directory if it
# exists, in order to ensure a clean environment.
pxt.drop_dir('fo_demo', force=True)
pxt.create_dir('fo_demo')
Connected to Pixeltable database at:
postgresql+psycopg://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory `fo_demo`.
<pixeltable.catalog.dir.Dir at 0x387766700>
# Create a Pixeltable table for our dataset and insert some sample images.
url_prefix = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/source/data/images'
urls = [
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/source/data/images/000000000019.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/source/data/images/000000000025.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/source/data/images/000000000030.jpg',
'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/source/data/images/000000000034.jpg',
]
t = pxt.create_table('fo_demo.images', {'image': pxt.Image})
t.insert({'image': url} for url in urls)
t.head()
Created table `images`.
Inserting rows into `images`: 4 rows [00:00, 2775.85 rows/s]
Inserted 4 rows with 0 errors.
Now we export our new table to a Voxel51 dataset and load it into a new Voxel51 session within our demo notebook. Once it's been loaded, the images can be interactively navigated as with any other Voxel51 dataset.
fo_dataset = pxt.io.export_images_as_fo_dataset(t, t.image)
session = fo.launch_app(fo_dataset)
4 [20.0ms elapsed, ? remaining, 199.8 samples/s]
INFO:eta.core.utils: 4 [20.0ms elapsed, ? remaining, 199.8 samples/s]
Adding Labels
We'll now show how Voxel51 labels can be attached to the exported dataset. Currently, Pixeltable supports only classification and detection labels; other Voxel51 label types may be added in the future.
First, let's generate some labels by applying two models from the Huggingface transformers
library: A ViT model for image classification and a DETR model for object detection.
from pixeltable.functions.huggingface import vit_for_image_classification, detr_for_object_detection
t.add_column(classifications=vit_for_image_classification(
t.image, model_id='google/vit-base-patch16-224'
))
t.add_column(detections=detr_for_object_detection(
t.image, model_id='facebook/detr-resnet-50'
))
Computing cells: 100%|ββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:00<00:00, 5.86 cells/s]
Added 4 column values with 0 errors.
Computing cells: 100%|ββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:01<00:00, 2.15 cells/s]
Added 4 column values with 0 errors.
UpdateStatus(num_rows=4, num_computed_values=4, num_excs=0, updated_cols=[], cols_with_excs=[])
Both models output JSON containing the model results. Let's peek at the contents of our table now:
t.head()
Now we need to transform our model data into the format the Voxel51 API expects (see the Pixeltable documentation for pxt.io.export_images_as_fo_dataset for details). We'll use Pixeltable UDFs to do the appropriate conversions.
@pxt.udf
def vit_to_fo(vit_labels: list) -> list:
return [
{'label': label, 'confidence': score}
for label, score in zip(vit_labels['label_text'], vit_labels['scores'])
]
@pxt.udf
def detr_to_fo(img: pxt.Image, detr_labels: dict) -> list:
result = []
for label, box, score in zip(detr_labels['label_text'], detr_labels['boxes'], detr_labels['scores']):
# DETR gives us bounding boxes in (x1,y1,x2,y2) absolute (pixel) coordinates.
# Voxel51 expects (x,y,w,h) relative (fractional) coordinates.
# So we need to do a conversion.
fo_box = [
box[0] / img.width,
box[1] / img.height,
(box[2] - box[0]) / img.width,
(box[3] - box[1]) / img.height,
]
result.append({'label': label, 'bounding_box': fo_box, 'confidence': score})
return result
We can test that our UDFs are working as expected with a select()
statement.
t.select(
t.image,
t.classifications,
vit_to_fo(t.classifications),
t.detections,
detr_to_fo(t.image, t.detections)
).head()
Now we pass the modified structures to export_images_as_fo_dataset
.
fo_dataset = pxt.io.export_images_as_fo_dataset(
t,
t.image,
classifications=vit_to_fo(t.classifications),
detections=detr_to_fo(t.image, t.detections)
)
session = fo.launch_app(fo_dataset)
4 [36.6ms elapsed, ? remaining, 109.4 samples/s]
INFO:eta.core.utils: 4 [36.6ms elapsed, ? remaining, 109.4 samples/s]
Adding Multiple Label Sets
You can include multiple label sets of the same type in the same dataset by passing a list
or dict
of expressions to the classifications
and/or detections
parameters. If a list
is specified, default names will be assigned to the label sets; if a dict
is specified, the label sets will be named according to its keys.
As an example, let's try recomputing our detections using the more powerful DETR model ResNet-101, and then load them into the same Voxel51 dataset as the earlier detections in order to compare them side-by-side.
t.add_column(detections_101=detr_for_object_detection(
t.image, model_id='facebook/detr-resnet-101'
))
Computing cells: 100%|ββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:01<00:00, 2.90 cells/s]
Added 4 column values with 0 errors.
UpdateStatus(num_rows=4, num_computed_values=4, num_excs=0, updated_cols=[], cols_with_excs=[])
fo_dataset = pxt.io.export_images_as_fo_dataset(
t,
t.image,
classifications=vit_to_fo(t.classifications),
detections={
'detections_50': detr_to_fo(t.image, t.detections),
'detections_101': detr_to_fo(t.image, t.detections_101)
}
)
session = fo.launch_app(fo_dataset)
4 [53.9ms elapsed, ? remaining, 74.2 samples/s]
INFO:eta.core.utils: 4 [53.9ms elapsed, ? remaining, 74.2 samples/s]
Exploring the resulting images, we can see that the results are not much different between the two models, at least on our small sample dataset.
Updated about 2 months ago