This documentation page is also available as an interactive notebook. You can launch the notebook in
Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the
above links.
Understand when to use bounding boxes versus pixel-level masks for image
analysis.
What’s in this recipe:
- Run object detection to get bounding boxes and labels
- Run panoptic segmentation to get pixel-level masks
- Visualize and compare outputs side-by-side
Problem
You need to analyze objects in images, but there are two approaches:
Which should you use? Detection is faster but approximate. Segmentation
is slower but precise.
Solution
Run both approaches on the same images using DETR models and compare the
results.
Setup
%pip install -qU pixeltable torch transformers timm
import numpy as np
import pixeltable as pxt
from pixeltable.functions.huggingface import detr_for_object_detection, detr_for_segmentation
from pixeltable.functions.vision import draw_bounding_boxes, overlay_segmentation
Load images
pxt.drop_dir('detection_vs_seg', force=True)
pxt.create_dir('detection_vs_seg')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘detection_vs_seg’.
<pixeltable.catalog.dir.Dir at 0x145b43f90>
images = pxt.create_table('detection_vs_seg.images', {'image': pxt.Image})
base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
images.insert([
{'image': f'{base_url}/000000000034.jpg'},
{'image': f'{base_url}/000000000049.jpg'},
])
Created table ‘images’.
Inserted 2 rows with 0 errors in 0.22 s (9.21 rows/s)
2 rows inserted.
Run object detection
The detr_for_object_detection function returns bounding boxes, labels,
and confidence scores.
Parameters:
model_id: DETR variant (facebook/detr-resnet-50 or
facebook/detr-resnet-101)
threshold: Confidence threshold (0.0-1.0). Higher = fewer but more
confident detections
Output:
{'boxes': [[x1, y1, x2, y2], ...], 'scores': [0.98, ...], 'label_text': ['person', ...]}
images.add_computed_column(
detections=detr_for_object_detection(
images.image,
model_id='facebook/detr-resnet-50',
threshold=0.8
)
)
Added 2 column values with 0 errors in 4.09 s (0.49 rows/s)
2 rows updated.
# View detection results
images.select(images.image, images.detections).collect()
Visualize detections with bounding boxes
Use draw_bounding_boxes to overlay the detection results on the
original image.
images.add_computed_column(
detection_viz=draw_bounding_boxes(
images.image,
boxes=images.detections.boxes,
labels=images.detections.label_text,
fill=True,
width=2
)
)
Added 2 column values with 0 errors in 0.03 s (58.89 rows/s)
2 rows updated.
images.select(images.detection_viz).collect()
Run panoptic segmentation
The detr_for_segmentation function returns pixel-level masks and
segment metadata.
Parameters:
model_id: Segmentation model (facebook/detr-resnet-50-panoptic)
threshold: Confidence threshold for filtering segments
Output:
{
'segmentation': np.ndarray, # (H, W) array where each pixel = segment ID
'segments_info': [{'id': 1, 'label_text': 'person', 'score': 0.98}, ...]
}
Note: The full segmentation output contains a numpy array that
can’t be stored as JSON. We store just the segments_info metadata
and compute the pixel-level visualization inline.
# Store just the segments_info (JSON-serializable) as a computed column
# The segmentation array will be computed inline for visualization
seg_expr = detr_for_segmentation(
images.image,
model_id='facebook/detr-resnet-50-panoptic',
threshold=0.5
)
images.add_computed_column(segments_info=seg_expr.segments_info)
# View stored segmentation info
images.select(images.image, images.segments_info).collect()
Visualize segmentation with colored overlay
Use overlay_segmentation to visualize the pixel masks with colored
regions and contours.
# Compute segmentation visualization inline
# Cast the segmentation array to the proper type for overlay_segmentation
seg_expr = detr_for_segmentation(
images.image,
model_id='facebook/detr-resnet-50-panoptic',
threshold=0.5
)
segmentation_map = seg_expr.segmentation.astype(pxt.Array[(None, None), np.int32])
images.select(
segmentation_viz=overlay_segmentation(
images.image,
segmentation_map,
alpha=0.5,
draw_contours=True,
contour_thickness=2
)
).collect()
Compare side-by-side
# Side-by-side comparison: original, detection, segmentation
seg_expr = detr_for_segmentation(
images.image,
model_id='facebook/detr-resnet-50-panoptic',
threshold=0.5
)
segmentation_map = seg_expr.segmentation.astype(pxt.Array[(None, None), np.int32])
images.select(
images.image,
images.detection_viz,
segmentation_viz=overlay_segmentation(
images.image,
segmentation_map,
alpha=0.5,
draw_contours=True,
contour_thickness=2
)
).collect()
Count objects per image
# Count objects per image (using stored columns)
images.select(
images.image,
num_detections=images.detections.boxes.apply(len, col_type=pxt.Int),
num_segments=images.segments_info.apply(len, col_type=pxt.Int)
).collect()
Explanation
Detection gives fast, approximate locations. Segmentation gives slower
but precise boundaries.
Capability comparison
When to use each
Choose detection when:
- You need to know what objects are present and where
(approximately)
- Speed matters (detection is 2x faster)
- You need search, filtering, or counting
- Bounding boxes suffice for visualization
Choose segmentation when:
- You need exact object boundaries (pixel-perfect masks)
- You’re doing image editing, compositing, or AR
- You need to measure actual object area/coverage
- You want scene composition analysis (what % is sky vs buildings)
See also