Skip to main content
Open in Kaggle  Open in Colab  Download Notebook
This documentation page is also available as an interactive notebook. You can launch the notebook in Kaggle or Colab, or download it for use with an IDE or local Jupyter installation, by clicking one of the above links.
Understand when to use bounding boxes versus pixel-level masks for image analysis. What’s in this recipe:
  • Run object detection to get bounding boxes and labels
  • Run panoptic segmentation to get pixel-level masks
  • Visualize and compare outputs side-by-side

Problem

You need to analyze objects in images, but there are two approaches:
Which should you use? Detection is faster but approximate. Segmentation is slower but precise.

Solution

Run both approaches on the same images using DETR models and compare the results.

Setup

%pip install -qU pixeltable torch transformers timm
import numpy as np

import pixeltable as pxt
from pixeltable.functions.huggingface import detr_for_object_detection, detr_for_segmentation
from pixeltable.functions.vision import draw_bounding_boxes, overlay_segmentation

Load images

pxt.drop_dir('detection_vs_seg', force=True)
pxt.create_dir('detection_vs_seg')
Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory ‘detection_vs_seg’.
<pixeltable.catalog.dir.Dir at 0x145b43f90>
images = pxt.create_table('detection_vs_seg.images', {'image': pxt.Image})

base_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images'
images.insert([
    {'image': f'{base_url}/000000000034.jpg'},
    {'image': f'{base_url}/000000000049.jpg'},
])
Created table ‘images’.
Inserted 2 rows with 0 errors in 0.22 s (9.21 rows/s)
2 rows inserted.

Run object detection

The detr_for_object_detection function returns bounding boxes, labels, and confidence scores. Parameters:
  • model_id: DETR variant (facebook/detr-resnet-50 or facebook/detr-resnet-101)
  • threshold: Confidence threshold (0.0-1.0). Higher = fewer but more confident detections
Output:
{'boxes': [[x1, y1, x2, y2], ...], 'scores': [0.98, ...], 'label_text': ['person', ...]}
images.add_computed_column(
    detections=detr_for_object_detection(
        images.image,
        model_id='facebook/detr-resnet-50',
        threshold=0.8
    )
)
Added 2 column values with 0 errors in 4.09 s (0.49 rows/s)
2 rows updated.
# View detection results
images.select(images.image, images.detections).collect()

Visualize detections with bounding boxes

Use draw_bounding_boxes to overlay the detection results on the original image.
images.add_computed_column(
    detection_viz=draw_bounding_boxes(
        images.image,
        boxes=images.detections.boxes,
        labels=images.detections.label_text,
        fill=True,
        width=2
    )
)
Added 2 column values with 0 errors in 0.03 s (58.89 rows/s)
2 rows updated.
images.select(images.detection_viz).collect()

Run panoptic segmentation

The detr_for_segmentation function returns pixel-level masks and segment metadata. Parameters:
  • model_id: Segmentation model (facebook/detr-resnet-50-panoptic)
  • threshold: Confidence threshold for filtering segments
Output:
{
    'segmentation': np.ndarray,  # (H, W) array where each pixel = segment ID
    'segments_info': [{'id': 1, 'label_text': 'person', 'score': 0.98}, ...]
}
Note: The full segmentation output contains a numpy array that can’t be stored as JSON. We store just the segments_info metadata and compute the pixel-level visualization inline.
# Store just the segments_info (JSON-serializable) as a computed column
# The segmentation array will be computed inline for visualization
seg_expr = detr_for_segmentation(
    images.image,
    model_id='facebook/detr-resnet-50-panoptic',
    threshold=0.5
)

images.add_computed_column(segments_info=seg_expr.segments_info)
# View stored segmentation info
images.select(images.image, images.segments_info).collect()

Visualize segmentation with colored overlay

Use overlay_segmentation to visualize the pixel masks with colored regions and contours.
# Compute segmentation visualization inline
# Cast the segmentation array to the proper type for overlay_segmentation
seg_expr = detr_for_segmentation(
    images.image,
    model_id='facebook/detr-resnet-50-panoptic',
    threshold=0.5
)
segmentation_map = seg_expr.segmentation.astype(pxt.Array[(None, None), np.int32])

images.select(
    segmentation_viz=overlay_segmentation(
        images.image,
        segmentation_map,
        alpha=0.5,
        draw_contours=True,
        contour_thickness=2
    )
).collect()

Compare side-by-side

# Side-by-side comparison: original, detection, segmentation
seg_expr = detr_for_segmentation(
    images.image,
    model_id='facebook/detr-resnet-50-panoptic',
    threshold=0.5
)
segmentation_map = seg_expr.segmentation.astype(pxt.Array[(None, None), np.int32])

images.select(
    images.image,
    images.detection_viz,
    segmentation_viz=overlay_segmentation(
        images.image,
        segmentation_map,
        alpha=0.5,
        draw_contours=True,
        contour_thickness=2
    )
).collect()

Count objects per image

# Count objects per image (using stored columns)
images.select(
    images.image,
    num_detections=images.detections.boxes.apply(len, col_type=pxt.Int),
    num_segments=images.segments_info.apply(len, col_type=pxt.Int)
).collect()

Explanation

Detection gives fast, approximate locations. Segmentation gives slower but precise boundaries.

Capability comparison

Performance tradeoffs

When to use each

Choose detection when:
  • You need to know what objects are present and where (approximately)
  • Speed matters (detection is 2x faster)
  • You need search, filtering, or counting
  • Bounding boxes suffice for visualization
Choose segmentation when:
  • You need exact object boundaries (pixel-perfect masks)
  • You’re doing image editing, compositing, or AR
  • You need to measure actual object area/coverage
  • You want scene composition analysis (what % is sky vs buildings)

See also