Segmentation of Road Scenes

Segmentation of Road Scenes

Semantic Segmentation of Road Scenes to support reasoning for autonomous driving

Semantic Segmentation of Road Scenes to support reasoning for autonomous driving

Category

May 15, 2024

ROB 535 – Self Driving Cars

ROB 535 – Self Driving Cars

Services

May 15, 2024

Autonomy; Self-Driving; Computer Vision

Autonomy; Self-Driving; Computer Vision

Client

May 15, 2024

Course Final Project

Course Final Project

Year

May 15, 2024

Fall 2024

Fall 2024

🧠 Mask2Former Universal Segmentation Evaluation

Evaluating the Performance on KITTI and SYNTHIA Datasets

By Andrew Albano, Saket Pradhan, Utkrisht Sahai
GitHub: Mask2FormerImageSegementation

📄 Overview

This project investigates the generalization capabilities of Mask2Former, a universal image segmentation architecture, by evaluating its performance on KITTI and SYNTHIA datasets. While Mask2Former has proven competitive on standard datasets like COCO, Cityscapes, and ADE20K, this study expands the evaluation to new domains—specifically real-world urban and synthetic scenes.

🧰 Architecture

The evaluation centers on Mask2Former, which builds upon MaskFormer by incorporating:

  • Masked attention in the Transformer decoder

  • Multi-scale high-res features for better small object segmentation

  • Removal of dropout, and use of learnable query features

These improvements target segmentation quality and computational efficiency without altering training procedures or loss functions.

📊 Datasets Used

DatasetTypeKey FeaturesKITTIReal-worldUrban driving scenes, semantic & instance labelsSYNTHIASyntheticDiverse environmental/weather conditionsCOCOBenchmarkGeneral-purpose object segmentationCityscapesBenchmarkUrban street scenesADE20KBenchmarkDiverse visual environments

🧪 Methodology

  • Used pre-trained Mask2Former model

  • Fine-tuned on KITTI and SYNTHIA without altering architecture or training procedure

  • Evaluated with mIoU (Mean Intersection over Union) and qualitative visualization

  • Focused on small object segmentation performance

📈 Results

Dataset

mIoU Score

Notes

Cityscapes

62

~5,000 images (fine annotations) 2,975 train, 500 val, 1,525 test (only train+val have labels)

COCO

61

~164,000 images ~118k train, ~5k val, ~41k test; supports panoptic/semantic/instance segmentation

ADE20K

58

~25,000 images 20k train, 2k val, 3k test; wide variety of scenes

KITTI

55

~200 labeled images (for segmentation) Limited semantic segmentation labels; used mostly for depth/object detection

SYNTHIA

49

~9,400 images (SYNTHIA-RAND-CITYSCAPES subset) Synthetic, diverse weather/time conditions

  • Generalization: Strong performance on KITTI/SYNTHIA despite no architectural tuning

  • Challenge: Small object segmentation remains a weak point

🎯 Key Insights

  • Robust across domains: Performs well on both real and synthetic data

  • Limitations: Misses small-scale details in crowded scenes

  • Future Work: Improve multi-scale feature extraction, explore domain adaptation

📷 Visual Results

Visual comparisons with ground truth for KITTI images are provided to highlight:

  • Effective large object segmentation

  • Missed small-scale details

  • Color mismatches due to post-processing (not misclassification)

📚 References

Key citations include works on Mask2Former, Mask R-CNN, semantic/instance/panoptic segmentation, and transformer-based vision models (see full paper for full reference list).

📬 Contact

For questions or collaborations, reach out via the GitHub repo or email any of the authors:

Let's talk 🙉

I am: Super busy ATM!

Email:

saketp@umich.edu

Reach out:

© Copyright 2025

Let's talk 🙉

I am: Super busy ATM!

Email:

saketp@umich.edu

Reach out:

© Copyright 2025