Projects

Segmentation of Road Scenes

Semantic Segmentation of Road Scenes to support reasoning for autonomous driving

🧠 Mask2Former Universal Segmentation Evaluation

Evaluating the Performance on KITTI and SYNTHIA Datasets

By Andrew Albano, Saket Pradhan, Utkrisht Sahai
GitHub: Mask2FormerImageSegementation

📄 Overview

This project investigates the generalization capabilities of Mask2Former, a universal image segmentation architecture, by evaluating its performance on KITTI and SYNTHIA datasets. While Mask2Former has proven competitive on standard datasets like COCO, Cityscapes, and ADE20K, this study expands the evaluation to new domains—specifically real-world urban and synthetic scenes.

🧰 Architecture

The evaluation centers on Mask2Former, which builds upon MaskFormer by incorporating:

Masked attention in the Transformer decoder
Multi-scale high-res features for better small object segmentation
Removal of dropout, and use of learnable query features

These improvements target segmentation quality and computational efficiency without altering training procedures or loss functions.

📊 Datasets Used

DatasetTypeKey FeaturesKITTIReal-worldUrban driving scenes, semantic & instance labelsSYNTHIASyntheticDiverse environmental/weather conditionsCOCOBenchmarkGeneral-purpose object segmentationCityscapesBenchmarkUrban street scenesADE20KBenchmarkDiverse visual environments

🧪 Methodology

Used pre-trained Mask2Former model
Fine-tuned on KITTI and SYNTHIA without altering architecture or training procedure
Evaluated with mIoU (Mean Intersection over Union) and qualitative visualization
Focused on small object segmentation performance

📈 Results

Dataset	mIoU Score	Notes
Cityscapes	62	~5,000 images (fine annotations) 2,975 train, 500 val, 1,525 test (only train+val have labels)
COCO	61	~164,000 images ~118k train, ~5k val, ~41k test; supports panoptic/semantic/instance segmentation
ADE20K	58	~25,000 images 20k train, 2k val, 3k test; wide variety of scenes
KITTI	55	~200 labeled images (for segmentation) Limited semantic segmentation labels; used mostly for depth/object detection
SYNTHIA	49	~9,400 images (SYNTHIA-RAND-CITYSCAPES subset) Synthetic, diverse weather/time conditions

Generalization: Strong performance on KITTI/SYNTHIA despite no architectural tuning
Challenge: Small object segmentation remains a weak point

🎯 Key Insights

Robust across domains: Performs well on both real and synthetic data
Limitations: Misses small-scale details in crowded scenes
Future Work: Improve multi-scale feature extraction, explore domain adaptation

📷 Visual Results

Visual comparisons with ground truth for KITTI images are provided to highlight:

Effective large object segmentation
Missed small-scale details
Color mismatches due to post-processing (not misclassification)

📚 References

Key citations include works on Mask2Former, Mask R-CNN, semantic/instance/panoptic segmentation, and transformer-based vision models (see full paper for full reference list).

📬 Contact

For questions or collaborations, reach out via the GitHub repo or email any of the authors:

Andrew Albano – aalbano@umich.edu
Saket Pradhan – saketp@umich.edu
Utkrisht Sahai – usahai@umich.edu

View more projects

View all

AgroQR: Hyperlocal Environmental Monitoring

Remote Sensing; Drones; Climate Tech

AgroQR: Hyperlocal Environmental Monitoring

Remote Sensing; Drones; Climate Tech

AgroQR: Hyperlocal Environmental Monitoring

Remote Sensing; Drones; Climate Tech

Spot-ROS2: ROS 2 Driver Package for Boston Dynamics' Spot

ROB 590 – Research Project (ARM LAB)

Spot-ROS2: ROS 2 Driver Package for Boston Dynamics' Spot

ROB 590 – Research Project (ARM LAB)

Spot-ROS2: ROS 2 Driver Package for Boston Dynamics' Spot

ROB 590 – Research Project (ARM LAB)

Humanoid Bipedal Locomotion and AI Control on Unitree G1

Independent Research

Humanoid Bipedal Locomotion and AI Control on Unitree G1

Independent Research

Humanoid Bipedal Locomotion and AI Control on Unitree G1

Independent Research

Pixel Polyglots: Pronunciation Enhancement in Language Learning

EECS 504 – Foundations of Computer Vision

Pixel Polyglots: Pronunciation Enhancement in Language Learning

EECS 504 – Foundations of Computer Vision

Pixel Polyglots: Pronunciation Enhancement in Language Learning

EECS 504 – Foundations of Computer Vision

Segmentation of Road Scenes

Segmentation of Road Scenes

🧠 Mask2Former Universal Segmentation Evaluation

Evaluating the Performance on KITTI and SYNTHIA Datasets

📄 Overview

🧰 Architecture

📊 Datasets Used

🧪 Methodology

📈 Results

🎯 Key Insights

📷 Visual Results

📚 References

📬 Contact

View more projects

View more projects

AgroQR: Hyperlocal Environmental Monitoring

AgroQR: Hyperlocal Environmental Monitoring

AgroQR: Hyperlocal Environmental Monitoring

Spot-ROS2: ROS 2 Driver Package for Boston Dynamics' Spot

Spot-ROS2: ROS 2 Driver Package for Boston Dynamics' Spot

Spot-ROS2: ROS 2 Driver Package for Boston Dynamics' Spot

Humanoid Bipedal Locomotion and AI Control on Unitree G1

Humanoid Bipedal Locomotion and AI Control on Unitree G1

Humanoid Bipedal Locomotion and AI Control on Unitree G1

Pixel Polyglots: Pronunciation Enhancement in Language Learning

Pixel Polyglots: Pronunciation Enhancement in Language Learning

Pixel Polyglots: Pronunciation Enhancement in Language Learning

Let's talk 🙉

I am: Super busy ATM!

Email:

Phone:

Socials:

Let's talk 🙉

I am: Super busy ATM!

Email:

Phone:

Socials: