Autonomous System Engineer
Perception | Sensor Fusion (Camera-LiDAR) | SLAM
My work is focused on 2D and 3D perception, sensor fusion (LiDAR-camera), and localization & mapping techniques, with applications in Autonomous Mobile Robots (AMRs) and Advanced Driver Assistance Systems (ADAS).
My career began with a focus on advancing the perception capabilities of Autonomous Mobile Robots (AMRs). As my career evolved, I broadened my skills by working on LiDAR and camera-based sensor fusion for Advanced Driver Assistance Systems (ADAS). Currently, I am expanding my expertise in Simultaneous Localization and Mapping (SLAM) technologies to achieve long-term autonomy in autonomous systems.
With my experience in AMR and ADAS projects, I divide my work into three core areas:
Egocentric Robust Perception
Sensor Fusion (LiDAR-Camera)
SLAM
ADAS & AMR: Extensive experience working with LiDAR-camera-based sensor fusion for vehicles in ADAS, as well as enhancing the perception capabilities of Autonomous Mobile Robots (AMRs).
Artificial Intelligence & Perception: Work expereince with Deep Learning, Computer Vision and Metric Learning for
2D Vision: Object Detection, Recognition, Tracking & Segmentation
Edge Deployment: Jetson Xavier NX
3D Vision: Object Detection & Recognition, Point Cloud Processing
Sensor Fusion & Calibration – Work experience in multi-sensor fusion (camera–LiDAR) based object recogntion & tracking, including calibration for robust perception pipelines in real-world autonomous driving system.
SLAM & Localization – Practical experience with visual odometry, LiDAR odometry, loop closure, bundle adjustment, and graph-based SLAM (g2o optimization) using ROS2.
Data Collection & Model Optimization – Work experience in acquiring, preprocessing, and managing large-scale datasets, as well as optimizing models for robustness and efficiency.
Leadership & Collaboration – Experienced in leading teams, managing projects from concept to deployment, and aligning technical goals with organizational objectives.
Technical Writing & Knowledge Sharing – Active in writing technical blogs, sharing insights on 3D Vision, LiDAR, camera systems, and sensor fusion technologies, with SLAM-related posts coming soon.
My research Interests are:
Robotics, Autonomous Vehicles
Artificial Intelligence, Deep Learning, Computer Vision, Machine Learning, Metric Learning
Object detection, Recognition, Tracking, Segmentation, 3D Vision, Sensor Fusion
SLAM, Reinforcement Learning, Navigation & Control
I am passionate about mobile robotics and autonomous driving systems, with a strong focus on expanding my expertise from sensor fusion into full SLAM, navigation, and control systems. Beyond my professional work, I actively dedicate time to hands-on projects and self-directed learning to build a comprehensive, end-to-end expertise in the entire robotic autonomy stack.
My goal is to become an expert in full-stack robotics, mastering every part of a robot's autonomy, from how it sees the world to how it moves through it.
Ph.D.(Computer Engineering)
2017-2023
Department of Electrical and Computer Engineering, Sungkyunkwan University
South Korea
MSCS (Computer Science)
2014-2016
Department of Computer Science,
COMSATS University
Pakistan
BSIT ( Information Technology, Computer Science)
2009-2013
Department of Computer Science,
Virtual University
Pakistan
Position:
Robotic Software Research & Development Engineer
Company:
Creative Algorithms and Sensor Evolution Laboratory
Duration:
May 2021 - November 2023
From May 2021 to November 2023, I worked as a Robotic Software Research and Development Engineer at Creative Algorithms and Sensor Evolution Laboratory (CASELab), South Korea. My responsibilities were three-fold:
Developing AI vision-based perception models for robotic applications using deep learning,
Optimizing models for edge deployment, particularly on the NVIDIA Jetson Xavier NX platform,
Publishing research outcomes in SCIE-indexed journals, as well as international and national conferences and workshops.
Over the course of 2 years and 7 months at CASELab, I gained extensive industrial experience in deep learning-based perception for human-robot interaction (HRI). My work included projects such as:
Semantic perception for a service robots
Multi-floor robot navigation using elevator button recognition,
Human-following robot
During this time, I contributed to a wide range of computer vision and AI tasks—including object detection, classification, recognition, tracking, and segmentation—as well as face detection and recognition, gesture recognition, and voice command recognition, with a primary focus on robotic applications. I also worked on deploying these models efficiently on edge devices for real-time robotic perception.
Position:
Team Leader
(Infra Team/ Autonomous Driving)
Company:
Youngshin Co Ltd.
Duration:
December 2023 - Present
Since December 2023, I have been working as the Team Leader for Autonomous Driving project at Youngshin Corporation, South Korea. My key responsibilities include:
Developing AI-based vision and perception models for autonomous driving using deep learning,
Working on multi sensor fusion,
Designing and developing AI solutions in line with the company’s patent applications.
At Youngshin, I have further expanded my expertise sensor fusion, camera and LiDAR calibration, projection, 3D vision. These skills have been applied to real-world autonomous driving systems for object recognition and tracking, as well as lane generation and vehicle position estimation, enabling robust perception for intelligent navigation.
During my work in Youngin, I start to write blogs related to the new technologies that I experienced to work with for LiDAR and different types of cameras for sensor fusion based objects recognition and tracking in ADAS.
Sensor Fusion (LiDAR and Camera) based Marker Recognition and Tracking
Lane Line Generation & Vehicle Position Estimation using Sensor Fusion based Marker Recognition and Tracking
Vehicle Position: First Lane
Vehicle Position: Lane Crossing
Vehicle Position: Second Lane
Vehicle Position: First Lane
Vehicle Position: Lane Crossing
Vehicle Position: Second Lane
I am passionate about working in SLAM and expanding my expertise beyond sensor fusion into SLAM by actively building hands-on experience. To strengthen this transition, I have worked extensively on building a strong foundation through hands-on experience with feature detection and matching, visual odometry, LiDAR odometry, bundle adjustment, loop closure, and graph SLAM with g2o optimization. For example, in one project I integrated these components to evaluate improvements in loop closure detection, applying g2o-based optimization across multiple datasets using ROS2 (Humble) bag files.
A selection of my SLAM-related work, demonstrating these methods in practice, is presented below.
Implemented a visual odometry pipeline for keypoint detection and feature matching.
Developed a dynamic trajectory plotter that expands the visualization canvas automatically to avoid clipping as the trajectory extends.
Enabled keypoint visualization alongside trajectory for detailed analysis using KITTI sequence 08
Estimated (green color) and ground truth (white color) trajectories with average error feedback
Logged pose data for evaluation
Blue arrows representing lidar odometry
Blue Points represent all point clouds accumulated in the map
Green Points represent the Odometry
Yellow Points represent loop closure
Blue Points represent all point clouds accumulated in the map
Green Points represent the Odometry
Logged Poses to the Graph
The graph optimization pipeline is evaluated in three stages:
1: Initial Graph Construction
Initial Graph Construction
Added 57 pose vertices (green) with 56 odometry edges.
Incorporated 172 landmark vertices (blue) and 559 observation edges, resulting in 229 vertices and 615 edges
2: Optimization
Graph structure remains the same (229 vertices, 615 edges).
Positions of poses &landmarks are adjusted to minimize error.
Since only vertex positions are updated, no new log output is generated.
3: Ground Truth Evaluation
Added 57 ground-truth poses (yellow) with 56 edges.
Final graph: 286 vertices, 671 edges.
Track Single Target Object using Kalman Filter
Multi-Object Tracking using Kalman Filter
Track Single Target despite occlusion - Kalman Filter
I possess a robust and well-rounded technical skill set, gained through practical experience with a diverse range of tools, libraries, and frameworks critical to robotics, autonomous systems, and machine learning applications. My expertise spans across middleware, simulation environments, programming languages, and both 2D and 3D perception frameworks, enabling me to design, develop, and optimize complex intelligent systems. My technical stack includes the following:
Software Experience
I have extensive experience working with advanced sensor platforms and hardware critical for for autonomous driving and robotics applications including different LiDARs, cameras, and edge computing platform. My work has focused on the following key hardware components for autonomous systems:
Hardware Experience
# 1: Year 2025
Distance-Adaptive Sensor Fusion to Enhance 2D–3D Object Localization for Perception in Autonomous Driving Systems
Type: Conference Article
** Accepted in:
ICRCV - International Conference on Robotics and Computer Vision
Abstract: Sensor fusion plays a pivotal role in the perception of autonomous robots and self-driving vehicles, enabling accurate object localization by combining the rich visual cues from cameras with the precise depth information from LiDAR. However, traditional fusion methods, namely early and late fusion, struggle with noise amplification, weak spatial alignment, and limited adaptability across varying distances. We propose a distance-adaptive fusion strategy that integrates 2D and 3D bounding boxes using a weighted averaging mechanism based on object range to address these limitations. This method improves both spatial precision and detection reliability by balancing the visual and depth data contributions according to their relative strengths over distance. Evaluation on the KITTI dataset shows that Our method achieves: (1) 68% lower short-range error (2.8-23.7%) vs early fusion, (2) 0% mid-range error without late fusion’s 25% fragmentation, and (3) 1.8% long-range error where early fusion fails. With consistent 2.8-12.3% errors (48-85% better than early fusion’s worst cases) and 33% higher recall, it outperforms both approaches.
# 2: Year 2023
Edge Deployment of Vision-based Model for Human-Following Robot
Type: Conference Article
Published in:
ICCAS - International Conference on Control, Automation & Systems
Abstract: Mobile robots are proliferating at a significant pace and the continuous interaction between humans and robots opens the doors to facilitate our daily life activities. Following the target person with the robot is an important human-robot interaction (HRI) task that leads to its applications in industrial, domestic, and medical assistant robots. To implement the robotic tasks, traditional solutions rely on cloud servers that cause significant communication overhead due to data offloading. In our work, we overcome this potential issue of cloud-based solutions, by implementing the task of a hum-following robot (HFR) at the Nvidia Jetson Xavier NX edge platform. To perform the HFR task, typical approaches track the target person only from behind. While, our work allows the robot to track the person from behind, front, and side views (left & right). In this article, we combine the latest advances of deep learning and metric learning by presenting two trackers: Single Person Head Detection-based Tracking (SPHDT) model and Single Person full-Body Detection-based Tracking (SPBDT) model. For both models, we leverage a deep learning-based single object detector called MobileNetSSD with a metric learning-based re-identification model, DaSiamRPN. We perform the qualitative analysis considering six major environmental factors: pose change, illumination variations, partial occlusion, full occlusion, wall corner, and different viewing angles. Based on the better performance of SPBDT, compared to SPHDT in the experimental results, we select SPBDT model for the robot to track the target. We also use this vision model to provide the relative position, location, distance, and angle of the target person to control the robot’s movement for performing the human-following task.
# 3: Year 2023
SPT: Single Pedestrian Tracking Framework with Re-Identification-Based Learning Using the Siamese Model
Type: Journal Article
Published in:
Sensors
Abstract: Pedestrian tracking is a challenging task in the area of visual object tracking research and it is a vital component of various vision-based applications such as surveillance systems, human following robots, and autonomous vehicles. In this paper, we proposed a single pedestrian tracking (SPT) framework for identifying each instance of a person across all video frames through a trackingby-detection paradigm that combines deep learning and metric learning-based approaches. The SPT framework comprises three main modules: detection, re-identification, and tracking. Our contribution is a significant improvement in the results by designing two compact metric learningbased models using Siamese architecture in the pedestrian re-identification module and combining one of the most robust re-identification models for data associated with the pedestrian detector in the tracking module. We carried out several analyses to evaluate the performance of our SPT framework for single pedestrian tracking in the videos. The results of the re-identification module validate that our two proposed re-identification models surpass existing state-of-the-art models with increased accuracies of 79.2% and 83.9% on the large dataset and 92% and 96% on the small dataset. Moreover, the proposed SPT tracker, along with six state-of-the-art (SOTA) tracking models, has been tested on various indoor and outdoor video sequences. A qualitative analysis considering six major environmental factors verifies the effectiveness of our SPT tracker under illumination changes, appearance variations due to pose changes, changes in target position, and partial occlusions. In addition, quantitative analysis based on experimental results also demonstrates that our proposed SPT tracker outperforms the GOTURN, CSRT, KCF, and SiamFC trackers with a success rate of 79.7% while beating the DiamSiamRPN, SiamFC, CSRT, GOTURN, and SiamMask trackers with an average of 18 tracking frames per second.
# 4: Year 2022
Edge Deployment Framework of GuardBot for Optimized Face Mask Recognition With Real-Time Inference Using Deep Learning
Type: Journal Article
Published in:
IEEE Access
Abstract: Deep learning based models on the edge devices have received considerable attention as a promising means to handle a variety of AI applications. However, deploying the deep learning models in the production environment with efficient inference on the edge devices is still a challenging task due to computation and memory constraints. This paper proposes a framework for the service robot named GuardBot powered by Jetson Xavier NX and presents a real-world case study of deploying the optimized face mask recognition application with real-time inference on the edge device. It assists the robot to detect whether people are wearing a mask to guard against COVID-19 and gives a polite voice reminder to wear the mask. Our framework contains dual-stage architecture based on convolutional neural networks with three main modules that employ (1) MTCNN for face detection, (2) our proposed CNN model and seven transfer learning based custom models which are Inception-v3, VGG16, denseNet121, resNet50, NASNetMobile, XceptionNet, MobileNet-v2 for face mask classification, (3) TensorRT for optimization of all the models to speedup inference on the Jetson Xavier NX. Our study carries out several analysis based on the models’ performance in terms of their frames per second, execution time and images per second. It also evaluates the accuracy, precision, recall & F1-score and makes the comparison of all models before and after optimization with a main focus on high throughput and low latency. Finally, the framework is deployed on a mobile robot to perform experiments in both outdoor and multi-floor indoor environments with patrolling and non-patrolling modes. Compared to other state-of-the-art models, our proposed CNN model for face mask recognition based on the classification obtains 94.5%, 95.9% and 94.28% accuracy on training, validation and testing datasets respectively which is better than MobileNet-v2, Xception and InceptionNet-v3 while it achieves highest throughput and lowest latency than all other models after optimization at different precision levels.
# 5: Year 2022
Qualitative Analysis of Single Object and Multi Object Tracking Models
Type: Conference Article
Published in:
ICCAS - International Conference on Control, Automation & Systems
Abstract: Tracking the object(s) of interest in the real world is one of the most salient research areas that has gained widespread attention due to its applications. Although different approaches based on traditional machine learning and modern deep learning have been proposed to tackle the single and multi-object tracking problems, these tasks are still challenging to perform. In our work, we conduct a comparative analysis of eleven object trackers to determine the most robust single object tracker (SOT) and multi-object tracker (MOT). The main contributions of our work are (1) employing nine pre-trained tracking algorithms to carry out the analysis for SOT that include: SiamMask, GOTURN, BOOSTING, MIL, KCF, TLD, MedianFlow, MOSSE, CSRT; (2) investigating MOT by integrating object detection models with object trackers using YOLOv4 combined with DeepSort, and CenterNet coupled with SORT; (3) creating our own testing videos dataset to perform experiments; (4) performing the qualitative analysis based on the visual representation of results by considering nine significant factors that are appearance and illumination variations, speed, accuracy, scale, partial and full-occlusion, report failure, and fast motion. Experimental results demonstrate that SiamMask tracker overcomes most of the environmental challenges for SOT while YOLOv+DeepSort tracker obtains good performance for MOT. However, these trackers are not robust enough to handle full occlusion in real-world scenarios and there is always a trade-off between tracking accuracy and speed.
# 6: Year 2021
Ontology-Based Knowledge Representation in Robotic Systems: A Survey Oriented toward Applications
Type: Journal Article
Published in:
Applied Sciences
Abstract: Knowledge representation in autonomous robots with social roles has steadily gained importance through their supportive task assistance in domestic, hospital, and industrial activities. For active assistance, these robots must process semantic knowledge to perform the task more efficiently. In this context, ontology-based knowledge representation and reasoning (KR & R) techniques appear as a powerful tool and provide sophisticated domain knowledge for processing complex robotic tasks in a real-world environment. In this article, we surveyed ontology-based semantic representation unified into the current state of robotic knowledge base systems, with our aim being three-fold: (i) to present the recent developments in ontology-based knowledge representation systems that have led to the effective solutions of real-world robotic applications; (ii) to review the selected knowledge-based systems in seven dimensions: application, idea, development tools, architecture, ontology scope, reasoning scope, and limitations; (iii) to pin-down lessons learned from the review of existing knowledge-based systems for designing better solutions and delineating research limitations that might be addressed in future studies. This survey article concludes with a discussion of future research challenges that can serve as a guide to those who are interested in working on the ontology-based semantic knowledge representation systems for autonomous robots.
# 7: Year 2021
Ontology-based Knowledge Representation for Cognitive Robotic Systems: A Review
Type: Conference Article
Published in:
ICROS - Control, Robots, and Systems Society Conference
Abstract: Ontology-based knowledge representation endows autonomous robots with cognitive skills that are required to perform actions in compliance to goals. In this paper, we will review five knowledge base systems that represent the knowledge using ontologies and enable the robots to model the semantic information to perform variety of tasks in domestic, hospital and industrial environments. We also highlight the research gaps by discussing the limitationsthat might be addressed in future and conclude our review with brief discussion. This review is intended to show recent developmentsfor motivating those who are interested to work in this area.
# 8: Year 2021
3D Recognition Based on Sensor Modalities for Robotic Systems: A Survey
Type: Journal Article
Published in:
Sensors
Abstract: 3D visual recognition is a prerequisite for most autonomous robotic systems operating in the real world. It empowers robots to perform a variety of tasks, such as tracking, understanding the environment, and human–robot interaction. Autonomous robots equipped with 3D recognition capability can better perform their social roles through supportive task assistance in professional jobs and effective domestic services. For active assistance, social robots must recognize their surroundings, including objects and places to perform the task more efficiently. This article first highlights the value-centric role of social robots in society by presenting recently developed robots and describes their main features. Instigated by the recognition capability of social robots, we present the analysis of data representation methods based on sensor modalities for 3D object and place recognition using deep learning models. In this direction, we delineate the research gaps that need to be addressed, summarize 3D recognition datasets, and present performance comparisons. Finally, a discussion of future research directions concludes the article. This survey is intended to show how recent developments in 3D visual recognition based on sensor modalities using deep-learning-based approaches can lay the groundwork to inspire further research and serves as a guide to those who are interested in vision-based robotics applications.
# 9: Year 2021
Performance Evaluation of YOLOv3 and YOLOv4 Detectors on Elevator Button Dataset for Mobile Robot
Type: Conference Article
Published in:
ICCAS - International Conference on Control, Automation & Systems
Abstract: The performance evaluation of an AI network model is the important part for building an effective solution before its deployment in real-world on the robot. In our study, we have implemented YOLOv3-tiny and YOLOv4-tiny darknet based frameworks for performance evaluation of the elevator button recognition task and tested both variants on image and video datasets. The objective of our study is two-fold: First, to overcome the limitation of elevator buttons dataset by creating new dataset and increasing its quantity without compromising the quality; Second, to provide a comparative analysis through experimental results and the performance evaluation of both detectors using four machine learning metrics. The purpose of our work is to assist the researchers and developers in decision making of suitable detector selection for deployment in the elevator robot towards button recognition application. The results show that YOLOv4-tiny outperforms YOLOv3-tiny with an overall accuracy of 98.60% compared to 97.91% at 0.5 IoU.
# 10: Year 2020
Autonomous navigation framework for intelligent robots based on a semantic environment modeling
Type: Journal Article
Published in:
Applied Sciences
Abstract: Humans have an innate ability of environment modeling, perception, and planning while simultaneously performing tasks. However, it is still a challenging problem in the study of robotic cognition. We address this issue by proposing a neuro-inspired cognitive navigation framework, which is composed of three major components: semantic modeling framework (SMF), semantic information processing (SIP) module, and semantic autonomous navigation (SAN) module to enable the robot to perform cognitive tasks. The SMF creates an environment database using Triplet Ontological Semantic Model (TOSM) and builds semantic models of the environment. The environment maps from these semantic models are generated in an on-demand database and downloaded in SIP and SAN modules when required to by the robot. The SIP module contains active environment perception components for recognition and localization. It also feeds relevant perception information to behavior planner for safely performing the task. The SAN module uses a behavior planner that is connected with a knowledge base and behavior database for querying during action planning and execution. The main contributions of our work are the development of the TOSM, integration of SMF, SIP, and SAN modules in one single framework, and interaction between these components based on the findings of cognitive science. We deploy our cognitive navigation framework on a mobile robot platform, considering implicit and explicit constraints for autonomous robot navigation in a real-world environment. The robotic experiments demonstrate the validity of our proposed framework.
# 11 Year 2019
Comparison of Object Recognition Approaches using Traditional Machine Vision and Modern Deep Learning Techniques for Mobile Robot
Type: Conference Article
Published in:
ICCAS - International Conference on Control, Automation & Systems
Abstract: In this paper, we consider the problem of object recognition for a mobile robot in an indoor environment using two different vision approaches. Our first approach uses HOG descriptor with SVM classifier as traditional machine vision model while the second approach uses Tiny-YOLOv3 as modern deep learning model. The purpose of this study is to gain intuitive insight of both approaches for understanding the principles behind these techniques through their practical implementation in real world. We train both approaches with our own dataset for doors. The proposed work is assessed through the real-world implementation of both approaches using mobile robot with Zed camera in real world indoor environment and the robustness has been evaluated by comparing and analyzing the experimental results of both models on same dataset.
# 12 Year 2019
A Novel Semantic SLAM Framework for Humanlike High-Level Interaction and Planning in Global Environment
Type: Workshop Article
Published in:
IROS: SDMM19
Abstract: In this paper, we propose a novel semantic SLAM framework based on human cognitive skills and capabilities that endow the robot with high level interaction and planning in real-world dynamic environment. Two-fold strengths of our framework aims at contributing: 1) A semantic map resulting from the integration of SLAM with the Triplet Ontological Semantic Model (TOSM); 2) Human-like robotic perception system that is optimal and biologically plausible for place and object recognition in dynamic environment proposing semantic descriptor and CNN .We demonstrate the effectiveness of our proposed framework using mobile robot with Zed camera (3D sensor) and a laser range finder (2D sensor) in real-world indoor environment. Experimental results demonstrate the practical merit of our proposed framework.
Google Scholar: Sumaira Manzoor - Google Scholar
Since 2024, I have been writing blogs in my leisure time, sharing insights on advanced technologies I work with, focusing on 3D perception, calibration, and camera-LiDAR-based sensor fusion using various LiDAR and camera systems for ADAS (Advanced Driver Assistance Systems) in autonomous driving.
(My upcoming blogs will focus on SLAM and will be posted shortly.)
8 May, 2025
(updated blog will available soon)
June 1, 2025
February 9, 2025
March 15, 2025
January 2, 2025
January 3, 2025
(updated blog will available soon)
November 2, 2024
(updated blog will available soon)
December 5, 2024
From August 28, 2017, to August 26, 2019, I was awarded the STEM Scholarship at Sungkyunkwan University, which covered my semester fees for two years during my PhD coursework. This scholarship recognized academic excellence and supported my advanced studies in STEM fields.
Robot Task/ Motion Planning (ROSPlan)
Tranining and Research
I completed Korean Language – Level 1, through a program conducted by the "Korean Immigration and Social Integration Program", from September 10, 2020, to November 26, 2020, gaining foundational proficiency in Korean language and cultural integration.
In June 2025, I was granted permanent residency in Korea, officially recognizing my long-term commitment to living and working in the country.
This status allows me full eligibility for professional opportunities and reflects my dedication to establishing a stable, long-term career in Korea.