Sensor Fusion and Object Tracking

The problem of Multi-Object Tracking (MOT) in the setting of autonomous driving is a task for multiple sub-systems which have to perform simultaneously. Since the autonomous vehicle itself is in motion, EGO motion has to be compensated in order for tracking to be done in absolute coordinates. Objects that come from all possible directions pose collision threat and should be detected, regardless of occlusion, light or weather conditions, which puts stress on the sensor setup and calibration. Lastly, detected objects might be static, moving toward or away from the EGO vehicle, therefore the estimation of relative velocities is critical for accurate path planning. Based on the object's class and size, an appropriate kinematic model should be applied in order to explain the sensor measurements.

This page showcases research results in Camera-LiDAR and Camera-RADAR multi-object tracking for applications in autnomous vehicle.

Tracking by fusion of Camera and LiDAR

Example use case scenario: An object of large volume is detected in the current LiDAR point cloud. It's 3D projection creates a "footprint" or ROI in the RGB camera image which is then classified by a very accurate convolutional neural network. The CNN determines that the label of the object is "Truck". In the meantime, the vehicle odometry estimated the EGO motion and relative distance of this detected object. The 3D position, volume, appearance and label are then injected into a Multi-Object Multi-Class tracker. Based on an appropriate set of motion parameters, the tracker assigns this detection to a tracklet and predicts the most probable position of the object in the future.
The following demo video shows the system operation using real-world data (KITTI sequence 13). On the left is a 3D plot showing pedestrians with unique colors and motion vectors and on the right, a visualization of the raw object detector, and the projection of tracklets in the RGB image.

For more details please refer to the open access paper: "Behavioral Pedestrian Tracking Using a Camera and LiDAR Sensors on a Moving Vehicle" DOI link

At the time of submission, our multi-object tracker scored the highest tracking score (MOTA) of all publicly availabe methods in the KITTI pedestrian tracking benchmark.

Tracking by cooperative Camera and RADAR fusion

Ranging by RADAR has proven effective in highway environments, however people tracking remains beyond the capability of single sensor systems. In this work, I showcase a cooperative RADAR-camera fusion method for people tracking on the ground plane. Using average person height, joint detection likelihood is calculated by projecting detections from the RADAR onto the camera frame. Peaks in the joint likelihood, representing candidate targets, are fed into a Particle Filter tracker. Depending on the association outcome, particles are updated using the associated detections (Tracking by Detection), or by sampling the raw likelihood itself (Tracking Before Detection). Utilizing the raw likelihood data has the advantage that lost targets are continuously tracked even if the camera or RADAR signal is below the detection threshold. A radar-to-camera cooperative feedback loop si employed allowing improved camera object detection in bad lighting situations. In single target, uncluttered environments, the proposed method entirely outperforms camera-only tracking. Experiments in a real-world urban environment also confirm that the cooperative fusion tracker produces significantly better estimates, even in difficult and ambiguous situations.
In the following demo we present our VRU tracking results on the image and ground plane in typical urban traffic scenes.

For more details please refer to the open access paper: "Cooperative Multi-Sensor Tracking of Vulnerable Road Users in the Presence of Missing Detections" DOI link

Scenario 1: Tracking of pedestrians and cyclists in various traffic situations through dense city center.



Scenario 2: Single person walking in and out of camera view, continuously tracked by the joint Camera-RADAR information.


Scenario 3: Night-time video surveilance, multiple people in a heavily cluttered RADAR environment.

Get in touch