Annotation Techniques for Diverse Autonomous Driving Sensor Streams
By Umang Dayal
August 21, 2024
Autonomous driving requires large quantities of sensory data, fueling the development of accurate and capable sensors. These sensors can be categorized depending on their sensing modalities, such as cameras, lidars, radars, ultrasonics, or microphones, and by their position in the car, such as in-car, on-vehicle, or external sensors.
Not all vehicles carry all sensor types, and the choice of what sensor to employ is influenced by operating conditions (e.g., inside cities, on highways) and technical and economic constraints.
Despite their differences, rendering sensor data intelligible to an autonomous driving agent, for example by annotating them with geometric shapes. These include bounding boxes, critical in the development of sensor-independent models for a multitude of tasks like object detection, semantic segmentation, optical flow, and pose estimation.
Common to all sensors is also the need to record the vehicle's own state and position relative to the environment, whether for situational awareness, adaptive speed control, or navigation. Recording these signals often also requires a means to combine and synchronize the autonomous driving sensor streams.
Types of Autonomous Driving Sensors
When discussing annotation techniques for autonomous cars, it is crucial to mention the different types of sensors used in these vehicles. The data from these sensors is likely to prove useful in different ways and may have unique annotation techniques. Autonomous driving cars use various types of sensors: Lidar, Radar, and camera.
LiDAR
LiDAR sensors measure the distance to objects based on the travel time of laser signals, and the scan range of most LiDAR configurations includes 360° of horizontal field of view, 30° to 40° (and up to 45°) of vertical field of view, and can reach ranges of up to 300m.
LiDAR point clouds, consisting of coordinate data and the reflectance or intensity signal for every measured point, are typically used as a backbone for obstacle detection, feature extraction, and most mapping techniques. A notable disadvantage of LiDAR is the distribution of the point cloud over the sensor's 3D range field, which follows a specific scan pattern.
One revolution in horizontal and vertical space produces a complete scan with a certain number of layers, but the limited measurement rate of each individual sensor leads to a low number of points per scan layer.
Three main challenges exist when working with LiDAR data.
The reflectance measured with LiDARs is a function of the 3D object’s surface and the light intensity, an abrupt change in the 3D geometry (in areas such as vertical structures, car corners, and tackles) can cause low-intensity in-plane measurements, making these spots tougher to distinguish than off-plane objects in fog or dust.
The high density of information is due to multiple measurements of planar and point structures that LiDAR devices can provide.
3D object boundaries in the point clouds are often more evident than within them, which is why center point offsets are used. Unfortunately, low-point density objects may have problems with the topological analysis, which will lead to ambiguity during the annotation processes.
Radar
Autonomous cars often utilize radar sensors to enable important key features, such as blind spot monitoring, cross-traffic alert, or adaptive cruise control. Its technical concept works fundamentally in real-world scenarios, no matter if it is dark or foggy, independent of other road users, and unaffected by environmental conditions. These features are currently in full industrial utilization.
The most important attribute of autonomous driving is a competent system that navigates complex, crowded urban settings. Annotating massive radar sensor data to document user preferences can amount to significant manpower efforts and serve to understand industrial development decisions.
Currently, only radar sensor rotation poses lead to the available and significantly increased degradation of sensor-specific low-level (box-long) result prediction. Available long-term radar annotations depend on the utilization of radar reflections caused by the environment's nearby objects.
Camera
Cameras are vital sensors for autonomous driving applications, and many autonomous driving datasets consist of or contain camera data. Images are also a critical part of many annotation pipelines. Digital cameras capture color images that have a range of spatial resolutions and influence the speed at which the image data can be processed.
The majority of camera sensors in autonomous driving applications capture visible light. This creates the possibility of using a common sense and object recognition model that is already trained on visible light images. There is also significant literature on increasing the capability of cameras in different conditions and scenes. Additionally, there are reduced sensor requirements for LiDAR and radar, which makes the camera an attractive sensor choice in some applications.
There have been several advances in the automated annotation of camera images for autonomous vehicles. Perspective boxes are popular annotation types for camera images, and images are often taken prior to the greatest distance of a lidar or radar sensor. This is because they can then be used to help with other sensors' depth estimation and data association.
The challenges of camera data for annotation include being closer to the ground (causing overlap), of the axis of the vehicle motion (causing relative motion articulation), trees and buildings that can make areas without visible labels, a lack of an explicit rotation signal, and a wide range of light conditions.
Challenges in Annotating Sensor Data
Annotating sensor data is an essential step in the development of intelligent systems. This step becomes particularly challenging in the context of autonomous driving for several reasons.
The scale at which data is collected in autonomous driving leads to a large volume that humans alone cannot fully process.
A wide variety of sensor inputs are involved and should be annotated. Their diversity covers inputs from cameras, Light Detection And Ranging (LiDAR) sensors, Global Navigation Satellite System (GNSS) modules, as well as environmental information such as semantic maps, road furniture, and events. Furthermore, while cameras are widely employed as peripheral sensors in autonomous cars, no limiting factor compels other sensor modalities to be used.
The content, in the form of objects and events, must be accurately annotated as it is classifiable and used as input for decision-making. Finally, these elements, especially 2-D bounding boxes for object detection, characterize a 3-D world mapped onto 2-D space and encapsulate sensor noise, a characteristic of sensory data.
Problems mainly arise from variations in the collected data. Some categories in the provided MOD are inadequate, especially defects (errors and omissions) that incur quality costs such as missed obstacles, and crashes, making the data useless for supervised learning.
Sensor input data is often noisy or extremely time-consuming to overwrite during annotation review. Due to the inherent diversity of sensor data, some annotations are not even possible. Overall, missing, mislabeled, or low-quality annotations generate non-representative data that bias model training and degrade the model’s performance.
Thus, it is important to improve annotation quality and assess annotation consistency thoroughly before using a different annotation system to implement and test semi-automated annotation approaches.
Traditional Annotation Methods
There are a number of traditional road car sensor streams like structured data such as internal state variables, structured outputs from image processing pipelines, etc., with typically annotation-based approaches to generating them. The manual annotation is not usually feasible with respect to computational cost and/or the inconsistent agreement between different annotators.
Rule-based systems that can infer these structured outputs directly from raw sensor data are not available. Vision-based internal function estimation is still an open issue, despite significant attempts in computer graphics, computer vision, and machine learning.
The level of annotation can be segregated into manual, semi-automatic, automatic, and hyper-automatic. The last term refers to a version of the automatic annotation by which unsupervised learning methods are able to annotate the video according to the same criteria used by humans.
Many video perception systems require a clear description of the experimental conditions (targets, perspectives, brightness) and a clear instruction set for the annotation task. It is known that the performance of automatic methods strongly depends on the level of detail needed for the description, with a forward dependency from the confidence threshold required to successfully assess the answers to the annotation.
As an example, a simple annotation of car equipment is easily carried out by end-users by only considering the evidence of a boundary of the windshield in the captured images. After this step, more specific driving lanes can be clearly defined as identified homogeneous-color pixels, leading to a form of initial segmentation.
Advanced Annotation Techniques
Even with the latest annotation technology, many companies are either using or experimenting with various state-of-the-art annotation tools that can provide a more precise way of solving specified autonomy requirements. These tools employ clever machine learning or computer vision algorithms to achieve difficult annotations. It’s one of the primary pioneers in using this technology to support advanced ADAS and automated vehicle programs.
Today, these tools are built into a large-scale machine learning platform that enables an end-to-end ML training pipeline with advanced support for annotating vast datasets of many different sensor types and ADAS features.
Read More: How Image Segmentation and AI is Revolutionizing Traffic Management
Conclusion
The development of autonomous driving cars has proved to be highly complicated, partly due to the closed-loop system where perception and reasoning abilities rely on a high-level understanding of vehicle motion and complex surrounding scenes. More importantly, robust autonomous driving algorithms with different sensors actually rely on high-quality, large-scale, sensor-specific annotation datasets. In summary, the techniques for annotating different sensor data from autonomous vehicles would be an essential development for future self-driving vehicle use cases.
Object detection and instance segmentation are among the most popular computer vision tasks and form the basis for many others, representing a fundamental step of semantic scene understanding.
As one of the leading data annotation companies, we provide comprehensive data annotation solutions for diverse autonomous driving sensor streams.