[IEEE2019/PaperSummary] LDLS: 3D Object Segmentation through Label
Diffusion from 2D Images

  1. Applying an off-the-shelf object segmentation algorithm (Mask-RCNN [1]) to the 2D image in order to detect object classes and instances at the pixel-by-pixel level.
  2. Constructing a graph by connecting 2D pixels to 3D lidar points according to their 2D projected locations,as well as connecting lidar points that neighbor one another in 3D space.
  3. Using label diffusion method [2] to propagate 2D segmentation labels through this graph, thereby labeling the 3D lidar points
  1. Deep Learning on Point Clouds : PointNet [3]defines a network architecture that operates directly on unstructured point clouds and extracts features that are invariant to point re-ordering, capturing both local and global point cloud information.Other methods extend convolutional neural networks to point clouds. Since 3D points lack the grid structure of images. (a)one approach is to arrange the points into a 3D voxel grid and perform 3D convolution (b) panoramic projection (c) bird’s-eye view .
  2. Graphical Models and 2D-3D Fusion : Wang et al. [4] propose a semantic segmentation method for image-aligned 3D point clouds by retrieving referenced labeled images of similar appearances and then propagating
    their labels to the 3D points using a graphical model. Zhang et al. [5] train a neural network for 2D semantic segmentation, then project
    onto dense 3D data from a long-range laser scanner.
Fig. 1. The full segmentation pipeline, from the input point cloud and image
to the final lidar point cloud segmentation
  1. Object instance segmentation in 3D point clouds are formulated by considering all the input lidar points xi from i=1 to Npoints .
  2. Similarly we consider the all the input pixels in an image as pi from i=1 to Npixels.
  3. Semi-supervised learning assumes that a set of data points is available, of
    which a subset of points is labeled, so we use Graph based semi supervised learning to annotate the remaining points by definining connections between data points and then diffusing labels along these connections.
  4. To construct a framework to lidar point cloud segmentation the author constructs a graph by drawing connections from 2D pixels to 3D lidar points, as well as among the 3D points.The 2D pixels are labeled according to results from 2D object segmentation of the RGB image and the graph is then used to diffuse labels onto the 3D points, which are all initially unlabeled.
  1. Two types of nodes 2D image pixels and 3D lidar points
  2. Two types of connections between nodes from a 2D pixel to a 3D
    point, and between two 3D points.
Eq1: Graph martix of 2D-3D
Eq2: Graph matrix for 3D-3D
Eq3: Label diffusion graph matrix
Eq4:Instance Label vector
Eq5: Iterative diffusion Label
Eq6: Likelihood after convergence
Eq7: Outlier removal equation
Fig2: LDLS algorithm
Table1 : COMPARISON OF SEMANTIC SEGMENTATION ACCURACY
Table2 :INSTANCE SEGMENTATION PERFORMANCE ON MANUAL ANNOTATIONS.
Fig3:Effect of range on semantic and instance segmentation precision and
recall. Instance segmentation metrics use IoU = :70.
Fig4:Scatter plot showing object range versus segmentation IoU. Each point
is a pedestrian or car instance. Zero IoU points indicate false negatives
Table3: Ablation Study
  1. Direct projection labeling Lidar points are naively labeled, without graph diffusion, based on whether they project to within a 2D segmentation mask in the image.
  2. Diffusion without outlier removal The full pipeline is executed, except for the final outlier removal step.
Fig. 5. Qualitative results from running LDLS on the KITTI Drive 91 sequences, as well as on data collected on the Cornell campus using a mobile robot
  1. The VLP-16 sensor outputs only 16 laser scan lines, as opposed to the 64-scan lidar used in KITTI, making the lidar point clouds sparser and more difficult to segment, especially at further ranges.
  2. Errors from sensor calibration and time synchronization
    were higher on the Jackal, compared to the KITTI data set.
  1. LDLS is simple projection based method for annotation of large point cloud data
  2. Python implementation averages approximately .38 seconds per frame on an Nvidia GTX 1080 Ti, excluding the computation of Mask-RCNN[1] results.
  3. MaskRCNN model can be replaced with higher accuracy models of 2D segmentation.
  4. LDLS method is scalable to different class objects.
  1. High latency due to two modules (a) 2D segmentation using pre-trained model (b)semi supervised learning which makes it not suitable for real time scenarios like self driving cars.
  2. Accuracy is highly dependent on the density of lidar points in point cloud captured using sensor.
  3. LDLS point cloud segmentation accuracy decreases as distance range increases .
  4. Semi supervised graph method is linearly dependent on the number of points of the class object.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store