Vanishing Point Detection in Indoor Scenes Srinath Sridhar

Yu Xiang

Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor {srinaths, yuxiang}@umich.edu

Abstract Vanishing point detection has many useful applications such as camera calibration, autonomous vehicle navigation, 3D reconstruction, object recognition and so on. In this project, we have explored two state-of-the-art vanishing point detection algorithms proposed by Rother [1] and Tardif [2]. The algorithms have been implemented from the bottom up and they have been used to detect three orthogonal vanishing points in indoor scenes. We have also compared the two algorithms through both theoretical analysis and extensive experiments. Finally, we have proposed a hybrid method which combines both the algorithms for increased efficiency. The hybrid method is capable of detecting three orthogonal vanishing points efficiently without calibrating the camera. Results of our implementation and comparison of the methods are shown. We have also explored the possibility of using the three orthogonal vanishing points for camera calibration and finding the pose of objects in the scene.

1

Introduction

In projective formation of images, two parallel lines in the 3D world intersect at a single point in the image plane, which is called the vanishing point. In Figure 1, the parallel railway tracks intersect at a single point in the image, which is the vanishing point of the tracks. Vanishing points encode much information about both the camera and the real world. As a result, vanishing point detection is an important problem in computer vision with applications in camera calibration, autonomous vehicle navigation, 3D reconstruction, object recognition and so on. Vanishing points are the result of the transformation that projects 3D points onto the 2D image plane. This projective transformation does not preserve parallelism and hence vanishing points are formed. The equation of the vanishing point, assuming perfect projection, is given by v = Kd, where v is the coordinate of vanishing point in the image plane, K is the camera internals matrix and d is the direction of the line in the real world [3]. There are numerous algorithms for vanishing point detection, each targeted at a specific application. For instance, the method proposed by Rother [1] is targeted at building reconstruction and uses a computationally intensive approach. Other algorithms like the one proposed by Tardif [2] are computationally less intensive. Thus the choice of algorithm depends largely on the application in mind. Some of the factors considered while choosing an algorithm include robustness, accuracy, computational efficiency and optimization technique used. In this project we demonstrate the detection of vanishing points in uncalibrated images of indoor scenes using two state-of-the-art algorithms. The case of images from uncalibrated cameras 1

(a) The railway tracks appear to converge to a point

(b) The convergence point is called the vanishing point (red dot)

Figure 1: Illustration of a vanishing point is taken because in many situations calibration information is not available. Only indoor scenes are considered because, in general, indoor scenes are dominated by parallel lines (e.g.: roof, wall, tables, etc.) which provide much information for vanishing point detection. We have implemented the two algorithms from the bottom up and we have compared their results. We have also combined the best features of the two algorithms into a hybrid method. Finally, we have explored the possibility of using the vanishing points detected by this hybrid method to estimate the pose of objects in an indoor scene.

2

Review of Previous Work

Vanishing point detection in images has been an active area of research in computer vision for quite a while [4; 5] with one of the oldest papers being published over 25 years ago [6]. A majority of the vanishing point detection algorithms are divided into two steps viz. line detection and model estimation. In the line detection step edges in the image that correspond to straight lines are extracted. Line fitting is performed to detect only lines and discard other edges. An important problem here is that due to errors in the image projection (like lens distortion) lines in the real world may not be imaged as straight lines. In the model estimation step the detected lines are considered as a whole to estimate the vanishing points corresponding to parallel lines. This process is computationally intensive and several optimization techniques have been proposed for this. One approach is to map the unbounded image plane to a bounded space called the accumulator space and then to compute intersections on this space. An example of an accumulator space is the Gaussian sphere. The intersections of the detected lines are considered as candidate vanishing points. Other approaches do not employ an accumulator space but work on the detected lines directly and simultaneously estimate the vanishing points. Examples of the second include the method proposed by Kogecka and Zhang [4] that uses the Expectation Maximization (EM) algorithm and a non-iterative technique proposed by Tardif [2]. For this project two recent algorithms were implemented [1; 2], each employing a different technique in the model estimation step. Some parts of the algorithms were modified so as to suit our requirements. In the following sections the two algorithms are explained with notes where modifications were made.

2

3 3.1

Technical Details Rother’s Algorithm

The first algorithm implemented was proposed by Rother [1] to detect three orthogonal vanishing points in architectural environments. The algorithm consists of two steps viz. the accumulation step and the search step. Compared with methods which use a Gaussian sphere as the accumulation space, Rother’s algorithm uses an unbound image plane as the accumulation space. First, given an image, line segments are detected in the image using the method proposed by Kosecka and Zhang [7]. Then the intersection points, perhaps at infinity, of all pairs of non-collinear line segments are considered as potential vanishing points. A vote value is calculated for each vanishing point candidate based on the relationships between the line segments and the candidate. Finally, three orthogonal vanishing points with the maximal vote are selected from the vanishing point candidates in the search step. To calculate the vote of a potential vanishing point, we need to define a compatibility measure between a line segment and a vanishing point. We adopt the exponential voting scheme proposed in [8], which is more discriminative between good and bad vanishing point candidates than the original voting scheme used by Rother. For each detected line segment s in the image, we define its vote for a vanishing point p as γ v(s, p) = |s| × exp(− 2 ), (1) 2σ where |s| denotes the length of line segment s, γ is the angle between line segment s and the line connecting p and midpoint of s, and σ is a robustness threshold. Then we say that line segment s votes for vanishing point p if and only if γ < tγ , where tγ is a predefined constant. The vote of vanishing point p is given by X vote(p) = v(s, p). (2) s vote for p

Finally, the orthogonal vanishing point triplet candidates with the maximal sum of vote values is selected. Rother proposes three criteria to check the orthogonality of three vanishing point candidates viz. the orthogonal criterion, the camera criterion and the vanishing line criterion. If, and only if, the three criteria are satisfied by the three vanishing point candidates will they be considered in the search step.

3.2

Tardif ’s Algorithm

The second algorithm implemented was proposed by Tardif [2] for estimating vanishing points in man-made environments. As opposed to most algorithms, Tardif’s algorithm uses a non-iterative approach thus increasing the computational efficiency. It is based on a multiple model estimation technique called J-Linkage [9]. Tardif’s method consists of two steps as before viz. the accumulation step and the search step. Similar to Rother’s algorithm the accumulation takes place in the image plane. For line segment detection Tardif suggests a technique based on the Canny edge detector. But in our experiments we found that Kosecka’s method [7] worked much better. The line segments are then clustered together based on the J-Linkage algorithm which is explained in the following section. In the search step the vanishing point corresponding to each of the detected clusters is found using a least squares approach. The first three clusters with the most number of candidate line segments are chosen as the three dominant vanishing points.

3

3.2.1

J-Linkage

Tardif’s algorithm uses J-Linkage for clustering line segments that correspond to the same vanishing point. J-Linkage is very similar in principle to RANSAC. The difference here is that RANSAC is used when we want to fit data to one model whereas J-Linkage can fit data to multiple models. In the present case the models are the vanishing points. The first step in the agglomerative clustering is to choose M minimal sets of two line segments each that are assumed to correspond to M vanishing points. Then a preference matrix is created with number of rows equal to the line segments and M columns. This matrix stores the vote of every line segment to every other of the M models in a boolean form. Each row of the preference matrix is treated as a cluster from now. Next, a distance metric is used to find the distances between all the clusters. The distance metric used is the Jaccard distance, dj (A, B) which is given as dj (A, B) =

|A ∪ B| − |A ∩ B| , |A ∪ B|

(3)

where A and B are clusters. Two clusters with the minimum distance are merged together into a single cluster. This process is repeated until the the distance between all the clusters is 1. At the end of the clustering there are typically 3-7 clusters out of which the top three are chosen to find the dominant vanishing points.

3.3

The Hybrid Method

Although Tardif’s algorithm is much faster computationally than Rother’s, it has the drawback that orthogonal vanishing points are not detected, only dominant vanishing points are. Therefore we have devised a hybrid method that combines these two algorithms. In our approach, edges are detected as explained earlier. Subsequently, Tardif’s clustering technique is used to find the clusters corresponding to the dominant vanishing points. Finally, the three criteria given by Rother are applied on the dominant vanishing points to get the three most orthogonal. This hybrid method is thus able to detect the three orthogonal vanishing points without the need for camera calibration.

3.4

Using Vanishing Points

The three orthogonal vanishing points obtained from the above methods can be used for numerous purposes including estimating the camera matrix, K. Another application is to find the pose of objects in the scene. Figure 2 shows the line membership corresponding to the three orthogonal vanishing points for an image containing a common object like a chair. Once K and the rotation matrix R are found between the camera and the world reference frames, we can compute the pose of the chair. This has not been attempted for this project and is a possible future direction.

4 4.1

Experimental Results Vanishing Points

The results of Rother’s algorithm and Tardif’s algorithm from our implementation are shown in this section. Both the algorithms were used to detect three orthogonal vanishing points in indoor

4

Figure 2: Line membership for a common object like a chair

VP 1 VP 2 VP 3

Rother’s Algorithm 459.14 2687.44 -264.14 -33.06 1914.77 -19.57

Tardif ’s Algorithm 469.08 2740.82 -254.13 16.17 2033.93 -51.44

Table 1: Numerical coordinates of the three orthogonal vanishing points for the box image images. Various images of indoor scenes obtained from the WWW were used to test our implementation. Figure 3 displays the vanishing point detection results for two indoor images using Rother’s algorithm. The vertical, horizontal left and horizontal right vanishing points are displayed in distinct colors. In each image, the orthocenter of the triangle defined by three vanishing lines is the principal point. Figure 4 shows the same results for Tardif’s method. For a given image if the lines along a particular direction are almost parallel the vanishing points will be very far in the image plane. In such cases visualization using the triangle scheme is not possible. It is clear that, for the given images, Rother’s algorithm produces a principal point that is more accurate than Tardif’s algorithm. The numerical coordinates for the vanishing points are given in Table 1 for one of the images.

4.2

Line Membership

The line membership corresponding to the three orthogonal vanishing points are shown in Figure 5 for one test image. Figure 6 and 7 shows the line memberships for numerous other images tested using our implementation.

4.3

Execution Time

The execution time in seconds for images of different sizes is given in Table 2. This clearly shows that Tardif’s algorithm is better computationally than Rother’s.

5

Figure 3: Three orthogonal vanishing points detected using Rother’s algorithm. The orthocenter of the three vanishing points is the principal point of the image

Figure 4: Three orthogonal vanishing points detected using Tardif ’s algorithm. The orthocenter of the three vanishing points is the principal point of the image Image Size 640 × 480 1000 × 747 2592 × 1932

No. of Line Segments 66 153 261

Rother’s 0.56 22.7 –

Tardif ’s 0.06 0.26 1.01

Table 2: Execution time in seconds for different image sizes and line segment numbers 6

(a) Rother’s Algorithm

(b) Tardif’s Algorithm

Figure 5: Line membership for the three orthogonal vanishing points

4.4

Implementational Details

In order to implement and test the vanishing point detection algorithms, we have used C/C++ as the base programming language. The OpenCV library was used to implement common computer vision algorithms. MATLAB was used for visualization and testing purposes. All the code written along with additional information is available online1 . The choice of the above tools, with the exception of MATLAB, has been largely motivated by their availability under free and open source licenses. We also plan to submit our implementation of the algorithms for inclusion in the OpenCV library.

5

Conclusion

In this project, we have implemented Rother’s and Tardif’s vanishing point detection algorithms and have applied them to detecting three orthogonal vanishing points in indoor scenes. We have compared the two algorithms through both theoretical analysis and extensive experiments. Rother’s method can find three orthogonal vanishing points without calibration of the camera, but it is computationally expensive. Tardif’s method uses J-linkage algorithm for vanishing point detection which is efficient, but it requires calibration of the camera to find the three orthogonal vanishing points. So we have proposed a hybrid method by combining Rother’s and Tardif’s algorithms, which can detect three orthogonal vanishing points efficiently without calibration of the camera. As an extension to this project, we have explored the possibility of using the three orthogonal vanishing points to calibrate the camera, estimate relative orientation of the camera with respect to the scene and ultimately to find the pose of objects in the scene. 1

http://www.umich.edu/~srinaths/courses/442/project/

7

Figure 6: Line membership for the three orthogonal vanishing points using Rotherâ€™s Algorithm

Figure 7: Line membership for the three orthogonal vanishing points using Tardif â€™s Algorithm 8

References [1] C. Rother. A new approach to vanishing point detection in architectural environments. Image and Vision Computing, 20(9-10):647–655, 2002. [2] J.P. Tardif. Non-iterative approach for fast and accurate vanishing point detection. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1250–1257. IEEE, 2010. [3] R. Hartley and A. Zisserman. Multiple view geometry. Cambridge university press Cambridge, UK, 2000. [4] J. Kogecka and W. Zhang. Efficient computation of vanishing points. In Robotics and Automation, 2002. Proceedings. ICRA’02. IEEE International Conference on, volume 1, pages 223–228. IEEE, 2002. [5] J.M. Coughlan and A.L. Yuille. The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, pages 845–851, 2001. [6] MJ Magee and JK Aggarwal. Determining vanishing points from perspective images. Computer Vision, Graphics, and Image Processing, 26(2):256–267, 1984. [7] J. Kosecka and W. Zhang. Video compass. In Proceedings of the 7th European Conference on Computer Vision-Part IV, pages 476–490. Springer-Verlag, 2002. [8] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. In International Conference on Computer Vision (ICCV), 2009. [9] R. Toldo and A. Fusiello. Robust multiple structures estimation with J-linkage. Computer Vision–ECCV 2008, pages 537–547, 2008.

9