A Comparison of Image Data Representations for Local Stereo Matching

The stereo matching problem, while having been present for several decades, continues to be an active area of research. The goal of this research is to find correspondences between elements found in a set of stereoscopic images. With these pairings, it is possible to infer the distance of objects within a scene, relative to the observer. Advancements in this field have led to experimentations with various techniques, from graph-cut energy minimization to artificial neural networks. At the basis of these techniques is a cost function, which is used to evaluate the likelihood of a particular match between points in each image. While at its core, the cost is based on comparing the image pixel data; there is a general lack of consistency as to what image data representation to use. This paper presents an experimental analysis to compare the effectiveness of more common image data representations. The goal is to determine the effectiveness of these data representations to reduce the cost for the correct correspondence relative to other possible matches.




References:
[1] D. Marr, and T. Poggio. "A computational theory of human stereo vision." Proceedings of the Royal Society of London B: Biological Sciences 204.1156 (1979): 301-328.
[2] U. R. Dhond, and J. K. Aggarwal, "Structure from stereo-a review." IEEE transactions on systems, man, and cybernetics 19.6 (1989): 1489-1510.
[3] W. Grimson, “Computational Experiments with a Feature Based Stereo Algorithm”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.PAMI-7, No: 1, pp.17 - 34, Jan. 1985.
[4] A. Fusiello, E. Trucco, A. Verri, "Rectification with unconstrained stereo geometry." BMVC. 1997.
[5] D Scharstein, R Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms." International journal of computer vision 47.1-3 (2002): 7-42.
[6] Hirschmüller, Heiko, and Daniel Scharstein. "Evaluation of cost functions for stereo matching." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.
[7] H. Hirschmuller, "Stereo processing by semiglobal matching and mutual information." Pattern Analysis and Machine Intelligence, IEEE Transactions on 30.2 (2008): 328-341.
[8] Kim, Junhwan, Vladimir Kolmogorov, and Ramin Zabih. "Visual correspondence using energy minimization and mutual information." Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 2003.
[9] Zitnick, C. Lawrence, and Takeo Kanade. "A cooperative algorithm for stereo matching and occlusion detection." Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.7 (2000): 675-684.
[10] S. Birchfield, C. Tomasi. "Depth discontinuities by pixel-to-pixel stereo." International Journal of Computer Vision 35.3 (1999): 269-293.
[11] Žbontar, Jure, and Yann LeCun. "Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches." arXiv preprint arXiv:1510.05970 (2015).
[12] C. C. Pham, J. W. Jeon, "Domain transformation-based efficient cost aggregation for local stereo matching." Circuits and Systems for Video Technology, IEEE Transactions on 23.7 (2013): 1119-1130.
[13] Q. Yang, "A non-local cost aggregation method for stereo matching." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
[14] http://vision.middlebury.edu/stereo/data
[15] De-Maeztu, Leonardo, Arantxa Villanueva, and Rafael Cabeza. "Stereo matching using gradient similarity and locally adaptive support-weight." Pattern Recognition Letters 32.13 (2011): 1643-1651.