One of the most fundamental problems in Computer Vision is the problem of visual correspondence. The visual correspondence problem is the basis of stereo and motion. Given two images of the same scene, a pixel in one image corresponds to a pixel in the other if both pixels are projections along lines of sight of the same physical scene element. The problem is to determine this correspondence between pixels of two images.
An input for a stereo vision algorithm is the "left" and the "right" images obtained from two cameras (like two eyes) pointed to the same real scene. Relative positions of objects in these images appear slightly shifted. The objects that are closer to the cameras will have larger shifts ("disparities"). The problem is to find for each pixel in the "left" image the corresponding pixel in the "right" image. Equivalently, a stereo vision algorithm has to detect the disparity (shift or depth) for each pixel in the "left" (or "right") image.
The table below contains the data generated by several stereo vision algorithms for some benchmark real scenes. To describe each scene we show only the "left" image (b/w image at the top of each column) since the difference between the "left" and the "right" images is usually very small. (Note that a human vision can reasonably assess the relative depth of various elements of the scene even from a single image.) An output for each stereo vision algorithm is represented as an image where pixels with a larger determined value of disparity (closer to a viewer) are marked with a "warmer" color.
Head Scene | Tree Scene | Meter Scene | Shrub Scene | |
---|---|---|---|---|
Boykov Veksler Zabih [BVZ1] |
||||
Normalized correlation |
The "Head" scene is a courtesy of the Computer Vision Lab at the University of Tsukuba (Japan). Other scenes are from the Vision Center at Carnegie Mellon University.