As I said in the previous post, the first assignment in that computer vision course I took was to write an image stitching program. The basic idea is to take a series of pictures by rotating around the point where the camera is located. Then you find how each “maps” to the others and “stitch” them into a single coherent panorama.
At first I thought this couldn’t make sense for a first foray into CV algorithms and/or would take a gargantuan amount of time. Turns out only the second part was right, but mostly because I had to learn to use Mathematica and program in OpenCV along the way. And the hardest part was already done for us by either of these tools, anyway: finding correspondences between pictures.
The basic principle, as I said, is to find corresponding points between pictures. To do that you need to identify “keypoints”, basically points that look unique yet can be identified in another picture to find matches, even if in the other picture they’re slightly different. There’s a whole world of research on how to do that, but in our case it boiled down to either using SIFT or SURF features in Mathematica and OpenCV. Then some “distance” is used to find points that match between images. It measures how much the region around keypoints differ between potential matching points.
Once you’ve got these matches between pictures, supposing for the moment they’re all correct, if the pictures were taken “right” you can find a transform matrix H (an homography) such that, if x is the coordinate of a point in the first image, Hx gives the coordinate of the point in the second.
So, skipping some details about weeding out bad matches, what I did was find an homography between pairs of consecutive images, and bring everything back into the coordinates of the central image. Starting from these, say:
I get this:
Cropping to keep only the central part:
Now I will concede my C program probably doesn’t have all the gizmos of other commercial options (notably if you look at the full size version above, you’ll see problems stemming from auto lighting on the camera), but hey, for a first assignment I was pretty impressed with the result. (And it turns out it was actually a pretty good first assignment, as those transformations and keypoint finding operations were fundamental for the rest of the course).
(As a last technical point: to filter out bad matches, we used RANSAC and other constraints on distances (in my case the first match must be much better than the second one).)