Computer vision project: overlaying 3D reconstruction from webcam on the original scene

I’ve been taking a few graduate courses at Université de Montréal in the past two years, but the last one, the computer vision course, was by far the one with the most “showable” (mighty Google says that’s a word) projects. I’ll be posting later on the first project, a basic image stitching app. For now, I’ll just leave this here:

It’s an example result from my term project.

In a nutshell, skipping the technical:

  • I take a couple of pictures with a camera for which I know some parameters, notably focal length;
  • In the scene there’s this augmented reality tag which helps me find out where the camera is in each pose;
  • I find corresponding points between images to create a (sparse) 3D point cloud of the object;
  • I then use MeshLab to get a mesh (I only have points, I need triangles) and then find some “good enough” textures for the triangles;
  • I then use the 3D model that comes out of all this and overlay it on the original scene, so the model ends up being … well where it should be in the first place.

I wrote this using ARToolkit+ and OpenCV. Reprojecting onto the original image is done with OpenGL.

Some technical details for those who are into this kind of thing:

  • I pre-calibrate the camera model with OpenCV calibration routines and a “chessboard”;
  • Getting the camera position from the detected tag is simply decomposing an homography between the tag plane and its image, which happened to be something we did for another course work;
  • I find corresponding points with SURF, and filter correspondences with epipolar line constraints found from camera poses, among other filters;
  • I first find correspondences between image pairs, and then merge them correspondences from all pairs using an algorithm I cooked up;
  • Finding a texture for each triangle involves finding a “good” image to project the triangle onto, then building a huge map of the textures for OpenGL from the extracted bits of the original images.

I know the tracking’s pretty bad (I had to hack ARTookit+ which would return false positives at times) and the model is far from the reconstruction quality you’d get with less sparse techniques (another student in the course tackled reconstruction with structured light, for example, and had ~2 million points… I get ~2000), but hey, it (kinda) works and uses only a webcam (for the example shown above).

Maybe at some point I’ll get around to making the tracking smoother, at least. And to posting a more detailed demo of the app itself. It shows a bunch of fun stuff about the reconstruction process, like epipolar lines and camera positions for each pose.

Leave a comment