image warping and mosaicing
part a:
background
imagine stitching together photos seamlessly, like creating panoramas on your phone or the immersive scenes in vr headsets. this project dives into image mosaicing, where multiple images are registered, projectively warped, resampled, and composited into a single, cohesive mosaic. by computing homographies and warping images, i explored key techniques used in modern camera apps and augmented reality systems.
shooting the pictures
the pictures below were shot with my iphone camera with the camera itself being the center of projection; the camera acted as a pivot as i slowly turned my camera and snapped some photos. this is important for when we warp our images and stitch them together to minimize errors and artifacts. currently, these images are in its raw and original form where they will later on be projected onto the center image via the projection transformation matrix.
recover homographies
to recover the homographies of the images, we simply find the transformation p' = Hp
where H is a 3x3 matrix with 8 degrees of freedom. in my code, i use the function
H = computeH(im1_pts,im2_pts)
to compute matrix H where im1_pts and im2_pts are the corresponding (x, y) points of two images. the homography matrix can be solved by recovering the 8 parameters as seen by solving the following equation:
we can then expand this out to solve for the parameters a to h:
this solves for a pair of corresponding points. to solve for more corresponding points, we simply just stack the matrix on the left vertically.
warp the images
we can use the computeH()
function to find the homographies of different images. in our case we will project the left and right images onto the middle image. we can then later use these images and stitch them together to form an image mosaic! as seen in lecture, inverse warping produces better results as it doesn’t result in holes in the final image; we use this for our function
imwarped = warpImage(im,H)
it is important to note that after warping the points of one image to another, we might encounter some values in negative locations. to deal with this, we keep track of the displacement of the warped image from the original image and apply the translation dx and dy to the warped image. furthermore, the warped image may not necessarily be the same size as the original image which is why we first find the ‘boundaries’ of the warped corners and then create a new canvas with the newly obtained dimension of the image.
corresponding points
below is the visualization of the corresponding points for one of our soon-to-be mosaic
below are some results of the warped images:
tennis courts:
heyns reading room:
street:
image rectification
to test that our warp function is working so far, i perform rectification on some other images that i took and some from the web. i select corner points from the object in the image that i would like to rectify and since i know the ground plane rectification, i can select the corner points that will be rectified. below are some of the original images, the corresponding points, as well as the rectified (cropped) images.
art gallery (weitman gallery, washington university):
taking a peek at my roommate’s homework:
berkeley magazine on the coffee table:
now that we know our warpImage() function works, we can move on to creating our image mosaics of the photographs we took before!
blend the images into a mosaic
using our previously warped images (tennis courts, heyns reading room, and my street view), we can stitch those warped images together, perform some blending, and we will end up with a panorama!
below are the results of placing the warped (left and right) images beside the (unwarped) center image via translation onto a large canvas:
the results are pretty good! we can see that the images are aligned. but, there’s an issue. the edges where the separate images meet are very strong and we can tell that these are separate image put side-by-side, not a mosaic.
to fix this, we use what we did in the image blending project and use a level-2 laplacian pyramid to blend the seams together. i first blend the warped left image to the center, then the warped right image to the center, and then blend both of those blended images together. here are the level-2 laplacian pyramid masks i used for the heyns panoramas:
the final results are here:
you can see the huge improvement in the images! the seams are now gone - we have created panoramas from 3 different photographs.
part b: feature matching and autostitching
even though our mosaics turned out pretty nice, much of the process is very manual. picking the correspondence points itself takes a while. in this part of the project, i will implement feature matching and autostitching to create our mosaics for us.
corner detection
similar to how i manually selected the points, we want to select corner points by using the harris interest point detector. the harris interest point detector uses peaks in the matrix to find corners.
below are images of harris points overlaid onto my photos:
no threshold (all harris points):
but you can see that there are way too many points. let’s set a threshold to only keep the top n points.
threshold = 2000:
threshold = 1000:
using a threshold to filter out the top n corners is a good idea but it is not very sophisticated. you can see in our above images that our corners are not very spaced out; we get clusters spread out sparsely throughout the image.
adaptive non-maximal suppression (anms)
with the harris interest points, we get lots of redudant information since many points are clustered together. to space out the points a little more, we use adaptive non-maximal suppression (anms) to make our feature matching more effective.
i implemented the algorithm from section 3 of the paper “multi-image matching using multi-scale oriented patches” by brown et al..
you can see my anms points overlaid onto the images below. you can observe that the points are a lot more evenly spread out compared to just using the threshold:
500 anms points:
extracting feature descriptors
before we can match our correspondence points, we need to get some context for each point and to do this, we use feature descriptors. for each points, we get its feature descriptor which is a sample of 40x40 pixels centered around that pixel and downsize (space it out) by s=5 (resulting sample will be 8x8). after we do this, we have to bias/gain-normalize our descriptors. here are some of the descriptors from the heyns reading room images.
point at (x=683, y=546):
point at (x=856, y=1041):
point at (x=893, y=930):
point at (x=1277, y=1034):
feature matching between two images
we use the feature descriptors from both images and find the closest corresponding descriptors to match to each other. we implement section 5 of brown’s paper in this part of the project. we also make use of lowe’s trick for thresholding by setting the threshold to the ratio between the nearest and second-nearest neighbor.
4-point random sample consensus (ransac)
lowe’s trick improves the matching process by reducing the number of outliers. however, we will evidently still have some outliers remaining which can really skew our calculations when finding the least-squares solution. just one pair of incorrect points can really mess up the homography matrix calculations. this is why we need to use ransac.
the general idea of ransac:
- select four feature pairs (at random)
- compute homography H (exact)
- compute inliers where
dist(pi’, Hpi) < ε
- keep largest set of inliers
- re-compute least-squares H estimate on all of the inliers
we can see below that ransac does a really good job at eliminating the outliers:
you can see that the homography H produced by ransac is pretty accurate as the warped image from both manually and automatically selected points are the same.
autostitching (putting everything together)
heyns
you can see that the results are really good! they are basically identical. if we zoom in a little more, into both images, we can see that there are minor differences.
you can see that the automatic correspondence points using all the techniques used in the second part of the project actually does a better job than manually selected points (as expected). in the image on the left (manual correspondences), there are some noticeable artifacts while there are none in the image on the right.
tennis
street view
closing remarks
this project encompasses many techniques we learned in our past assignmnets and combines them all into a single project. it is also quite fascinating to see how much can be done in image processing with just mathematics and smart tricks/techniques.
being able to automate the process of selecting correspondence points and stitching/blending the images together has been both satisfying and rewarding. :)