CS 194-26 Project 1

Roger Chen

In this project, our task is to take a scanned image containing 3 grayscale images, which represent the world as seen through blue, green, and red filters, and align them into a RGB color image. I chose to use Python and Numpy for this task, because it is what I am most familiar with.

Input images

There are a lot of blotches and defects in the input images that make it hard to make sense of. I ended up just taking three equal sized slices of the input image array, by dividing the height of the original image in 3, and making those the three channels.

I noticed that when I loaded the images into Python with the Pillow image library, the small JPEG images would be arrays of type np.uint8 and have range 0 to 255, whereas the large TIFF images would be arrays of type np.uint16 and have range 0 to 65535. I want to use floating point numbers instead of integers, so the first thing I did was to normalize the values to approximately 0 to 256, so I do not have to worry about numerical issues regarding integer overflow.

Saving the output

Before I started doing any alignment, I wanted to write the output function so I could save the final output files and visualize my work. My output function creates a H*W*3 array of type np.uint8 and fills it in with 3 numpy arrays that represent the three color channels. I noticed that the input images seemed to be in Blue-Green-Red order, whereas numpy wants the arrays in Red-Green-Blue order, so I saved each channel to the 2-N^th channel of the numpy array. I output the final images as PNG instead of JPEG, because I do not think I should be using a lossless file format if I'm hunting for alignment errors in the output images. I can always convert the PNG to JPEG if I want to publish the results.

First attempt

The first version of my code tried to align all 3 images at the same time, by searching over two pairs of displacements: one for green and one for red. There were a total of 4 parameters to match. I thought that, even if the luminosity of two different color channels might have a lot of differences, I could get a pretty good answer by using the luminosity of all 3 color channels at once. I noticed my code ran really slowly and this was not the suggested method on the project spec, so I tried something else.

For my error metric, I used the L2 norm of the difference of the two images. I did not normalize this error metric, because I would only be comparing images that were all the same size. This is because I designed my get_displaced() function to also take in an extra parameter that would guarantee that, in one iteration of the algorithm, all of the compared images would be the same size.

Finally, I added the image pyramid optimization to my algorithm. I set a recursion base case threshold at 100 pixels, so images that were less than 100 pixels on the longest side would not be resized any smaller. On each recursive call, I reduced the size of my input images by a factor of 2. When the recursive call returned, I scaled up the estimated alignment parameters by a factor of 2.

This initial version of the algorithm produced this image, which is pretty good at a first glance:

However, the 100% crop reveals that there is a lot of odd colors on the edges:

Plus, some of the example input images did not align at all:

First attempt results

Here are all of the results produced by this initial algorithm. The parameters are expressed as (dy, dx).

bridge (-12, 7), (-68, -6)
cathedral (-5, -2), (-12, -3)
emir misaligned (-48, -24), (-442, -442)
harvesters (-59, -17), (-123, -14)
icon (-41, -17), (-89, -23)
lady (-54, -8), (-114, -12)
melons (-82, -10), (-178, -13)
monastery (3, -2), (-3, -2)
nativity (-3, -1), (-8, 0)
onion_church (-51, -27), (-108, -37)
self_portrait (-78, -29), (-175, -37)
settlers (-7, 0), (-15, 1)
three_generations (-51, -14), (-110, -12)
tobolsk (-3, -3), (-7, -3)
train (-42, -6), (-86, -32)
turkmen (-55, -21), (-116, -28)
village (-64, -12), (-137, -22)
workshop (-53, 1), (-105, 12)

We see imperfections in the alignment of most images, but only one image (emir) is seriously misaligned. This misalignment will be addressed in the next section.

Better transformations and features

I used some extra techniques to improve the performance of my algorithm on the images that it fails to align.

Distortion and scaling

My first idea was to expand the search space from a displacement pair (x, y) to an 8-tuple of parameters (x1, y1, x2, y2, x3, y3, x4, y4) that represented the 4 corners of a rectangular image. With the 4 corners, I would use a linear perspective transformation matrix to align the green channel to the blue channel, and another transformation matrix to align the red channel to the blue channel.

The main problem with this approach is that it would be too slow to implement, if I just added on the extra parameters to my current code. So, I came up with two modifications to this plan: #1) I would use 2 corners (top-left and bottom-right) instead of 4, which contains almost the same expressive power, but fewer dimensions, and #2) I would try to optimize one parameter at a time, while holding the others fixed.

My algorithm tweaks each parameter one at a time, and finds the tweak that yields the lowest error. Then, it updates the best guess for the alignment parameters with these new values. On each level of the image pyramid, this tweak-update cycle is performed 8 times.

So, my search space became (x1, y1, x2, y2). I represented this internally in my code as (y1, x1, y2, x2), because it makes more sense with Python's array indexing.

Edge detection

I used a function called sobel(), which was part of the scikit-image filters module. It produces an outline of the image that correspond to the edges. However, the edges appear black-er when they are less defined edges, and white-er when they are more defined. This is nicer than binary True/False edges, because edges in different color channels might not always show up the same way.

Improved Results

Bridge color fringing

One reason I wanted to implement the additional parameters (x1, y1, x2, y2) was to remove the color fringes in the bridge photo. The effect is especially notieable at the edges of the image, so I have chosen to examine those pink and yellow fringes. Here is a full preview and a 100% crop of the original algorithm's result:

I improved the result by enabling edge detection and the extra scaling parameters. Here is the same full preview and same 100% crop of the improved algorithm's result:

Not bad!

Emir misalignment

The combination of edge detection and scaled parameters seems to have fixed the misaligned emir.tif input as well. Here is the result of the original algorithm:

And here is the result of the improved algorithm:

Strangely, turning off either scaling or edge detection causes the image to be misaligned. The image is only properly aligned when both features are enabled together.

emir (no scaling)
(0, 0, 3209, 3702),
(-22, -44, 3184, 3648),
(-50, -159, 3177, 3752)
emir (no edge detection)
(-49, -24), (-572, -51)

All results

Here are all of the results produced by the final version of the algorithm. The parameters are expressed as (dy1, dx1, dy2, dx2).

bridge
(0, 0, 3225, 3742),
(17, -6, 3221, 3717),
(0, -64, 3213, 3666)
cathedral
(0, 0, 341, 390),
(-1, -7, 338, 388),
(-4, -15, 338, 384)
emir
(0, 0, 3209, 3702),
(-20, -48, 3183, 3652),
(-45, -107, 3171, 3596)
harvesters
(0, 0, 3218, 3683),
(-7, -60, 3194, 3624),
(-10, -115, 3200, 3550)
icon
(0, 0, 3244, 3741),
(-10, -36, 3220, 3695),
(-20, -86, 3219, 3648)
lady misaligned
(0, 0, 3212, 3761),
(1, -47, 3200, 3700),
(-27, -75, 3210, 3611)
melons misaligned
(0, 0, 3241, 3770),
(-2, -93, 3226, 3719),
(-11, -174, 3226, 3584)
monastery
(0, 0, 341, 391),
(-3, 2, 340, 395),
(-1, -2, 338, 387)
nativity
(0, 0, 341, 395),
(0, -2, 340, 391),
(1, -8, 340, 388)
onion_church
(0, 0, 3215, 3781),
(-13, -48, 3180, 3727),
(-30, -112, 3174, 3677)
self_portrait
(0, 0, 3251, 3810),
(-20, -84, 3216, 3741),
(-32, -176, 3211, 3635)
settlers
(0, 0, 341, 396),
(1, -4, 340, 386),
(0, -13, 342, 379)
three_generations
(0, 0, 3209, 3714),
(-6, -41, 3189, 3652),
(-7, -112, 3197, 3603)
tobolsk misaligned
(0, 0, 341, 396),
(-2, -2, 338, 392),
(-4, -11, 338, 394)
train misaligned
(0, 0, 3238, 3741),
(-6, -23, 3231, 3670),
(-24, -84, 3201, 3653)
turkmen
(0, 0, 3209, 3762),
(-17, -56, 3184, 3705),
(-28, -120, 3180, 3648)
village misaligned
(0, 0, 3270, 3819),
(-6, -64, 3253, 3754),
(-17, -208, 3243, 3794)
workshop
(0, 0, 3209, 3741),
(4, -44, 3207, 3678),
(9, -92, 3225, 3622)

Woah! It looks like there are a lot more misaligned images than there were even the first attempt algorithm. However, I noticed that if I turn off either edge detection or scaling, I can make the misalignment disappear.

Fixes for misaligned images

lady
Fixed by disabling scaling
(-55, -8), (-115, -13)
melons
Fixed by disabling edge detection
(0, 0, 3241, 3770),
(-3, -76, 3227, 3677),
(-8, -174, 3224, 3584)
tobolsk
Fixed by disabling scaling
(0, 0), (-3, -3), (-7, -3)
train
Fixed by disabling edge detection
(0, 0, 3238, 3741),
(4, -42, 3226, 3698),
(-25, -87, 3201, 3655)
village
Fixed by disabling scaling
(-64, -11), (-137, -22)

Other images from the Prokudin-Gorskii collection

Just for fun, here are some other images from the Prokudin-Gorskii collection. They were all aligned properly with both edge detection and scaling turned on.

00027a
(0, 0, 3208, 3723),
(-17, -46, 3180, 3667),
(-38, -108, 3162, 3608)

00476a
(0, 0, 3241, 3751),
(1, 22, 3228, 3757),
(4, -23, 3237, 3708)

00488a
(0, 0, 3248, 3770),
(-22, -45, 3211, 3749),
(-34, -109, 3206, 3709)