# Augmenting reality; Changing the image in a poster

I’m going to show you how to place a picture into another with the correct perspective.

ImageA - Two side by side pictures. A is the drawing crudely placed on the frame. WRONG The other is it correctly perspective mapped. Correct!

The goal is to map our rectangular source image into an appropriate polygon (but for now lets ignore “appropriate” and just choose an arbitrary polygon). Some unknown function maps source image coordinates to the correct destintation image coordinates.

(Apple photo by hjl, with additions by Imagemagick and Fred’s imagemagick scripts)

So, what is $$f(x,y) = (x’,y’)$$? How do we find it? and what are it’s properties?

# It Isn’t (entirely) Linear

This section can be skipped. It exists to motivate the following section on optics.

We know the four input corners and where they are going: Top left from $$(0,0) \rightarrow (26,10)$$, top right from $$(1024,0) \rightarrow (600,150)$$, bottom right from $$(1024,640) \rightarrow (500,400)$$, and bottom left from $$(0, 640) \rightarrow (35,620)$$.

Lets assume a linear relationship and solve for that. We can use our four points (which gives 8 equations) and solve for all the variable using a least-square-error solution:

\begin{eqnarray} x’&=& ax + by + c \\
y’&=& dx + ey + f \end{eqnarray}

\begin{bmatrix} 0 & 0 & 1 & & & \\\ & & & 0 & 0 & 1 \\
1024 & 0 & 1 & & & \\
& & & 1024 & 0 & 1 \\
1024 & 640 & 1 & & & \\
& & & 1024 & 640 & 1 \\
0 & 640 & 1 & & & \\
& & & 0 & 640 & 1 \end{bmatrix} \times \left[ \begin{array}{c} a \\\ b \\\ c \\\ d \\\ e \\\ f \end{array} \right] = \left[ \begin{array}{c} 26 \\\ 10 \\\ 600 \\\ 150 \\\ 500 \\\ 400 \\\ 35 \\\ 620 \end{array} \right]

Here is the python code to compute the solution:

 199200201202203204205206207208209210 import numpy a = numpy.array([[ 0, 0,1, 0, 0,0], [ 0, 0,0, 0, 0,1], [1024, 0,1, 0, 0,0], [ 0, 0,0,1024, 0,1], [1024,640,1, 0, 0,0], [ 0, 0,0,1024,640,1], [ 0,640,1, 0, 0,0], [ 0, 0,0, 0,640,1]]) b = numpy.array([26,10,600,150,500,400,35,620]) result = numpy.linalg.lstsq(a, b) print result[0]
[  5.07324219e-01  -7.10937500e-02   5.32500000e+01  -3.90625000e-02
6.71875000e-01   1.00000000e+02]


What does this look like?

\begin{eqnarray} x’&=& 0.507x - .0711y + .5325 \\
y’&=& -.0391x + .672y + .01 \end{eqnarray}

So, this is not a good fit. What we have just built is the most general type of 2D linear transformation: an affine transformation. In an affine transformation, parallel lines remain parallel but angles may change. However, notice that in the perspective transform parallel lines do not remain parallel. What’s going on?

# Introducing 3D space

Affline geometry is like two pairs of scissors whose blade-tips are joined. As you open and close one pair, the other mirrors it. The blades frame an askew box. You can also lengthen and shorten pairs of parallel-blade arm lengths.

Perspective geometry is very different: It describes what you see after embedding your 2D plane in 3D space. In other words, it’s like your object is throwing off rays towards your eyeballs and you’re catching them on a plane just in front of your eyes.

So, lets look at the equation for an intersection between a ray from eye to image and the plane z=1. The point on the image we will choose is the bottom right corner (coordinates W-1,H-1 of the picture, but coordinate 0.5, 0.5, 4 in 3D space). The eye is located at coordinates 0,0,0.

We can find the intersection point by just setting the plane equation equal to the line equation. So we can see from the line equation below that when $$t = .25$$, then the points (z = 1) (and it’s on the plane).

\begin{eqnarray} 0x + 0y + z&=&1& \quad\textrm{plane} \\
(0.5x + 0.5y + 4z)*t&=&P(t)&\quad\textrm{line} \end{eqnarray}

Note that we can now trivially compute the x and y coordinate of the intersection with the plane. It’s just (x/z) and $$y/z$$! This useful fact is what makes homogenous coordinates possible. But we’ll skip that and solve for a perspective transform without introducing that concept.