Chapter 16 Image Matching

16.1 Introduction

16.1.1 Image Matching and Object Reconstruction
object reconstruction: determination of object's pose or shape
image matching and object reconstruction needed for

$I^1,I^2,...,I^k$: object mapped into images
$T_0^1,T_0^2,...,T_0^k$: object to image mapping transformations
$T_0^1,T_0^2,...,T_0^k$: describe illumination, reflectance, sensing, ...
assumption: transformations $T_0^k$ are one to one
assumption implicitly excludes transparent surfaces or occlusions
$p_G$: unknown parameters specifying geometry of object
$p_R$: unknown parameters specifying reflectance property of object
$p_P$: unknown parameters specifying pose of camera $\hat{u}=u_0-\widehat{\Delta u},
\forall x$
$p_A$: unknown parameters specifying atmospheric response
mapping can be summarized as:

I^k = T_0^k(p_G,p_R,p_P,p_A,...)

general setup of object reconstruction and image matching
(a) object mapped to $I^1,I^2$ via transformations $T_0^1,T_0^2$
(b) setup for two perspective images
=====Fig. 16.1=====

image $I^2$ can be predicted from $I^1$ by:

T_2^1 = T_0^1 \cdot T_2^0 = T_0^1 \cdot (T_0^2)^{-1}(p_G,p_R,

16.1.2 The Principle of Image Matching
$I', I''$: two digital or digitized images or image patches
$I', I''$: size $5\times 5$ (point tracking) to $10000\times 10000$ (satellite image registration)
$P',P''$: points of images $I', I''$
$(r',c'),(r'',c'')$: coordinates of points $P',P''$
$g',g''$: intensities of points $P',P''$
$T_G$: specified geometric spatial-mapping function
$p_G$: set or vector of unknown parameters
if $(r',c')$ corresponds to $(r'',c'')$, their coordinates related by:


$T_I$: intensity-mapping function containing intensity relation
$p_I$: unknown vector
intensities on one image related to other image:


complete model of image matching:


$T_G,T_I$: may be deterministic, stochastic, and/or piecewise continuous

correspondence of $P',P''$: $P',P''$ relate to same point on object
problem of image matching or correspondence consists of:

  1. finding all corresponding points
  2. determining parameters $p_G,p_I$ of mapping functions $T_G,T_I$
=====Fig. 16.2=====

$a',a''$: point attributes derived from intensity functions in neighborhood
$P',P''$: points in a neighborhood around $(r',c'),(r'',c'')$

P'=P'(r',c';a') {\rm\ and \ } P''=P''(r'',c'';a'')

image matching solution achieved in three-step procedure
  1. select appropriate image features in one or in both images
    e.g. interest operator to derive list of interesting points or edges
  2. find corresponding point pairs $(P'_i,P''_j)$ with similarity and consistency
    with $P'_i$ from first list and $P''_j$ from second list
    similarity: based on attributes and properties of intensity function
    consistency: based on degree to which spatial-mapping function fulfilled
  3. interpolate parallaxes $(r''-r',c''-c')$ between selected feature pairs

16.1.3 Image Matching Procedures
properties of some correspondence algorithms for image matching
=====Table 16.1=====
similarity measures reflect the assumed intensity-mapping function

consistency measure or interpolation method on stereo images:

=====Fig. 16.3=====
precision of image matching and edge detection (all figures in pixels)
=====Table 16.2=====
=====Garfield 17:28=====

16.2 Intensity-Based Matching of One-Dimensional Signals
$g$: replacing $g'$: intensity of first image
$T_0^k$: replacing $g''$: intensity of second image
$x$: replacing $r'$: location of first image
$y$: replacing $r''$: location of second image
$n(x)$: observational noise component
1D intensity-based matching model:


16.2.1 The Principle of differential Matching
assumption: no intensity changes due to viewing direction
$x_i$: location in first image
$\hat{u}(x_i)$: unknown deformation at $x_i$ to produce corresponding point
point $x_i$ in first image corresponds to point $y_i$ in second image


nonlinear model valid for all $m$ observed values $g(x_i),i=1,...,m$:

g(x_i)=f[x_i - \hat{u}(x_i)]+n(x_i), \ \ \ i=1,...,m

$n(x_i)$: additive noise; independent and identically distributed
$n(x_i)$: mean zero, variance $\sigma_n^2$
assumed observation process: $u(x)+ \Rightarrow x_i$ to the right of $y(x_i)$
=====Fig. 16.4=====

$u_0(x_i)$: approximate values of unknown deformation
$\widehat{\Delta u}(x_i)$: unknown correction for unknown value $\hat{u}(x_i)$

\hat{u}(x_i) = u_0(x_i) + \widehat{\Delta u}(x_i)

abbreviations: $g_i=g(x_i), \widehat{\Delta u}_i=\widehat{\Delta u}(x_i),
n_i=n(x_i), u_{0i}=u_0(x_i), \hat{u}_i=\hat{u}(x_i)$
nonlinear model can be written as:

g_i = f(x_i-\hat{u}_i)+n_i=f(x_i-u_{0i}-\widehat{\Delta u}_i)+n_i, \ \ \

linearize $T_0^k$ around point $x_i-u_{0i}$ to obtain:

g_i=f(x_i-u_{0i})-f'(x_i-u_{0i})\widehat{\Delta u}_i+\frac{1...
-\xi\widehat{\Delta u}_i)(\widehat{\Delta u}_i)^2+n_i

for some $\xi \in [0,1],$ and where $f'(y)=df/dy, f''(y)=d^2 f / dy^2$
assumption: $K', K''$ does not vanish and second-order term negligible

\Delta g_i = \Delta g(x_i) = g(x_i)-f(x_i-u_{0i})

f'_i = f'(x_i - u_{0i})

linearized model with known: $\Delta g_i, f'_i$, unknown: $\widehat{\Delta u}_i$

\Delta g_i = -f'_i \widehat{\Delta u}_i + n_i, \ \ \ i = 1,...,m

or explicitly

g(x_i)-f[x_i-u_0(x_i)] = - \frac{d f(y)}{dy}\vert _{y=x_i-u_0(x_i)}
\widehat{\Delta u}_i + n(x_i), \ \ \ i=1,...,m

easily determine random variable $\widehat{\Delta u}_i$ assuming $f'_i \neq 0$:

\widehat{\Delta u}_i = - \frac{\Delta g_i}{f'_i}

or explicitly

\widehat{\Delta u}_i = - \frac{g(x_i)-f[x_i-u_0(x_i)] }{\frac{d f(y)}{dy}\vert _{y=x_i-u_0(x_i)}}


\hat{u}_i = u_{0i}+\widehat{\Delta u}_i

principle of the applied differential approach to estimate local deformation
=====Fig. 16.5=====

16.2.2 Estimating an Unknown Shift
assumption: only uniform shift i.e. $\hat{u}(x)=\hat{u}$, $y(x_i)=x_i-\hat{u}$
$u_0$: initial approximation for $u$, thus $\hat{u}=u_0-\widehat{\Delta u},
\forall x$
linearizing $T_0^k$ around $x_i-u_0$ for each $i$:

f[x_i-(u_0+\widehat{\Delta u})]=f(x_i-u_0)-f'(x_i-u_0)\widehat{\Delta u}

linearized model can be expressed as:

g(x_i)=f(x_i-u_0)-f'(x_i-u_0)\widehat{\Delta u}+n_i, \ \ \ i=1,...,m

or in short

\Delta g_i = -f'_i \widehat{\Delta u} + n_i, \ \ \ i=1,...,m

minimizing $\Omega=\sum_{i=1}^m n_i^2$ by choosing appropriate $\widehat{\Delta u}$

\frac{\partial \Omega}{\partial \widehat{\Delta u}}
...=1}^m f'_i \Delta g_i + 2(\sum {f'_i}^2)\widehat{\Delta u} = 0

from which follows the estimate

\widehat{\Delta u} = - \frac{\sum_{i=1}^m f'_i \Delta g_i}{\...

16.2.3 Estimating Unknown Shift and Scale
$x_0$: fixed reference point
augment model by assuming transformation contains unknown scale:


in reduced coordinates


linear model reads as:

g(x_i)=f(s\bar{x}_i-\hat{u})+n(x_i), \ \ \ i=1,...,m

linearized model:

\Delta g_i = -f'_i \widehat{\Delta u} + f'_i \bar{x}_i \widehat{\Delta s}
+ n(x_i), \ \ \ i=1,...,m

use Taylor's expansion for linearized model and minimize $\sum n^2(x_i)$, thus

\left (
\sum {f'_i}^2 & - \sum {f'_i}^2 \b...
...lta g_i \\
\sum f'_i \bar{x}_i \Delta g_i
\end{array}\right )

16.2.4 Compensation for Brightness and Contrast
$a$: change in contrast
$b$: change in brightness
nonlinear model with brightness and contrast but without scale parameter:


linearized model:

\Delta g_i = -a_0 f'_i \widehat{\Delta u} + f_i \widehat{\Delta a} + \hat{b}
+n_i, \ \ \ i=1, ..., m

normal equation system for least-squares solution to minimize $\sum n^2(x_i)$:

\left (
a_0^2 \sum {f'_i}^2 & - a_0\sum f...
\sum f_i \Delta g_i \\
\sum \Delta g_i
\end{array}\right )

16.2.5 Estimating Smooth Deformations
used for larger windows but transformation still smooth

16.2.6 Iterations and Resampling
iterative estimation scheme: if initial approximations are crude

16.2.7 Matching of Two Observed Profiles
both profiles corrupted by noise: reduce to methods developed so far

16.2.8 Relations to Cross-Correlation Techniques
cross-correlation model: assumes a shift between two corresponding image
=====Oldie 33:79=====

16.3 Intensity-Based Matching of Two-Dimensional Signals
differential techniques for intensity-based matching expanded to 2D

16.3.1 The Principle and the Relation to Optical Flow
nonlinear model:

g(r_i,c_i)=f(p_i,q_i)+n(r_i,c_i) \ \ \ i=1,...,m


\left (
p \\
\end{array}\right )_i = \le...
\hat{u}(r,c) \\
\end{array}\right )_i

assumption: approximate values of unknown $[\hat{u}(r,c),\hat{v}(r,c)]$ known

related to optical flow equation: i.e. noise term omitted:

f[r+u(r,c)\cdot dt, c+v(r,c)\cdot dt, t+dt]=f(r,c,t)

linearize by Taylor's expansion:

f(r,c,t)+\frac{\partial f(r,c,t)}{\partial r} \cdot u(r,c) \...
...) \cdot dt
+\frac{\partial f(r,c,t)}{\partial t} dt = f(r,c,t)

let $\nabla f = (f_r,f_c)^T,V=(u,v)^T, \dot{f}=\partial f/\partial t$, we get optical flow equation:

\nabla f \cdot V + \dot{f} = f_r u + f_c v + f_t = 0

16.3.2 Estimating Constant-Shift Parameters
simplest model for matching:

p = x-u, \ \ \ q = y-v

linearized model:

\Delta g_i = - f_{r_i} \cdot \widehat{\Delta u}
- f_{c_i} \cdot \widehat{\Delta v} + n_i, \ \ \ i=1, ..., m

normal equations to minimize $\sum n^2(x_i)$:

\left (
\sum_{i=1}^m f_{r_i}^2 & \sum_{i=1...
\sum_{i=1}^m f_{c_i} \cdot \Delta g_i
\end{array}\right )

16.3.3 Estimating Linear Transformations
$a_k,k=1,...,8$: eight unknown parameters
model to match two small windows:



\left (
p \\
\end{array}\right )_i = \le...
...)_i + \left (
a_3 \\
\end{array}\right )

h(f)=a_7 \cdot f + a_8

linearized model:

\Delta g_i = f_{r_i} \cdot r_i \cdot \widehat{\Delta a_1}
+...{\Delta a_4}
+ f_{c_i} \cdot c_i \cdot \widehat{\Delta a_5}

+ f_{c_i} \widehat{\Delta a_6}
+ f_i \widehat{\Delta a_7}
+ 1 \widehat{\Delta a_8} + \bar{n}_i

to minimize $\sum \bar{n}_i^2$, normal equation matrix $T_0^1,T_0^2$:

\left (
\sum f_r^2 r^2 & \sum f_r^2 ...
...sum f^2 & \sum f \\
& & & & & & & \sum 1
\end{array}\right )

eight unknown parameters:

\hat{y} = \left (
\widehat{\Delta a_1} \\
...ehat{\Delta a_7} \\
\widehat{\Delta a_8}
\end{array}\right )

right-hand side $h$:

h=\left (
\sum_{i=1}^m f_{r_i} r_i \Delta g...
... \Delta g_i \\
\sum_{i=1}^m 1 \Delta g_i
\end{array}\right )

solution: normal equation system $N \hat{y} = h$
$\widehat{\Delta a_k}, \ k=1,...,6$: six corrections for geometric transformation
$\widehat{\Delta a_7}, \widehat{\Delta a_8}$: corrections for radiometric parameter

16.3.4 Invariant Points
(a),(b): corner points; scale difference cannot be determined
(c),(d): circularly symmetric; rotation cannot be determined
=====Fig. 16.8=====

16.4 An Interest Operator

16.4.1 Introduction
interesting has several meanings, depending on context:

  1. distinctness: distinguishable from immediate neighbors
    distinct points: corners, blobs, highly textured places, etc.
  2. invariance: position and selection invariant w.r.t. geometric distortion
    invariance and distinctness: influence all subsequent steps in analysis
  3. stability: position and selection invariant w.r.t. viewing
    stability: ensures interesting points in image correspond to object points
    corner points of polyhedra: stable
    T-junction: unstable since it results from occlusions
    stability: decisive for image-matching, 3D reconstruction
    =====Nalwa, A Guided Tour of Computer Vision, p. 137=====
  4. uniqueness: global separability, i.e. imagewide separability
    distinctness: aims at local separability
    uniqueness: to avoid locally distinct but repetitive features or points
    uniqueness: closest notion to interestingness
  5. interpretability: requires extracted points to have meaning
    e.g. corners, junctions of lines, centers of circles, rings, etc.

an interest operator with a three-step procedure:

  1. selection of optimal windows: search for local maxima of average gradient magnitude
  2. classification of the image function within the selected windows
    classification distinguishes between types of singular points
    e.g. corners, rings, spirals, isotropic texture
  3. estimation of the optimal point within the window as the classification
    precise for corners and for centers of circular symmetric features

16.4.2 Estimating Corner Points

16.4.3 Evaluation and Classification of Selected Windows

16.4.4 Selection of Optimal Windows

16.4.5 Uniqueness of Selected Points
uniqueness of selected points: based on similarity derived from attributes
highest uniqueness measure: points with no features in common with other
repetitive features show low uniqueness
uniqueness measure represented by the area of the circles
=====Fig. 16.11=====

16.5 Robust Estimation for Feature-Based Matching

16.5.1 The Principle of Feature-Based Matching
three steps of feature-based matching:

  1. selecting features by using some interest operator
  2. finding correspondences by using similarity and consistency measure
  3. interpolating between parallaxes by spatial-mapping function

individual steps for similarity measure:

  1. similarity between extracted features:
    form preliminary list of correspondences, including weights
  2. mapping function hypothesis: found by robust estimation procedure, similar to relaxation techniques
    by enforcing one-to-one correspondence between image features
  3. final parameters of mapping function: by maximum likelihood estimate allowing rigorous evaluation of the match

$\alpha$: parameters to be estimated
$\beta$: observed measurement
maximum likelihood (ML) estimate:

\alpha_{ML} = \arg \max_{\alpha} P(\beta\vert\alpha)

$P(\alpha)$: prior information about parameters
Bayesian estimate: maximum a posteriori (MAP) estimate:

\alpha_{MAP} = \arg \max_\alpha P(\alpha\vert\beta)
= \arg \max_\alpha \frac{P(\beta\vert\alpha)P(\alpha)}{P(\beta)}

=====J. V. Beck and K. J. Arnold, Parameter Estimation in=====
=====Engineering and Science, Wiley, New York, 1977.=====

16.5.2 The Similarity Measure

16.5.3 Heuristics for Selecting Candidate Pairs
finding candidate pairs of features: first step after selecting image features
finding candidate pairs: to reduce algorithmic complexity in final match
finding candidate pairs: all types of a priori knowledge may be included
heuristics and strategies for selecting candidate pairs:

  1. expected parallax may be used to exclude unlikely feature pairs
  2. similarity of features required to be above certain threshold
  3. uniqueness of selected points used to reduce number of candidate pairs

16.5.4 Robust Estimation for Determining the Spatial-Mapping Function
preliminary correspondences must be compared to see if consistent with model
most general model applied: local smoothness constraint almost everywhere
finite-element description of spatial-mapping function would be appropriate
robust estimation: used only for finding good hypothesis for match
images with mostly translation $(10, -5.7)$, no rotation (identity matrix)
=====Fig. 16.12=====

16.5.5 Evaluating the Final Result
result of finding most likely parameters of mapping function must be evaluated:

apply classical hypothesis tests to evaluate the result:

  1. global check if data and model are consistent using sum of squared residuals
  2. precision of estimated parameters can be determined
  3. result termed reliable only if enough points used to determine mapping
  4. projection of one image into other: decisive check on matching correctness

16.6 Structure from Stereo by Using Correspondence
=====Experiment: close one eye and put pen cap back=====
implication of ambiguous correspondences between image points
=====Nalwa, A Guided Tour of Computer Vision, Fig. 7.3=====
monotonic-ordering assumption: conjugate image points have same order
violation of the monotonic-ordering assumption
=====Nalwa, A Guided Tour of Computer Vision, Fig. 7.7=====
occlusion as an impediment to stereo
=====Nalwa, A Guided Tour of Computer Vision, Fig. 7.4=====
scene recovered from a pair of stereo images
=====Nalwa, A Guided Tour of Computer Vision, Fig. 7.9=====
large windows: to establish global correspondence
small windows: using global correspondence for local correspondence
stochastic relaxation as a tool for correspondence establishment
=====Nalwa, A Guided Tour of Computer Vision, Fig. 7.10=====
assumption: interior orientation of cameras determined by calibration
for reasons of stability: at least five groups of points used for calibration

16.6.1 Epipolar Geometry
image matching tremendously simplified: if relative orientation known
relative orientation known: 2D search reduced to 1D by epipolar geometry
$O', O''$: projection centers
$b$: baseline length
$f', f''$: focal lengths
$H', H''$: principal points assumed to be two image coordinate origins
$(u', v'), (u'', v'')$: two image coordinate systems
$P(x, y, z)$: 3D object point
$P'(u', v'), P''(u'', v'')$: 2D image coordinates for the point
collinearity condition: five points $P, O', O'', P', P''$ lie in one plane
epipolar plane $\epsilon(P)$: the plane $P, O', O'', P', P''$ lie in
epipolar lines $e'(P), e''(P)$: intersection lines of epipolar plane and image planes
different points: have different pairs of epipolar lines
all epipolar planes: form pencil of planes passing through baseline $b=(O'O'')$
epipoles $F', F''$: epipolar line intersection point
epipoles $F', F''$: intersection of baseline and image planes
in general: epipolar lines are not parallel
epipolar geometry of a general image pair
=====Fig. 16.13=====

epipolar plane $\epsilon(P)$: defined by $P', O', O''$
given one image point: epipolar line $e''(P)$ fixed and $P''$ sits on this line
since epipolar line known: search is necessary in only one dimension
epipolar line constraint: strongest constraint in image matching
epipolar line constraint: used as soon as available

$K', K''$: principal points
$(x', y'), (x'', y'')$: image coordinate system
$x'$ parallel to $x''$
epipolar lines $e'(P), e''(P)$ identical and parallel to baseline $b$
image planes $\mu', \mu''$: identical and parallel to baseline $b$
search space for $P''$ given $P'(x', y')$: line $y=y'$
epipolar geometry of a normal image pair
=====Fig. 16.14=====

16.6.2 Generation of Normal Images
to rectify image pairs to normal: for correspondence and 3D determination
$\nu$: new image plane
relation between general image pair and normal image pair
=====Fig. 16.15=====
procedure for rectification to normal image pair:

  1. Choose a plane $\nu$ parallel to baseline $b$.
    now focal length $\bar{f}$ of normal images fixed
    Choose new image coordinate systems $(x', y'), (x'', y'')$ with origins $\bar{K}', \bar{K}''$
    $x'$ parallel to $x''$
    $\bar{K}', \bar{K}''$: principal points of new images
    $\bar{K}'O'$ parallel to $\bar{K}''O''$
    Choose a common pixel spacing in the normal images.
  2. For each of the images:
    (a) Choose four points well distributed over the new image.
    $(x_i, y_i), i = 1,...,4$: coordinates of 4 points in normal images
    (b) Project the four points into the original image.
    Use known pose of the cameras here.
    $(u_i, v_i), i=1,...,4$: four coordinates in image planes $\mu', \mu''$
    (c) Solve the equation system:

(g x_i + h y_i + 1) \cdot u_i = a x_i + b y_i + c

(g x_i + h y_i + 1) \cdot v_i = d x_i + e y_i + f

    (d) Rectify the original image.
    each pixel $(x_j, y_j)$ in normal image: determines $(u_j, v_j)$ in original image

16.6.3 Specializing the Image-Matching Procedures
interest operator reduced to 1D: searching edges across epipolar lines

16.6.4 Precision of Three-Dimensional Points from Image Points
wide-angle stereo: distance between two centers of projection large
wide-angle stereo: more precise estimates for 3D positions
wide-angle stereo: more difficult to establish correspondence
adaptive matching window: size inversely proportional to intensity variance
adaptive matching window: size inversely proportional to estimated disparity
(a) original (b) \begin{displaymath}
g_i=f(x_i-u_{0i})-f'(x_i-u_{0i})\widehat{\Delta u}_i+\frac{1...
-\xi\widehat{\Delta u}_i)(\widehat{\Delta u}_i)^2+n_i
\end{displaymath} (c) $7\times 7$ (d) adaptive matching window size
=====Nalwa, A Guided Tour of Computer Vision, Fig. 7.8=====
stereo image pair taken with a photogrammetric stereo camera
=====Fig. 16.17=====
=====Garfield 17:52=====