Chapter 18 Object Models and Matching

18.1 Introduction
object recognition: one of most important aspects of computer vision

18.2 Two-Dimensional Object Representation
2D shape analysis useful in machine vision application:

• medical image analysis
• aerial image analysis
• manufacturing
2D shape representation classes:
• global features
• local features
• boundary description
• skeleton
• 2D parts

18.2.1 Global Feature Representation
2D object: can be thought of as binary image
value 1: pixels of object
value 0: pixels outside object
2D shape features: area, perimeter, moments, circularity, elongation

Shape Recognition by Moments
1#1: binary image function
2#2: 2D shape
digital 3#3th moment of 4#4:

5#5

6#6: area of 4#4: number of pixels in 4#4
moment invariants: functions of moments invariant under shape transform
prefer moment invariants: under translation, rotation, scaling, skewing
center of gravity 7#7 of 4#4:

8#8

9#9

central 3#3th moment of 4#4:

10#10

central moments: translation invariant

normalized central moments of 4#4:

11#11

seven functions that are rotation invariant:
12#12
13#13: skew invariant

Shape Recognition with Fourier Descriptors
Fourier descriptors: another way for extracting features from 2D shapes
Fourier descriptors: defined to characterize boundary
The main idea is to represent the boundary as a function of one variable 14#14, expand 14#14 in its Fourier series, and use the coefficients of the series as Fourier descriptors (FDs).
finite number of FDs: can be used to describe the shape

15#15: arc length along 16#16
16#16: clockwise-oriented, simple closed curve represented by:

17#17

point moving along curve generates complex function:

18#18

19#19: periodic function with period 20#20
Fourier series expansion of 19#19:

21#21

FDs are coefficients 22#22 defined by:

23#23

assume 2D shape described as sequence of 24#24 points 25#25
define sequence of unit vectors:

26#26

define sequence of cumulative differences:

27#27

28#28

FDs are defined by:

29#29

30#30: two curves to be compared
22#22: sequence of FDs of 31#31
32#32: sequence of FDs of 33#33
34#34: number of harmonics used
distance measure in shape comparison:

35#35

=====Garfield 17:61=====

18.2.2 Local Feature Representation
2D object characterized by: local features, attributes, relationships
most commonly used local features: holes, corners
holes: found by connected component procedure followed by boundary tracing
holes: detected by binary mathematical morphology, if hole shapes known
hole properties: areas, shapes
corner detection: can be performed on binary or gray tone image
corner property: angle at which lines meet

18.2.3 Boundary Representation
boundary representation: most common representation for 2D objects
3 main ways to represent object boundary:

1. sequence of points
2. chain code
3. sequence of line segments

The Boundary as a Sequence of Points
boundary points: from border-following or edge-tracking algorithms
interest points: boundary points with special property useful in matching

The Chain Code Representation
chain encoding: can be used at any level of quantization
chain encoding: saves space required for row and column coordinates
boundary encoded: first quantized by placing over square grid
grid side length: determines resolution of encoding
marked points: grid intersections closest to curve and used in encoding
36#36: marks starting point of curve
chain encoding of boundary curve
=====Fig. 18.1=====
line segments: links: to be used to approximate the curve
encoding scheme: eight possible directions assigned integer between 0, 7
chain: chain encoding: in the form

37#37

or

38#38

length of chain code with 39#39 chains: can be simply estimated as 39#39
40#40: number of odd chain codes
41#41: number of even chain codes
42#42: number of corners
20#20: unbiased estimate of perimeter length
Freeman suggested: 43#43

The Boundary as a Sequence of Line Segments
line segment sequence: after boundary segmented into near-linear portion
line segment sequence: used in shape recognition or other matching tasks
44#44: coordinate location where pair of lines meet
45#45: angle magnitude where pair of lines meet
46#46: sequence of junction points to represent line segment sequence
47#47: sequence of junction points representing model object 48#48
49#49: sequence of junction points representing test object 50#50
51#51: an association
goal: given 52#52, to find 53#53 satisfying 54#54 or 55#55
or 56#56

18.2.4 Skeleton Representation
strokes: long, sometimes thin parts forming shapes
line segments that characterize the strokes of set of characters
=====Fig. 18.2=====
symmetric axis transform: set of maximal circular disks inside object
symmetric axis: locus of centers of these maximal disks
symmetric axes of the characters
=====Fig. 18.3=====
symmetric axis: one example of skeleton description of 2D object
symmetric axis of rectangle: consists of five line segments, not single line
symmetric axis: extremely sensitive to noise
symmetric axis: difficult to use in matching
=====Nalwa, A Guided Tour of Computer Vision, Fig. 9.3=====

axis of smoothed local symmetries: separate definition for skeleton
local symmetry: midpoint 57#57 of line segment 58#58 joining pair of points 59#59
31#31: angle between 58#58 and outward normal 60#60 at 61#61
31#31: angle between 58#58 and inward normal 62#62 at 63#63
point 57#57 that is local symmetry with respect to boundary points 61#61 and 63#63
=====Fig. 18.4=====
axes: spines: loci of local symmetries maximal w.r.t. forming smooth curve
cover of axis: portion of shape subtended by axis
axis cover properly contained in another cover: second axis subsumes first
symmetric axes of local symmetry of a rectangle
=====Fig. 18.5=====
axes of smoothed local symmetries of several objects
=====Fig. 18.6=====

18.2.5 Two-Dimensional Part Representation
parts, attributes, interrelationships: form structural description of shape
nuclei: regions where primary convex subset overlap
decomposition of shape into primary convex subsets and nuclei
=====Fig. 18.7=====

near-convexity: allows noisy, distorted instances to have same decompositions
64#64: two points on object boundary
65#65 relation: visibility relation
if line 66#66 completely interior to object boundary, 67#67
the graph-theoretic clustering to determine clusters of visibility relation
decomposition of three similar shapes into near-convex pieces
=====Fig. 18.8=====

18.3 Three-Dimensional Object Representations

18.3.1 Local Features Representation
range data: obtained from laser range finder, light striping, stereo, etc.
from depth, try to infer surfaces, edges, corners, holes, other features
3D matching more difficult than 2D because of occlusion
=====Oldie 33:116=====

18.3.2 Wire Frame Representation
wire frame model: 3D object model with only edges of object
two-color hyperboloid and its line drawing
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.1=====
Necker cube: lower-vertical face or upper vertical face closer to viewer
Schroder staircase: viewed either from above or from below
two well-known ambiguous line drawings
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.2=====
inherent ambiguity of line drawing: owing to complete loss of depth
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.3=====

general-viewpoint assumption: none of the following situations

• two vertices of scene objects represented at same picture point
• two scene edges seen as single line in picture
• vertex seen exactly in line with unrelated edge
general-viewpoint assumption: heart of line-drawing interpretation
viewpoint in perspective projection: center of projection
viewpoint in orthographic projection: direction of projection
subjective contours of Kanizsa: white occluding triangle in space
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.4=====

line labels for visible projections of surface-normal discontinuities:
68#68: occluding, depth discontinuity
68#68: oriented such that occluding surface to the right of arrow
69#69: nonoccluding convex
70#70: nonoccluding concave
nonpolyhedral object: line label may change from 69#69 to 68#68
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.5=====
four basic ways in which three planar surfaces can form polyhedron vertex
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.6=====
all possible distinct appearances of trihedral vertices of polyhedra
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.7=====
complete junction catalog for line drawings of trihedral-vertex polyhedra
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.8=====
consistent labeling necessary but not sufficient for physical realizability
(a) line drawing with no consistent labeling
(b) consistent labeling of line drawing of impossible polyhedron
(c) physically realizable line drawing with unrealizable consistent labeling
=====Nalwa, A Guided Tour of Computer Vision, Fig. 4.9=====

18.3.3 Surface-Edge-Vertex Representation
VISIONS system: Visual Integration by Semantic Interpretation of Natural
Scenes
PREMIO system: Prediction in Matching Images to Objects
PREMIO 3D object model: hierarchical, relational model with five levels
5 levels: world, object, face/edge/vertex, surface/boundary, arc/2D, 1D piece
world level: arrangement of different objects in world
object level: arrangement of different faces, edges, vertices forming objects
face level: describes face in terms of surfaces and boundaries
surface level: specifies elemental pieces forming surfaces
2D piece level: describes pieces and specifies arcs forming boundaries
1D piece level: describes elemental pieces forming arcs
SDS: spatial data structure
71#71: attribute-value table
PREMIO 3D object model
=====Fig. 18.9=====

18.3.4 Sticks, Plates, and Blobs
sticks, plates, blobs model: rough models of 3D objects used in rough-matching
sticks: long, thin parts with only one significant dimension
stick: cannot bend very much
stick: has two logical endpoints, set of interior points, center of mass
plates: flattish, wide parts with two nearly flat surfaces
plates: have two significant dimensions
plate surfaces: cannot fold very much
plate: has set of edge points, set of surface points, center of mass
blobs: parts with three significant dimensions
blob: can be bumpy but cannot have concavities
blob: set of surface points and center of mass
sticks, plates, blobs: near-convex
examples of sticks, plates, and blobs
=====Fig. 18.10=====

attribute-value table: contains global attributes
simple-parts relation: lists the parts and their attributes
connects-supports relation: gives connections between pairs of parts
triples relation: specifies connections between three parts at a time
parallel relation: lists pairs of parts that are parallel
perpendicular relation: lists pairs of parts that are perpendicular
TYPE: 1 for stick, 2 for plate, 3 for blob
full relational structure of sticks-plates-blobs model of chair object
=====Fig. 18.11=====

18.3.5 Generalized Cylinder Representation
generalized cylinder: volumetric primitive defined by axis and cross-section
cross section: swept along axis, creating a solid
e.g. actual cylinder: generalized cylinder whose axis is straight-line segment
and whose cross section is circle of constant radius
e.g. cone: generalized cylinder whose axis is straight-line segment and
cross section is circle with radius initially zero to maximum
e.g. rectangular solid: generalized cylinder whose axis is straight line
segment and cross section is constant rectangle
e.g. torus: generalized cylinder whose axis is circle and
whose cross section is constant circle
generalized cylinder representation: uses generalized cylinders as primitives

surface-edge-vertex model: very precise
sticks-plates-and-blobs model: very rough
generalized cylinder model: somewhere in between
person: modeled roughly as cylinders for head, torso, arms, legs
dotted lines: axes of cylinders
=====Fig. 18.12=====

superquadrics: lumps of clay deformable and can be glued into object models
superquadric models: mainly used with range data
range data image of (a) a doll, (b) its superquadric fit (c), (d) wire frame
=====Fig. 18.13=====

18.3.7 Octree Representation
octree encoding: geometric modeling technique used to represent 3D objects
octree encoding: used in computer vision, robotics, computer graphics
octree: hierarchical 8-ary tree structure
each node in octree: corresponds to cubic region of universe
full: if cube is completely enclosed by 3D object
empty: if cube contains no part of object
partial: if cube partly intersects object
full, empty, partial: correspond to black",white",gray" in quadtrees
node with label full or empty: has no children
partial: has eight children representing partition of cube into octants
simple 3D object (a) and its octree encoding (b)
=====Fig. 18.14=====

18.3.8 The Extended Gaussian Image
3D object: collection of surface normals, one at each point of object surface
planar surface: all points on surface map to same surface normal
convex with positive curvature everywhere: distinct surface normal everywhere
Gaussian sphere: unit sphere
set of surface normals mapped to Gaussian sphere: tail at center head outward
Gaussian image of object: resultant set of points on Gaussian sphere
concepts associated with Gaussian image of 3D object
=====Fig. 18.15=====

for planar objects: Gaussian image not invertible, not precise enough for use
72#72: small surface patch of object
73#73: corresponding surface patch on Gaussian sphere
Gaussian curvature 74#74:

75#75

76#76: point on Gaussian sphere corresponding to point 77#77 on object surface
extended Gaussian image:

78#78

planar region: Gaussian curvature 0, point mass in extended Gaussian image
=====joke=====

18.3.9 View-Class Representation
view classes: each representing set of viewpoints sharing some property
property: e.g. same object surfaces visible
property: e.g. same line segments visible
property: e.g. relational distances between relational structures are similar
characteristic views: sets producing topologically isomorphic line drawings
three view classes of cube producing topologically isomorphic line drawings
=====Fig. 18.16=====
surface-edge-vertex representation of a wedge
=====Fig. 18.17=====
aspect graph of object: graph structure where

1. each node represents topologically distinct view of object
2. a node for each such view of object
3. each arc represents a visual event at transition
4. there is an arc for each such transition
aspect graph of a cube
=====Fig. 18.18=====

18.4 General Frameworks for Matching
matching: finding correspondence between two entities
consistent labeling procedures: examples of matching algorithms

18.4.1 Relational-Distance Approach to Matching
relational distance: compares two structures and determines similarity

Relational-Distance Definition
79#79: relational description
80#80: sequence of relations
81#81: set of parts of entity being described
82#82: relation indicating various relationships among parts
83#83: relational description with part set 61#61
84#84: relational description with part set 63#63
assumption: 85#85, otherwise add dummy parts to smaller set
1#1: any one-one, onto mapping from 61#61 to 63#63
86#86: positive integer
composition 87#87 of relation 88#88 with function 1#1:

89#89

1#1: maps parts from set 61#61 to parts from set 63#63
structural error of 1#1 for 90#90th pair of corresponding relations in 91#91:

92#92

total error of 1#1 with respect to 91#91:

93#93

relational distance 94#94 between 91#91:

95#95

best mapping from 96#96 to 97#97: mapping 1#1 that minimizes total error

Relational-Distance Examples
best mapping from 98#98 to 99#99 is
100#100
for this mapping:

101#101

102#102

103#103

104#104

105#105

two digraphs whose relational distance is 3
=====Fig. 18.19=====

four object models, 106#106
=====Fig. 18.20=====

Relational Distance as a Metric
relational distance: used to determine similarity of unknow object to model
relational distance: used to compare object models to group models
1#1 relational isomorphism: if 1#1 one-one, onto from 61#61 to 63#63 and 107#107
108#108 relational isomorphism: 91#91 isomorphic
109#109: relational-distance measure
110#110: arbitrary relational descriptions
metric property of 109#109:

1. 111#111 91#91 isomorphic
2. 112#112
3. 113#113

Attributed Relational Descriptions and Relational Distance
extend relational description and relational distance to include

• properties of parts
• properties of the whole
• properties of these relationships

18.4.2 Ordered Structural Matching
definition of ordering on primitives: greatly reduces complexity of search

18.4.3 Hypothesizing and Testing with Viewpoint Consistency Constraint
viewpoint consistency constraint:

The locations of all projected model features in an image must be consistent with projection from a single viewpoint.

18.4.4 View-Class Matching
if 3D object represented by view-class model, matching divided into 2 stages:

1. determining view class of object
2. determining precise viewpoint within that view class

Determining View Class
relational pyramid: hierarchical relational structure to represent view class
level-1 primitives: straight- and curved-line segments
level-2 relations: junctions and loops
level-3 relations: adjacency, collinearity, junction parallelness, loop-loop

Pose Determination within View Class
relational pyramid: hierarchical, relational structure to constrain matching

18.4.5 Affine-Invariant Matching
114#114: set of interest points lying in 115#115 plane
rotation matrix relating model reference frame to camera reference frame:

116#116

translation of object reference frame to camera reference frame:

117#117

1#1: distance between image plane and center of perspectivity
118#118: observed image data points by perspective projection:

119#119

120#120

when translation 121#121 in 122#122-direction large compared with 123#123:

124#124

61#61: 125#125 (scaling, rotation, skewing) matrix
126#126: 2D (translation) vector
affine 2D correspondence: 127#127

Affine Transformation of Points in a Plane
necessary and sufficient to define plane uniquely: 3 noncollinear points

The Hummel-Wolfson-Lamdan Matching Algorithm
to match noncollinear triplets in model interest points with scene:

• preprocessing: convert model interest points into affine-invariant model
• recognition: match model against image using affine representation

Shortcomings of the Affine-Invariant Matching Technique
affine-invariant matching technique: mathematically sound in noiseless case
shortcomings of affine-invariant matching in practice:

1. if three noncollinear points not numerically stable, points not reliable
2. coordinates of detected interest points: noisy in real image
3. partial object symmetries may cause wrong matching

An Explicit Noise Model and Optimal Voting

18.5 Model Database Organization
organize database of models: to allow rapid access to most likely candidate
group similar relational models into clusters and choose representative
arrows: indicate mapping from parts of object 2 to parts of other objects
cluster of object models whose representative is object 2
=====Fig. 18.21=====
=====joke=====

2002-02-26