Chapter 18 Object Models and Matching

18.1 Introduction

object recognition: one of most important aspects of computer vision

18.2 Two-Dimensional Object Representation

2D shape analysis useful in machine vision application:

- medical image analysis
- aerial image analysis
- manufacturing

- global features
- local features
- boundary description
- skeleton
- 2D parts

18.2.1 Global Feature Representation

2D object: can be thought of as binary image

value 1: pixels of object

value 0: pixels outside object

2D shape features: area, perimeter, moments, circularity, elongation

Shape Recognition by Moments

1#1: binary image function

2#2: 2D shape

digital 3#3th moment of 4#4:

5#5

6#6: area of 4#4: number of pixels in 4#4

moment invariants: functions of moments invariant under shape transform

prefer moment invariants: under translation, rotation, scaling, skewing

center of gravity 7#7 of 4#4:

8#8

9#9

central 3#3th moment of 4#4:

10#10

central moments: translation invariant

normalized central moments of 4#4:

11#11

seven functions that are rotation invariant:

12#12

13#13: skew invariant

Shape Recognition with Fourier Descriptors

Fourier descriptors: another way for extracting features from 2D shapes

Fourier descriptors: defined to characterize boundary

The main idea is to represent the boundary as a function of one variable
14#14, expand 14#14 in its Fourier series, and use the coefficients
of the series as Fourier descriptors (FDs).

finite number of FDs: can be used to describe the shape

15#15: arc length along 16#16

16#16: clockwise-oriented, simple closed curve represented by:

17#17

point moving along curve generates complex function:

18#18

19#19: periodic function with period 20#20

Fourier series expansion of 19#19:

21#21

FDs are coefficients 22#22 defined by:

23#23

assume 2D shape described as sequence of 24#24 points
25#25

define sequence of unit vectors:

26#26

define sequence of cumulative differences:

27#27

28#28

FDs are defined by:

29#29

30#30: two curves to be compared

22#22: sequence of FDs of 31#31

32#32: sequence of FDs of 33#33

34#34: number of harmonics used

distance measure in shape comparison:

35#35

=====Garfield 17:61=====

18.2.2 Local Feature Representation

2D object characterized by: local features, attributes, relationships

most commonly used local features: holes, corners

holes: found by connected component procedure followed by boundary tracing

holes: detected by binary mathematical morphology, if hole shapes known

hole properties: areas, shapes

corner detection: can be performed on binary or gray tone image

corner property: angle at which lines meet

18.2.3 Boundary Representation

boundary representation: most common representation for 2D objects

3 main ways to represent object boundary:

- sequence of points
- chain code
- sequence of line segments

The Boundary as a Sequence of Points

boundary points: from border-following or edge-tracking algorithms

interest points: boundary points with special property useful in matching

The Chain Code Representation

chain encoding: can be used at any level of quantization

chain encoding: saves space required for row and column coordinates

boundary encoded: first quantized by placing over square grid

grid side length: determines resolution of encoding

marked points: grid intersections closest to curve and used in encoding

36#36: marks starting point of curve

chain encoding of boundary curve

=====Fig. 18.1=====

line segments: links: to be used to approximate the curve

encoding scheme: eight possible directions assigned integer between 0, 7

chain: chain encoding: in the form

37#37

or

38#38

length of chain code with 39#39 chains: can be simply estimated as 39#39

40#40: number of odd chain codes

41#41: number of even chain codes

42#42: number of corners

20#20: unbiased estimate of perimeter length

Freeman suggested:
43#43

The Boundary as a Sequence of Line Segments

line segment sequence: after boundary segmented into near-linear portion

line segment sequence: used in shape recognition or other matching tasks

44#44: coordinate location where pair of lines meet

45#45: angle magnitude where pair of lines meet

46#46: sequence of junction points to represent line segment sequence

47#47: sequence of junction points representing model object 48#48

49#49: sequence of junction points representing test object 50#50

51#51: an association

goal: given 52#52, to find 53#53 satisfying
54#54 or
55#55

or 56#56

18.2.4 Skeleton Representation

strokes: long, sometimes thin parts forming shapes

line segments that characterize the strokes of set of characters

=====Fig. 18.2=====

symmetric axis transform: set of maximal circular disks inside object

symmetric axis: locus of centers of these maximal disks

symmetric axes of the characters

=====Fig. 18.3=====

symmetric axis: one example of skeleton description of 2D object

symmetric axis of rectangle: consists of five line segments, not single line

symmetric axis: extremely sensitive to noise

symmetric axis: difficult to use in matching

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 9.3=====

axis of smoothed local symmetries: separate definition for skeleton

local symmetry: midpoint 57#57 of line segment 58#58 joining pair of points 59#59

31#31: angle between 58#58 and outward normal 60#60 at 61#61

31#31: angle between 58#58 and inward normal 62#62 at 63#63

point 57#57 that is local symmetry with respect to boundary points 61#61 and 63#63

=====Fig. 18.4=====

axes: spines: loci of local symmetries maximal w.r.t. forming smooth curve

cover of axis: portion of shape subtended by axis

axis cover properly contained in another cover: second axis subsumes first

symmetric axes of local symmetry of a rectangle

=====Fig. 18.5=====

axes of smoothed local symmetries of several objects

=====Fig. 18.6=====

18.2.5 Two-Dimensional Part Representation

parts, attributes, interrelationships: form structural description of shape

nuclei: regions where primary convex subset overlap

nuclei: shaded areas of overlap

decomposition of shape into primary convex subsets and nuclei

=====Fig. 18.7=====

near-convexity: allows noisy, distorted instances to have same decompositions

64#64: two points on object boundary

65#65 relation: visibility relation

if line 66#66 completely interior to object boundary,
67#67

the graph-theoretic clustering to determine clusters of visibility relation

decomposition of three similar shapes into near-convex pieces

=====Fig. 18.8=====

18.3 Three-Dimensional Object Representations

18.3.1 Local Features Representation

range data: obtained from laser range finder, light striping, stereo, etc.

from depth, try to infer surfaces, edges, corners, holes, other features

3D matching more difficult than 2D because of occlusion

=====Oldie 33:116=====

18.3.2 Wire Frame Representation

wire frame model: 3D object model with only edges of object

two-color hyperboloid and its line drawing

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.1=====

Necker cube: lower-vertical face or upper vertical face closer to viewer

Schroder staircase: viewed either from above or from below

two well-known ambiguous line drawings

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.2=====

inherent ambiguity of line drawing: owing to complete loss of depth

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.3=====

general-viewpoint assumption: none of the following situations

- two vertices of scene objects represented at same picture point
- two scene edges seen as single line in picture
- vertex seen exactly in line with unrelated edge

viewpoint in perspective projection: center of projection

viewpoint in orthographic projection: direction of projection

subjective contours of Kanizsa: white occluding triangle in space

=====Nalwa,

line labels for visible projections of surface-normal discontinuities:

68#68: occluding, depth discontinuity

68#68: oriented such that occluding surface to the right of arrow

69#69: nonoccluding convex

70#70: nonoccluding concave

nonpolyhedral object: line label may change from 69#69 to 68#68

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.5=====

four basic ways in which three planar surfaces can form polyhedron vertex

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.6=====

all possible distinct appearances of trihedral vertices of polyhedra

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.7=====

complete junction catalog for line drawings of trihedral-vertex polyhedra

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.8=====

consistent labeling necessary but not sufficient for physical realizability

(a) line drawing with no consistent labeling

(b) consistent labeling of line drawing of impossible polyhedron

(c) physically realizable line drawing with unrealizable consistent labeling

=====Nalwa, *A Guided Tour of Computer Vision*, Fig. 4.9=====

18.3.3 Surface-Edge-Vertex Representation

VISIONS system: Visual Integration by Semantic Interpretation of Natural

Scenes

PREMIO system: Prediction in Matching Images to Objects

PREMIO 3D object model: hierarchical, relational model with five levels

5 levels: world, object, face/edge/vertex, surface/boundary, arc/2D, 1D piece

world level: arrangement of different objects in world

object level: arrangement of different faces, edges, vertices forming objects

face level: describes face in terms of surfaces and boundaries

surface level: specifies elemental pieces forming surfaces

2D piece level: describes pieces and specifies arcs forming boundaries

1D piece level: describes elemental pieces forming arcs

SDS: spatial data structure

71#71: attribute-value table

PREMIO 3D object model

=====Fig. 18.9=====

18.3.4 Sticks, Plates, and Blobs

sticks, plates, blobs model: rough models of 3D objects used in rough-matching

sticks: long, thin parts with only one significant dimension

stick: cannot bend very much

stick: has two logical endpoints, set of interior points, center of mass

plates: flattish, wide parts with two nearly flat surfaces

plates: have two significant dimensions

plate surfaces: cannot fold very much

plate: has set of edge points, set of surface points, center of mass

blobs: parts with three significant dimensions

blob: can be bumpy but cannot have concavities

blob: set of surface points and center of mass

sticks, plates, blobs: near-convex

examples of sticks, plates, and blobs

=====Fig. 18.10=====

attribute-value table: contains global attributes

simple-parts relation: lists the parts and their attributes

connects-supports relation: gives connections between pairs of parts

triples relation: specifies connections between three parts at a time

parallel relation: lists pairs of parts that are parallel

perpendicular relation: lists pairs of parts that are perpendicular

TYPE: 1 for stick, 2 for plate, 3 for blob

full relational structure of sticks-plates-blobs model of chair object

=====Fig. 18.11=====

18.3.5 Generalized Cylinder Representation

generalized cylinder: volumetric primitive defined by axis and cross-section

cross section: swept along axis, creating a solid

e.g. actual cylinder: generalized cylinder whose axis is straight-line segment

and whose cross section is circle of constant radius

e.g. cone: generalized cylinder whose axis is straight-line segment and

cross section is circle with radius initially zero to maximum

e.g. rectangular solid: generalized cylinder whose axis is straight line

segment and cross section is constant rectangle

e.g. torus: generalized cylinder whose axis is circle and

whose cross section is constant circle

generalized cylinder representation: uses generalized cylinders as primitives

surface-edge-vertex model: very precise

sticks-plates-and-blobs model: very rough

generalized cylinder model: somewhere in between

person: modeled roughly as cylinders for head, torso, arms, legs

dotted lines: axes of cylinders

=====Fig. 18.12=====

18.3.6 Superquadric Representation

superquadrics: lumps of clay deformable and can be glued into object models

superquadric models: mainly used with range data

range data image of (a) a doll, (b) its superquadric fit (c), (d) wire frame

=====Fig. 18.13=====

18.3.7 Octree Representation

octree encoding: geometric modeling technique used to represent 3D objects

octree encoding: used in computer vision, robotics, computer graphics

octree: hierarchical 8-ary tree structure

each node in octree: corresponds to cubic region of universe

full: if cube is completely enclosed by 3D object

empty: if cube contains no part of object

partial: if cube partly intersects object

full, empty, partial: correspond to ``black",``white",``gray" in quadtrees

node with label full or empty: has no children

partial: has eight children representing partition of cube into octants

simple 3D object (a) and its octree encoding (b)

=====Fig. 18.14=====

18.3.8 The Extended Gaussian Image

3D object: collection of surface normals, one at each point of object surface

planar surface: all points on surface map to same surface normal

convex with positive curvature everywhere: distinct surface normal everywhere

Gaussian sphere: unit sphere

set of surface normals mapped to Gaussian sphere: tail at center head outward

Gaussian image of object: resultant set of points on Gaussian sphere

concepts associated with Gaussian image of 3D object

=====Fig. 18.15=====

for planar objects: Gaussian image not invertible, not precise enough for use

72#72: small surface patch of object

73#73: corresponding surface patch on Gaussian sphere

Gaussian curvature 74#74:

75#75

76#76: point on Gaussian sphere corresponding to point 77#77 on object surface

extended Gaussian image:

78#78

planar region: Gaussian curvature 0, point mass in extended Gaussian image

=====joke=====

18.3.9 View-Class Representation

view classes: each representing set of viewpoints sharing some property

property: e.g. same object surfaces visible

property: e.g. same line segments visible

property: e.g. relational distances between relational structures are similar

characteristic views: sets producing topologically isomorphic line drawings

three view classes of cube producing topologically isomorphic line drawings

=====Fig. 18.16=====

surface-edge-vertex representation of a wedge

=====Fig. 18.17=====

aspect graph of object: graph structure where

- each node represents topologically distinct view of object
- a node for each such view of object
- each arc represents a visual event at transition
- there is an arc for each such transition

=====Fig. 18.18=====

18.4 General Frameworks for Matching

matching: finding correspondence between two entities

consistent labeling procedures: examples of matching algorithms

18.4.1 Relational-Distance Approach to Matching

relational distance: compares two structures and determines similarity

Relational-Distance Definition

79#79: relational description

80#80: sequence of relations

81#81: set of parts of entity being described

82#82: relation indicating various relationships among parts

83#83: relational description with part set 61#61

84#84: relational description with part set 63#63

assumption: 85#85, otherwise add dummy parts to smaller set

1#1: any one-one, onto mapping from 61#61 to 63#63

86#86: positive integer

composition 87#87 of relation
88#88 with function 1#1:

89#89

1#1: maps parts from set 61#61 to parts from set 63#63

structural error of 1#1 for 90#90th pair of corresponding relations in 91#91:

92#92

total error of 1#1 with respect to 91#91:

93#93

relational distance 94#94 between 91#91:

95#95

best mapping from 96#96 to 97#97: mapping 1#1 that minimizes total error

Relational-Distance Examples

best mapping from 98#98 to 99#99 is

100#100

for this mapping:

101#101

102#102

103#103

104#104

105#105

two digraphs whose relational distance is 3

=====Fig. 18.19=====

four object models,
106#106

=====Fig. 18.20=====

Relational Distance as a Metric

relational distance: used to determine similarity of unknow object to model

relational distance: used to compare object models to group models

1#1 relational isomorphism: if 1#1 one-one, onto from 61#61 to 63#63 and 107#107

108#108 relational isomorphism: 91#91 isomorphic

109#109: relational-distance measure

110#110: arbitrary relational descriptions

metric property of 109#109:

- 111#111 91#91 isomorphic
- 112#112
- 113#113

Attributed Relational Descriptions and Relational Distance

extend relational description and relational distance to include

- properties of parts
- properties of the whole
- properties of these relationships

18.4.2 Ordered Structural Matching

definition of ordering on primitives: greatly reduces complexity of search

18.4.3 Hypothesizing and Testing with Viewpoint Consistency Constraint

viewpoint consistency constraint:

The locations of all projected model features in an image must be consistent with projection from a single viewpoint.

18.4.4 View-Class Matching

if 3D object represented by view-class model, matching divided into 2 stages:

- determining view class of object
- determining precise viewpoint within that view class

Determining View Class

relational pyramid: hierarchical relational structure to represent view class

level-1 primitives: straight- and curved-line segments

level-2 relations: junctions and loops

level-3 relations: adjacency, collinearity, junction parallelness, loop-loop

Pose Determination within View Class

relational pyramid: hierarchical, relational structure to constrain matching

18.4.5 Affine-Invariant Matching

114#114: set of interest points lying in 115#115 plane

rotation matrix relating model reference frame to camera reference frame:

116#116

translation of object reference frame to camera reference frame:

117#117

1#1: distance between image plane and center of perspectivity

118#118: observed image data points by perspective projection:

119#119

120#120

when translation 121#121 in 122#122-direction large compared with 123#123:

124#124

61#61: 125#125 (scaling, rotation, skewing) matrix

126#126: 2D (translation) vector

affine 2D correspondence: 127#127

Affine Transformation of Points in a Plane

necessary and sufficient to define plane uniquely: 3 noncollinear points

The Hummel-Wolfson-Lamdan Matching Algorithm

to match noncollinear triplets in model interest points with scene:

- preprocessing: convert model interest points into affine-invariant model
- recognition: match model against image using affine representation

Shortcomings of the Affine-Invariant Matching Technique

affine-invariant matching technique: mathematically sound in noiseless case

shortcomings of affine-invariant matching in practice:

- if three noncollinear points not numerically stable, points not reliable
- coordinates of detected interest points: noisy in real image
- partial object symmetries may cause wrong matching

An Explicit Noise Model and Optimal Voting

18.5 Model Database Organization

organize database of models: to allow rapid access to most likely candidate

group similar relational models into clusters and choose representative

arrows: indicate mapping from parts of object 2 to parts of other objects

cluster of object models whose representative is object 2

=====Fig. 18.21=====

=====joke=====

2002-02-26