Chapter 4 Statistical Pattern Recognition

4.1 Introduction
units of statistical pattern recognition: image regions, projected segments
each unit: has associated measurement vector
decision rule: designed optimally to assign unit to class or category
statistical pattern recognition techniques include

1. feature selection and extraction techniques
2. decision rule construction techniques
3. techniques for estimating decision rule error

4.2 Bayes Decision Rules: Maximum Utility Model for Pattern Discrimination
In the simple pattern discrimination or pattern identification process, a unit is observed or measured and a category assignment is made that names or classifies the unit as a type of object
unit category assignment: made solely on observed measurement (pattern)

distinct facts characterizing the situation

• : assigned category from set of categories
• : true category identification from set
• observed measurement from a set of measurements
: event of classifying the observed unit
: probability of the event

4.2.1 Economic Gain Matrix
making category assignments: carries consequences economically
: economic gain or utility with true category , assigned category
decision rule: so resulting expected utility is highest

identity gain matrix: right the largest possible fraction of the time

economic gain matrix for jet fan blade inspection:
test costs $10 decision positive on crack,$500 fan blade has to be discarded
really crack but negative on decision, \$50,000,000 for airplane crash
=====Table 4.1=====
optimizing decision rule using this matrix: different from identity matrix

automatic defect-inspection machine: good, bad objects
: probability of true good, assigned good
: probability of true good, assigned bad
: probability of true bad, assigned good
=====Table 4.2=====
fraction of good objects manufactured

positive: profit consequence
negative: loss consequence
=====Table 4.3=====

Expected profit per object

inspection machine performance specified by conditional probabilities
given object good, probability detected as good

given object good, probability detected as bad

given object bad, probability detected as good

: false-detection rate: false-alarm rate
: misdetection rate

: characterizes manufacturing environment
: characterizes inspection machine performance
: economic consequences

Discussion
automatic defect-detection machine operating curve: false-alarm/misdetection
possible to trade false-alarm for misdetection rate, or vice-versa

trade-off between misdetection and false-alarm rate
=====Example 4.1=====
=====Table 4.4=====
=====Table 4.5=====
=====Example 4.2=====
=====Table 4.6=====
=====Oldie 33:12=====

4.2.2 Decision Rule Construction
: probability true category , assigned category , measurement
average economic gain

=====Fig. 4.1=====

when economic gain matrix is identity matrix

the expected gain is the probability of correct assignment

fair game assumption: decision rule uses only measurement data in assignment
: conditional probability assignment given true , observed

conditional probability of making assignment given measurement

fair game assumption says

: completely defines decision rule
deterministic decision rule

decision rules not deterministic: probabilistic: nondeterministic: stochastic

expected value of economic gain depending on decision rule

Bayes decision rules: maximize expected economic gain
Bayes decision rule satisfies

=====Fig. 4.2=====
=====Fig. 4.3=====
continuous measurement space rather than discrete
=====p. 108, p. 109=====

4.3 Prior Probability

: conditional probability of measurement given true
: prior probability true category is

4.4 Economic Gain Matrix and the Decision Rule
identity economic gain matrix: maximizes correct classification probability

correct assignment gains 0, incorrect loses 1: same optimal rule

4.5 Maximin Decision Rule
maximin decision rule: maximizes average gain over worst prior probability
: two categories
: three possible measurements
: conditional probability of measured , given category
: eight possible decision rules
: gain matrix with 0s off the diagonal and 1s on the diagonal
: conditional gain

=====Example 4.3=====
: yield minimum expected conditional gain of 0.5
: deterministic maximin rules
: maximize the minimum expected conditional gain
expected gain that the eight possible decision rules yield
=====Fig. 4.4=====
convex set of the expected conditional gains possible for any decision rule
=====Fig. 4.5=====

=====Example 4.4=====
: maximin rule yielding minimum expected gain of
expected gains as a function of prior probability for various decision rules
=====Fig. 4.6=====
convex set of expected conditional gains =====Fig. 4.7=====

=====Example 4.5=====
expected gain as a function of prior probability for various decision rules
=====Fig. 4.8=====
convex set of the conditional gains possible for any decision rule
=====Fig. 4.9=====
=====Garfield 17:4=====

4.6 Decision Rule Error: Misidentification/False Identification
misidentification error for category when true assigned

false-identification error for category when assigned true

=====p. 129=====

4.7 Reserving Judgement
reserved judgement: important technique to control error rate
reserved judgement: may withhold judgement for some measurement
=====ITRI, MIRL, plate=====

4.8 Nearest Neighbor Rule
nearest neighbor rule: assigns pattern to closest vector in training set
chief difficulty: brute-force nearest neighbor algorithm computational
complexity proportional to number of patterns in training set

4.9 A Binary Decision Tree Classifier
decision tree classifier: assigns by hierarchical decision procedure
typical binary decision tree classifier
=====Fig. 4.10=====
three major problems in constructing a decision tree classifier

• choosing tree structure
• choosing features used at each nonterminal node
• choosing decision rule at each nonterminal node
five decision rules at each nonterminal node:
• thresholding the measurement component
• Fisher's linear decision rule
• Bayes linear decision rule
• linear decision rule from the first principal component

4.10 Decision Rule Error Estimation
decision rule constructed: important to characterize performance by errors
training data set: must be independent of testing data set
hold-out method: one common error estimation technique
hold-out method: divide total data set in half
hold-out method: one half to construct decision rule, other half to test it

4.11 Neural Networks
neural network: set of units taking linear combination of input values
linear combination: goes through nonlinear function e.g. threshold
neural network: has a training algorithm, responses observed
neural network: reinforcement algorithms/back propagation to change weights
neural network literature extensive, course by Prof. Cheng-Yuan Liou

4.12 Summary
pattern recognition literature extensive, course by Prof. Yi-Ping Hung
=====joke=====

2001-09-19
Counter:
FastCounter by bCentral