Chapter 4 Statistical Pattern Recognition

4.1 Introduction

units of statistical pattern recognition: image regions, projected segments

each unit: has associated measurement vector

decision rule: designed optimally to assign unit to class or category

statistical pattern recognition techniques include

- feature selection and extraction techniques
- decision rule construction techniques
- techniques for estimating decision rule error

4.2 Bayes Decision Rules: Maximum Utility Model for Pattern Discrimination

In the simple pattern discrimination or pattern identification process, a unit
is observed or measured and a category assignment is made that names or
classifies the unit as a type of object

unit category assignment: made solely on observed measurement (pattern)

distinct facts characterizing the situation

- : assigned category from set of categories
- : true category identification from set
- observed measurement from a set of measurements

: probability of the event

4.2.1 Economic Gain Matrix

making category assignments: carries consequences economically

: economic gain or utility with true category , assigned category

decision rule: so resulting expected utility is highest

identity gain matrix: right the largest possible fraction of the time

economic gain matrix for jet fan blade inspection:

test costs $10

decision positive on crack, $500 fan blade has to be discarded

really crack but negative on decision, $50,000,000 for airplane crash

=====Table 4.1=====

optimizing decision rule using this matrix: different from identity matrix

automatic defect-inspection machine: good, bad objects

: probability of true good, assigned good

: probability of true good, assigned bad

: probability of true bad, assigned good

: probability of true bad, assigned bad

=====Table 4.2=====

fraction of good objects manufactured

fraction of bad objects manufactured

positive: profit consequence

negative: loss consequence

=====Table 4.3=====

Expected profit per object

inspection machine performance specified by conditional probabilities

given object good, probability detected as good

given object good, probability detected as bad

given object bad, probability detected as good

given object bad, probability detected as bad

: false-detection rate: false-alarm rate

: misdetection rate

: characterizes manufacturing environment

: characterizes inspection machine performance

: economic consequences

Discussion

automatic defect-detection machine operating curve: false-alarm/misdetection

possible to trade false-alarm for misdetection rate, or vice-versa

trade-off between misdetection and false-alarm rate

=====Example 4.1=====

=====Table 4.4=====

=====Table 4.5=====

=====Example 4.2=====

=====Table 4.6=====

=====Oldie 33:12=====

4.2.2 Decision Rule Construction

: probability true category , assigned category , measurement

average economic gain

=====Fig. 4.1=====

when economic gain matrix is identity matrix

the expected gain is the probability of correct assignment

fair game assumption: decision rule uses only measurement data in assignment

: conditional probability assignment given true , observed

conditional probability of making assignment given measurement

fair game assumption says

: completely defines decision rule

deterministic decision rule

decision rules not deterministic: probabilistic: nondeterministic: stochastic

expected value of economic gain depending on decision rule

Bayes decision rules: maximize expected economic gain

Bayes decision rule satisfies

=====Fig. 4.2=====

=====Fig. 4.3=====

continuous measurement space rather than discrete

=====p. 108, p. 109=====

4.3 Prior Probability

: conditional probability of measurement given true

: prior probability true category is

4.4 Economic Gain Matrix and the Decision Rule

identity economic gain matrix: maximizes correct classification probability

correct assignment gains 0, incorrect loses 1: same optimal rule

4.5 Maximin Decision Rule

maximin decision rule: maximizes average gain over worst prior probability

: two categories

: three possible measurements

: conditional probability of measured , given category

: eight possible decision rules

: gain matrix with 0s off the diagonal and 1s on the diagonal

: conditional gain

=====Example 4.3=====

: yield minimum expected conditional gain of 0.5

: deterministic maximin rules

: maximize the minimum expected conditional gain

expected gain that the eight possible decision rules yield

=====Fig. 4.4=====

convex set of the expected conditional gains possible for any decision rule

=====Fig. 4.5=====

=====Example 4.4=====

: maximin rule yielding minimum expected gain of

expected gains as a function of prior probability for various decision rules

=====Fig. 4.6=====

convex set of expected conditional gains
=====Fig. 4.7=====

=====Example 4.5=====

expected gain as a function of prior probability for various decision rules

=====Fig. 4.8=====

convex set of the conditional gains possible for any decision rule

=====Fig. 4.9=====

=====Garfield 17:4=====

4.6 Decision Rule Error: Misidentification/False Identification

misidentification error for category when true assigned

false-identification error for category when assigned true

=====p. 129=====

4.7 Reserving Judgement

reserved judgement: important technique to control error rate

reserved judgement: may withhold judgement for some measurement

=====ITRI, MIRL, plate=====

4.8 Nearest Neighbor Rule

nearest neighbor rule: assigns pattern to closest vector in training set

chief difficulty: brute-force nearest neighbor algorithm computational

complexity proportional to number of patterns in training set

4.9 A Binary Decision Tree Classifier

decision tree classifier: assigns by hierarchical decision procedure

typical binary decision tree classifier

=====Fig. 4.10=====

three major problems in constructing a decision tree classifier

- choosing tree structure
- choosing features used at each nonterminal node
- choosing decision rule at each nonterminal node

- thresholding the measurement component
- Fisher's linear decision rule
- Bayes quadratic decision rule
- Bayes linear decision rule
- linear decision rule from the first principal component

4.10 Decision Rule Error Estimation

decision rule constructed: important to characterize performance by errors

training data set: must be independent of testing data set

hold-out method: one common error estimation technique

hold-out method: divide total data set in half

hold-out method: one half to construct decision rule, other half to test it

4.11 Neural Networks

neural network: set of units taking linear combination of input values

linear combination: goes through nonlinear function e.g. threshold

neural network: has a training algorithm, responses observed

neural network: reinforcement algorithms/back propagation to change weights

neural network literature extensive, course by Prof. Cheng-Yuan Liou

4.12 Summary

pattern recognition literature extensive, course by Prof. Yi-Ping Hung

=====joke=====

2001-09-19 Counter:

FastCounter by bCentral