Knowledge-Based Methods in Image Processing and Pattern

Transcrição

Knowledge-Based Methods in Image Processing and Pattern
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Unit 12
Introduction to Classification
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
235
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Motivation
Classification is an essential component of pattern recognition: based on some characteristic feature values, we
may want to determine
1. whether a given object belongs to a specific class or
not (checking class hypotheses)
2. to which class of a set of known classes a given object
belongs (assigning to a class)
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
236
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Simple Examples
Given: images of flowers from which we are able to compute the features color, length and width of blossom, and
number of petals.
Goal: design of classifier which guesses from these features which kind of flower is on a picture
Given: images of prints from which we are able to compute
the parts which differ from a reference print along with their
size, contrast, and shape
Goal: design of classifier which assigns a quality class to
each print
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
237
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
A Few Application Scenarios
Fingerprint recognition
Character recognition
Face recognition
Quality control (object OK or NG, classification of type
or severity of defects)
Scene analysis (i.e. separation of objects)
...
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
238
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Basic Setup
We assume that we are given n objects, each of which
is represented by a p-dimensional feature vector (i =
1, . . . , n):
xi ∈ X1 × · · · × Xp
For simplicity, we will not distinguish between the objects
and the feature vector in the following.
If Xj is a finite set of labels, we speak of a categorical
variable/feature. If Xj = R, a real interval, etc., we speak
of a numerical variable/feature.
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
239
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
The Task of Classification (Simplistic)
A classification task is usually concerned with finding a
mapping
f :
X1 × · · · × Xp
−→
Y
x
7−→
f (x)
where Y is a finite set of labels (i.e. a categorical variable).
In other words, f is a classification function that assigns
classes to objects.
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
240
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Classification Paradigms
Discriminative approach: the simple variant outlined on the previous
slide; each object is assigned to exactly one class, i.e. the output
for a given feature vector x is a single class label
Probabilistic approach: given an object, probabilities are computed
to which the object belongs to the different classes, i.e. the output
for a given feature vector x is a probability distribution over Y
Fuzzy approach: given an object, degrees of membership are computed to which the object belongs to the different classes, i.e. the
output for a given feature vector x is a fuzzy set over Y
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
241
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Constructing Classifiers
Explicit construction: the classification function (distribution, fuzzy function, respectively) is explicitly constructed, e.g. from explicit expert knowledge
Data-driven construction: the classification function
(distribution, fuzzy function, respectively) is constructed from example data
In the following, we will only deal with data-driven approaches.
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
242
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Data-Driven Classification: Basic Setup
We assume that we are given n objects, each of which
is represented by a p-dimensional feature vector (i =
1, . . . , n):
xi ∈ X1 × · · · × Xp
and, for each xi , we know the correct classification yi ∈ Y .
The data set X , therefore, consists of n tuples
{(xi , yi ) | i = 1, . . . , n}.
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
243
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
The Task of Data-Driven Classification
Generally, the goal of data-driven classification is to construct a classification function such that the match between the known classifications yi and the actually obtained classifications f (xi ) is as good as possible. Typically, we want to minimize the following error function:
n
X
1
n
·
d(yi , f (xi ))
(1)
i=1
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
244
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Parametric vs. Non-Parametric Models
It is practically impossible to identify a general classification
function f . Instead, a restriction to certain classes of classification functions that can be represented in a practically feasible
was is necessary.
Non-Parametric Models: there is no specific underlying parameter model; data points/representatives themselves are
the parameters fully describing the classifier
Parametric Models: the set of classification functions we are
considering are modeled with parameters outside or exceeding the data space
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
245
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Underfitting vs. Overfitting
Usually, one does not have sufficient prior information
about the appropriate underlying model complexity
Assuming a simple model may lead to robust (i.e. noisetolerant) classifiers, but may not capture all details (underfitting)
Assuming a very complex model may lead to overfitting,
i.e. a too complex, less robust model that takes noise and
outliers too much into account
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
246
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Validating Classifiers
Some data-driven classification methods directly minimize the error function (1), some follow heuristic
search approaches. In any case, the error function
(1) is a good measure for the performance of a classification function.
Note that the error function only measures the performance of the classifier on the data set, but does not
give information about whether the model under- or
overfits the data
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
247
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Assessing the Generalization Performance
A classification function is only good if it also performs well
on previously unknown data
To directly assess the generalization capability of a classifier, it is usual to select data for constructing the classification function (training data) and to evaluate the classifier
also with respect to the remaining data not used for training
(test data); analogously, one can use the error function (1)
to assess the performance on the test data (generalization
error). This strategy is often called holdout method.
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
248
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Cross Validation (1/2)
The holdout method has the disadvantage that its evaluation can have a high variance. The evaluation may depend heavily on which data points are selected for training
and which for testing; hence, the evaluation may be significantly different depending on how the division is made.
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
249
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Cross Validation (2/2)
k-fold Cross Validation tries to overcome the short-
comings of the holdout method
The data set is divided into k equally large subsets.
Then the holdout method is repeated k times, where
always k − 1 subsets are used for training and one
subset is used for testing
Then the mean generalization error of the k trials is an
improved measure for the overall performance of the
model
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
250
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
White-Box vs. Black-Box Models
White-box: parameters allow detailed analysis of the behavior of the classifier, possibly even qualitative information can be extracted from the parameters
Black-box: internal representation of classifier system
does not allow any qualitative analysis
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
251
JOHANNES KEPLER
UNIVERSITÄT LINZ
Netzwerk für Forschung, Lehre und Praxis
Regression
If the goal is to construct a numerical prediction function
(i.e. Y is a numerical feature), one speaks of regression.
Most concepts from above can be transferred to regression with only obvious modifications. Although our main
focus is classification, some of the methods discussed in
the following can be used for both classification and regression.
The collective term for classification and regression is predictive modeling.
Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer
252