Knowledge-Based Methods in Image Processing and Pattern
Transcrição
Knowledge-Based Methods in Image Processing and Pattern
JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Unit 12 Introduction to Classification Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 235 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Motivation Classification is an essential component of pattern recognition: based on some characteristic feature values, we may want to determine 1. whether a given object belongs to a specific class or not (checking class hypotheses) 2. to which class of a set of known classes a given object belongs (assigning to a class) Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 236 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Simple Examples Given: images of flowers from which we are able to compute the features color, length and width of blossom, and number of petals. Goal: design of classifier which guesses from these features which kind of flower is on a picture Given: images of prints from which we are able to compute the parts which differ from a reference print along with their size, contrast, and shape Goal: design of classifier which assigns a quality class to each print Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 237 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis A Few Application Scenarios Fingerprint recognition Character recognition Face recognition Quality control (object OK or NG, classification of type or severity of defects) Scene analysis (i.e. separation of objects) ... Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 238 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Basic Setup We assume that we are given n objects, each of which is represented by a p-dimensional feature vector (i = 1, . . . , n): xi ∈ X1 × · · · × Xp For simplicity, we will not distinguish between the objects and the feature vector in the following. If Xj is a finite set of labels, we speak of a categorical variable/feature. If Xj = R, a real interval, etc., we speak of a numerical variable/feature. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 239 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis The Task of Classification (Simplistic) A classification task is usually concerned with finding a mapping f : X1 × · · · × Xp −→ Y x 7−→ f (x) where Y is a finite set of labels (i.e. a categorical variable). In other words, f is a classification function that assigns classes to objects. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 240 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Classification Paradigms Discriminative approach: the simple variant outlined on the previous slide; each object is assigned to exactly one class, i.e. the output for a given feature vector x is a single class label Probabilistic approach: given an object, probabilities are computed to which the object belongs to the different classes, i.e. the output for a given feature vector x is a probability distribution over Y Fuzzy approach: given an object, degrees of membership are computed to which the object belongs to the different classes, i.e. the output for a given feature vector x is a fuzzy set over Y Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 241 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Constructing Classifiers Explicit construction: the classification function (distribution, fuzzy function, respectively) is explicitly constructed, e.g. from explicit expert knowledge Data-driven construction: the classification function (distribution, fuzzy function, respectively) is constructed from example data In the following, we will only deal with data-driven approaches. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 242 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Data-Driven Classification: Basic Setup We assume that we are given n objects, each of which is represented by a p-dimensional feature vector (i = 1, . . . , n): xi ∈ X1 × · · · × Xp and, for each xi , we know the correct classification yi ∈ Y . The data set X , therefore, consists of n tuples {(xi , yi ) | i = 1, . . . , n}. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 243 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis The Task of Data-Driven Classification Generally, the goal of data-driven classification is to construct a classification function such that the match between the known classifications yi and the actually obtained classifications f (xi ) is as good as possible. Typically, we want to minimize the following error function: n X 1 n · d(yi , f (xi )) (1) i=1 Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 244 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Parametric vs. Non-Parametric Models It is practically impossible to identify a general classification function f . Instead, a restriction to certain classes of classification functions that can be represented in a practically feasible was is necessary. Non-Parametric Models: there is no specific underlying parameter model; data points/representatives themselves are the parameters fully describing the classifier Parametric Models: the set of classification functions we are considering are modeled with parameters outside or exceeding the data space Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 245 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Underfitting vs. Overfitting Usually, one does not have sufficient prior information about the appropriate underlying model complexity Assuming a simple model may lead to robust (i.e. noisetolerant) classifiers, but may not capture all details (underfitting) Assuming a very complex model may lead to overfitting, i.e. a too complex, less robust model that takes noise and outliers too much into account Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 246 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Validating Classifiers Some data-driven classification methods directly minimize the error function (1), some follow heuristic search approaches. In any case, the error function (1) is a good measure for the performance of a classification function. Note that the error function only measures the performance of the classifier on the data set, but does not give information about whether the model under- or overfits the data Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 247 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Assessing the Generalization Performance A classification function is only good if it also performs well on previously unknown data To directly assess the generalization capability of a classifier, it is usual to select data for constructing the classification function (training data) and to evaluate the classifier also with respect to the remaining data not used for training (test data); analogously, one can use the error function (1) to assess the performance on the test data (generalization error). This strategy is often called holdout method. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 248 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Cross Validation (1/2) The holdout method has the disadvantage that its evaluation can have a high variance. The evaluation may depend heavily on which data points are selected for training and which for testing; hence, the evaluation may be significantly different depending on how the division is made. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 249 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Cross Validation (2/2) k-fold Cross Validation tries to overcome the short- comings of the holdout method The data set is divided into k equally large subsets. Then the holdout method is repeated k times, where always k − 1 subsets are used for training and one subset is used for testing Then the mean generalization error of the k trials is an improved measure for the overall performance of the model Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 250 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis White-Box vs. Black-Box Models White-box: parameters allow detailed analysis of the behavior of the classifier, possibly even qualitative information can be extracted from the parameters Black-box: internal representation of classifier system does not allow any qualitative analysis Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 251 JOHANNES KEPLER UNIVERSITÄT LINZ Netzwerk für Forschung, Lehre und Praxis Regression If the goal is to construct a numerical prediction function (i.e. Y is a numerical feature), one speaks of regression. Most concepts from above can be transferred to regression with only obvious modifications. Although our main focus is classification, some of the methods discussed in the following can be used for both classification and regression. The collective term for classification and regression is predictive modeling. Knowledge-Based Methods in Image Processing and Pattern Recognition; Ulrich Bodenhofer 252