LINEAR REGRESSION BASED ON CORRENTROPY
Transcrição
LINEAR REGRESSION BASED ON CORRENTROPY
XII Simpósio Brasileiro de Automação Inteligente (SBAI) Natal – RN, 25 a 28 de outubro de 2015 LINEAR REGRESSION BASED ON CORRENTROPY FOR SENSOR CALIBRATION Joilson B.A Rêgo∗, Aluisio I.R Fontes†, Adrião D.D Neto‡, Luiz F.Q. SIlveira§, Allan de M. Martins¶ ∗ Instituto de Computação. Universidade de Alagoas, 57072900, Maceió, AL, Brasil † Departamento de Engenharia de Computação e Automação, Universidade Federal do Rio Grande do Norte, 59078-970, Natal, RN, Brasil. ‡ Departamento de Engenharia de Computação e Automação, Universidade Federal do Rio Grande do Norte, 59078-970, Natal, RN, Brasil. § Departamento de Engenharia de Computação e Automação, Universidade Federal do Rio Grande do Norte, 59078-970, Natal, RN, Brasil. ¶ 3. Departamento de Engenharia Elétrica, Universidade Federal do Rio Grande do Norte, 59078-970, Natal, RN, Brasil. Emails: [email protected], [email protected], [email protected], [email protected], [email protected] Abstract— Correntropy is a measure of statistical similarity between two random variables. This paper proposes an extension of correntropy for N random variables, and sets a new linear regression method from this extension. Experiments carried out in the presence of impulse noise demonstrated the efficiency of the proposed method, taking as reference, the sensor calibration problem. Keywords— Correntropy, Linear Regression, Sensor Calibration. Resumo— A correntropia é uma medida de similaridade estatı́stica entre duas variáveis aleatórias. Este trabalho propõe uma extensão da correntropia para N variáveis aleatórias, e define um novo método de regressão linear a partir desta extensão. Experimentos realizados na presença de ruı́do impulsivo demonstraram a eficiência da ferramenta proposta, tomando-se como referência, o problema de calibragem de sensores. Palavras-chave— 1 Correntropia, Regressão Linear, Calibragem de Sensores. Introduction ficient mechanism to eliminate outliers. However, both the MSE and correntropy are similarity measure defined for only two random variables. Furthermore, in the context of correntropy, the kernel size should be adjusted according to the application (Liu et al., 2006). This paper proposes a generalization of correntropia for N random variables, even with different dimensions, using statistical concepts. This new measure was evaluated in a sensor calibration application. The results show good efficiency and lower dependence between the application and the parameter kernel size. The remainder of the paper is organized as follows. Section 2 presents concepts related to correntropy as well as theoretical tools used in the development of the proposed method. Section 3 presents the considered problem, and results obtained from the experimental analysis of the linear regression method based on correntropy. Finally, Section 4 presents some final remarks on the proposed approach. The concept of similarity is fundamental in statistical measures when the objective is to compare two random variables. Statistical secondorder in the form of correlation and mean square error (MSE) are probably the most widely methods used to quantify how similar are two random variables (Singh and Principe, 2010). The troubleshooting using second order statistics depend on the presence of gaussianity and linearity in the data, so that the existence of impulse noise (outliers) in the measurements makes the estimation error increases significantly. In recent years, data fitting approaches using mean square error has been improved with the inclusion of criteria based on information theory, setting a new field called Information Theoretic Learning (ITL). In (Santamaria et al., 2006) extended the concept of correlation to a new similarity measure named correntropy. The correntropy is a robust similarity measure obtained for two random variables, which uses a symmetric and positive definite kernel function. Correntropy is directly related to the probability of two random variables are similar in a sample space controlled by a parameter called kernel size. The adjustment of the kernel size gives an ef- 2 Correntropy Correntropy is a generalization of the correlation measure between random signals defined as (Santamaria et al., 2006): 1265 XII Simpósio Brasileiro de Automação Inteligente (SBAI) V (X, Y ) = EXY [Gσ (X − Y )] (1) ZZ = Gσ (X, Y )pX, Y (x, y)dxdy, where G(., .) corresponds to any symmetric function defined as positive. In this work, G(., .) is a Gaussian kernel given as 1 (X, Y )2 . Gσ (X, Y ) = √ exp − 2σ 2 2πσ (2) In practice, the joint PDF is unknown of the Equation 1 and only a finite number of data {(xi , yi )}ni=1 , are available. Estimating the PDF with Parzen estimators for the samples, we obtain the estimator (Liu et al., 2006): p(z) = N 1 X Gσ (z − zi ) N i=1 Figure 1: Correntropy as the integral in the joint space along X=Y line. et al., 2006). There is still a problem to be solved when using Correntropy regarding the roughness of cost function. In order to avoid outliers, one must pick, in general, a small kernel size that will model the error probability of the parameter well. Due to the finite the limited number of points available in practical problems, the Parzen’s estimate for the probability is noisy. That problem can cause the optimization of the parameters very difficult. To overcome this difficulty, we will propose a generalization for Correntropy in the light of the interpretation of section 2. As it was showed, Correntropy is a measure of a P arzen0 s estimate for the probability of both sides of 1 being equal. In this problem, one would find the constant that maximizes that probability. When we have just a limited number of data points (pairs of values for T and V ), that estimate can become noisy and imprecise. Moreover, one must pick a suitable kernel size. One of the possible generalizations is to consider several random variables being equal one another (not just two). In this case, one must extend the idea of integration along the 45 degree line and integrate along the line x1 = x2 = x3 = ... = xn Using this approach, we need to reformulate the problem in the following manner. We have variables and measurements for each. We want to maximize the following probability (3) where z e zi are vectors formed by random variables X e Y , this is z = [X, Y ]t . Substituting 3 in 1 and resolving the integral, the correntropia can be defined by: n V̂ (X, Y ) = 1X √ G (xi − yi ). n i=1 σ 2 (4) where σ is the variance defined as the kernel size. Figure 1 d shows diagrammatically that correntropy provides the probability density of the event p(x = y). This understanding is interesting, because indeed we are able to quantify the probability of two events being equal (Liu et al., 2006). Nowadays, correntropy has been successfully used in a wide variety applications where the signals are nongaussian or nonlinear e.g. automatic modulation classification (Fontes, Martins, Silveira and Principe, 2014), classification systems of pathological voices (Fontes, Souza, Neto, Martins and Silveira, 2014), pricipal component analysis (He et al., 2011), and adaptive filter (Singh and Principe, 2009) 3 Correntropy for Regression In the proposed problem, the goal is to obtain a conversion constant that relates two values linearly as T = aV i av1i = av2i , av3i = av4i , ... = avL = Ti (6) where a is the constant we wish to find, vji is the i0 th measurement of the j 0 th sensor and T i is the i0 th known value.It turns out that we can improve the numerical stability of the expression for the probability assuming that we have more known values of T i . Yet another numerical improvement that can be made in this problem is to avoid multiplications (5) This problem is, at first view, very trivial and actually consists in a one variable regression problem. However, as we will see in the results section, in the presence of impulse noise or when the measure has outliers, Correntropy can be used to naturally weight off those outliers (Santamaria 1266 XII Simpósio Brasileiro de Automação Inteligente (SBAI) and use the logarithm of the quantities (hence fining he logarithm of the constant T i ). Now, the expression in 6 can be rewritten as a0 +λi1 = ti = a0 +λi2 = ti = ... = a0 +λiL = ti (7) where a0 = log(a), λi1 = log(v1i ), and ti = log(T i ). Using a parzen’s estimate for the probability and assuming the use of the logarithms, we can compute that probability Equation 7 as p= where 1 N N P i=1 2 2 C = 2 ti + L 1 1 √ 2L (2πσ 2 )L−1 kσ/ 2L L X j=1 q C 4 2 1 a0 + λij −ti + a0 + L (8) L X j=1 2 Figure 2: Ideal regression for calibration of the sensors λij (9) Now, maximizing Equation 7 for a will find the best value to use in the relation showed in 5. 4 Correntropy in Sensors Calibration In order to show the performance of the new expression for computing the constant a, we will use as an example, a problem of calibrating a temperature sensor. The idea is to find the best conversion constant (in o c per volts) that translates a measured voltage V into a temperature T . The relation is the same as presented in Equation 5. In this example we consider L sensors and perform N measurements each. We also have N known temperatures so, for each of the N measurements at each L sensor, we compare against the corresponding temperature. For the purpose of this paper, we will use only 5 measurements and for only 5 sensors. Moreover, it is possible to notice two outliers that have a huge impact no Mean Square Error (MSE) estimation, as we will see in the results. Normally this is a difficult problem due to the lack of more data, but, as we can see, using Correntropy and further, is generalization, we can estimate the constant a with a good precision. In Figure 2 shows a plot of one example of data points used in this paper. Equation 8 has been evaluated according to Monte Carlo’s method. For each experiment a minimum number of 200 trials were used by computational simulation carried out in MATLAB. The constant a was adjusted to 30 and kernel size to 0.5. The proposed method has been evaluated and compared with MSE and classical correntropy defined by Equation 4. Figure 3 shows that the proposed method has lower sensitivity to variations of the kernel size, in contrast with the conventional correntropy, which is highly dependent of this parameter. Figure 3: Comparison of estimating the probability with kernel size to 0.5 Figure 4 highlights the robustness of the correntropy to the presence of outliers and the comparison between this estimator and the MSE. The MSE is highly skewed by the outliers given their influence within the squared term. Proposed method minimized the effect of outlier and given a better approach than classical correntropy. Table 1 shows error simulation data. We observe that MSE obtained the greatest mean error and classical correntropy the highest standard deviation. The proposed method obtained the lowest mean and standard deviation of error, showing good efficiency in function approximation. Table 1: Error simulation data Method MSE Correntropy This work Mean 15.05 8.36 1.68 Standard deviation 4.42 33.79 1.42 We can observe a significant difference between the performances of the algorithms. It is 1267 XII Simpósio Brasileiro de Automação Inteligente (SBAI) 5 Conclusion This paper develops a new algorithm that extends the application of Correntropy to linear estimation. The new extensions compute Correntropy in any dimension rather than only comparing two scalar variables as in the original definition. Due to this fact each extension can lead to a different interpretation in terms of probability. The estimators for the probabilities are all based on the Parzen estimation of the joint pdf of the data, therefore the kernel size of the estimator has to be chosen. The experiments showed that for the linear estimation application the choice of the kernel size does not interfere too much in the final results, leading to a very good and practical algorithm to be used in real situations. Figure 4: Illustrative curve of sensor calibration Acknowledgment The authors acknowledge CNPQ (process Number 486135/2013-6) for the financial support. References Fontes, A. I., Martins, A. d. M., Silveira, L. F. and Principe, J. (2014). Performance evaluation of the correntropy coefficient in automatic modulation classification, Expert Systems with Applications . Figure 5: Comparison of estimating the probability with kernel size to 10 Fontes, A. I., Souza, P. T., Neto, A. D., Martins, A. d. M. and Silveira, L. F. (2014). Classification system of pathological voices using correntropy, Mathematical Problems in Engineering 2014. He, R., Hu, B.-G., Zheng, W.-S. and Kong, X.-W. (2011). Robust principal component analysis based on maximum correntropy criterion, Image Processing, IEEE Transactions on 20(6): 1485–1494. Liu, W., Pokharel, P. P. and Principe, J. C. (2006). Correntropy: A localized similarity measure, pp. 4919–4924. Santamaria, I., Pokharel, P. and Principe, J. (2006). Generalized correlation function: definition, properties, and application to blind equalization, IEEE Transactions on Signal Processing 54. Figure 6: Comparison of estimating the probability with kernel size to 0.1 clear that the proposed method is both smooth as well as robust to the presence of noise. In addition, the method is insensitive to the kernel size, like MSE, and is able to filter Gaussian noise. Analyzing figures 3, 4 and 5, we conclude that the proposed method is a Minimum Variance Estimator, which encompasses two main advantages. The smoothness, like the MSE, and the robustness to outliers, similarly to the conventional Correntropy. Singh, A. and Principe, J. C. (2009). Using correntropy as a cost function in linear adaptive filters, pp. 2950–2955. Singh, A. and Principe, J. C. (2010). A closed form recursive solution for maximum correntropy training, pp. 2070–2073. 1268