LINEAR REGRESSION BASED ON CORRENTROPY

Transcrição

LINEAR REGRESSION BASED ON CORRENTROPY
XII Simpósio Brasileiro de Automação Inteligente (SBAI)
Natal – RN, 25 a 28 de outubro de 2015
LINEAR REGRESSION BASED ON CORRENTROPY FOR SENSOR CALIBRATION
Joilson B.A Rêgo∗, Aluisio I.R Fontes†, Adrião D.D Neto‡, Luiz F.Q. SIlveira§, Allan de
M. Martins¶
∗
Instituto de Computação. Universidade de Alagoas, 57072900, Maceió, AL, Brasil
†
Departamento de Engenharia de Computação e Automação, Universidade Federal do Rio Grande do
Norte, 59078-970, Natal, RN, Brasil.
‡
Departamento de Engenharia de Computação e Automação, Universidade Federal do Rio Grande do
Norte, 59078-970, Natal, RN, Brasil.
§
Departamento de Engenharia de Computação e Automação, Universidade Federal do Rio Grande do
Norte, 59078-970, Natal, RN, Brasil.
¶
3. Departamento de Engenharia Elétrica, Universidade Federal do Rio Grande do Norte, 59078-970,
Natal, RN, Brasil.
Emails: [email protected], [email protected], [email protected],
[email protected], [email protected]
Abstract— Correntropy is a measure of statistical similarity between two random variables. This paper
proposes an extension of correntropy for N random variables, and sets a new linear regression method from this
extension. Experiments carried out in the presence of impulse noise demonstrated the efficiency of the proposed
method, taking as reference, the sensor calibration problem.
Keywords—
Correntropy, Linear Regression, Sensor Calibration.
Resumo— A correntropia é uma medida de similaridade estatı́stica entre duas variáveis aleatórias. Este
trabalho propõe uma extensão da correntropia para N variáveis aleatórias, e define um novo método de regressão
linear a partir desta extensão. Experimentos realizados na presença de ruı́do impulsivo demonstraram a eficiência
da ferramenta proposta, tomando-se como referência, o problema de calibragem de sensores.
Palavras-chave—
1
Correntropia, Regressão Linear, Calibragem de Sensores.
Introduction
ficient mechanism to eliminate outliers. However,
both the MSE and correntropy are similarity measure defined for only two random variables. Furthermore, in the context of correntropy, the kernel
size should be adjusted according to the application (Liu et al., 2006).
This paper proposes a generalization of correntropia for N random variables, even with different dimensions, using statistical concepts. This
new measure was evaluated in a sensor calibration
application. The results show good efficiency and
lower dependence between the application and the
parameter kernel size.
The remainder of the paper is organized as follows. Section 2 presents concepts related to correntropy as well as theoretical tools used in the
development of the proposed method. Section 3
presents the considered problem, and results obtained from the experimental analysis of the linear
regression method based on correntropy. Finally,
Section 4 presents some final remarks on the proposed approach.
The concept of similarity is fundamental in statistical measures when the objective is to compare two random variables. Statistical secondorder in the form of correlation and mean square
error (MSE) are probably the most widely methods used to quantify how similar are two random
variables (Singh and Principe, 2010). The troubleshooting using second order statistics depend
on the presence of gaussianity and linearity in the
data, so that the existence of impulse noise (outliers) in the measurements makes the estimation
error increases significantly.
In recent years, data fitting approaches using
mean square error has been improved with the
inclusion of criteria based on information theory,
setting a new field called Information Theoretic
Learning (ITL). In (Santamaria et al., 2006) extended the concept of correlation to a new similarity measure named correntropy. The correntropy is a robust similarity measure obtained for
two random variables, which uses a symmetric and
positive definite kernel function.
Correntropy is directly related to the probability of two random variables are similar in a sample space controlled by a parameter called kernel
size. The adjustment of the kernel size gives an ef-
2
Correntropy
Correntropy is a generalization of the correlation measure between random signals defined as
(Santamaria et al., 2006):
1265
XII Simpósio Brasileiro de Automação Inteligente (SBAI)
V (X, Y )
= EXY [Gσ (X − Y )]
(1)
ZZ
=
Gσ (X, Y )pX, Y (x, y)dxdy,
where G(., .) corresponds to any symmetric function defined as positive. In this work, G(., .) is a
Gaussian kernel given as
1
(X, Y )2
.
Gσ (X, Y ) = √
exp −
2σ 2
2πσ
(2)
In practice, the joint PDF is unknown of
the Equation 1 and only a finite number of data
{(xi , yi )}ni=1 , are available. Estimating the PDF
with Parzen estimators for the samples, we obtain
the estimator (Liu et al., 2006):
p(z) =
N
1 X
Gσ (z − zi )
N i=1
Figure 1: Correntropy as the integral in the joint
space along X=Y line.
et al., 2006). There is still a problem to be solved
when using Correntropy regarding the roughness
of cost function. In order to avoid outliers, one
must pick, in general, a small kernel size that will
model the error probability of the parameter well.
Due to the finite the limited number of points
available in practical problems, the Parzen’s estimate for the probability is noisy. That problem
can cause the optimization of the parameters very
difficult.
To overcome this difficulty, we will propose a
generalization for Correntropy in the light of the
interpretation of section 2. As it was showed, Correntropy is a measure of a P arzen0 s estimate for
the probability of both sides of 1 being equal. In
this problem, one would find the constant that
maximizes that probability. When we have just a
limited number of data points (pairs of values for
T and V ), that estimate can become noisy and
imprecise. Moreover, one must pick a suitable kernel size.
One of the possible generalizations is to consider several random variables being equal one another (not just two). In this case, one must extend
the idea of integration along the 45 degree line and
integrate along the line x1 = x2 = x3 = ... = xn
Using this approach, we need to reformulate the
problem in the following manner. We have variables and measurements for each. We want to
maximize the following probability
(3)
where z e zi are vectors formed by random variables X e Y , this is z = [X, Y ]t . Substituting 3 in
1 and resolving the integral, the correntropia can
be defined by:
n
V̂ (X, Y ) =
1X √
G
(xi − yi ).
n i=1 σ 2
(4)
where σ is the variance defined as the kernel size.
Figure 1 d shows diagrammatically that correntropy provides the probability density of the
event p(x = y). This understanding is interesting,
because indeed we are able to quantify the probability of two events being equal (Liu et al., 2006).
Nowadays, correntropy has been successfully
used in a wide variety applications where the signals are nongaussian or nonlinear e.g. automatic
modulation classification (Fontes, Martins, Silveira and Principe, 2014), classification systems of
pathological voices (Fontes, Souza, Neto, Martins
and Silveira, 2014), pricipal component analysis
(He et al., 2011), and adaptive filter (Singh and
Principe, 2009)
3
Correntropy for Regression
In the proposed problem, the goal is to obtain
a conversion constant that relates two values linearly as
T = aV
i
av1i = av2i , av3i = av4i , ... = avL
= Ti
(6)
where a is the constant we wish to find, vji is the
i0 th measurement of the j 0 th sensor and T i is the
i0 th known value.It turns out that we can improve
the numerical stability of the expression for the
probability assuming that we have more known
values of T i .
Yet another numerical improvement that can
be made in this problem is to avoid multiplications
(5)
This problem is, at first view, very trivial and
actually consists in a one variable regression problem. However, as we will see in the results section, in the presence of impulse noise or when
the measure has outliers, Correntropy can be used
to naturally weight off those outliers (Santamaria
1266
XII Simpósio Brasileiro de Automação Inteligente (SBAI)
and use the logarithm of the quantities (hence fining he logarithm of the constant T i ). Now, the
expression in 6 can be rewritten as
a0 +λi1 = ti = a0 +λi2 = ti = ... = a0 +λiL = ti (7)
where a0 = log(a), λi1 = log(v1i ), and ti = log(T i ).
Using a parzen’s estimate for the probability
and assuming the use of the logarithms, we can
compute that probability Equation 7 as
p=
where
1
N
N
P
i=1
2 2
C = 2 ti +
L
1
1
√
2L (2πσ 2 )L−1 kσ/ 2L
L
X
j=1

q C
4
2
1
a0 + λij −ti + a0 +
L
(8)
L
X
j=1
2
Figure 2: Ideal regression for calibration of the
sensors
λij 
(9)
Now, maximizing Equation 7 for a will find
the best value to use in the relation showed in 5.
4
Correntropy in Sensors Calibration
In order to show the performance of the new expression for computing the constant a, we will use
as an example, a problem of calibrating a temperature sensor. The idea is to find the best conversion constant (in o c per volts) that translates
a measured voltage V into a temperature T . The
relation is the same as presented in Equation 5. In
this example we consider L sensors and perform
N measurements each. We also have N known
temperatures so, for each of the N measurements
at each L sensor, we compare against the corresponding temperature. For the purpose of this
paper, we will use only 5 measurements and for
only 5 sensors. Moreover, it is possible to notice two outliers that have a huge impact no Mean
Square Error (MSE) estimation, as we will see in
the results. Normally this is a difficult problem
due to the lack of more data, but, as we can see,
using Correntropy and further, is generalization,
we can estimate the constant a with a good precision. In Figure 2 shows a plot of one example of
data points used in this paper.
Equation 8 has been evaluated according to
Monte Carlo’s method. For each experiment a
minimum number of 200 trials were used by computational simulation carried out in MATLAB.
The constant a was adjusted to 30 and kernel size
to 0.5. The proposed method has been evaluated
and compared with MSE and classical correntropy
defined by Equation 4.
Figure 3 shows that the proposed method has
lower sensitivity to variations of the kernel size, in
contrast with the conventional correntropy, which
is highly dependent of this parameter.
Figure 3: Comparison of estimating the probability with kernel size to 0.5
Figure 4 highlights the robustness of the correntropy to the presence of outliers and the comparison between this estimator and the MSE.
The MSE is highly skewed by the outliers given
their influence within the squared term. Proposed
method minimized the effect of outlier and given
a better approach than classical correntropy.
Table 1 shows error simulation data. We observe that MSE obtained the greatest mean error
and classical correntropy the highest standard deviation. The proposed method obtained the lowest mean and standard deviation of error, showing
good efficiency in function approximation.
Table 1: Error simulation data
Method
MSE
Correntropy
This work
Mean
15.05
8.36
1.68
Standard deviation
4.42
33.79
1.42
We can observe a significant difference between the performances of the algorithms. It is
1267
XII Simpósio Brasileiro de Automação Inteligente (SBAI)
5
Conclusion
This paper develops a new algorithm that extends
the application of Correntropy to linear estimation. The new extensions compute Correntropy
in any dimension rather than only comparing two
scalar variables as in the original definition. Due
to this fact each extension can lead to a different interpretation in terms of probability. The
estimators for the probabilities are all based on
the Parzen estimation of the joint pdf of the data,
therefore the kernel size of the estimator has to be
chosen. The experiments showed that for the linear estimation application the choice of the kernel
size does not interfere too much in the final results,
leading to a very good and practical algorithm to
be used in real situations.
Figure 4: Illustrative curve of sensor calibration
Acknowledgment
The authors acknowledge CNPQ (process Number
486135/2013-6) for the financial support.
References
Fontes, A. I., Martins, A. d. M., Silveira, L. F.
and Principe, J. (2014). Performance evaluation of the correntropy coefficient in automatic modulation classification, Expert Systems with Applications .
Figure 5: Comparison of estimating the probability with kernel size to 10
Fontes, A. I., Souza, P. T., Neto, A. D., Martins,
A. d. M. and Silveira, L. F. (2014). Classification system of pathological voices using
correntropy, Mathematical Problems in Engineering 2014.
He, R., Hu, B.-G., Zheng, W.-S. and Kong, X.-W.
(2011). Robust principal component analysis based on maximum correntropy criterion, Image Processing, IEEE Transactions
on 20(6): 1485–1494.
Liu, W., Pokharel, P. P. and Principe, J. C.
(2006). Correntropy: A localized similarity
measure, pp. 4919–4924.
Santamaria, I., Pokharel, P. and Principe, J.
(2006). Generalized correlation function: definition, properties, and application to blind
equalization, IEEE Transactions on Signal
Processing 54.
Figure 6: Comparison of estimating the probability with kernel size to 0.1
clear that the proposed method is both smooth as
well as robust to the presence of noise. In addition, the method is insensitive to the kernel size,
like MSE, and is able to filter Gaussian noise.
Analyzing figures 3, 4 and 5, we conclude that
the proposed method is a Minimum Variance Estimator, which encompasses two main advantages.
The smoothness, like the MSE, and the robustness
to outliers, similarly to the conventional Correntropy.
Singh, A. and Principe, J. C. (2009). Using correntropy as a cost function in linear adaptive
filters, pp. 2950–2955.
Singh, A. and Principe, J. C. (2010). A closed
form recursive solution for maximum correntropy training, pp. 2070–2073.
1268