Novelty Detection - Recognition and Evaluation of

Transcrição

Novelty Detection - Recognition and Evaluation of
Environ Monit Assess
DOI 10.1007/s10661-006-9538-5
Novelty Detection – Recognition and Evaluation
of Exceptional Water Reflectance Spectra
Helmut Schiller · Wolfgang Schönfeld · Hansjörg L. Krasemann · Kathrin Schiller
Received: 16 May 2006 / Accepted: 27 September 2006
© Springer Science + Business Media B.V. 2007
Abstract The aim of environmental surveillance
is to monitor known phenomena as well as to
detect exceptional situations. Synoptic monitoring of large areas in coastal waters can be performed by remote sensing using multispectral
sensors onboard satellites. Many methods are in
use which enable the detection and quantification
of ‘standard algae’ or specific algae blooms using their known spectral response. The present
study focusses on the detection of spectra outside
the known range and which are referred to as
exceptional spectra. In a first step observations
from a one-year period were used to establish the
parameterisation of what is defined as ‘normal.’
In a second step observations from a different
period were used to test the novelty detection
application, i.e. to look for features not occurring
in the first period.
Podi tuda – ne zna kuda,
prinesi to – ne zna qto.
Go I know not where,
bring back I know not what.
– Russian Fairy Tale
H. Schiller (B) · W. Schönfeld ·
H. L. Krasemann · K. Schiller
GKSS Forschungszentrum, PF 1160,
21494 Geesthacht, Germany
e-mail: [email protected]
Keywords Novelty search · Monitoring ·
Water quality · Reflectance · Remote sensing
1 Introduction
This order given to the hero of the quoted fairy
tale could be used to summarise the request received to detect exceptional water reflectance
spectra. ‘Exceptional’ here applies to unusual
events not lying within the bounds of operationally processed multi-spectral satellite data. Of
course, neither the time nor the place of such
events or their spectral characteristic are known
beforehand. What we are looking for are deviations from what is occurring ‘normally.’
Out of range spectra can result from e.g. exceptional plankton blooms which may form so-called
Red Tides (due to the red discolouration of the
water) or white water caused by extreme coccolithophore blooms. It is of great interest to detect
and map the temporal and spatial development
of these events, since it might turn out that such
exceptional events or blooms can not be handled
by standard algorithms for known substances.
In short the above task is one of novelty detection: i.e. training a learning system to identify
new or unknown data or data which is different
from that presented during the training. Novelty
detection is used in signal processing and when
Environ Monit Assess
applied to multidimensional data, statistical approaches as well as Neural Network based approaches have been used. A recent review of the
techniques available is presented by (Markou and
Singh 2003a, 2003b), which also illustrates that
there is a great variety of areas where novelty
detection techniques can be applied. Examples of
the use of such novelty detection techniques in the
analysis of remotely sensed data are (Auguststeijn
and Folkert 2002) and (D’Alimonte et al. 2003).
In (Auguststeijn and Folkert 2002) two Neural
Net (NN) classification schemes are compared
with respect to their ability to identify novel
patterns in Thematic Mapper (TM) imagery acquired in spring 1993 of United States Air Force
Academy grounds, north of Colorado Springs.
Although the temporal and spatial range is rather
restricted the variability on the ground was found
to be extremely high. In conclusion, it could be
shown that the Probabilistic Neural Network
(PNN) (Washburne et al. 1993) is significantly
more successful in resolving new ground covers
types compared to the back-propagation NN.
(D’Alimonte et al. 2003) used novelty detection
to identify the range of applicability of empirical ocean colour algorithms. Two empirical ocean
colour algorithms were constructed on the basis
of their respective training data sets. The accuracy
of the output of a given algorithm was found
to depend on the representativeness of the input
data set in the respective training. To estimate
the representativeness of a given data set it was
shown that a multivariate Gaussian function could
accurately model the distribution of the logarithm
of the three remote sensing reflectances (wavelengths λ = 490, 555 and 665 nm) used in the
algorithm.
In contrast to the aforementioned example the
present study aimed to develop a novelty detection scheme for water reflectance data which
could be used operationally. The basic idea behind
this approach is to use a large number of reflectance spectra from one complete year for the
North Sea so as to include the annual cycle of
plankton species (including their blooms et cetera) to teach a learning system ‘what is normal.’
Only (distinctive) deviations from these known
spectra should be classified as novel. Thus, the algorithm should accept all reflectance spectra from
the training period as being normal but should still
be sensitive enough to detect novel situations. To
check if this was the case, a test application data
set was used which also included the spectra of a
Red Tide which did not occur in the training set.
The selection of the appropriate algorithm is
the central issue of this paper. It is intimately
connected with the type and structure of the data
to be tackled. The water reflectances are measured at nine different wavelengths, all of which
will be used in the present algorithms. Bio-optical
models often exclude several wavebands or give
other wavelengths more weight. Since our aim is
to detect novel situations with unknown spectral
signature we use all the information available.
However, due to the resulting high dimensionality
of the data set (nine dimensions), performance of
different algorithms can not be easily visualised.
In Section 2 we therefore start by analysing the
basic characteristics of the reflectance data and
then construct an artificial two-dimesional showcase data set resembling some properties of these
reflectances. This artificial showcase data set is
used to test different approaches, namely: a simple
statistical scheme, an auto-associative Neural Net,
a Neural Net classifier and a tessellation scheme.
An algorithm is required which classifies all data
from the training (the two-dimensional showcase
data set) as normal but will still be sensitive
enough to detect novelties. In other words, we
need to envelop the data set as tightly as possible
and such that all points inside this envelope can
be defined as normal. Arguments for choosing
the tessellation as the most appropriate novelty
detection scheme will be given.
In Section 3 the North Sea data sets for training
and application testing are presented. The tessellation of the training data is described and the
results of the novelty search in the test data are
discussed.
In Section 4 the conclusions are presented.
2 Choice of the Algorithm
The data for which we want to develop a novelty
detection scheme are obtained by remote sensing.
The sensor and some important aspects of the reflectance spectra are described in Subsection 2.1.
Environ Monit Assess
10
reflectance [1/sr]
The main characteristics of these data are then
used to construct a two-dimensional showcase
data set. The performance of different novelty
detection schemes with this showcase data set is
described in Subsection 2.2. It was found that
an appropriate novelty detection scheme can be
based on a set of Gaussians.
10
–1
–2
2.1 MERIS water reflectance spectra
In March 2002, the European Space Agency
launched ENVISAT, an advanced polar-orbiting
earth observation satellite which has provided
measurements of the atmosphere, ocean, land,
and ice for over the five-year period to the present.
The ENVISAT satellite has an ambitious and innovative payload that will ensure the continuity
of the measurements of the ESA ERS satellites.
ENVISAT data supports earth science research
and allows the monitoring of the evolution of
environmental and climatic changes.
One of the instruments onboard ENVISAT is
MERIS (Rast et al. 1999), the MEdium Resolution Imaging Spectrometer. The primary mission
of MERIS is the measurement of water colour in
the oceans and in coastal areas. After atmospheric
correction, measurements of the water colour, can
be converted into an estimate of the concentrations of chlorophyll pigment, suspended matter
and Gelbstoff (yellow substance). The main application domains of such data are (1) the ocean
carbon cycle, (2) the thermal regime of the upper
ocean and (3) the management of fisheries and
coastal zones. MERIS allows global coverage of
the earth every 3 days.
In this paper the spectrum of water leaving
radiance reflectance (reflectance, for short) is used
to monitor the oceanic and coastal environment.
The nine spectral bands are centred at wavelengths λ = 412, 442, 490, 510, 560, 620, 664, 681,
708 nm. Examples of water reflectance spectra are
shown in Fig. 1.
This illustrates the great variability of water
reflectance spectra: the sea is not always blue.
Randomly chosen pixels from MERIS scenes obtained during 2002/2003 in a region1 of the North
1 Details
of this data are discussed in Section 3.
10
–3
450
500
550
λ [nm]
600
650
700
Fig. 1 Examples of water reflectance spectra measured by
MERIS
Sea were used to produce the scatter-plots of logarithms2 of reflectances at different wavelengths as
shown in Fig. 2.
This shows that there are strong correlations
between the reflectances at different wavelengths.
Obviously the correlation behaviour is not the
same for all pixels as this changes with the type
of water present. This leads to point clouds having
v-like shapes in some scatter-plots.
For real applications of novelty detection hundreds of MERIS scenes, each with tens of thousands of pixels, have to be checked. Therefore it is
important to devise an algorithm which has a low
rate of false alarms.
2.2 Algorithm performance with the showcase
data
In order to study the pros and cons of different
novelty detection schemes in a realistic manner
as well as being able to visualise the findings we
construct an artificial two-dimensional data set.
This demo data set is a superposition of two
2 We use the natural logarithm of the reflectance rather
than the reflectance itself since the latter have distributions
strongly skewed to small values, especially in the long
wavelength bands. As the reflectance has the unit sr−1 we
divide the reflectance by its unit before taking the logarithm. Any scaling of the unit would not change anything
since it is always the difference between the variables which
is most important for the algorithms considered .
Environ Monit Assess
"Meris water pixel
"Meris water pixel
–2
–2.5
–3
–2.5
ln(reflectance at 664nm)
ln(reflectance at 490nm)
–3.5
–3
–3.5
–4
–4.5
–5
–4
–4.5
–5
–5.5
–6
–6.5
–5.5
–6
–6
–7
–5.5
–5
–4.5
–4
–3.5
ln(reflectance at 412nm)
–3
–2.5
–7.5
–6
–2
–5.5
–5
–2
–3
–3
–4
–4
–5
–6
–7
–8
–9
–10
–6
–3
–2.5
–2
"Meris water pixel
–2
ln(reflectance at 708nm)
ln(reflectance at 708nm)
"Meris water pixel
–4.5
–4
–3.5
ln(reflectance at 412nm)
–5
–6
–7
–8
–9
–5.5
–5
–4.5
–4
–3.5
ln(reflectance at 442nm)
–3
–2.5
–2
–10
–5.5
–5
–4.5
–4
–3.5
–3
ln(reflectance at 560nm)
–2.5
–2
Fig. 2 Scatter-plots showing examples of water reflectances at different wavelengths as measured by the MERIS sensor
onboard the environmental satellite ENVISAT
Gaussians which exhibit the kind of v–like structure seen in the MERIS reflectance data shown
above. Like the reflectance data the demo data
set shows correlations, its variables are continuous
and have the same dimension and order of magnitude. Therefore, the demo data set is well suited to
exemplifying the novelty search in real reflectance
data. A randomly chosen subset of 800 points out
of a total of 5, 000 points is shown in Fig. 3.
In any novelty search a model of the ‘known’
data K is built which serves to construct a measure
ρ(x) of how certain a given point x belongs to
the kind of data characterised by the proxy K.
In the following subsections we construct four
different models of the demo data set and discuss
for each of the models the isoline of ρ(x) which
encloses 95% of the entire demo data set. For a
convenient comparison of the different algorithm
performances the comprised results are shown in
Fig. 5.
2.2.1 Simple statistical scheme
Let x(m) , m ∈ [1, . . . , M] be the M points of the
known data K. Then K is characterised to some
extent by its first two moments, the centroid q
qj =
M
1 (m)
x
M m=1 j
and the covariance matrix C
Cij =
M
1 (m)
(x − qi )(x(m)
j − qj )
M m=1 i
Environ Monit Assess
showcase data set to visualize features of novelty searches
3.5
3
2.5
2
y
1.5
1
0.5
0
–0.5
–1
–3
–2
–1
0
1
2
3
4
x
Fig. 3 Subset of the demo data which is a superposition of
two Gaussians
The moments contain information about the position of the points and shape of the point cloud
K which is used in the Mahalanobis distance function. The Mahalanobis distance of a given point x
from the centre of the points q is
ρM (x, q) = (x − q)T (C)−1 (x − q)
In Fig. 5a the isoline of the Mahalanobis distance from the centre enclosing 95% of the demo
data is shown. This clearly shows that the ellipsoid of equal Mahalanobis distance accounts for
the gross arrangement of the data. However, it
is not able to account for details such as the v–
like shape and the isoline in Fig. 5a also encloses
points such as A which could represent a ‘novelty.’
This example shows that such a simple model is
not appropriate for our MERIS spectra of water
reflectances.
Since AANN’s are trained to map the points
x ∈ K onto themselves A AN N(x) x after successful
training, the distance ρ(x, A AN N(x)) =
(x − A AN N(x))2 is small for points from K.
Now, to use AANN’s for novelty search the idea
is that points which are not like the points in
the training set are not projected onto themselves
and consequently lead to large ρ(x, A AN N(x)).
For the demo data the isoline of ρ(x, A AN N(x))
which encloses 95% of the demo data is shown
in Fig. 5b. It clearly demonstrates the AANN’s
ability to account for nonlinear correlations. The
weakness of the AANN scheme, however, lies in
the fact that there is no way of preventing the
AANN from mapping points which do not belong
to K also onto themselves . Therefore, one can not
exclude the possibility of small ρ(x, A AN N(x))
for points such as B in Fig. 5b which could also be
‘novelty’ candidates.
2.2.3 Neural net classifier
To avoid the possibility that regions close to K
are classified as belonging to K, one can generate
points close to K and then train a NN to classify
the points. This is illustrated in Fig. 4.
In the surrounding of points from K (dots),
points belonging to the class ‘non-K’ (crosses) are
generated. All the points are then taken together
and used as a training and test data set to train
subset of two class training data
3.5
3
2.5
2.2.2 Auto-associative neural net
1.5
y
Auto-associative Neural Nets (AANN’s) became
quite popular for novelty search as illustrated in
(Markou and Singh 2003a, 2003b). AANN’s are
trained to map the points of K onto themselves
whereby at one stage of the mapping the input
information is passed through a NN-layer of lower
dimension compared to the dimension of x itself
(hence a ‘bottleneck’). This can be accomplished
only by exploiting correlations existing in K. The
AANN’s can account for nonlinear correlations
and can handle very distinct input sources.
2
1
0.5
0
–0.5
–1
–3
–2
–1
0
1
2
3
4
x
Fig. 4 Two class data to train NN classifier: dots indicate
points from the original showcase dataset (K), crosses indicate points which where added and are representatives of
‘non-K’
Environ Monit Assess
a)
b)
isoline: ρ ((x,y),centroid) = 2.6
isoline: ρ((x,y),AANN(x,y)) = 0.71
M
3
3
2.5
2.5
2
2
1.5
1.5
B
x
y
3.5
y
3.5
x
1
1
A
0.5
0.5
0
0
–0.5
–0.5
–1
–3
–2
–1
0
1
2
3
–1
–3
4
–2
–1
0
x
1
2
3
4
3
4
x
c)
d)
isoline: ρ((x,y),centers) = 2.42
isoline: NN(x,y) = 0.35
3.5
3
3
2.5
2.5
2
2
1.5
1.5
y
y
3.5
1
1
0.5
0.5
0
0
–0.5
–0.5
–1
–3
–2
–1
0
1
2
3
4
–1
–3
–2
–1
0
x
1
2
x
Fig. 5 The figures show a subset of the showcase dataset
together with isolines of the measure ρ(x) which enclose
95% of the data for the four different approaches. a refers
to the statistical model, b to the auto-associative Neural
Net, c to the Neural Net classifier and d to the tesselation
a NN to classify the points accordingly. In this
case the NN has just one output value in which
we code ‘0’ to represent membership in K and
‘1’ nonmembership in K. The isoline of the NN
output enclosing 95% of the data is shown in
Fig. 5c. The isoline is now quite close to K.
For this demo data the generation of neighbouring points is trivial. For real data the generation of neighbouring points can become a very
difficult task in cases of point clouds extending
in different directions and having varying intrinsic
dimensionalities.
2.2.4 Tessellation scheme
Here we first tessellate the known data K and
then define a distance of a given point from the
tessellation.3
3 This
could be the initialisation step of a cluster algorithm
(e.g. ‘k – means clustering’) or the construction of a radial basis function (RBF) neural network. However, the
respective iterations would compromise the description of
the border of K in favour of the better density estimation.
Environ Monit Assess
To start the tessellation we choose by trial and
error a radius R (this is discussed below in detail
for the tessellation of the real data). Now
– The first point becomes the first centre.
– For each of the remaining points the smallest of the distances to all centres is compared with R. If the distance is larger than R
the point is included in the set of centres.
– In the second step each point is assigned to
that centre to which it is closest (Voronoi
tessellation).
For each of the patches obtained by the tessellation the mean (centre of gravity) and the covariance matrix is calculated and as the distance of a
given point to the tessellation the minimum of the
Mahalanobis distances is used.4 The isoline of the
tesselation enclosing 95% of the data is shown in
Fig. 5d. The isoline is again quite close to K.
In summary, the simple statistical model is
not appropriate for novelty detection in water
reflectance spectra. The auto-associative Neural
Net approach provides a tighter enclosure of the
data set but it is still not tight enough to ensure
novelty detection. The Neural Net classifier and
the tessellation both provide a tight enclosure of
the data set K. The advantage of the tessellation
approach compared to the two class Neural Net
classifier (where ‘non-K’ points had to be constructed) is its easy and straightforward set up.
Thus the tessellation approach will be used for the
water reflectance data.
Fig. 6 The roi in the North Sea chosen for the tessellation
novelty detection scheme
in which to search for novelties. In this section
we first describe the selection of data used to
represent the ‘normal’ situation and then the construction of the tessellation on these data and the
cleaning of the results. With a few amendments we
found a tessellation which describes our training
data.
We also discuss the different kinds of novelty
which were detected by our tesselation procedure
for scenes obtained from the test application
period.
3.1 The dataset defining the ‘normal’ situation
We used all MERIS scenes from the 26th June
2003 to the 29th June 2004 from a region of the
Projection of the roi pixels onto first eigenvectors
1
As an application for the tessellation novelty detection scheme we used MERIS data of the North
Sea. The logarithms of the remote sensing reflectances in the first nine MERIS channels (centred at wavelengths λ = 413, 443, 490, 510, 560,
620, 665, 681, 708 nm) were used to span the space
4 It should be mentioned here that this scheme is a
blend of the schemes (Auguststeijn and Folkert 2002) and
(D’Alimonte et al. 2003) mentioned in the introduction:
like in the PNN a set of Gaussians is used but here the covariance matrix of the points of each patch is used instead
of univariate smoothing parameters.
Projection onto second eigenvector
3 North Sea Results
0
–1
–2
–3
–4
–5
–6
–7
–8
–24
–22
–20
–18
–16
–14
–12
–10
Projection onto first eigenvector
–8
–6
Fig. 7 The projection of the roi data onto the eigenvectors
with the two largest eigenvalues
Environ Monit Assess
distance of training pixels from tesselation
7000
6000
80
5000
60
4000
40
3000
20
#
North Sea (region of interest = roi) as shown in
Fig. 6. We excluded pixels from the sun glint part
as well as those which had one or more negative
reflectances (bad atmospheric correction). Then
∼ 1, 000 pixels from each scene were randomly
sampled thus ending up with 115, 331 pixels for
the ‘training’. The projection of (a 10th of) this
data onto the eigenvectors with the two largest
eigenvalues is shown in Fig. 7. This projection
contains 94.9% of the variance of this data. It
is obvious that the point cloud has not a simple
shape.
0
6
2000
8
10
12
1000
0
0
2
4
6
8
10
12
Mahalanobis distance
Fig. 9 The distribution of the distances of the training
points from the tessellation. The inset shows the tail of the
distribution
3.2 The tessellation
The algorithm described in Subsection 2.2.4 was
applied to the above data with varying radius R
and hence varying number of resulting centres.
To choose a sensible tessellation one has to make
a compromise between a low number of centres
(computing time) and a large number of centres
(modelling the details of the probability density
function). For this we study Fig. 8 which shows the
weighted mean of the ratios of the two largest
eigenvalues vs. the number of centres
=
C
1 λc2
nc
N c=1 λc1
where nc , c = 1, . . . , C is the number of points
belonging to the cth centre (N = C
c=1 nc ) and
λc1(2) is the largest (second largest) eigenvalue of
the covariance matrix of these points.
The quantity is a ‘shape parameter’: if all
pixels are forced into one group the ratio λλ21 is
about one tenth. With an increasing number of
centres this ratio becomes larger and – if many
details are modelled – remains constant at around
0.6. As a compromise we decided to use the
26-centres tessellation (R = 2.52). This decision
is supported by the finding that the use of the
31-centres tessellation leads to the same large distance points.
The distribution of the distances of the training
points from the tessellation is shown in Fig. 9.
false novelty pixels from training set
Weighted mean ratios of eigenvalues vs. # of centers
–2
0.7
–3
0.6
–4
0.5
ln(reflectance)
–5
ε
0.4
0.3
–6
–7
–8
–9
0.2
–10
0.1
0
0
–11
10
20
30
40
50
# of centers
60
70
80
Fig. 8 The shape parameter as a function of the number
of centres
–12
400
450
500
550
600
λ [nm]
650
700
750
Fig. 10 The spectra of the 38 pixels of the training set with
a distance above 7.5
Environ Monit Assess
Fig. 11 One of the four
groups which was
appended to the
tessellation was located
near northern Denmark
From the blow-up included in this figure we decided to declare points as representing novelty if
their distance to the tessellation was above 7.5.
The spectra of the 38 pixels of the training set
having a distance above 7.5 is shown in Fig. 10.
3.3 Clean-up of the findings
The distance cut-off of 7.5 leaves only 38 pixels remaining from the training points. However,
if we apply the cut to the 288 complete roi’s
(s. Fig. 6), with 16, 575, 000 pixels of the time
range from which the training data set was selected, we find 110, 718 pixels with a distance >
7.5 from the tessellation. Inspection of these pixels
shows that many are very near to clouds and
includes also many single noisy spectra not well
represented in the training data set. We therefore
introduced two conditions which were not related
to the spectral information, namely:
1. Pixels have to be at least three pixels away
from clouds. This reduces the number of false
alarms from 110, 718 to 39, 860.
Fig. 12 The remaining
false alarm in the
Skagerrak
2. Novelty has to appear in a minimum patch
size. We accept pixels as novel only if at least
19 pixel in the surrounding 5 × 5 square have
a distance above 7.5. This reduces the number
of false alarms from 39, 860 to 2, 670 in altogether 26 scenes.
To get rid of the remaining false alarms we identified four groups of pixels such as those shown in
Fig. 11. For these four groups we calculated the
centre of gravity and the covariance matrix and
appended this to the existing tessellation information. After this addition only one false alarm as
shown in Fig. 12, remained in the training set.
3.4 Novelties
The test application data set comprised 223 scenes
from the 30th June 2004 to the 13th October 2005.
After the application of our novelty detection
scheme a total of 10 scenes remained. Three show
a red tide near to the island Helgoland on the 30th
July, and on the 3rd and 5th August 2004 (see
Fig. 13).
Environ Monit Assess
Fig. 13 Red tide in the
German Bight found as
novelty on the 30th of
July, 3rd and 5th of
August 2004. The fourth
sub-picture shows the
spectrum of the 3rd
of August
This red tide was also observed on a measuring
campaign between Cuxhaven and Helgoland on
the 3rd August 2004 and identified as Myrionecta
rubra. Standard algorithms also detected an increase in the chlorophyll content but by construction gave no hint to an unusual bloom.
In the scene from the 19th November 2004
novelty was signalled in the Skagerrak area, however, after further inspection this was shown to
Fig. 14 Coccolithophoride found as novelty
(example from 23rd of
June 2005)
be simply a processing failure in that a significant
number of water reflectances were just unity.
The remaining six novelties of 19th, 23rd, 26th,
28th June and 2nd, 18th July 2005 are located in
the central and northwestern North Sea. These
findings look like distinctive blooms of coccolithophores. An example is shown in Fig. 14.
Blooms of coccolithophores are quite frequent
every year. As we are signalling novel situations
Environ Monit Assess
we looked further into the period from which our
training data were taken. For several days MODIS
reports coccolithophore blooms in the MOD25
product (Balch et al. 1996, 1999; Gordon et al.
1988). We localised these blooms in the MERIS
training data and compared the spectra with the
novelty detections in our test data application.
These spectra were classified as novelties due to
their more pronounced blue reflectance as compared to those events within the training period.
This is illustrated in Fig. 15 where the band ratio of
blue over green reflectance (442/510 nm) is shown.
In the upper part for patches in the training period
classified by MODIS as coccolithophores the ratio
is well below 1.0. The lower part shows clearly
that the band ratio is above 1.0 for patches seen
as novelty in our application test.
For further practical novelty detection, and
for those not immediately interested in coccol-
coccolithophores in training data (2004)
300
250
#
4 Conclusions
A novelty detection scheme which is suitable for
regular monitoring purposes using multispectral
remote sensing data has been constructed.
It is applied to MERIS water reflectances and
has a low failure (false positive) rate. It was successful in finding exceptional algae blooms, one of
which was also identified as a red tide by in situ
measurements.
To reach this low failure rate it was necessary
to use not only the spectral information but also
to avoid pixels close to clouds and to require a
minimum novelty patch size.
The novelty detection scheme developed is
flexible and adaptable to various monitoring
needs. It easily allows the exclusion of explained
novelties from subsequent searches.
Acknowledgements This research was supported by the
European Space Agency ESA as a part of the ‘MERIS
Case2 Regional’ project.
The authors like to thank the BEAM/VISAT
development team (see URL http://www.brockmannconsult.de/beam/). The novelty detection processor was
developed on the basis of this open source software
package and can thus be freely used.
200
150
100
50
0
0.5
ithophore bloom alarms, one would append these
spectra to the tesselation as outlined in the data
clean-up step (see Section 3.3).
1
1.5
References
novelties in test data (2005)
250
200
#
150
100
50
0
0.5
1
band ratio 442nm/510nm
1.5
Fig. 15 Ratio of blue to green reflectance (442/510 nm).
Comparison of training data classified as coccolithophores
vs. novelties emerging from the test data
Auguststeijn, M. F., & Folkert, B. A. (2002). Neural
network classification and novelty detection. International Journal of Remote Sensing, 23, 2891–2902.
Balch, W. M., Kilpatrick, K., Holligan, P. M., Harbour,
D., & Fernandez, E. (1996). The 1991 coccolithophore
bloom in the central north Atlantic. II. Relating optics
to coccolith concentration. Limnology and Oceanography, 41, 1684–1696.
Balch, W. M., Drapeau, D. T., Cucci, T. L., Vaillancourt,
R. D., Kilpatrick, K. A., & Fritz, J. J. (1999). Optical backscattering by calcifying algae – Separating the
contribution by particulate inorganic and organic carbon fractions. Journal of Geophysical Research, 104,
1541–1558.
D’Alimonte, D., Melin, F., Zibordi, G., & Berthon, J.
J. (2003). Use of the novelty detection technique
to identify the range of applicability of empiri-
Environ Monit Assess
cal ocean color algorithms. IEEE Transactions on
Geoscience and Remote Sensing, 41, 2833–2843.
Gordon, H. R., Brown, O. B., Evans, R. H., Brown,
J. W., Smith, R. C., Baker, K. S., et al. (1988). A
semianalytic radiance model of ocean color. Journal
of Geophysical Research, 93, 10909–10924.
Markou, M., & Singh, S. (2003a). Novelty detection: A
review – Part 1: Statistical approaches. Signal Processing, 83(12), 2481–2497, December.
Markou, M., & Singh, S. (2003b). Novelty detection: A review – Part 2: Neural network based
approaches. Signal Processing, 83(12), 2499–2521,
December.
Rast, M., Bézy, J. L., & Bruzzi, S. (1999). The ESA
Medium Resolution Imaging Spectrometer MERIS –
A review of the instrument and its mission.
International Journal of Remote Sensing, 20, 1681–
1702. (see also URL http://envisat.esa.int/instruments/
meris/descr/charact.html).
Washburne, T. P., Specht, D. F., & Drake, R. M.
(1993). Identification of unknown categories with
probabilistic neural networks. In Proceedings
of the 1993 international conference on neural
networks, San Francisco, CA, 28 March – 1 April,
1993, vol. 1. (pp. 434–437). Piscataway, NJ: IEEE
Service Center.