NR Edicao 01 - NICS

Transcrição

NR Edicao 01 - NICS
Expediente
NICS Reports
NICS Reports
Periódico eletrônico do
Núcleo Interdisciplinar de Comunicação
Sonora – NICS
Universidade Estadual de Campinas UNICAMP
Editores:
Jônatas Manzolli, NICS/UNICAMP
José Fornari, NICS/UNICAMP
Marcelo Gimenes, NICS/UNICAMP
Endereço:
Rua da Reitoria, 165
Cidade Universitária Zeferino Vaz
13.083-872 Campinas, SP
Telefones:
+55 (19) 3521-7770
+55 (19) 3521-2570
E-mail:
[email protected]
Site:
http://www.nics.unicamp.br/nr
Suporte Técnico
Edelson Constantino
[email protected]
NICS Reports
Sumário
Editorial ................................................................................................................ 5!
Artigos ................................................................................................................. 7!
1.! The pursuit of happiness in music: retrieving valence with contextual
music descriptors ............................................................................................. 9!
2.! Panorama dos modelos computacionais aplicados à musicologia
cognitiva ......................................................................................................... 27!
3.! An a-life approach to machine learning of musical worldviews for
improvisation systems .................................................................................... 52!
4.! Vox Populi: An Interactive Evolutionary System for Algorithmic Music
Composition ................................................................................................... 71!
5.! Abduction and Meaning in Evolutionary Soundscapes ........................... 86!
NICS Reports
NICS Reports
Editorial
A edição inaugural do NICS Reports (NR) apresenta
artigos que estão relacionados a dois temas que se
tornaram recorrentes durante os últimos anos de pesquisa
do Núcleo.
O primeiro deles é a utilização de métodos da
Computação Evolutiva como os algoritmos genéticos e
outros processos derivados com o objetivo de produzir
diversidade
musical.
Os
processos
evolutivos
desenvolvidos no NICS foram pioneiros no tratamento
dessa questão fundamental em composição e design
sonoro com suporte computacional que objetiva
empreender um estudo em modelos computacionais que
vislumbre na criatividade sonora um campo amplo de
exploração de sonoridades, tão amplo como o próprio
processo biológico que inspira a Computação Evolutiva.
Esperamos que esses dois temas e os próximos que virão
possam suscitar novas leituras de questões fundamentais
sobre composição, performance e processos interativos
musicais. Almejamos também que a leitura da primeira
edição do NR possa inspirar novos caminhos e estudos.
Campinas, outubro de 2012
Os Editores
Editorial
O segundo tema que a edição inaugural aborda é a
Cognição Musical estudada a partir de simulação
computacional. Nesse sentido o computador opera como
um simulacro das inúmeras potencialidades que o
processo cognitivo humano produz quando focado na
criação e performance musical. Trata-se de uma
abordagem sistêmica na qual esse imenso universo, que
é a interação da percepção com o meio sonoro, é
traduzido em aspectos específicos que são tratados por
modelos computacionais.
NICS Reports
NICS Reports
Artigos
Artigos
NICS Reports
8"
NICS Reports
1.
The pursuit of happiness in music:
retrieving valence with contextual music
descriptors1
José Fornari
Interdisciplinary Nucleus for Sound
Communication (NICS), University of Campinas
(Unicamp), Brazil
[email protected]
Tuomas Eerola
Music Department, University of Jyvaskyla (JYU),
Finland
[email protected]
Abstract. In the study of music emotions, Valence is usually referred to as one
of the dimensions of the circumplex model of emotions that describes music
appraisal of happiness, whose scale goes from sad to happy. Nevertheless,
related literature shows that Valence is known as being particularly difficult to be
predicted by a computational model. As Valence is a contextual music feature,
it is assumed here that its prediction should also require contextual music
descriptors in its predicting model. This work describes the usage of eight
contextual (also known as higher-level) descriptors, previously developed by us,
to calculate happiness in music. Each of these descriptors was independently
tested using the correlation coefficient of its prediction with the mean rating of
Valence, reckoned by thirty-five listeners, over a piece of music. Following, a
linear model using this eight descriptors was created and the result of its
prediction, for the same piece of music, is described and compared with two
other computational models from the literature, designed for the dynamic
prediction of music emotion. Finally it is proposed here an initial investigation on
the effects of expressive performance and musical structure on the prediction of
Valence. Our descriptors are then separated in two groups: performance and
structural, where, with each group, we built a linear model. The prediction of
Valence given by these two models, over two other pieces of music, are here
compared with the correspondent listeners’ mean rating of Valence, and the
achieved results are depicted, described and discussed.
Keywords: music information retrieval, music cognition, music emotion.
1
Referência original deste trabalho: Fornari, J. and T. Eerola (2009). The Pursuit of Happiness
in Music: Retrieving Valence with Contextual Music Descriptors. Computer Music Modeling and
Retrieval. Genesis of Meaning in Sound and Music. S. Ystad, R. Kronland-Martinet and K.
Jensen, Springer Berlin Heidelberg. 5493: 119-133.
9"
NICS Reports
1 Introduction
Music emotion has been studied by many researches in the field of psychology,
such as the ones described in [1]. The literature mentions three main models
used in the study of music emotion: 1) categorical model; originated from the
work of [2], that describes music in terms of a list of basic emotions [3], 2)
dimensional model; originated from the research of [4], who proposed that all
emotions can be described in a Cartesian coordinate system of emotional
dimensions, also named as circumplex model [5], and 3) component process
model; from the work of [6] that describes emotion appraised according to the
situation of its occurrence and the current listener's mental (emotional) state.
Computational models, for the analysis and retrieval of emotional content in
music, have also been studied and developed, in particular by the Music
Information Retrieval (MIR) community, that maintains a repository of
publication on its field (available at the International Society for MIR link:
www.ismir.net). To name a few: in [7] it was developed a computational model
for musical genre classification that is similar, although simpler, to the retrieval
of emotions in music. In [8] it was provided a good example of audio feature
extraction using multivariate data analysis and behavioral validation of its
features. There are also several examples of computing models developed for
the retrieval of emotional features evoked by music, such as in [9] and [10] that
studied the retrieval of higher-level features of music, such as tonality, in a
variety of music audio files.
1.1 The dynamic variation of appraised Valence
In the study of the dynamic aspects of music emotion, [11] used a twodimensional model to measure emotions appraised by listeners along time, in
several music pieces. The emotional dimensions described are the classical
ones: Arousal (that ranges from calm to agitated) and Valence (that goes from
sad to happy). This one used Time Series techniques to create linear models
with five acoustic descriptors to predict each of these two dimensions, for each
music piece. In [12] it was used the same listener’s mean ratings collected by
[11] to develop and test a general model for each emotional dimension (i.e. one
general model for Arousal and another one for Valence). This one used System
Identification techniques to create its two models of prediction.
In any case, these two studies described above, there was not made any effort
to distinguish between musical aspects predicted by the descriptors that are
related to the composition, given by its muscal structure or to its expressive
performance.
10"
NICS Reports
1.2 The balance between expressive performance and musical structure
for the appraisal of Valence
Music emotion is influenced by two groups of musical aspects. One, that is
given by the structural features created by the composer and described in terms
of musical notation. The other one relates to the emotions aroused in the
listeners during the musician(s) expressive performance. The first group is here
named as structural aspects and the second one, performance aspects.
Sometimes the difference between a mediocre and a breathtaking interpretation
of a musical structure relies on the performers’ ability of properly manipulate
basic musical aspects such as: tempo, dynamics and articulation. Such skill
often seems to be the key for the musician to recreate the emotional depths
whose composer supposedly tried to convey in the musical structure.
About this subject, in [13] it is mentioned that: “expert musical performance is
not just a matter of technical motor skill; it also requires the ability to generate
expressively different performances of the same piece of music according to the
nature of intended structural and emotional communication”. Also, in [14] it is
said that: “Music performance is not unique in its underlying cognitive
mechanisms”. These arguments seem to imply that, in music, structure and
performance both cooperate to evoke emotion. The question is to know how the
musical structure and expressive performance cooperate and interact with each
other on the appraisal of music emotion.
There are several researches on this subject. For instance, [15] provided an
overview of the state of the art in the field of computational modeling of
expressive music performance. He mentioned three important ones. The KTH
model; that consists in a set of performance rules that predict timing, dynamics,
and articulation based on the current musical context [16]. The Todd model;
that, in contrast, applies the notion of “analysis-by-measurement”, once that
their empirical evidence comes directly from the ratings of the expressive
performances [17]. Finally, there is the Mazzola model that is mainly based on
mathematical modeling [18] (see the link: www.rubato.org).
Recently, a Machine Learning approach has also been developed. This one
builds computational models of expressive performance from a large set of
empirical data (precisely measured performances made by skilled musicians)
where the system autonomously seeks out significant regularities on the data,
via inductive machine learning and data mining techniques [19].
As seen, finding the hidden correlations between musical structure and
performance and its effects on music emotion is a broad field of research.
Obviously, fully mapping this relation is beyond our scope. Here we intend to
initiate an investigation on the subject, using our contextual descriptors as
structural and performance ones.
11"
NICS Reports
The underlying musical aspects that influence the emotional state of listeners
have been subject of research in a number of previous studies, although few
isolated the influence of each other, sometimes leading to conflicting qualitative
results. In fact, it seems that a thorough attempt of combining these aspects of
music are still to be done, despite some researches, such as in [20] that
described a quite comprehensive study with the “adjective circle”. There have
been some other researches, such as [21], that studied the interaction of mode
and tempo with music emotion, also studied by [22]. It would be, however,
rather ambitious the intent of evaluating the interactions between tempo,
dynamics, articulation, mode, and timbre in a large factorial experiment.
We aim here to initiate an investigation, using our eight higher-level descriptors,
on the prediction of appraised Valence and on how structural and performance
features contribute to this particular musical emotion. Here we first show the
prediction of Valence for each of our descriptors and for a linear model using
them all. Following, we separate these descriptors in two groups: structural and
performance, and create with each one a linear model to calculate Valence.
This experiment firstly takes one piece of music, its correspondent Valence
ground-truth, and calculates its prediction with each descriptor and with the
linear model using all descriptors. Next, we take two other pieces of music and
their Valence ground-truths to calculate their prediction with the structural and
performance models.
2 The difficulty of predicting Valence
As seeing in the results shown in [11] and [12], these models successfully
predicted the dimension of Arousal, with high correlation with their groundtruths. However, the retrieval of Valence has proved to be difficult to measure
by these models. This may be due to the fact that the previous models did not
make extensive usage of higher-level descriptors. The literature in this field
named as descriptor a model (usually a computational model) that predicts one
aspect of music, emulating the perception, cognition or emotion of a human
listener. While low-level descriptors account for perceptual aspects of music,
such as: loudness (perception of sound intensity) or pitch (perception of
fundamental partial), the higher-level ones account for contextual musical
features, such as: pulse, tonality or complexity. These refer to the cognitive and
aspects of music and deliver one prediction for each overall music excerpt.
If this assumption is true, it is understandable why Valence, as a highly
contextual dimension of music emotion, is poorly described by models using
mostly low-level descriptors.
Intuitively, it was expected that Valence, as the measurement of happiness in
music, would be mostly correlated to the prediction of higher-level descriptors
such as key clarity (major versus minor mode), harmonic complexity, and pulse
12"
NICS Reports
clarity. However, as described further, the experimental results pointed to
another direction.
3 Designing contextual musical descriptors
In 2007, during the Braintuning project (see Discussion section for details) we
were involved in the development of computational models for contextual
descriptors of specific musical aspects. This effort resulted in the development
of eight higher- level music descriptors. Their design used a variety of audio
processing techniques (e.g. chromagram, similarity function, autocorrelation,
filtering, entropy measurement, peak detection, etc.) to predict specific
contextual musical aspects. Their output is a normalized between zero
(normally meaning the lack of that feature in the analyzed music excerpt) and
one (referring to the clear presence of such contextual music aspect).
These eight descriptors were designed and simulated in Matlab, as algorithms
written in the form of script files that run music stimuli as digital audio files, in 16
bits of resolution, 44.1 KHz of sampling rate and 1 channel (mono).
To test and improve the development of these descriptors, behavioral data was
collected from thirty-three listeners that were asked to rate the same features
predicted by these descriptors. They rated one hundred short excerpts of music
(five seconds of length each) from movie sound tracks. Their mean rating was
then correlated with the descriptors predictions. After several experiments and
adjustments, all descriptors presented coefficient of correlation from 0.5 to 0.65
with their respective ground- truths. They are briefly described as following
below.
3.1 Pulse Clarity
This descriptor measures the sensation of pulse in music. Pulse is here seen as
a fluctuation of musical periodicity that is perceptible as “beatings”, in a subtonal frequency (below 20Hz), therefore, perceived not as tone (frequency
domain) but as pulse (time domain). This can be of any musical nature
(melodic, harmonic or rhythmic) as long as it is perceived by listeners as a
fluctuation in time. The measuring scale of this descriptor is continuous, going
from zero (no sensation of musical pulse) to one (clear sensation of musical
pulse).
3.2 Key Clarity
This descriptor measures the sensation of tonality, or tonal center, in music.
This is related to the sensation of how tonal an excerpt of music is perceived by
listeners, disregarding its specific tonality, but focusing on how clear its
perception is. Its scale is also continuous, ranging from zero (atonal) to one
13"
NICS Reports
(tonal). Intermediate regions, neighboring the middle of its scale tend to refer to
musical excerpts with sudden tonal changes, or dubious tonalities.
3.3 Harmonic complexity
This descriptor measures the sensation of complexity conveyed by musical
harmony. In communication theory, musical complexity is related to entropy,
which can be seen as the degree of disorder of a system. However, here we
are interested in measuring the perception of its entropy, instead of the entropy
itself. For example, in acoustical terms, white-noise could be seen as a very
complex sound, yet its auditory perception is of a very simple, unchanging
stimuli. The challenge here is finding out the cognitive sense of complexity.
Here we focused only on the complexity of musical harmony, leaving the
melodic and rhythmic complexity to further studies. The measuring scale of this
descriptor is continuous and goes from zero (no harmonic complexity
perceptible) to one (clear perception of harmonic complexity).
3.4 Articulation
In music theory, the term articulation usually refers to the way in which a melody
is performed. If it is clearly noticeable a pause in between each note in the
melodic prosody, it is said that the articulation of its melody is staccato, which
means “detached”. In the other hand, if there is no pause in between the notes
of the melody, then it is said that this melody is legato, meaning “linked”. This
descriptor attempts to grasp the articulation from musical audio files and
attributing to it an overall grade that ranges continuously from zero (staccato) to
one (legato).
3.5 Repetition
This descriptor accounts for the presence of repeating patterns in a musical
excerpt. These patterns can be: melodic, harmonic or rhythmic. This is done by
measuring the similarity of hopped time-frames along the audio file, tracking
repeating similarities happening within a perceptibly time delay (around 1Hz to
10Hz). Its scale ranges continuously from zero (not noticeable repetition within
the musical excerpt) to one (clear presence of repeating musical patterns).
3.6 Mode
Mode is the musical term referring to one of the eight modes in the diatonic
musical scale. The most well-known are: major (first mode) and minor (sixth
mode). In the case of our descriptor, mode refers to a computational model that
calculates out of an audio file an overall output that continuously ranges from
zero (minor mode) to one (major mode). It is somewhat fuzzy to intuit what its
middle range grades would stand for, but the intention of this descriptor is
mostly to distinguish between major and minor excerpts, as there is still ongoing
14"
NICS Reports
discussion on whether major mode carries in itself valence of appraised
happiness, as well as minor mode accounts for sadness (see Discussion
section for counter-intuitive result on this subject).
3.7 Event Density
This descriptor refers to the overall amount of perceptually distinguishable, yet
simultaneous, events in a musical excerpt. These events can also be: melodic,
harmonic and rhythmic, as long as they can be perceived as independent
entities by our cognition. Its scale ranges continuously from zero (perception of
only one musical event) to one (maximum perception of simultaneous events
that the average listener can grasp).
3.8 Brightness
This descriptor measures the sensation of how bright a music excerpt is felt to
be. It is intuitive to know that this perception is somehow related to the spectral
centroid, which accounts for the presence of partials with higher frequencies in
the frequency spectrum of an audio file. However other aspects can also be of
influence in its perception, such as: attack, articulation, or the unbalance or
lacking of partials in other regions of the frequency spectrum. Its measurement
goes continuously from zero (excerpt lacking brightness, or muffled) to one
(excerpt is clearly bright).
4 Building a model to predict Valence
In the research on temporal dynamics of emotion, described in [11], Schubert
created ground-truths with data collected from thirty-five listeners that
dynamically measured the emotion categories depicted into a two-dimensional
emotion plan that was then mapped into two coordinates, or dimensions:
Arousal and Valence. Listener’s ratings variations were sampled every one
second. The pruned data of these measurements, mean rated and mapped into
Arousal and Valence, created the ground- truths that was used later in [12] by
Korhonen, as well as in this work. Here, we calculated the correlation between
each descriptor prediction and Schubert’s Valence ground-truth for one music
piece, named “Aranjuez concerto”, by Joaquín Rodrigo. During the initial minute
of this 2:45’ long piece of music, the guitar plays alone (solo). Then, it is
suddenly accompanied by full orchestra, whose intensity fades towards the end,
till the guitar, once again, plays the theme alone.
For this piece, the correlation coefficient presented between the descriptors
predictions and its Valence ground-truth are: event density: r = 0.59, harmonic
complexity: r = 0.43, brightness: r = 0.40, pulse clarity: r = 0.35, repetition: r =
0.16, articulation: r = 0.09, key clarity: r = 0.07, mode: r = 0.05.
15"
NICS Reports
Then, a multiple regression linear model was created with all eight descriptors.
The model employs a time frame of three seconds (related to the cognitive “now
time” of music) and hop-size of one second to predict the continuous
development of Valence. This model presented a correlation coefficient of r =
0.6484, which leaded to a coefficient of determination of: R2 = 42%.
For the same ground-truth, Schubert’s model used five music descriptors: 1)
Tempo, 2) Spectral Centroid, 3) Loudness, 4) Melodic Contour and 5) Texture.
The descriptors output differentiation was regarded as the model predictors.
Using time series analysis, he built an ordinary least square (OLS) model for
this particular music excerpt. Korhonen’s approach used eighteen low-level
descriptors (see [12] for details) to test several models designed with System
Identification techniques. The best general model reported in his work was an
ARX (Auto-Regressive with eXtra inputs).
Table 1 shows below the comparison of results for all three models, in terms of
best achieved R2 (coefficient of determination) in the measurement of Valence
for the Aranjuez concerto.
Table 1. Emotional dimension: VALENCE. Ground-truth: Aranjuez concerto.
This table shows that our model performed significantly better than the previous
ones, for this specific ground-truth. The last column of table 1 shows the
achieved result for the descriptor prediction “event density”, the one that
presented the highest correlation with the ground-truth. This descriptor alone
presented better results than the two previous models. The results shown seem
to suggest that higher-level descriptors can in fact be successfully used to
improve the dynamic prediction of Valence.
Figure 1 depicts the comparison between this ground-truth, given by the mean
rating of Valence for the Aranjuez concerto ranked by listeners, and the
prediction given by our multiple-regressive model, using all eight descriptors.
16"
NICS Reports
Fig. 1. Mean rating of the behavioral data for Valence (continuous line) and our model
prediction (dashed line).
It is seen here that, in spite of the prediction curve presents some rippling
effect, when visually compared with the ground-truth (the mean-rating
behavioral data), in overall, its prediction follows the major variations of Valence
along with the music performing time, what resulted in a high coefficient of
determination.
As described in the next sections, the next step of this study was to distinguish
between performance and structural aspect of music and to study how they
account for the prediction of Valence. Hence, we separated our eight contextual
descriptors into these two groups and created with them two new linear models;
one to predict the performance aspects influencing the appraisal of Valence,
and another one to predict its structural aspects.
4.1 Performance Model
This model is formed by the higher-level descriptors related to the dynamic
aspects of musical performance. These descriptors try to capture music
features that are manipulated mostly by the performer(s) instead of the aspects
already described in the musical structure (i.e. its composition). They are
commonly related to musical features such as: articulation, dynamics, tempo
and micro-timing variability.
As the “dynamics” aspect is related to Arousal, as seen in [11, 12] and the
examples studied here had their “tempo” aspect approximately unchanged,
here we focused on “pulse clarity” and “brightness” aspects, as they also have
17"
NICS Reports
been used as descriptors of expressive performance in other studies, such as in
[21].
We considered as belonging to the performance, the following descriptors: 1)
articulation, 2) pulse clarity and 3) brightness.
Articulation is a descriptor that measures how much similar musical events are
perceptually separated to each other. This is a fundamental component of
expressive performance that has been studied in many researches, such as in
[22] where was analyzed the articulation strategies applied by pianists in
expressive performances of the same scores. Articulation may also be seen as
a musical trademark or fingerprint to help identifying a musical genre or the
performing artist style.
Pulse clarity is the descriptor that measures how clear, or perceptible, is the
pulse in a musical performance. This is chiefly in the distinction between
expressive performances characterized by an interpretation more towards the
Ad Libitum (without clear pulse), or the Marcato (with clear pulse).
Brightness is the descriptor that accounts for the musical aspects related to the
variation of the perception of brightness along of an expressive performance. It
scale will cover from Muffled (without brightness) to Bright (or brilliant).
4.2 Structural Model
Structural descriptors are the ones that account for the static or structural
aspects of a piece of music given by the composition musical score, or any
other kind of notation, so they are supposed to be little influenced by the
expressive performance aspects. Several researches have studied them, such
as in [23]. We considered as structural descriptors the following: 1) mode, 2)
key clarity, 3) harmonic complexity, 4) repetition and 5) event density.
Mode is the descriptor that grades musical structure tonality. If the structure of
the excerpt analyzed is clearly minor, the scale will have value near to zero,
otherwise, if it is clearly major, the scale will have value towards one. If the
music excerpt presents ambiguity in its tonality, or if it is atonal, its scale will
have values around 0.5.
Key Clarity measures how tonal a particular excerpt of music structure is. Its
scale goes from atonal (e.g. electro-acoustic, serialistic, spectral music
structures) to clearly tonal structures (e;g; diatonic, modal, minimalist
structures).
Harmonic Complexity is a descriptor that refers to the complexity of an structure
in terms of its harmonic clusters, what is related to the perceptual entropy of: 1)
chords progression and 2) chord structures.
18"
NICS Reports
Repetition describes the amount of repeating similar patterns found in the
musical structure. This repetition has to happen in a sub-tonal frequency, thus
perceived as rhythmic information.
Event Density is the descriptor that accounts for the amount of perceptible
simultaneous musical events found in a structure excerpt. They can be melodic,
harmonic or rhythmic as long as they can be aurally distinctively perceived.
4.3 Valence prediction with Structural and Performance Models
As before, here we also used the ground-truth developed by the work of [13]
where thirty-five listeners rated the music emotion dynamically appraised in a
circumplex model, for several pieces of music, and then mapped to the
dimensions of Arousal and Valence.
For this part we chose the Valence ratings of two musical pieces: 1) “Pizzicato
Polka” by Strauss, and 2) “Morning” by Peer Gynt. The Valence ground-truths of
them were chosen mainly because they presented a repeating musical
structure with slight changes in the expressive performance, so both structural
and performance models could be tested and compared.
Figures 2 and 3 show the comparison between each Valence ground-truth and
its prediction for the structural and performance models. These ones were
created using multiple regression technique.
Figure 1 shows the “Pizzicato Polka” example. It is seen here three curves
overlapped: mean-rating, structural model prediction and performance model
prediction. The “Mean Rating” curve is the Valence ground-truth. “Structural
Model” curve is the prediction for the structural linear model, the same way as
the “Performance Model” curve is the prediction for the performance model.
Fig. 2. Structural and performance models for “Pizzicato Polka”, by Strauss.
19"
NICS Reports
“Pizzicato” is a simple and tonal orchestral piece where the strings are mostly
played in pizzicato (i.e. strings plucked). Its musical parts are repeated several
times. Each part has even two similar sub-parts (i.e. A = A1 + A2). The musical
parts that compound this piece of music are shown in table 2.
Table 2. Musical parts of “Pizzicato”.
The second and most complex example is shown in Figure 3. It is the rating and
predictions of Valence for the piece of music named “Morning”. This figure
describes the results for the Rating (Valence ground-truth) and its predictions
for the structural and performance models.
Fig. 3. Structural and performance models for “Morning”, by Peer Gynt.
“Morning” has a more advanced orchestration, whose melody swaaps between
solo instruments and tonalities (key changes), although it still has a repetitive
musical structure. The musical parts that constitute this piece are shown in table
3. Here, an extra column was included to describe what these changes
represent in terms of musical structure.
20"
NICS Reports
Table 3. Musical parts of “Morning”.
Finally, table 4 shows the coefficient of correlation for each piece of music, for
the structural and performance models.
Table 4. Experimental results for the overall correlation between the Valence groundtruths and the Performance and Structural models.
As seen on the table above, the coefficient of correlation for these two pieces of
music are approximately the same, where the structural model correlation is
higher than the performance one, for the overall prediction of Valence.
5 Discussion
This work was developed during the project named: “Tuning you Brain for
Music”, the Braintuning project (www.braintuning.fi). An important part of it was
the study of acoustic features retrieved from musical excerpts and their
correlation with specific emotions appraised. Following this goal, we designed
the contextual descriptors, here briefly described. They were initially conceived
because of the lack of such descriptors in the literature. In Braintuning, a fairly
large number of studies for the retrieval of emotional connotations in music
were investigated. As seem in previous models, for the dynamic retrieval of
contextual emotions such as the appraisal of happiness (represented here by
the dimension of Valence), low-level descriptors are not enough, once they do
not take into consideration the contextual aspects of music.
It was interesting to notice that the prediction of Valence done by the descriptor
“Event Density” presented the highest correlation with Valence ground-truth,
while the predictions of “Key Clarity” and “Mode” correlated very poorly. This
seems to indicate that, at least for this particular case, the perception of major
or minor tonality in music (represented by “Mode”) or its tonal center (given by
“Key Clarity”) is not relevant to predict Valence, as it could be intuitively inferred.
What counted the most here was the amount of simultaneous musical events
(given by “event density”), remembering that by “event”, it is here understood
any perceivable rhythmic, melodic or harmonic stimuli. The first part of this
21"
NICS Reports
experiment chose the music piece “Aranjuez” because it was the one that the
previous models presented the lowest correlation with Valence ground-truth.
Although the result presented here is enticing, further studies are definitely
needed in order to establish any solid evidence.
The second part of this experiment studied the effects of expressive
performance and musical structure on the appraisal of Valence. In “Pizzicato”,
the rating curve starts near zero and then abruptly plummets to negative values
(i.e. sad). During the musical part A, the rating rises until it becomes positive
(i.e. happy), when part B starts. Both models approximately follow the rating
and present a peak where the rating inverts, as part B starts. They both present
a negative peak around 35s, where part A repeats for the first time. At the same
time the rating declines a little but still remains positive (happy). Maybe this is
related to the listeners’ memory of part A, and the models don't take into
consideration their previous predictions. When part C starts, the rating rises
sharply, as this is appraised as a particular "happy" passage of this music.
Here, the performance model seems to present higher values (although wavy)
than the structure model, until the beginning of part D, around 68s. Parts A-B- A
repeats again around 92s where the rating shows a similar shape as before,
although much narrower, maybe because the listeners have “recognized” this
part. Here the performance model follows the rating closely. The structural
model presents an abrupt rising between 110s and 120s where part B is taking
place. In Coda, both models present positive predictions, but the rating is
negative.
In "Morning", the rating starts from negative (sad) and rises continuously until it
reaches positive values (happy), when part A3 stars. This is understandable
once that part A3 begins with an upward key change, which in fact delivers the
appraisal of an joy. The rating keeps raising until the next “key change” in part
A5, and reaches its highest values in part A6, from 50s to 80s, when the whole
orchestra plays together the “A” theme, back to the original key. Both models
start from values close to zero. They show a steep rise in values from part A1 to
part A2 (more visible in the performance model prediction). When part A3 starts,
both models predictions decrease and the performance model goes to the
negative side. This may have happened because articulation and pulse clarity,
as the descriptors within the performance model, decrease in values at this
passage, as well as in 40s, when A5 starts. During part A6, the structural model
prediction is more similar to the rating than the performance model, which
makes sense once that this is mostly a structural change and the performance
parameters almost remain still. The rating decreases when part B1 starts in 78s.
This is expected once that, in this part, the music mode changes from major to
minor. Consequently, at this moment, the performance model prediction almost
remains unchanged. The structural model prediction rises from negative to near
22"
NICS Reports
zero (or positive) values and shows a peak around the beginning of part B2.
When A7 starts in 123s the rating drops to negative values and rises
continuously until 138s, when Coda starts. Both models do not follow this
behavior. Structural model prediction remains positive as well as in any other
part “A”. Performance model is also little affected by this passage.
6 Conclusion
This work intended to investigate the usage of contextual descriptors for the
prediction of the dynamic variation of music emotions. We chose to study the
emotional dimension of Valence (here referred to as the perception of
happiness in music) because this is a highly contextual aspect of music and
known to be particularly difficult to be predicted by computational models.
We briefly introduced eight contextual descriptors previously developed by us.
They are: event density, harmonic complexity, brightness, pulse clarity,
repetition, articulation, key clarity and mode.
We used the same music stimuli and correspondent Valence ground-truths of
two important models from the literature. Firstly, we selected a piece of music
whose previous models did not reach satisfactory correlations in the prediction
of Valence. We then predicted Valence with each descriptor and with a linear
model made with all descriptors. The highest correlation descriptor was the
"event density", presenting coefficient of determination higher than the ones
presented by the previous models.
Secondly, we studied the relation between the appraisal of Valence with the
expressive performance aspects and musical structure ones. Our descriptor
were then separated in two groups, one covering the structural aspects (mode,
key clarity, harmonic complexity, repetition and event density) and the other for
the performance ones (articulation, pulse clarity and brightness). Two models
with each descriptor group were then created and named as: structure and
performance. Although these models did not reach outstanding coefficients of
correlation with ground-truths (around 0.33 for performance model and 0.44 for
the structural one) they reached very similar coefficients for two pieces of music
stylistically very distinct. This seems to indicate that the results of these models,
despite their simplicity and limitations, are pointing to a further promising
outcome.
It also seems to make sense the results showing that the structural model
presents a higher correlation with ground-truth than the performance one. The
structural model accounts for a greater portion of musical aspects. The structure
comprehends the musical composition, arrangement, orchestration, and so
forth. In theory, it conveys “the seed” of all emotional aspects whose expressive
performance is supposed to bring about.
23"
NICS Reports
There is a great number of topics that can be tested in further investigations on
this subject. For instance, we did not take into consideration the memory
aspects that will certainly influence the emotional appraisal of Valence. New
models including this aspect, should consider principles found in the literature
such as the forgetting curve and the novelty curve.
We used rating data from the ground-truth of another experiment that, in spite
of bringing enticing results, was not meant for this kind of experiment. In a
further investigation, a new listeners’ rating data should be collected, with
different performances of the same musical structure, as well as different
structures of similar performances. This is a quite demanding task but that
seems to be the correct path to be followed in order to enable the development
of better descriptors and models.
7 Acknowledgements
We would like to thank the BrainTuning project (www.braintuning.fi) FP6-2004NEST-PATH-028570, the Music Cognition Group at the University of Jyväskylä
(JYU), and the Interdisciplinary Nucleus of Sound Communication (NICS) at the
State University of Campinas (UNICAMP). We are specially grateful to Mark
Korhonen, for sharing the ground-truth data from his experiments with us.
References
1. Sloboda, J. A. and Juslin, P. (Eds.): Music and Emotion: Theory and
Research. Oxford: Oxford University Press. ISBN 0-19-263188-8. (2001)
2. Ekman, P.: An argument for basic emotions. Cognition & Emotion, 6 (3/4):
169–200, (1992).
3. Juslin, P. N., & Laukka, P.: Communication of emotions in vocal expression
and music performance: Different channels, same code? Psychological
Bulletin(129), 770-814. (2003)
4. Russell, J.A.: Core affect and the psychological construction of emotion.
Psychological Review Vol. 110, No. 1, 145- 172. (2003)
5. Laukka, P., Juslin, P. N., & Bresin, R.: A dimensional approach to vocal
expression of emotion. Cognition and Emotion, 19, 633-653. (2005)
6. Scherer, K. R., & Zentner, K. R.: Emotional effects of music: production rules.
In J. P. N. & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp.
361-392). Oxford: Oxford University Press (2001)
7. Tzanetakis, G., & Cook, P.: Musical Genre Classification of Audio Signals.
IEEE Transactions on Speech and Audio Processing, 10(5), 293-302. (2002)
8. Leman, M., Vermeulen, V., De Voogdt, L., Moelants, D., & Lesaffre, M.:
Correlation of Gestural Musical Audio Cues. Gesture-Based Communication in
24"
NICS Reports
Human-Computer Interaction. 5th International Gesture Workshop, GW 2003,
40-54. (2004)
9. Wu, T.-L., & Jeng, S.-K.: Automatic emotion classification of musical
segments. Proceedings of the 9th International Conference on Music
Perception & Cognition, Bologna, (2006)
10. Gomez, E., & Herrera, P.: Estimating The Tonality Of Polyphonic Audio
Files: Cogtive Versus Machine Learning Modelling Strategies. Paper presented
at the Proceedings of the 5th International ISMIR 2004 Conference, October
2004., Barcelona, Spain. (2004)
11. Schubert, E.: Measuring emotion continuously: Validity and reliability of the
two- dimensional emotion space. Aust. J. Psychol., vol. 51, no. 3, pp. 154–165.
(1999)
12. Korhonen, M., Clausi, D., Jernigan, M.: Modeling Emotional Content of
Music Using System Identification. IEEE Transactions on Systems, Man and
Cybernetics. Volume: 36, Issue: 3, pages: 588- 599. (2006)
13. Slodoba, J. A.: Individual differences in music performance. Trends in
Cognitive Sciences, Volume 4, Issue 10, Pages 397-403. (2000)
14. Palmer, C.: Music Performance. Annual Review of Psychology. 48:115-38.
(1997)
15. Gerhard, W., Werner, G.: Computational Models of Expressive Music
Performance: The State of the Art. Journal of New Music Research 2004, Vol.
33, No. 3, pp. 203–216. (2004)
16. Friberg, A., Bresin, R., Sundberg, J.: Overview of the KTH rule system for
music performance. Advances in Experimental Psychology, special issue on
Music Performance, 2(2-3), 145-161. (2006)
17. Todd, N.P.M.: A computational model of Rubato. Contemporary Music
Review, 3, 69–88. (1989)
18. Mazzola, G., Göller, S.: Performance and interpretation. Journal of New
Music Research, 31, 221–232. (2002)
19. Widmer, G., Dixon, S. E., Goebl, W., Pampalk, E., Tobudic, A.: Search of
the Horowitz factor. AI Magazine, 24, 111–130. (2003)
20. Hevner, K.: Experimental studies of the elements of expression in music.
American Journal of Psychology, Vol. 48, pp. 246-268. (1936)
21. Gagnon, L., Peretz, I.: Mode and tempo relative contributions to "happy sad" judgments in equitone melodies. Cognition and Emotion, vol. 17, pp. 2540. (2003)
25"
NICS Reports
22. Dalla Bella, S., Peretz, I., Rousseau, L., Gosselin, N.: A developmental
study of the affective value of tempo and mode in music. Cognition, 80(3), B110. (2001)
21. Juslin, P. N.: Cue utilization in communication of emotion in music
performance: relating performance to perception. J Exp Psychol Hum Percept
Perform, 26(6), 1797–1813. (2000) 22. Bresin, R., Battel, G.: Articulation
strategies in expressive piano performance. Journal of New Music Research,
Vol.29, No.3, Sep 2000, pp.211-224. (2000)
23. BeeSuan O.,: Towards Automatic Music Structural Analysis: Identifying
Characteristic Within-Song Excerpts in Popular Music. Doctorate dissertation.
Department of Technology. University Pompeu Fabra. (2005)
26"
NICS Reports
2.
Panorama dos modelos computacionais
aplicados à musicologia cognitiva2
Marcelo Gimenes
Núcleo Interdisciplinar de Comunicação Sonora
Universidade Estadual de Campinas
[email protected]
Resumo: Este artigo apresenta um panorama do estado da arte dos modelos
computacionais que interessam à musicologia cognitiva. Alguns destes são
inspirados em fenômenos naturais, tentando imitar, por exemplo, processos
executados pela mente humana, enquanto outros não têm essa preocupação.
Diferentes modelos podem co-existir em modelos mais complexos. Os
sistemas são organizados considerando o fluxo da informação musical, desde
a percepção dos sons e a aquisição de conhecimentos musicais até a
manipulação deste conhecimento em processos criativos. Entre as abordagens
apresentadas encontram-se sistemas baseados em regras, em gramática e
que usam aprendizagem de máquina. Além desses, também são apresentados
modelos baseados na computação evolutiva (e.g., algoritmos genéticos) e na
vida artificial.
Palavras-chave: musicologia cognitiva, modelos computacionais, inteligência
artificial
1. Introdução
Entre as muitas transformações sofridas pela ciência durante o século XX, o
surgimento da computação, da inteligência artificial, das técnicas para
obtenção de imagens do cérebro e, ao mesmo tempo, do declínio da
popularidade da psicologia comportamental, entre outros fatores, conduziram
ao que chamamos de revolução cognitiva (Huron, 1999). Progressivamente,
um crescente interesse pelo estudo da memória, atenção, reconhecimento de
padrões, formação de conceitos, categorização, raciocínio e linguagem (Huron,
1999) ocupou o espaço que antes pertencia à psicologia comportamental.
Nesse contexto, surgem as ciências cognitivas como uma área interdisciplinar
de pesquisa que reúne especialmente a filosofia, a psicologia experimental, as
neurociências e a computação com o objetivo de estudar a natureza e a
estrutura dos processos cognitivos. Para atingir este fim, um papel
particularmente importante é exercido pela modelagem computacional, por
proporcionar uma representação formal do conhecimento e a verificação
experimental de diferentes teorias cognitivas.
2
Referência original deste trabalho: Gimenes, M. (2011). "Panorama dos modelos
computacionais aplicados à musicologia cognitiva." Revista Cognição & Artes Musicais 3(2).
27"
NICS Reports
Acompanhando essas transformações, a musicologia, especialmente nas
últimas décadas, adota uma perspectiva na qual a música não é somente vista
como obra de arte mas, em particular, como um processo que resulta da
atuação de diversos agentes (músicos, ouvintes, etc.) (Honing, 2006). Esta
visão conduziu a novas vertentes musicológicas que passaram a emprestar do
rigoroso método científico (teste e falsificação), a formalização do
conhecimento (modelos computacionais) e o empirismo (busca de provas).
Em vista desses fatos, a musicologia cognitiva (também conhecida como
cognição musical ou musicologia computacional) conquista nas últimas
décadas progressivamente cada vez mais adeptos interessados em estudar o
pensamento musical ou, em outras palavras, os hábitos musicais da mente
(Huron, 1999). Sendo um ramo das ciências cognitivas, a musicologia cognitiva
possui o mesmo caráter interdisciplinar daquela, reunindo teorias e métodos
desenvolvidos pela filosofia (e.g., teorias do conhecimento), psicologia (e.g.,
experimentalismo), neurociências (e.g., imagens do cérebro) e ciência da
computação (e.g., simulação).
O objeto de estudo da musicologia cognitiva é, portanto, a representação e o
processamento (e.g., aquisição, armazenamento, geração) do conhecimento
musical pela mente para o quê busca suporte nos modelos computacionais.
Com o auxílio destes, simulações procuram demonstrar as teorias acerca dos
processos cognitivos humanos. Obviamente, quanto mais próximo o modelo
estiver das características destes processos, mais perto ele vai estar de atingir
aquela finalidade. É sabido, contudo, que esses modelos ainda não
alcançaram plenamente o objetivo de avaliar e falsificar as teorias que eles
representam (Honing, 2006).
Sendo uma atividade inteligente, a música oferece material abundante para a
investigação das atividades cognitivas humanas. Na ciência da computação, a
área que explora o comportamento inteligente é chamada de inteligência
artificial. Linhas gerais, dois paradigmas são utilizados. O primeiro, chamado de
modelos simbólicos, representam explicitamente as partes do problema sob
análise através de um vocabulário de símbolos que corresponde a objetos e/ou
conceitos, podendo ter um modelo do mundo no qual opera (Geraint Wiggins &
Smail, 2000). A compreensão do resultado das operações do sistema é
facilitada pela correspondência semântica existente com esses símbolos.
O segundo paradigma, adota uma abordagem sub-simbólica, também
conhecida como conexionista. Sistemas conexionistas organizam e manipulam
o conhecimento através das chamadas redes neurais, um sistema de nós
(processadores simples) que são interligados (vagamente) simulando as
conexões dos neurônios no cérebro. Uma vez que esses processadores não
têm uma relação de significado explícito com símbolos do mundo real, sua
operação é de difícil compreensão.
28"
NICS Reports
Feitas estas considerações preliminares, as próximas seções irão apresentar
um panorama diversos dos modelos computacionais utilizados pela
musicologia cognitiva. Alguns deles, como veremos, se preocupam em
implementar modelos teóricos que versam sobre a cognição humana e,
portanto, interessam diretamente à musicologia cognitiva. Outros, contudo,
adotam uma posição "engenherística", mais voltada ao resultado (criação
musical) do que propriamente à descrição desses modelos. Optamos por incluir
estes últimos pelo interesse que despertam e por possuírem muitos paralelos
com os primeiros.
Grosso modo, as seções estão organizadas de modo a acompanhar o fluxo da
informação musical, desde a percepção dos sons, a aquisição e representação
do conhecimento até processos de geração musical. Antes de iniciarmos a
exposição desses modelos, a seção a seguir, "2. Experimentos em Inteligência
Musical", irá apresentar os Experimentos em Inteligência Musical, um sistema
que se tornou referência na área, a fim de termos uma visão geral de como
sistemas computacionais podem exibir comportamento inteligente. Na
penúltima seção, encerramos esse panorama com o sistema Ambientes
Interativos Musicais (Musical Interactive Environments - iMe), que adota
explicitamente modelos cognitivos para explorar a evolução musical.
2. Experimentos em Inteligência Musical
David Cope (1991) iniciou o projeto Experimentos em Inteligência Musical
(Experiments in Musical Intelligence - EMI) há cerca de 30 anos visando a
simulação computacional de estilos musicais. A idéia inicial era criar um
sistema no qual pudesse ser incorporado o modo com que ele manipulava
suas idéias musicais. Se, a qualquer momento sentisse a necessidade de
ajuda em função de um bloqueio mental, por exemplo, o sistema poderia ser
usado para gerar automaticamente um número de novos compassos da
mesma forma que ele faria pessoalmente.
As implementações iniciais deste sistema codificaram o conhecimento musical
através de regras para a escrita de partes. Cope relata que os resultados não
foram muito satisfatórios e que o sistema produziu apenas "depois de muito
ensaio e erro ... uma música sem sabor que, basicamente, aderia a essas
regras" (Cope, 1999, p. 21). Partindo dessa experiência e vencidos os
primeiros obstáculos, Cope passou a enfrentar uma série de outras questões,
tais como qual seria a melhor maneira de segmentar as músicas originais ou
como os segmentos deveriam ser reorganizados para que a música gerada
pelo sistema tivesse sentido musical.
Cope observou que os compositores tendem a reutilizar determinadas
estruturas durante toda sua obra e que estas acabam por caracterizar seus
estilos musicais. Ele descobriu que estes elementos duram entre 2 e 5 tempos
29"
NICS Reports
(7 a 10 notas melódicas), muitas vezes combinam estruturas melódicas,
harmônicas e rítmicas e ocorrem normalmente de quatro a dez vezes em uma
música (Cope, 1999, p. 23). A esses elementos recorrentes Cope deu o nome
de "assinaturas".
Numa fase posterior, Cope passou a experimentar com corais de Bach,
segmentando-os em cada tempo dos compassos. O sistema analisava um
corpo de peças musicais e extraía as assinaturas que eram, em seguida,
categorizadas em léxicos. O sistema também armazenava as notas para as
quais as vozes se moviam de um tempo para outro do compasso. Novamente,
os resultados foram insatisfatórios: novas músicas tendiam a vaguear, sem
uma estrutura definida de grande escala. O problema, desta vez, era que a
lógica das frases musicais não estava sendo observada.
Para resolver este problema, informações de estruturas globais tinham de ser
incorporadas, juntamente com as regras de movimentação de uma nota para
outra. Novos módulos de análise foram adicionados ao EMI para permitir a
preservação do local que cada segmento ocupava nas seções das peças
originais. O "caráter" de cada tempo, definido através de elementos como o
ritmo e número de notas também tinha que ser preservado, a fim de garantir
que a música produzida pelo sistema proporcionasse uma sensação de
continuidade. De fato, transpostos esses obstáculos iniciais, as músicas que o
sistema passou a produzir em seguida eram bastante convincentes,
especialmente quando tocadas por músicos humanos.
Se, de um lado, EMI é capaz de simular determinados estilos musicais, de
outro, a arquitetura do sistema como um todo é extremamente complexa.
Resumidamente, o processo se inicia com a elaboração de um banco de dados
musicais, uma tarefa manual, tediosa e demorada que depende inteiramente
da experiência musical do usuário. Uma série de peças semelhantes têm de
ser escolhidas de forma a garantir que o resultado final seja consistente.
Tonalidade, tempo e métrica devem ser considerados nesta análise. Cope
mencionou certa vez que esta fase inicial, da seleção à codificação das
músicas, durava vários meses de trabalho (Muscutt, 2007).
Uma vez pronto o banco de dados musicais, EMI analisa as peças e deduz
assinaturas musicais e regras para a composição. Um algoritmo completo de
busca de padrões é aplicado sobre o material de entrada e todas as
possibilidades (resultados parciais ou totais) são calculados estatisticamente.
Todos os segmentos são marcados para as funções estrutural hierárquica e
harmônica. A conectividade das estruturas também é checada para melodia,
acompanhamento e harmonia.
Durante a recombinação (geração de novo material), as assinaturas devem
sobreviver, mantendo sua forma original (relações intervalares) e o contexto
30"
NICS Reports
local. A estrutura global de uma das composições originais é usada como
referência para as novas músicas produzidas. O sistema fixa as assinaturas em
seus locais de origem e depois preenche as lacunas com base nas regras
encontradas durante a análise estatística. Para isso, o sistema utiliza uma
Rede de Transição Aumentada (Augmented Transition Network - ATN) (Woods,
1970), uma estrutura utilizada na definição das línguas naturais e "projetada
para produzir sentenças lógicas a partir de pedaços de frases e peças que
tenham sido armazenados de acordo com a função de sentença" (Cope, 1991,
p. 26)3.
Finalmente, Cope ouve cada uma das peças geradas pelo sistema e mantém
aquelas que considera mais convincentes, em média, uma em cada quatro ou
cinco peças que são descartadas (Muscutt, 2007).
3. Percepção musical
As pessoas são capazes de fazer generalizações e de aprender conceitos
musicais elementares (e.g., alturas, escalas) a partir de exemplos musicais.
Esse conhecimento, uma vez adquirido, passa a ser o ponto de partida para a
apreciação de novas peças musicais (Cambouropoulos, 1998, p. 31). A
modelagem da percepção humana envolve, portanto, a descoberta de
estruturas de diferentes tipos e hierarquias.
Cambouropoulos (1998) propôs um modelo computacional teórico denominado
Teoria Geral Computacional da Estrutura Musical (General Computational
Theory of Musical Structure - GCTMS) que tem por objetivo precisamente
descrever os componentes estruturais da música. O modelo propõe captar
elementos que seriam reconhecidos por um ouvinte e, conseqüentemente,
inclui conceitos típicos das habilidades cognitivas humanas (e.g., abstração,
reconhecimento de identidades e/ou semelhanças e categorização).
O GCTMS é constituído por uma série de componentes que abordam
separadamente cada tarefa analítica. Um deles, a Representação de Intervalos
de Alturas Gerais (General Pitch Interval Representation - GPIR), codifica a
informação musical. O Modelo de Detecção de Limites Locais (Local Boundary
Detection Model - LBDM) é responsável pela segmentação e os Modelos de
Estruturas de Acentuação e Métrica (Accentuation and Metrical Structure
Models - AMSM), pela definição de modelos estruturais.
Segundo o autor, esse sistema não requer que a música seja previamente
marcada com elementos de nível estrutural e pode consistir em apenas uma
seqüência de eventos simbólicos (notas, etc.) que o sistema traduz para a sua
representação interna. Uma vez obtida a representação, o próximo passo é a
3
O aprofundamento desse tema fugiria ao escopo deste texto. Maiores informações podem ser
obtidas em (Woods, 1970).
31"
NICS Reports
segmentação do fluxo musical, para a qual o GCTMS leva em conta princípios
da psicologia Gestalt (Bod, 2001).
A palavra Gestalt significa "forma" em alemão e contém a idéia de que os
sentidos humanos são orientados pela percepção do todo (e.g., uma entidade
física, psicológica ou simbólica) antes da percepção das partes. Grupamentos
melódicos, por exemplo, podem ser definidos em razão da sua similaridade
(movimento ascendente e/ou descendente), ou proximidade (ocorrência de
pausas). Esses grupamentos são realizados pela memória de curto prazo 4
(Snyder, 2000).
Os conceitos da Gestalt vêm sendo adotados por diversos pesquisadores
(McAdams, 1984; Polansky, 1978; Tenney & Polansky, 1980). Deutsch
(Deutsch, 1982a, 1982b), por exemplo, analisa como as regras da Gestalt
podem ser aplicadas a combinações de notas. A Teoria Geradora da Música
Tonal (Generative Theory of Tonal Music - GTTM) de Lerdahl e Jackendoff
(1983) também usa princípios da Gestalt para definir segmentos e
agrupamentos.
A segmentação musical, uma questão básica para muitos dos sistemas que
exploram a cognição e/ou a análise musical, permanece, em grande medida,
apesar dos muitos progressos alcançados até hoje, um problema de difícil
solução em função da infinidade de parâmetros (melodia, ritmo, etc.) e níveis
de hierarquia a serem considerados. Em muitos casos, os segmentos se
sobrepõem, o que indica a possibilidade de haver várias soluções aceitáveis.
Para lidar com esses problemas, diversos sistemas aplicam filtros para
simplificar a entrada de dados. Ao invés de considerar a altura das notas, por
exemplo, podem ser usadas as distâncias intervalares (Deutsch, 1982b).
Obviamente, esta e outras estratégias têm o potencial de comprometer os
resultados da segmentação, algo que deve ser levado em consideração caso a
caso.
Diversos sistemas (Baker, 1989a, 1989b; Camilleri, Carreras, & Duranti, 1990;
Chouvel, 1990; Hasty, 1978) adotam diferentes algoritmos para lidar com a
segmentação. O LDBM, mencionado acima, constrói uma representação de
intervalos a partir da seqüência de notas musicais. Em seguida, tenta detectar
"descontinuidades perceptuais" ou "limites de percepção", através de
parâmetros como duração (notas longas/curtas) e saltos melódicos. Para
descobrir os pontos máximos de mudança local são aplicadas duas regras
4
Em um modelo funcional simplificado, a memória pode ser descrita através de três processos
(memória ecóica, memória de curto prazo e memória de longo prazo) que correspondem a
diferentes níveis temporais da experiência musical. A memória de curto prazo processa
eventos separados por mais de 63 milissegundos (16 eventos por segundo), o nível dos
grupamentos melódicos e rítmicos (Snyder, 2000).
32"
NICS Reports
(mudança de identidade e regra de proximidade), inspiradas nos princípios de
semelhança e proximidade da Gestalt. Com base nessa análise, para cada par
de notas de uma melodia é atribuído um coeficiente de descontinuidade que
determina a "força da divisão".
Thom et al (2002) apresentaram uma revisão abrangente de diversos
algoritmos de segmentação melódica comparando os resultados com a
segmentação executada por músicos. Entre os algoritmos analisados encontrase, além do já mencionado LDBM, o sistema Grouper, proposto por Temperley
(2004) e que se baseia em um conjunto de regras de preferência (lacunas,
extensão de frase, paralelismo métrico, etc.) adaptadas da já citada GTTM
(1983).
O modelo Implicação-Realização (Narmour, 1990), também inspirado em
princípios da Gestalt, envolve a análise de processos que ocorrem na
percepção de estruturas melódicas. "Estruturas de implicação" são as
expectativas que orientam a percepção e a criação musical e correspondem às
influências estilísticas recebidas através da exposição a contextos musicais.
Essas estruturas conduzem a "estruturas de realização", que são arquétipos
para possíveis continuações das estruturas de implicação.
Alguns pesquisadores usam o conceito de agentes inteligentes para definir
critérios e implementar algoritmos de segmentação. Gimenes (2008), por
exemplo, aplica conceitos da Gestalt a uma combinação de "informações
perceptivas" (e.g., direção melódica, salto melódico, o intervalo melódico entre
ataques, etc.), extraídas dos "órgãos sensoriais" de agentes inteligentes. No
sistema Cypher (Rowe, 2004) categorias diferentes de agentes são
especializadas em parâmetros musicais diferentes (harmonia, registro,
dinâmica, etc.).
Após a segmentação, uma vez definidas estruturas locais, muitas vezes, em
função da tarefa analítica em questão, é necessário que estas sejam
comparadas. Estabelecer que duas estruturas são iguais é algo relativamente
fácil de fazer. Encontrar estruturas "semelhantes", por outro lado, é algo bem
mais difícil.
Visando a contribuir para a solução desta questão, Martins et al (2005)
propuseram um algoritmo para medir a similaridade entre sub-seqüências em
um espaço geral rítmico usando uma estrutura chamada Vetor de Coeficientes
de Similaridade. Neste modelo, capaz de comparar estruturas rítmicas de
tamanhos diferentes, todas as sub-seqüências de um determinado ritmo são
comparadas. Uma subdivisão hierárquica das seqüências de ritmo é feita em
vários níveis e uma matriz de distância para cada nível é calculada usando
uma medida conhecida como "distância de bloco". A informação sobre a
similaridade das sub-estruturas rítmicas é então recuperada a partir das
33"
NICS Reports
matrizes de distâncias e codificadas para o Vetor de Coeficientes de
Similaridade5.
4. Conhecimento Musical
Esta seção apresenta alguns sistemas computacionais tendo em vista a
aquisição e a armazenagem do conhecimento musical.
4.1 Sistemas baseados em regras
Sistemas baseados em regras, também conhecidos como sistemas
especialistas ou baseados em conhecimento, tentam encapsular explicitamente
o conhecimento especialista humano em um determinado domínio. No caso da
música, a quantidade de elementos que devem ser tratados de forma eficiente
para descrever uma peça musical é enorme, fato que explica os muitos
problemas dessa abordagem.
Um exemplo de sistema musical baseado em regras é CHORAL, proposto por
Ebcioglu (1988). Este sistema codifica cerca de 350 normas destinadas à
harmonização de melodias no estilo coral de Bach e aborda aspectos como
progressões de acordes e linhas melódicas das partes.
Pachet (1998) propôs um sistema para explorar variações harmônicas em
seqüências de acordes de jazz. Uma dessas variações é a conhecida "regra de
substituição pelo trítono" segundo a qual um acorde dominante (ch1) pode ser
substituído por outro acorde dominante (ch2), em que a raiz de ch2 é a quarta
aumentada (ou trítono) de ch1. Esta substituição é possível uma vez que o
terceiro e o sétimo graus de ch1 correspondem ao sétimo e terceiro graus de
ch2. A Figura 1 abaixo mostra um acorde maior de dó dominante e a sua
correspondente substituição pelo trítono.
Figura 1: Substituição pelo trítono.
Outra substituição de acordes muito utilizada também é aplicável aos acordes
dominantes e consiste na preparação destes por acordes de sétima menor com
base no segundo grau da escala local.
Também é de Pachet (1994, p. 1) o sistema MusES, que tem como objetivo
experimentar "várias técnicas de representação do conhecimento orientadas a
objeto no campo da harmonia tonal". Este sistema faz análises de seqüências
de acordes de jazz, assim como gera automaticamente harmonizações e
improvisações.
5
O aprofundamento desse tema fugiria ao escopo deste texto. Maiores informações podem ser
obtidas em (Martins et al., 2005).
34"
NICS Reports
4.2 Sistemas baseados em gramática
A música, assim como a linguagem, é constituída por seqüências de estruturas
ordenadas e pode, desse modo, ser descrita em termos gramaticais.
Gramáticas são conjuntos finito de regras, que permitem a descrição de uma
coleção potencialmente infinita de símbolos estruturados (Geraint Wiggins,
1998, p. 3). A Figura 2 mostra um exemplo simples de gramática.
Figura 2: Exemplo de gramática.
(SN: sintagma nominal, SV: sintagma verbal, A: artigo, S: substantivo, V: verbo)
Assim, um segundo paradigma para a codificação do conhecimento musical
são os sistemas baseados em gramática. Na realidade, sistemas baseados em
conhecimento e sistemas gramaticais são muito semelhantes, uma vez que as
duas abordagens são constituídas regras e focam na forma que está sendo
produzida (Geraint Wiggins, 1999, p. 4).
O conhecido método de análise de Schenker adota princípios gramaticais
(Forte, 1983; Marsden, 2007). Em termos gerais, este método consiste em
submeter uma música a uma série de reduções (e.g., progressões auxiliares e
notas de passagem) até que uma estrutura elementar global ("ursatz") seja
revelada.
Abordagens semelhantes também são adotadas pela GTTM de Lerdahl e
Jackendoff (Cambouropoulos, 1998). Neste caso, o objetivo é descrever os
processos cognitivos envolvidos na música tonal, em termos de agrupamentos
(com base nos princípios da Gestalt), métrica, período de tempo e estruturas
redutoras. As regras de Steedman (Geraint Wiggins, 1999) são um outro
sistema baseado em gramática que visa a captar estruturas musicais do jazz e
de peças pop de blues de 12 compassos. Neste sistema, os processos mentais
que levam à expectativa em progressões de jazz são considerados.
4.3 Aprendizagem de máquina
Ao ter contato com a música, as pessoas começam a identificar naturalmente
determinadas estruturas e regularidades. Se no futuro os mesmos elementos
se repetirem, conexões com o material previamente aprendido irão surgir
35"
NICS Reports
espontaneamente. Portanto, além das abordagens anteriormente mencionadas
(sistemas baseados em regras e sistemas baseados em gramática), é possível
também adquirir conhecimento através de indução, ou seja, inferindo regras
gerais a partir de exemplos particulares.
O objetivo de sistemas que usam essa técnica, que chamamos de
aprendizagem de máquina, é fazer com que o computador "aprenda" a partir de
um conjunto de dados de exemplo. O sistema extrai os padrões locais usando
sistemas probabilísticos. Ao gerar novas seqüência, essas probabilidades são
utilizadas (Pachet, 2002a). Um caso particular de processo estocástico
comumente adotado, o modelo conhecido como cadeias de Markov permite
estabelecer as probabilidades de ocorrência de um estado futuro com base no
estado atual.
Uma das desvantagens dos modelos de Markov são a ausência de
informações de longo prazo (Pachet, 2002a) e, assim, a dificuldade de capturar
a estrutura geral de peças musicais. Além disso, o tamanho do contexto
musical tem uma implicação direta na eficiência de algoritmos. Cadeias de
Markov de baixa ordem não capturam eficientemente regras probabilísticas,
enquanto que ordens superiores, apesar de capturar algumas estruturas de
curto prazo (W. F. Walker, 1994), possuem um custo computacional importante
(Assayag, Dubnov, & Delerue, 1999).
A conhecida Suíte ILLIAC, de Hiller e Isaacson (1959) foi composta com o uso
de cadeias de Markov. Xenakis usou a mesma técnica para as composições
Analogique em fins dos anos 1950. Um número de sistemas mais recentes
usam modelos probabilísticos para modelagem do estilo musical (Cope, 2004;
Pachet, 2003; Thom, 2000a; W. Walker, Hebel, Martirano, & Scaletti, 1992) e
improvisação de música interativa (Assayag, Bloch, Chemillier, Cont, &
Dubnov, 2006; Pachet, 2003; Raphael, 1999; Thom, 2000b; Vercoe & Puckette,
1985), entre outras finalidades. Trivino-Rodriguez e Morales-Bueno (2001)
usaram grafos de predição com atributos múltiplos para gerar novas músicas.
O sistema iMe introduzido por Gimenes (2007) utiliza técnicas estocásticas
para modelagem da memória e geração de música.
4.3.1 Métodos e estruturas de dados
Alguns métodos e estruturas de dados têm sido freqüentemente utilizados por
sistemas baseados em aprendizagem de máquina. Pachet (2002b), por
exemplo, usa árvores de prefixo para armazenar uma árvore ordenada de
todas as sub-seqüências (ponderada pelo seu número de ocorrências) de uma
seqüência musical. Neste caso, os dados de entrada são simplificados
armazenando-se reduções ao invés da seqüência inteira.
No contexto de compressão sem perda, Jakob Ziv e Abraham Lempel
propuseram o algoritmo de análise incremental (Shlomo Dubnov, Assayag,
36"
NICS Reports
Lartillot, & Bejerano, 2003) em que um dicionário de motivos é construído
percorrendo-se uma seqüência de símbolos. Novas frases são adicionadas ao
dicionário quando o algoritmo encontra uma seqüência que se diferencia das
anteriores por um único caractere.
Outra estrutura de dados é a Árvore de Previsão de Sufixo (Ron, Singer, &
Tishby, 1996), que armazena cadeias de Markov de comprimento variável.
Esta estrutura foi proposta por Rissanen (1983) a fim de superar as
desvantagens (e.g., o crescimento de parâmetros) do modelo original de
Markov. O algoritmo constrói um dicionário dos motivos que aparecem um
número importante de vezes e que, portanto, são significativos para predizer o
futuro imediato. Há, conseqüentemente, perda da informação original (Assayag
& Dubnov, 2004, p. 1).
Finalmente, uma outra estrutura, o Fator Oracle (FO) é um autômato que capta
todos os fatores (sub-frases) em uma seqüência musical e uma série linear de
elos de transição (ponteiros) (S. Dubnov & Assayag, 2005). Ponteiros para a
frente (ou fator de links) no momento da geração permitem a reconstrução das
frases originais. Os ponteiros para trás (ou links de sufixo) acompanham as
outras sub-frases que compartilham o mesmo sufixo e geram recombinações
baseadas no contexto do material aprendido (S. Dubnov & Assayag, 2005). O
sistema OMAX (Assayag et al., 2006) usa este modelo para a aprendizagem
de estilo musical e improvisação em tempo real6.
5. Processos generativos
A criação musical pode ser vista como o resultado da interação entre
representações do conhecimento musical e processos generativos associados
a ela. Um paradigma explorado nos primórdios da Inteligência Artificial (IA) foi a
composição algorítmica (Hiller & Isaacson, 1959). Outros modelos incluem a
computação evolutiva, os agentes inteligentes e os modelos de inspiração
biológica (e.g., vida artificial, autômatos celulares e enxames). Muitas vezes
sistemas musicais complexos utilizam mais de um desses modelos.
5.1 Composição Algorítmica
O uso de algoritmos na música é provavelmente tão antigo quanto a própria
música. Cope afirma que seria impossível compor sem usar pelo menos alguns
algoritmos: "aqueles que confiam amplamente em sua intuição para a música
realmente usam algoritmos subconscientemente" (Muscutt, 2007, p. 20). Um
algoritmo é simplesmente uma "receita passo a passo para alcançar um
objetivo específico"; a música algorítmica, portanto, pode ser considerada
6
Uma explanação mais aprofundada sobre esses algoritmos pode ser encontrada em (Shlomo
Dubnov et al., 2003) e (S. Dubnov & Assayag, 2005)
37"
NICS Reports
como "uma receita passo a passo para a criação de novas composições"
(Muscutt, 2007, p. 10).
Um exemplo famoso de composição algorítmica são os Jogos Musicais de
Dados (Würfelspiel Musikalisches) atribuídos a Mozart. O processo consiste em
se criar segmentos musicais que depois são utilizados em composições onde a
ordem das seqüências é determinada pelo lançamento dos dados (Cope,
1991). A Suíte ILLIAC, mencionada acima, é conhecida por ter sido a primeira
peça musical gerada por computador. Mesmo que os computadores não sejam
um pré-requisito para a composição algorítmica, eles facilitam muito sua
execução (Muscutt, 2007).
Diversos sistemas musicais algorítmicos não necessariamente focam na
música, mas simplesmente mapeiam ou fazem associações entre o resultado
de algoritmos genéricos e parâmetros musicais. Esses sistemas devem ser
diferenciados daqueles que incorporam conhecimento musical (Miranda,
2002b), de maior interesse para a musicologia cognitiva.
5.2 Computação Evolutiva
As principais proposições teóricas sobre as origens e a evolução das espécies
foram introduzidas durante o século XIX. Lamarck (Packard, 2007) sugeriu
inicialmente que os indivíduos teriam a capacidade de se adaptar ao ambiente
e que os resultados dessa adaptação poderia ser transmitida de pais para
filhos.
Para Darwin (1998), indivíduos com características favoráveis em relação ao
seu ambiente teriam mais chances de sobreviver, se comparados a indivíduos
com traços menos favoráveis. Por este motivo, após uma série de gerações, a
população de indivíduos com características favoráveis cresceria e seria mais
adaptada ao ambiente. Eventualmente, após diversas gerações, as diferenças
seriam tão significativas que resultariam em novas espécies.
As idéias que fundamentam os modelos evolutivos são a adaptação, a
transmissão e a sobrevivência do mais apto. Sabemos que os genes (Mendel,
1865) permitem a transmissão de características particulares, mas a evolução
"ocorre quando um processo de transformação cria variantes de algum tipo de
informação. Normalmente, há um mecanismo que favorece a melhor
transformação e descarta aquelas que são consideradas inferiores, de acordo
com determinados critérios" (Miranda, 1999, p. 8).
Um número crescente de pesquisadores está desenvolvendo modelos
computacionais estudar a evolução musical. Miranda (2003) estudou as
origens e a evolução da música "no contexto das convenções culturais que
podem emergir sob uma série de restrições (por exemplo, psicológicas,
fisiológicas e ecológicas)". Em seu sistema, uma comunidade de agentes
38"
NICS Reports
evolui um conjunto de melodias (canções) "após um período de criação
espontânea, adaptação e reforço de memória" (Miranda et al., 2003, p. 94).
Para atingir esta meta, os agentes possuem habilidades motoras, auditivas e
cognitivas e evoluem vetores de parâmetros de controle motor imitando as
canções uns dos outros. Todd e Werner (1999) modelaram a pressão de
acasalamento seletivo nas origens do gosto musical, onde uma sociedade
evolui canções de acasalamento através de "machos" compositores e "fêmeas"
críticas.
5.2.1 Algoritmos genéticos
Algoritmos Genéticos (AGs), um caso particular em computação evolutiva
(Holland, 1992), são uma técnica de busca inspirada por alguns dos conceitos
(e.g., herança, mutação, seleção) da teoria de Darwin sobre a evolução pela
seleção natural. Os AGs têm sido utilizados em muitas aplicações musicais
(Brown, 1999; Horowitz, 1994; Jacob, 1995; McIntyre, 1994; Moroni, Manzolli,
Zuben, & Gudwin, 2000; Tokui & Iba, 2000; Weinberg, Godfrey, Rae, &
Rhoads, 2007 ) em diferentes contextos, em especial para gerar material de
composição e improvisação.
Grosso modo, um AG envolve a geração sucessiva de populações de
cromossomos que representam o domínio a ser explorado. A cada geração, a
população anterior de cromossomos é transformada por um número de
operadores (mutação, crossover, etc.) e uma função de aptidão avalia a
adequação dos novos candidatos para uma determinada solução. De uma
geração para outra, apenas os candidatos mais aptos sobrevivem (Figura 3).
Figura 3: Algoritmo genético.
Criar uma função aptidão adequada não é uma tarefa fácil, porém. No sistema
GenJam (Biles, 1999), por exemplo, a função de aptidão é executada pelo
operador humano, que avalia cada candidato recém-gerado. Esta abordagem,
conhecida como Algoritmo Genético Interativo (AGI), apresenta um sério
problema, pois o número de candidatos gerados é normalmente grande. No
sistema Vox Populi (Moroni et al., 2000) a função de aptidão (outro AGI) é
controlada em tempo real pelo usuário através de uma interface gráfica. Em
39"
NICS Reports
qualquer caso, a seleção dos candidatos mais aptos se baseia no julgamento
(experiência musical prévia, etc.) do controlador humano.
Biles (1994) define GenJam (abreviação de Genetic Jammer) como um
estudante aprendendo a improvisar solos de jazz. Este sistema integra um
conversor de áudio para MIDI, o que permite improvisações e "trading fours"7
em tempo real com um instrumento monofônico. Neste modo, GenJam ouve os
últimos quadro compassos tocados pelo ser humano, mapeando-os para sua
representação cromossômica. Em seguida os cromossomos são modificados e
o resultado é tocado durante os quatro compassos seguintes (Biles, 1998, p.
1).
Na realidade, a adequação dos sistemas baseados em AG para a musicologia
cognitiva é muito limitada, já que de nenhuma maneira simulam o
comportamento cognitivo humano. Nas palavras de Wiggins (1999, p. 12), "...
eles carecem de estrutura em seu raciocínio - compositores desenvolveram
métodos complexos e sutis ao longo de séculos que envolvem diferentes
técnicas para resolver os problemas abordados aqui. Ninguém poderia
seriamente sugerir que um autor de hinos trabalha da mesma forma que um
AG, por isso, enquanto podem produzir resultados (quase) aceitáveis, não
esclarecem em nada o funcionamento da mente do compositor".
5.3 Agentes Inteligentes
Agentes inteligentes (Figura 4), também conhecidos como agentes racionais,
autônomos ou de software (Jones, 2008; Russell & Norvig, 2002), são sistemas
adaptativos que residem em um ambiente dinâmico e complexo em que
sentem e agem de forma autônoma executando uma série de tarefas, a fim de
atingir os objetivos para os quais foram concebidos (Maes, 1991; Russell &
Norvig, 2002).
7
Modo de execução no jazz em que os músicos se revezam improvisando trechos de quatro
compassos.
40"
NICS Reports
Figura 4: Arquitetura de um agente
Miranda e Todd (2003) identificaram três abordagens para a construção de
sistemas baseados em agentes para a composição: (i) a conversão de
comportamento extra-musical em som, (ii) algoritmo de inspiração genética e
(iii) sistemas culturais. Uma perspectiva em que os agentes não
necessariamente realizam tarefas musicais exemplifica o primeiro caso, em
que alguns aspectos do seu comportamento (como se movimentar de um
espaço definido, etc.) é mapeado para o som. A interação afeta o
comportamento dos agentes e as músicas que eles produzem, mas essa
música, por outro lado, não necessariamente afeta o seu comportamento.
Na segunda abordagem (algoritmo de inspiração genética), os agentes
reproduzem artificialmente os mecanismos da teoria da evolução pela seleção
natural de Darwin. A sobrevivência e a reprodução dos agentes dependem da
música que eles produzem e o sistema como um todo tenderia a produzir mais
"músicas de sucesso".
A terceira e última abordagem utiliza agentes virtuais e processos de autoorganização para modelar sistemas culturais onde mecanismos de reforço
evoluem as habilidades dos agentes. Apenas esta última abordagem permitiria
o "estudo das circunstâncias e dos mecanismos pelos quais a música teria
surgido e evoluído em comunidades virtuais de músicos e ouvintes" (2003, p.
1).
Sistemas em que atuam mais de um agente são chamados de multi-agentes
(Wulfhorst, Nakayama, & Vicari, 2003) e muitas vezes usados em simulações
de interações sociais. Para Miranda (1999, p. 5), a linguagem e a música
devem ser encaradas como um fenômeno cultural que emerge de interações
sociais e não como um recurso completo que surge no nascimento de um
bebê.
Em um mesmo sistema podem existir diversos tipos de agentes, cada um
especializado em uma habilidade específica, como o que ocorre na Sociedade
da Mente proposta por Minsky (1988). Outra possibilidade é que todos
possuam as mesmas habilidades. No campo musical, Cypher (Rowe, 2004)
adota a primeira abordagem, enquanto os músicos virtuais de Miranda (2002b)
e os agentes de iMe (Gimenes & Miranda, 2008), a segunda.
Além dos sistemas acima mencionados, muitos outros sistemas são baseados
em agentes. OMAX, por exemplo, modela uma topologia de agentes interativos
com foco em diferentes habilidades (ouvintes, fatiadores, alunos, etc.)
(Assayag et al., 2006). Frank (Casal & Morelli) usa técnicas de MPEG7 e
genéticas de co-evolução, juntamente com agentes artificiais em performances
ao vivo. Wulfhorst et al (2003) apresentaram uma arquitetura genérica de um
sistema multi-agentes que interagem com músicos humanos. Impett (2001)
41"
NICS Reports
utiliza um sistema para gerar composições musicais onde os agentes se
adaptam às mudanças do ambiente em que residem. Pachet (2000) usa
agentes em um contexto evolutivo para emergir formas de ritmo em simulações
em tempo real.
O modelo mimético de Miranda (2002b) utiliza agentes inteligentes para
incorporar mecanismos de evolução musical. Mimese seria a habilidade de
imitar as ações de outras pessoas e animais. A hipótese é de esta
característica seria uma das chaves para o surgimento da música em uma
sociedade virtual (Miranda, 2002b, p. 79). Todos os agentes são capazes de
ouvir e produzir sons (sintetizador vocal), além de guardar associações entre
os parâmetros motores e perceptivos (memória). Como são programados para
imitar uns aos outros, após algum tempo, um repertório compartilhado de
melodias é criado.
5.4 Modelos biologicamente inspirados
Além dos algoritmos genéticos e dos agentes, outros modelos, tais como a vida
artificial (a-life), autômatos celulares e enxames procuram inspiração em
fenômenos biológicos para abordar a criatividade musical.
5.4.1 Os modelos da vida artificial
Os sistemas baseados na vida artificial tentam replicar fenômenos biológicos
através de simulações em computador (Miranda, 2003) e lidar com conceitos
(e.g., as origens dos organismos vivos, o comportamento emergente e autoorganização) que buscam esclarecer a gênese e a evolução da música.
Miranda e Todd (2003, p. 6) observam que talvez a aplicação mais interessante
de técnicas de a-life "seja o estudo das circunstâncias e dos mecanismos pelos
quais a música teria surgido e evoluído em mundos artificiais habitado por
comunidades virtuais de músicos e ouvintes". Alguns estudiosos têm abordado
esta questão ao longo da história (Thomas, 1995; Wallin, Merker, & Brown,
2000), embora modelos computacionais não tenham sido freqüentemente
utilizados para a validação teórica.
5.4.2 Autômatos celulares
Autômatos celulares consistem em uma rede multidimensional de células, cada
uma possuindo um estado em um determinado momento, de uma série de
estados possíveis. Uma função determina a evolução destes estados em
passos de tempo discretos.
42"
NICS Reports
Figura 5: Autômato celular.
No conhecido Jogo da Vida de Conway, por exemplo, o estado das células
(viva ou morta) é determinado pelo estado de seus vizinhos. Em cada ciclo de
tempo todas as células são avaliadas e seus estados alterados de acordo com
um conjunto de regras (Tabela 1). A configuração inicial das células afeta a
dinâmica do sistema e pode permitir a emergência de comportamentos
interessantes, especialmente no domínio visual.
Tempo t
Condição
Tempo t + 1
morta
3 vizinhos vivos
viva
viva
4 ou + vizinhos vivos
morta
viva
1 ou 0 vizinhos vivos
morta
viva
2 ou 3 vizinhos vivos
viva
Tabela 1: Regras de evolução do Jogo da Vida de Conway.
Diversos sistemas têm usado autômatos celulares. Chaosynth (Miranda,
2002a) usa essa técnica para controlar um sintetizador de áudio de síntese
granular. Camus (Miranda, 2001) utiliza dois autômatos celulares simultâneos o Jogo da Vida e o Demon Cyclic Space (Griffeath & Moore, 2003) - para gerar
estruturas musicais (acordes, melodias, etc.).
5.4.3 Enxames
Os elementos individuais de um sistema auto-organizado podem se comunicar
uns com os outros e modificar seu meio ambiente através de um método
conhecido como estigmergia. O comportamento de cada elemento do sistema
não é suficiente para determinar a organização do sistema como um todo. Por
outro lado, esta organização resulta (emerge) daquele comportamento. Esse
fenômeno ocorre, por exemplo, com enxames de abelhas e bandos de
pássaros. Como
Swarm Granulador (Blackwell, 2006) é um sistemas que se baseiam no
conceito de enxames. Um ser humano toca um instrumento musical, o que
produz atratores em torno dos quais gravitam partículas artificiais. As regras
seguidas por estas partículas são simples e envolvem os conceitos de coesão
("se separados, aproximem-se"), separação ("se muito pertos, afastem-se") e
43"
NICS Reports
alinhamento ("tentativa de igualar as velocidades") (Blackwell, 2006, p. 4). O
comportamento do sistema, que é mapeado em parâmetros de som, emerge
destas interações entre as partículas.
6. Ambientes Interativos Musicais
O último dos sistemas abordados neste artigo, os Ambientes Interativos
Musicais (Musical Interactive Environments - iMe) adota diversas das técnicas
mencionadas acima. Trata-se de um sistema interativo musical que tem como
objetivo principal explorar a evolução da música tendo como referência a
transmissão de memes (estruturas) musicais e, conseqüentemente, as
faculdades perceptivas e cognitivas dos seres humanos. Este sistema segue
as condições do Modelo Ontomemético de Evolução Musical (Ontomemetical
Model of Music Evolution - OMME), que se baseia nas noções de ontogênese e
de memética8.
No sistema iMe, especialmente concebido para abordar a interatividade sob um
ponto de vista improvisacional, agentes executam atividades inspiradas no
mundo real (ouvir, executar, praticar, improvisar-solo, improvisar-grupo, ler e
compor música) e se comunicam entre si e com o mundo exterior. O resultado
dessas atividades é que a memória dos agentes é constantemente alterada e,
conseqüentemente, seus estilos musicais evoluem.
O sistema utiliza o protocolo de comunicação MIDI para a troca de mensagens
entre os agentes e entre estes e o mundo exterior, a partir do qual os agentes
extraem a representação musical simbólica necessária para as interações. Esta
representação possui paralelos com os modelos perceptivos e cognitivos
humanos, ou seja, com a forma como os sons são captados pelos ouvidos,
processados e armazenados pela memória (Snyder, 2000). Uma série de filtros
equipam os "ouvidos" dos agentes e são responsáveis pela extração de
características particulares do fluxo sonoro, tais como o aumento e/ou a
diminuição da freqüência sonora (direção da melodia) ou a densidade musical
(número simultâneo de notas).
A segmentação implementada no iMe inspira-se em princípios da psicologia
Gestalt. Em linhas gerais, o algoritmo de segmentação simula o fenômeno da
habituação, ou seja, dado que um sinal (determinada característica do fluxo
sonoro) permanece estável durante algum tempo, o seu interesse (atenção)
decai. Enquanto os agentes percebem o fluxo sonoro, a repetição do mesmo
sinal resulta em uma falta de interesse enquanto que uma mudança de
comportamento desse sinal, depois de um certo número de repetições,
desperta sua atenção.
8
Maiores detalhes sobre esse modelo podem ser encontrados em (Gimenes, 2010).
44"
NICS Reports
Em (Gimenes, 2009) são descritas com detalhe algumas das possibilidades do
OMME implementadas pelo sistema iMe, especialmente na área da
musicologia cognitiva. É possível, por exemplo que, em um determinado
cenário, um agente ouça uma peça e um outro agente ouça uma outra peça.
Ao final da simulação, a diferença do conhecimento musical dos agentes irá
corresponder à diferença dos estilos musicais entre as peças.
Uma outra área explorada pelo sistema iMe é a criatividade musical visando,
mais especificamente, contribuir para a construção da "musicalidade das
máquinas" e para a interação entre máquinas e seres humanos. Uma
performance pública foi realizada durante o Peninsula Arts Contemporary
Music Festival em fevereiro de 2008 na Universidade de Plymouth (Reino
Unido) onde essa possibilidade foi demonstrada.
7. Conclusão
Este artigo apresentou o estado da arte dos modelos computacionais com
aplicações para a musicologia cognitiva. De várias maneiras, estes modelos
procuram modelar como os seres humanos lidam com questões como
percepção, representação do conhecimento, aprendizagem, criatividade e
raciocínio.
Diversas abordagens foram apresentadas. Sistemas baseados em regras
encapsulam o conhecimento especialista humano através de regras explícitas,
enquanto que os sistemas baseados em gramática definem um conjunto finito
de regras que descrevem a estrutura desse conhecimento. Sistemas que usam
aprendizagem de máquina, por outro lado, tentam reproduzir os processos de
aquisição do conhecimento humano.
Além desses, modelos baseados na computação evolutiva (e.g., algoritmos
genéticos) e na vida artificial tentam replicar fenômenos biológicos através de
simulações computacionais, e analisam temas como as origens dos
organismos vivos, o comportamento emergente e a auto-organização.
Como vimos, alguns desses modelos se preocupam em implementar teorias
que versam sobre a cognição humana e, portanto, interessam diretamente à
musicologia cognitiva. Outros, contudo, são mais voltados a um resultado
sonoro do que propriamente à descrição desses modelos. Estes últimos foram
incluídos pelo interesse que despertam e por possuírem muitos paralelos com
os primeiros.
Bibliografia
Assayag, G., Bloch, G., Chemillier, M., Cont, A., & Dubnov, S. (2006). Omax
Brothers: a Dynamic Topology of Agents for Improvisation Learning. Workshop
on Audio and Music Computing for Multimedia, Santa Barbara, EUA.
45"
NICS Reports
Assayag, G., & Dubnov, S. (2004). Using Factor Oracles for machine
Improvisation. Soft Computing - A Fusion of Foundations, Methodologies and
Applications, 8(9), 604-610.
Assayag, G., Dubnov, S., & Delerue, O. (1999). Guessing the Composer's
Mind: Applying Universal Prediction to Musical Style. International Computer
Music Conference, Beijing, China.
Baker, M. (1989a). An Artificial Intelligence approach to musical grouping
analysis. Contemporary Music Review, 3(1), 43-68.
Baker, M. (1989b). A cognitive model for the perception of musical grouping
structures. Contemporary Music Review(Music and the Cognitive Sciences).
Biles, J. A. (1994). GenJam: A Genetic Algorithm for Generating Jazz Solos.
International Computer Music Conference, Aarhus, Denmark.
Biles, J. A. (1998). Interactive GenJam: Integrating Real-Time Performance with
a Genetic Algorithm. International Computer Music Conference, Univ. of
Michigan, Ann Arbor, EUA.
Biles, J. A. (1999). Life with GenJam: Interacting with a Musical IGA.
International Conference on Systems, Man, and Cybernetics, Tokyo, Japan.
Blackwell, T. (2006). Swarming and Music. In E. Miranda & J. A. Biles (Org.),
Evolutionary Computer Music. London: Springer.
Bod, R. (2001). A Memory-Based Model For Music Analysis: Challenging The
Gestalt Principles. International Computer Music Conference, Havana, Cuba.
Brown, C. (1999). Talking Drum: A Local Area Network Music Installation.
Leonardo Music Journal, 9, 23-28.
Cambouropoulos, E. (1998). Towards a General Computational Theory of
Musical Structure. University of Edinburgh, Edinburgh.
Camilleri, L., Carreras, F., & Duranti, C. (1990). An Expert System Prototype for
the Study of Musical Segmentation. Interface, 19(2-3), 147-154.
Casal, D. P., & Morelli, D. (2007). Remembering the future: towards an
application of genetic co-evolution in music improvisation. MusicAL Workshop,
European Conference on Artificial Life, Lisboa, Portugal.
Chouvel, J. M. (1990). Musical Form: From a Model of Hearing to an Analytic
Procedure. Interface, 22, 99-117.
Cope, D. (1991). Recombinant Music: Using the Computer to Explore Musical
Style. Computer, 24(7), 22-28.
Cope, D. (1999). One approach to musical intelligence. IEEE Intelligent
Systems, 14(3), 21-25.
46"
NICS Reports
Cope, D. (2004). A Musical Learning Algorithm. Computer Music Journal, 28(3),
12-27.
Darwin, C. (1998). The Origin of Species (new ed.): Wordsworth Editions Ltd.
Deutsch, D. (1982a). Grouping Mechanisms in Music. In D. Deutsch (Org.), The
Psychology of Music. Nova York: Academic Press.
Deutsch, D. (1982b). The Processing of Pitch Combinations. In D. Deutsch
(Org.), The Psychology of Music. Nova York: Academic Press.
Dubnov, S., & Assayag, G. (2005). Improvisation Planning And Jam Session
Design Using Concepts Of Sequence Variation And Flow Experience. Sound
and Music Computing, Salerno, Italia.
Dubnov, S., Assayag, G., Lartillot, O., & Bejerano, G. (2003). Using MachineLearning Methods for Musical Style Modeling. IEEE Computer, 10(38), 73-80.
Ebcioglu, K. (1988). An expert system for harmonizing four-part chorales.
Computer Music Journal, 12(3), 43-51.
Forte, A. (1983). Introduction to Schenkerian Analysis: Form and Content in
Tonal Music: W. W. Norton & Company.
Gimenes, M. (2010). A Ontomemética e a Evolução Musical. VI Simpósio de
Cognição e Artes Musicais, Rio de Janeiro.
Gimenes, M., & Miranda, E. (2008). An A-Life Approach to Machine Learning of
Musical Worldviews for Improvisation Systems. 5th Sound and Music
Computing Conference, Berlin, Germany.
Gimenes, M., Miranda, E., & Johnson, C. (2007). The Emergent Musical
Environments: An Artificial Life Approach. Workshop on Music and Artificial Life
(ECAL), Lisboa, Portugal.
Griffeath, D., & Moore, C. (2003). New Directions in Cellular Automata: Oxford
University Press.
Hasty, C. F. (1978). A theory of segmentation developed from late works of
Stefan Wolpe. Yale University.
Hiller, L., & Isaacson, L. (1959). Experimental Music. Nova York: McGraw-Hill.
Holland, J. H. (1992). Adaptation in Natural and Artificial Systems - An
Introductory Analysis with Applications to Biology, Control, and Artificial
Intelligence: The MIT Press.
Honing, H. (2006). On the growing role of observation, formalization and
experimental method in musicology. Empirical Musicology Review, 1(1), 2-6.
Horowitz, D. (1994). Generating Rhythms with Genetic Algorithms. International
Computer Music Conference, Aarhus, Denmark.
47"
NICS Reports
Huron, D. (Producer). (1999, 27/03/2007) The 1999 Ernest Bloch Lectures.
Lecture 1 - Music and Mind: Foundations of Cognitive Musicology. retrieved
from
http://www.music-cog.ohiostate.edu/Music220/Bloch.lectures/1.Preamble.html
Impett, J. (2001). Interaction, simulation and invention: a model for interactive
music. Workshop on Artificial Models for Musical Applications, Cosenza, Italia.
Jacob, B. L. (1995). Composing with genetic algorithms. International Computer
Music Conference, Banff Centre for the Arts, Canada.
Jones, M. T. (2008). Artificial Intelligence - A System Approach. Hingham,
Massachusetts: Infinity Science Press.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music.
Cambridge, Mass.: MIT Press.
Maes, P. (1991). Designing Autonomous Agents: Theory and Practice from
Biology to Engineering and Back: MIT Press.
Marsden, A. (2007). Automatic Derivation Of Musical Structure: A Tool For
Research On Schenkerian Analysis. International Conference on Music
Information Retrieval, Viena, Austria.
Martins, J., Gimenes, M., Manzolli, J., & Maia Jr, A. (2005). Similarity Measures
for Rhythmic Sequences. Simpósio Brasileiro de Computação Musical, Belo
Horizonte, Brasil.
McAdams, S. (1984). The auditory Image: A metaphor for musical and
psychological research on auditory organisation. In W. R. Crozier & A. J.
Chapman (Org.), Cognitive Processes in the Perception of Art. Amsterdam:
North-Holland Press.
McIntyre, R. A. (1994). Bach in a box: the evolution of four part Baroque
harmony using thegenetic algorithm. IEEE World Congress on Computational
Intelligence, Orlando, EUA.
Mendel, G. (1865). Experiments on Plant Hybridization. Paper presented at the
Meetings of the Natural History Society of Brünn. Retrieved from
http://www.mendelweb.org/Mendel.html
Minsky, M. (1988). The Society of Mind: Pocket Books.
Miranda, E. (1999). The artificial life route to the origins of music. Scientia,
10(1), 5-33.
Miranda, E. (2001). Composing music with computers. Oxford: Focal Press.
Miranda, E. (2002a). Computer sound design: synthesis techniques and
programming (2nd ed.). Oxford: Focal Press.
48"
NICS Reports
Miranda, E. (2002b). Emergent Sound Repertoires in Virtual Societies.
Computer Music Journal, 26(2), 77-90.
Miranda, E. (2003). On the evolution of music in a society of self-taught digital
creatures. Digital Creativity, 14(1), 29-42.
Miranda, E., Kirby, S., & Todd, P. (2003). On Computational Models of the
Evolution fo Music: From the Origins of Musical Taste to the Emergence of
Grammars. Contemporary Music Review, 22(2), 91-111.
Miranda, E., & Todd, P. M. (2003). A-Life and Musical Composition: A Brief
Survey. Simpósio Brasileiro de Computação Musical, Campinas, Brasil.
Moroni, A., Manzolli, J., Zuben, F. V., & Gudwin, R. (2000). Vox Populi: An
Interactive Evolutionary System for Algorithmic Music Composition. Leonardo
Music Journal, 10, 49-54.
Muscutt, K. (2007). Composing with Algorithms An Interview with David Cope.
Computer Music Journal, 31(3), 10-22.
Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures:
University Of Chicago Press.
Pachet, F. (1994). The MusES system: an environment for experimenting with
knowledge representation techniques in tonal harmony. Simpósio Brasileiro de
Computação Musical, Baxambu, Brasil.
Pachet, F. (1998). Sur la structure algebrique des sequences d'accords de
Jazz. Journees d'Informatique Musicale, Agelonde, France.
Pachet, F. (2000). Rhythms as emerging structures. International Computer
Music Conference, Berlin, Germany.
Pachet, F. (2002a). The continuator: Musical interaction with style. International
Computer Music Conference, Gothenburg, Sweden.
Pachet, F. (2002b). Interacting with a Musical Learning System: The
Continuator. In C. Anagnostopoulou, M. Ferrand & A. Smaill (Org.), Music and
Artificial Intelligence, Lecture Notes in Artificial Intelligence (Vol. 2445, pp. 119132): Springer Verlag.
Pachet, F. (2003). The Continuator: Musical Interaction With Style. Journal of
New Music Research, 32(3), 333-341.
Packard, A. S. (2007). Lamarck, the Founder of Evolution: His Life and Work:
Dodo Press.
Polansky, L. (1978). A hierarchical gestalt analysis of Ruggle's Portals.
International Computer Music Conference, Evanston, EUA.
49"
NICS Reports
Raphael, C. (1999). A Probabilistic Expert System for Automatic Musical
Accompaniment. Journal of Computational and Graphical Statistics, 10(3), 487512.
Rissanen, J. (1983). A universal data compression system. IEEE Transactions
on Information Theory, 29(5), 656-664.
Ron, D., Singer, Y., & Tishby, N. (1996). The Power of Amnesia: Learning
Probabilistic Automata with Variable Memory Length. Machine Learning, 25,
117-149.
Rowe, R. (2004). Machine Musicianship. Cambridge, MA: MIT Press.
Russell, S. J., & Norvig, P. (2002). Artificial Intelligence: A Modern Approach:
Prentice Hall.
Snyder, B. (2000). Music and Memory: An Introduction. Cambridge, MA: MIT
Press.
Temperley, D. (2004). Cognition of Basic Musical Structures: The MIT Press.
Tenney, J., & Polansky, L. (1980). Temporal Gestalt Perception in Music.
Journal of Music Theory, 24(2), 205-241.
Thom, B. (2000a). BoB: an Interactive Improvisational Music Companion.
International Conference on Autonomous Agents, Barcelona, Spain.
Thom, B. (2000b). Unsupervised Learning and Interactive Jazz/Blues
Improvisation. Seventeenth National Conference on Artificial Intelligence and
Twelfth Conference on Innovative Applications of Artificial Intelligence.
Thom, B., Spevak, C., & Hothker, K. (2002). Melodic segmentation: evaluating
the performance of algorithms and musical experts. International Computer
Music Conference, Gothenburg, Sweden.
Thomas, D. A. (1995). Music and the Origins of Language. Cambridge:
Cambridge University Press.
Todd, P. M., & Werner, G. M. (1999). Frankensteinian Methods for Evolutionary
Music Composition. In N. Griffith & P. M. Todd (Org.), Musical networks:
Parallel distributed perception and performance. Cambridge, MA: MIT
Press/Bradford Books.
Tokui, N., & Iba, H. (2000). Music Composition with Interactive Evolutionary
Computation. International Generative Art, Milan, Italia.
Trivino-Rodriguez, J. L., & Morales-Bueno, R. (2001). Using Multiattribute
Prediction Suffix Graphs to Predict and Generate Music. Computer Music
Journal, 25(3), 62-79.
50"
NICS Reports
Vercoe, B., & Puckette, M. (1985). The synthetic rehearsal: Training the
synthetic performer. International Computer Music Conference, Vancouver,
Canada.
Walker, W., Hebel, K., Martirano, S., & Scaletti, C. (1992). ImprovisationBuilder:
improvisation as conversation. International Computer Music Conference, San
Jose State University, EUA
Walker, W. F. (1994). A Conversation-Based Framework For Musical
Improvisation. University of Illinois.
Wallin, N. J., Merker, B., & Brown, S. (Eds.). (2000). The Origins of Music.
Cambridge, MA: MIT Press.
Weinberg, G., Godfrey, M., Rae, A., & Rhoads, J. (2007). A Real-Time Genetic
Algorithm In Human-Robot Musical Improvisation. CMMR, Copenhagen.
Wiggins, G. (1998). Music, syntax and the meaning of "meaning". First
Symposium on Music and Computers, Corfu, Greece.
Wiggins, G. (1999). Automated generation of musical harmony: what's missing?
International Joint Conference on Artificial Intelligence.
Wiggins, G., Papadopoulos, A., Phon-Amnuaisuk, S., & Tuson, A. (1999).
Evolutionary Methods for Musical Composition. International Journal of
Computing Anticipatory Systems.
Wiggins, G., & Smail, A. (2000). Musical Knowledge: what can Artificial
Intelligence bring to the musician? Readings in Music and Artificial Intelligence
(pp. 29-46).
Woods, W. (1970). Transition Network Grammars for Natural Language
Analysis. Communications of the ACM, 13(10), 591-606.
Wulfhorst, R. D., Nakayama, L., & Vicari, R. M. (2003). A Multiagent approach
for Musical Interactive Systems. International Joint Conference on Autonomous
Agents and Multiagent Systems, Nova York, EUA.
51"
NICS Reports
3.
An a-life approach to machine learning of
musical worldviews for improvisation
systems9
Marcelo Gimenes
Interdisciplinary Centre for Computer Music
Research
University of Plymouth, UK
[email protected]
Eduardo R. Miranda
Interdisciplinary Centre for Computer Music
Research
University of Plymouth, UK
[email protected]
Abstract: In this paper we introduce Interactive Musical Environments (iMe), an
interactive intelligent music system based on software agents that is capable of
learning how to generate music autonomously and in real-time. iMe belongs to
a new paradigm of interactive musical systems that we call “ontomemetical
musical systems” for which a series of conditions are proposed.
1. Introduction
Tools and techniques associated with Artificial Life (A-Life), a discipline that
studies natural living systems by simulating their biological occurrence on
computers, are an interesting paradigm that deals with extremely complex
phenomena. Actually, the attempt to mimic biological events on computers is
proving to be a viable route for a better theoretical understanding of living
organisms [1].
We have adopted an A-Life approach to intelligent systems design in order to
develop a system called iMe (Interactive Music Environment) whereby
autonomous software agents perceive and are influenced by the music they
hear and produce. Whereas most A-Life approaches to implementing computer
music systems are chiefly based on algorithms inspired by biological
development and evolution (for example, Genetic Algorithms [2]), iMe is based
on cultural development (for example, Imitation Games [3, 4]).
Central to iMe are the notions of musical style and musical worldview. Style,
according to a famous definition proposed by Meyer, is “a replication of
patterning, whether in human behaviour or in the artefacts produced by human
behaviour, that results from a series of choices made within some set of
constraints” [5]. Patterning implies the sensitive perception of the world and its
9
Referência original deste trabalho: Gimenes, M. and E. Miranda (2008). An A-Life Approach
to Machine Learning of Musical Worldviews for Improvisation Systems. 5th Sound and Music
Computing Conference, Berlin, Germany.
52"
NICS Reports
categorisation into forms and classes of forms through cognitive activity, “the
mental action or process of acquiring knowledge and understanding through
thought, experience and the senses” (Oxford Dictionary).
Worldview, according to Park [6], is “the collective interpretation of and
response to the natural and cultural environments in which a group of people
lives. Their assumptions about those environments and the values derived from
those assumptions.” Through their worldview people are connected to the
world, absorbing and exercising influence, communicating and interacting with
it. Hence, a musical worldview is a two-way route that connects individuals with
their musical environment.
In our research we want to tackle the issue of how different musical influences
can lead to particular musical worldviews. We therefore developed a computer
system that simulates environments where software agents interact among
themselves as well as with external agents, such as other systems and
humans. iMe's general characteristics were inspired in the real world: agents
perform musical tasks for which they possess perceptive and cognitive abilities.
Generally speaking, agents perceive and are influenced by music. This
influence is transmitted to other agents as long as they generate new music that
is then perceived by other agents, and so forth.
iMe enables the design and/or observation of chains of musical influence
similarly to what happens with human musical apprenticeship. The system
addresses the perceptive and cognitive issues involved in musical influence. It
is precisely the description of a certain number of musical elements and the
balance between them (differences of relative importance) that define a musical
style or, as we prefer to call it, a musical worldview: the musical aesthetics of an
individual or of a group of like-minded individuals (both, artificial and natural).
iMe is referred to as an ontomemetic computer music system. In Philosophy of
Science, ontogenesis refers to the sequence of events involved in the
development of an individual organism from its birth to its death. However, our
research is concerned with the development of cultural organisms rather than
biological organisms. We therefore coined the term “ontomemetic” by replacing
the affix “genetic” by the term “memetic”. The notion of “meme” was suggested
by Dawkins [7] as the cultural equivalent of gene in Biology. Musical
ontomemesis therefore refers to the sequence of events involved in the
development of the musicality of an individual.
An ontomemetic musical system should foster interaction between entities and,
at the same time, allow for the observation of how different paths of
development can lead to different musical worldviews. Modelling perception and
cognition abilities plays an important role in our system, as we believe that the
way in which music is perceived and organized in our memory has direct
53"
NICS Reports
connections with the music we make and appreciate. The more we get exposed
to certain types of elements, the more these elements get meaningful
representations in our memory. The result of this exposure and interaction is
that our memory is constantly changing, with new elements being added and
old elements being forgotten.
Despite the existence of excellent systems that can learn to simulate musical
styles [8] or interact with human performers in real-time ([9-11]), none of them
address the problem from the ontomemetic point of view, i.e.:
• to model perceptive and cognitive abilities in artificial entities based on their
human correlatives
• to foster interaction between these entities as to nurture the emergence of
new musical worldviews
• to model interactivity as ways through which reciprocal actions or influences
are established
• to provide mechanisms to objectively compare different paths and worldviews
in order to assess their impact in the evolution of a musical style.
An ontomemetic musical system should be able to develop its own style. This
means that we should not rely on a fixed set of rules that restrain the musical
experience to particular styles. Rather, we should create mechanisms through
which musical style could eventually emerge from scratch.
In iMe, software entities (or agents) are programmed with identical abilities.
Nevertheless, different modes of interactions give rise to different worldviews.
The developmental path, that is the order in which the events involved in the
development of a worldview takes place, plays a crucial role here. Paths are
preserved in order to be reviewed and compared with other developmental
paths and worldviews. A fundamental requisite of an ontomemetic system is to
provide mechanisms to objectively compare different paths and worldviews in
order to assess the impact that different developmental paths might have had in
the evolution of a style. This is not trivial to implement.
1.1 Improvisation
Before we introduce the details of iMe, a short discussion about musical
improvisation will help to better contextualise our system. Not surprisingly,
improvised music seems to be a preferred field when it comes to the application
of interactivity, and many systems have been implemented focusing on
controllers and sound synthesis systems designed to be operated during
performance. The interest in exploring this area, under the point of view of an
ontomemetic musical system relies on the fact that, because of the intrinsic
characteristics of improvisation, it is intimately connected with the ways human
54"
NICS Reports
learning operates. However, not many
improvisation to date are able to learn.
systems
produced
for
music
According to a traditional definition, musical improvisation is the spontaneous
creative process of making music while it is being performed. It is like speaking
or having a conversation as opposed to reciting a written text.
As it encompasses musical performance, it is natural to observe that
improvisation has a direct connection with performance related issues such as
instrument design and technique. Considering the universe of musical elements
played by improvisers, it is known that certain musical ideas are more adapted
to be played with polyphonic (e.g., piano, guitar) as opposed to monophonic
instruments (e.g., saxophone, flute) or with keyboards as opposed to wind
instruments, and so forth.
Since instrument design and technique affect the easiness or difficulty of
performing certain musical ideas, we deduce that different musical elements
must affect the cognition of different players in different ways.
The technical or “performance part” of a musical improvisation is, at the same
time, passionate and extremely complex but, although we acknowledge the
importance of its role in defining one's musical worldview, our research (and this
paper) is focused primarily on how: (i) music is perceived by the sensory
organs, (ii) represented in memory and (iii) the resulting cognitive processes
relevant to musical creation in general (and more specifically, to improvisation)
conveys the emergence and development of musical worldviews.
Regarding specifically the creative issue, it is important to remember that
improvisation, at least in its most generalised form, follows a protocol that
consists of developing musical ideas “on top” of pre-existing schemes. In
general, these include a musical theme that comprises, among other elements,
melody and harmonic structure. Therefore, in this particular case, which
happens to be the most common, one does not need to create specific
strategies for each individual improvisational session but rather follow the
generally accepted protocol.
Despite of the fact that this may give the impression to be limiting the system,
preventing the use of more complex compositional strategies, one of the major
interests of research into music improvisation relies on the fact that once a
musical idea has been played, one cannot erase it. Therefore, each individual
idea is an “imposition” in itself that requires completion that leads to other ideas,
which themselves require completion, and so on. Newly played elements
complete and re-signify previous ones in such ways that the improviser's
musical worldview is revealed. In this continuous process two concurrent and
different plans play inter-dependent roles: a pathway (the “lead sheet”) to which
the generated ideas have to adapt and the “flow of musical ideas” that is
55"
NICS Reports
particular to each individual at each given moment and that imply (once more)
their musical worldview.
The general concepts introduced so far are all an integral part of iMe and will be
further clarified as we introduce the system.
2. The iMe System
iMe was conceived to be a platform in which software agents perform music
related tasks that convey musical influence and emerge their particular styles.
Tasks such as read, listen, perform, compose and improvise have already been
implemented; a number of others are planned for the future. In a multi-agent
environment one can design different developmental paths by controlling how
and when different agents interact; a hypothetical example is shown in Fig. 1.
Fig. 1. The developmental paths of two agents.
In the previous figure we see the representation of a hypothetical timeline
during which two agents (Agent 'A' and Agent 'B') perform a number of tasks.
Initially, Agent 'A' would listen to one piece of music previously present in the
environment. After that, Agent 'B' would listen to 4 pieces of music and so forth
until one of them, Agent 'A' would start to compose its own pieces. From this
moment Agent 'B' would listen to the pieces composed by Agent 'A' until Agent
'B' itself would start to compose and then Agent 'A' would interact with Agent
'B's music as well.
In general, software agents should normally act autonomously and decide if and
when to interact. Nevertheless, in the current implementation of iMe we decided
to constrain their skills in order to have a better control over the development of
their musical styles: agents can choose which music they interact with but not
how many times or when they interact.
When agents perform composition or improvisation tasks, new pieces are
delivered to the environment and can be used for further interactions. On the
other hand, by performing tasks such as read or listen to music, agents only
receive influence.
Interaction can be established not only amongst the agents themselves, but
also between agents and human musicians. The main outcome of these
56"
NICS Reports
interactions is the emergence and development of the agents' musical styles as
well as the musical style of the environment as a whole.
The current implementation of iMe's perceptive algorithms was specially
designed to take into account a genre of music texture (homophonic) in which
one voice (the melody) is distinguishable from the accompanying harmony. In
the case of the piano for instance, the player would be using the left hand to
play a series of chords while the right hand would be playing the melodic line.
iMe addresses this genre of music but also accepts music that could be
considered a subset of it; e.g., a series of chords, a single melody or any
combination of the two. Any music that fits into these categories should
generate an optimal response by the system. However, we are also
experimenting with other types of polyphonic music with a view on widening the
scope of the system.
In a very basic scenario, simulations can be designed by simply specifying:
• A number of agents
• A number of tasks for each agent
• Some initial music material for the interactions
iMe generates a series of consecutive numbers that correspond to an abstract
time control (cycle). Once the system is started, each cycle number is sent to
the agents, which then execute the tasks that were scheduled to that particular
cycle.
As a general rule, when an agent chooses a piece of music to read (in the form
of a MIDI file) or is connected to another agent to listen to its music, it receives
a data stream which is initially decomposed into several feature streams, and
then segmented as described in the next section.
2.1 System's Perception and Memory
iMe's perception and memory mechanisms are greatly inspired by the work of
Snyder [12] on musical memories. According to Snyder, “the organisation of
memory and the limits of our ability to remember have a profound effect on how
we perceive patterns of events and boundaries in time. Memory influences how
we decide when groups of events end and other groups of events begin, and
how these events are related. It also allows us to comprehend time sequences
of events in their totality, and to have expectations about what will happen next.
Thus, in music that has communication as its goal, the structure of the music
must take into consideration the structure of memory - even if we want to work
against that structure”.
iMe's agents initially “hear” music and subsequently use a number of filters to
extract independent but interconnected streams of data, such as melodic
57"
NICS Reports
direction, melodic inter-onset intervals, and so on. This results in a feature data
stream that is used for the purposes of segmentation, storage (memory) and
style definition (Fig. 2).
Fig. 2. Feature extraction and segmentation.
To date we have implemented ten filters, which extract information from melodic
(direction, leap, inter-onset interval, duration and intensity) and non-melodic
notes (vertical number of notes, note intervals from the melody, inter-onset
interval, duration and intensity). As it might be expected, the higher the number
of filters, the more accurate is the representation of the music. In order to help
clarify these concepts, in Fig. 3 we present a simple example and give the
corresponding feature data streams that would have been extracted by an
agent, using the ten filters:
1
2
3
4
5
6
7
8
9
10
11
...
a)
0
1
1
1
1
-1
-1
-1
1
1
1
...
b)
0
2
2
1
2
2
1
2
2
1
2
...
c)
120
120
120
120
120
120
120
120
120
120
120
...
d)
120
120
120
120
120
120
120
120
120
120
120
...
e)
6
6
6
6
6
6
6
6
6
6
6
...
f)
2
0
0
0
0
0
0
0
2
0
0
...
g)
5, 7
-2
-2
-2
-2
-2
-2
-2
7, 9
-2
-2
...
h)
120
-2
-2
-2
-2
-2
-2
-2
120
-2
-2
...
i)
960
-2
-2
-2
-2
-2
-2
-2
960
-2
-2
...
58"
NICS Reports
j)
6
-2
-2
-2
-2
-2
-2
-2
6
-2
-2
...
Fig. 3. Feature streams, where a) melody direction, b) melody leap, c) melody interonset
interval, d) melody duration, e) melody intensity, f) non melody number of notes, g) non
melody note intervals from melody, h) non melody interonset interval, i) non melody
duration, j) non melody intensity.
Number -2 represents the absence of data in a particular stream. Melody
direction can value -1, 0 and 1, meaning descending, lack of and ascending
movement, respectively. Melody leaps and intervals are shown in half steps. In
streams that hold time information (interonset intervals and duration) the value
240 (time resolution) is assigned to quarter notes. Intensity is represented by
the MIDI range (0 to 127); in Fig. 3 this was simplified by dividing this value by
ten.
After the extraction of the feature data stream, the next step is the segmentation
of the music. A fair amount of research has been conducted on this subject by a
number of scholars. In general, the issue of music segmentation remains
unsolved to a great extent due to its complexity. One of the paradigms that
substantiate segmentation systems has been settled by Gestalt psychologists
who argued that perception is driven from the whole to the parts by the
application of concepts that involve simplicity and uniformity in organising
perceptual information [13]. Proximity, closure, similarity and good continuation
are some of these concepts.
Fig. 4 shows a possible segment from piece by J. S. Bach (First Invention for
Two Voices) according to Gestalt theory. In this case the same time length
separates all except for the first and the last notes, which are disconnected from
the previous and the following notes by rests. This implies the application of
similarity and proximity rules.
Fig. 4. An example of a music segment.
In the example discussed below we decided to build the segmentation algorithm
on top of only one of the principles that guide group organization: the
occurrence of surprise. As the agents perceive the continuous musical stream
by the various expert sensors (filters), wherever there is a break in the
continuity of the behaviour of one (or a combination of some) of the feature
streams, this is an indication of positions for a possible segmentation. The
whole musical stream is segmented at these positions. If discontinuities happen
in more than one feature at the same time, this indicates the existence of
59"
NICS Reports
different levels of structural organization within the musical piece; this conflict
must be resolved (this will be clarified later).
In the example of Fig. 3, we shall only consider the melody direction stream ('a'
of Fig. 3). Hence, every time the direction of the melody is about to change, a
new grouping starts. These places are indicated on the musical score shown in
Fig. 3 with the symbol 'v'.
To designate these segmented musical structures we adopted the expression
“musical meme” or simply “meme”, a term that has been introduced by Dawkins
[7] to describe basic units of cultural transmission in the same way that genes,
in biology, are units of genetic information. “Examples of memes are tunes,
catch-phrases, clothes fashions, ways of making pots or of building arches. Just
as genes propagate themselves in the gene pool by leaping from body to body
via sperm and eggs, so memes propagate in the meme pool by leaping from
brain to brain via a process which, in a broad sense, can be called imitation.”
[7].
The idea of employing this concept is attractive because it covers both the
concept of structural elements and processes of cultural development, which fits
well with the purpose of our research.
A meme is generally defined as a short musical structure, but it is difficult to
ascertain what is the minimal acceptable size for a meme. In iMe, memes are
generally small structures in the time dimension and they can have any number
of simultaneous notes. Fig. 5 shows a meme (from the same piece of the
segment shown in Fig. 4) and its memotype representation following the
application of three filters: melodic direction, leap and duration:
Mel. direction: 0 1 1 1 -1 1 -1 1
-1
Mel. leap:
0 2 2 1 3 2 4 7 12
Mel. duration: 0 60 60 60 60 60 60 120 120
Fig. 5. Meme and corresponding memotype representation.
Since the memes were previously separated into streams of data, they can be
represented as a group of memotypes, each corresponding to a particular
musical feature. A meme is therefore represented by 'n' memotypes, in which 'n'
is the number of streams of data representing musical features. In any meme
the number of elements of all the memotypes is the same and corresponds to
the number of vertical structures. By “vertical structure” we mean all music
elements that happen at the same time.
60"
NICS Reports
2.2 Memory
The execution of any of the musical tasks requires the perception and
segmentation of the musical flow and the adaptation of the memory. As a result,
the agents need to store this information in their memory by comparing it with
the elements that were previously perceived. This is a continuous process that
constantly changes the state of the memory of the agents.
In iMe, the memory of the agents comprises a Short Term Memory (STM) and a
Long Term Memory (LTM). The STM consists of the last x memes (x is defined
“a priori” by the user) that were most recently brought to the agent's attention,
representing the focus of their “awareness”.
A much more complex structure, the LTM is a series of specialized “Feature
Tables” (FTs), a place designed to store all the memotypes according to their
categories. FTs are formed by “Feature Lines” (FLs) that keep a record of the
memotypes, the dates of when the interactions took place (date of first contact dfc, date of last contact - dlc), the number of contacts (noc), weight (w) and
“connection pointers” (cp). In Fig. 6 we present the excerpt of a hypothetical FT
(for melody leaps) in which there are 11 FLs. The information between brackets
in this Fig. corresponds to the memotype and the numbers after the colon
correspond to the connection pointers. This representation will be clarified by
the examples given later.
Feature n. 2 (melody leaps):
Line 0: [0 0]: 0 0 0 0 0 0 0 0 0 0
Line 1: [2 2 0 1 0 1 2 5 0]: 1
Line 2: [1 0 0 3 2 2 0]: 2 20 10 10
Line 3: [1 0 0 0 1 2 2 4]: 3
Line 4: [2 0 2 0 4 1 3 0]: 4
Line 5: [0 3 2 7 0 2 0 4]: 5 8 10
Line 6: [3 0 2 0 3 2 4]: 6 5 3
Line 7: [1 0 1 2 2 0 3]: 7 3
Line 8: [2 0 2 0 2 0 0]: 8 31 8
Line 9: [2 0]: 47 4 9 9 4 9 9
Line 10: [5 0 8 2 1 2]: 10
Fig. 6. A Feature Table excerpt.
2.2.1 Adaptation
Adaptation is generally accepted as one of the cornerstones of evolutionary
theories, Biology and indeed A-Life systems. With respect to cultural evolution,
however, the notion of adaptation still seem to generate heated debates
amongst memetic theory scholars. Cox [14] asserts that the “memetic
hypothesis” is based on the concept that the understanding that someone has
on sounds comes from the comparison with the sounds already produced by
this person. The process of comparison would involve tacit imitation, or memetic
participation that is based on the previous personal experience on the
production of the sound.
61"
NICS Reports
According to Jan [15] “the evolution of music occurs because of the differential
selection and replication of mutant memes within idioms and dialects. Slowly
and incrementally, these mutations alter the memetic configuration of the dialect
they constitute. Whilst gradualistic, this process eventually leads to fundamental
changes in the profile of the dialect and, ultimately, to seismic shifts in the
overarching principles of musical organization, the rules, propagated within
several dialects.”
iMe defines that every time agents interact with a piece of music their musical
knowledge changes according to the similarities and/or differences that exist
between this piece and their own musical “knowledge”. At any given time, each
memotype for each one of the FTs in an agent's memory is assigned with a
weight that represents their relative importance in comparison with the
corresponding memotypes in the other memes.
The adaptation mechanism is fairly simple: the weight is increased when a
memotype is perceived by an agent. The more an agent listens to a memotype,
the more its weight is increased. Conversely, if a memotype is not listened to for
some time, its weight is decreased; in other words, the agent begins to forget it.
The forgetting mechanism - an innovation if compared to other systems, such
as the ones cited earlier - is central to the idea of an ontomemetic musical
system and is responsible for much of the ever-changing dynamics of the
weights of memotypes. In addition to this mechanism, we have implemented a
“coefficient of permeability” (values between 0 and 1) that modulates the
calculation of the memotype weights. This coefficient is defined by a group of
other variables (attentiveness, character and emotiveness), the motivation
being that some tasks entail more or less transformation to the agent's memory
depending on the required level of attentiveness (e.g., a reading task requires
less attention than an improvisation task). On the other hand, attributes such as
character and emotiveness can also influence the level of “permeability” of the
memory.
When a new meme is received by the memory, if the memotype is not present
in the corresponding FT, a new FL is created and added to the corresponding
FT. The same applies to all the FTs in the LTM. The other information in the
FLs (dates, weight and pointers) is then (re)calculated. This process is
exemplified below.
Let us start a hypothetical run in which the memory of an agent is completely
empty. As the agent starts perceiving the musical flow (Fig. 3), the agent's
“sensory organs” (feature filters) generate a parallel stream of musical features,
according to the mechanism described earlier. The first meme (Fig. 7) then
arrives at the agent's memory and, as a result, the memory is adapted (Fig. 8).
62"
NICS Reports
Feature stream:
mdi: 0, 1, 1, 1
mle: 0, 2, 2, 1
mii: 120, 120, 120, 120
mdu: 120, 120, 120, 120
Fig. 7. Meme 1, where mdi is melody direction, mle is melody leap, mii is melody
interonset interval and mdu is melody duration.
In order to keep the example simple, we are only showing the representation of
four selected features: melody direction (FT1), leap (FT2), interonset interval
(FT3) and duration (FT4). Fig. 8 shows the memotypes in each of the Feature
Tables. Notice that the connection pointers (cp) of FTs 2 to 4 actually point to
the index (i) of the memotype of FT1. The initial weight (w) was set to 1.0 for all
of the memotypes and the information date (dfc, dlc) refers to the cycle in which
this task is performed during the simulation; in this case, the first task.
i Memotype
dfc
dlc noc
w
cp
Melody direction:
1
0, 1, 1, 1
1
1 1 1.0
1
1 1 1.0
1
1
1 1 1.0
1
1
1 1 1.0
1
Melody leap:
1
0, 2, 2, 1
Melody interonset interval:
1
120, 120, 120, 120
Melody duration:
1
120, 120, 120, 120
Fig. 8. Agent's memory after adaptation to meme 1.
Then comes the next meme (Fig. 9), as follows:
Feature stream:
mdi: 1, -1, -1
mle: 2, 2, 1
mii: 120, 120, 120
mdu: 120, 120, 120
Fig. 9. Meme 2.
And the memory is adapted accordingly (Fig. 10):
i Memotype
Dfc dlc noc w
cp
Melody direction:
1 0, 1, 1, 1
1
1
1 1.0
2
2 1, -1, -1
1
1
1 1.0
1 0, 2, 2, 1
1
1
1 1.0
1
2 2, 2, 1
1
1
1 1.0
2
Melody leap:
63"
NICS Reports
Melody interonset interval:
1 120, 120, 120, 120
1
1
1 1.0
1
2 120, 120, 120
1
1
1 1.0
2
1 120, 120, 120, 120
1
1
1 1.0
1
2 120, 120, 120
1
1
1 1.0
2
Melody duration:
Fig. 10. Agent's memory after adaptation to meme 2.
Here all the new memotypes are different from the previous ones and stored in
separate FLs in the corresponding FTs. Now the memotype of index 1 in FT1
points (cp) to the index 2. Differently from the other FTs, this information
represents the fact that memotype of index 2 comes after the memotype of
index 1. This shows how iMe keeps track of the sequence of memes to which
the agents are exposed. The cp of the other FTs still point to the index in FT1
that connect the elements of the meme to which the memory is being adapted.
The weights of the new memes are set to 1.0 as previously.
The same process is repeated with the arrival of meme 3 (Figs. 11 and 12) and
meme 4 (Figs. 13 and 14).
Feature stream:
mdi: -1, 1, 1, 1, 1, 1
mle: 2, 2, 1, 2, 2, 2
mii: 120, 120, 120, 120, 120, 120
mdu: 120, 120, 120, 120, 120, 120
Fig. 11. Meme 3.
i Memotype
dfc dlc Noc W
Cp
Melody direction:
1 0, 1, 1, 1
1
1
1 1.0
2
2 1, -1, -1
1
1
1 1.0
3
3 -1, 1, 1, 1, 1, 1
1
1
1 1.0
1 0, 2, 2, 1
1
1
1 1.0
1
2 2, 2, 1
1
1
1 1.0
2
3 2, 2, 1, 2, 2, 2
1
1
1 1.0
3
1 120, 120, 120, 120
1
1
1 1.0
1
2 120, 120, 120
1
1
1 1.0
2
3 120, 120, 120, 120, 120, 120
1
1
1 1.0
3
1 120, 120, 120, 120
1
1
1 1.0
1
2 120, 120, 120
1
1
1 1.0
2
Melody leap:
Melody interonset interval:
Melody duration:
64"
NICS Reports
3 120, 120, 120, 120, 120, 120
1
1
1 1.0
3
Fig. 12. Agent's memory after adaptation to Meme 3.
Feature stream:
mdi: 1, -1, -1
mle: 1, 1, 2
mii: 120, 120, 120
mdu: 120, 120, 120
Fig. 13. Meme 4.
The novelty here is that the memotypes for melody direction, interonset interval
and duration had already been stored in the memory. Only the melody leap has
new information and, as a result a new FL was added to FT2 and not to the
other FTs. The weights of the repeated memotypes were increased by '0.1',
which means that the relative weight of this information increased if compared
to the other memotypes. We can say thereafter that the weights ultimately
represent the relative importance of all the memotypes in relation to each other.
The memotype weight is increased by a constant factor (e,g, f = 0.1) every time
it is received and decreases by another factor if, at the end of the cycle, it is not
“perceived”. The later case will not happen in this example because we are
considering that the run is being executed entirely in one single cycle.
i
1
2
3
1
2
3
4
1
2
3
1
2
3
Memotype
dfc dlc
Melody direction:
0, 1, 1, 1
1 1
1, -1, -1
1 1
-1, 1, 1, 1, 1, 1
1 1
Melody leap:
0, 2, 2, 1
1 1
2, 2, 1
1 1
2, 2, 1, 2, 2, 2
1 1
1, 1, 2
1 1
Melody interonset interval:
120, 120, 120, 120
1 1
120, 120, 120
1 1
120, 120, 120, 120, 120, 120 1 1
Melody duration:
120, 120, 120, 120
1 1
120, 120, 120
1 1
120, 120, 120, 120, 120, 120 1 1
noc W
Cp
1
2
1
1.0
1.1
1.0
2
3
2
1
1
1
1
1.0
1.0
1.0
1.0
1
2
3
2
1
2
1
1.0 1
1.1 2, 2
1.0 3
1
2
1
1.0 1
1.1 2, 2
1.0 3
Fig. 14. Agent's memory after adaptation to meme 4.
Finally, the memory receives the last meme (Fig. 15) and is adapted
accordingly (Figs. 15 and16).
Feature stream:
mdi: -1, 1, -1, -1, -1
mle: 2, 2, 2, 2, 1
mii: 120, 120, 120, 120, 120
mdu: 120, 120, 120, 120, 480
Fig. 15. Meme 5.
65"
NICS Reports
i Memotype
dfc dlc noc w
cp
Melody direction:
1 0, 1, 1, 1
1
1
1 1.0
2
2 1, -1, -1
1
1
2 1.1 3, 4
3 -1, 1, 1, 1, 1, 1
1
1
1 1.0
4 -1, 1, -1, -1, -1
1
1
1 1.0
1 0, 2, 2, 1
1
1
1 1.0
1
2 2, 2, 1
1
1
1 1.0
2
3 2, 2, 1, 2, 2, 2
1
1
1 1.0
3
4 1, 1, 2
1
1
1 1.0
2
5 2, 2, 2, 2, 1
1
1
1 1.0
4
1 120, 120, 120, 120
1
1
1 1.0
1
2 120, 120, 120
1
1
2 1.1 2, 2
3 120, 120, 120, 120, 120, 120
1
1
1 1.0
3
4 120, 120, 120, 120, 120
1
1
1 1.0
4
1 120, 120, 120, 120
1
1
1 1.0
1
2 120, 120, 120
1
1
2 1.1 2, 2
3 120, 120, 120, 120, 120, 120
1
1
1 1.0
3
4 120, 120, 120, 120, 480
1
1
1 1.0
4
2
Melody leap:
Melody interonset interval:
Melody duration:
Fig. 16. Agent's memory after adaptation to meme 5.
3.3 Generative Processes
Gabora [16] explains that, in the same way that information patterns evolve
through biological processes, mental representation - or memes - evolves
through the adaptive exploration and transformation of an informational space
through variation, selection and transmission. Our minds perform tasks on its
replication through an aptitude landscape that reflects internal movements and
a worldview that is continuously being updated through the renovation of
memes.
In iMe agents are also able to compose through processes of re-synthesis of
the different memes from their worldview. Obviously, the selection of the
memes that will be used in a new composition implies that the musical
worldview of this agent is also re-adapted by reinforcing the weights of the
memes that are chosen.
In addition to compositions (non real-time), agents also execute two types of
real-time generative tasks: solo and collective improvisations. The algorithm is
described below.
66"
NICS Reports
3.3.1 Solo improvisations
During solo improvisations, only one agent play at a time, following the steps
below
Step 1: Generate a new meme according to the current “meme generation
mode”
The very first memotype of a new piece of music is chosen from the first
Feature Table (FT1), which guides de generation of the whole sequence of
memes, in a Markov-like chain. Let us assume that the user configured FT1 to
represent melody direction. Hence, this memotype could be, hypothetically [0,
1, 1, -1], where 0 represents “repeat the previous note”, 1 represents upward
motion and -1 represents downward motion. Once the memotype from FT1 is
chosen (based on the distribution of probability of the weights of the memotypes
in that table), the algorithm looks at the other memotypes at the other FTs to
which the memotype at FT1 points at and chooses a memotype for each FT of
the LTM according to the distribution of probability of the weights at each FT. At
this point we would end up with a new meme (a series of n memotypes, where
n = number of FTs in the LTM).
The algorithm of the previous paragraph describes one of the generation modes
that we have implemented: the “LTM generation mode”. There are other modes.
For instance, there is the “STM generation mode”, where agents choose from
the memes stored in their Short Term Memory. Every time a new meme is
generated, the agent checks the Compositional and Performance Map
(explanation below) to see which generation mode is applicable at any given
time.
Step 2: Adapt the memory with the newly generated meme
Once the new meme is generated, the memory is immediately adapted to
reflect this choice, according to the criteria explained in the previous section.
Step 3: Adapt the meme to the Compositional and Performance Map (CPM)
The new meme is then adapted according to criteria foreseen at the CPM. The
CPM (Fig. 17), iMe's equivalent to a “lead sheet”, possesses instructions
regarding a number of parameters that address both aspects of the
improvisation: the generation of new musical ideas and the performance of
these ideas. Examples of the former are: the meme generation mode,
transformations to the meme, local scales and chords, note ranges for right and
left hand. Examples of the latter are: ratio of loudness between melodic and
non-melodic notes, shifts for note onset, loudness and duration both for melodic
and non-melodic notes. Instructions regarding the performance only affect the
sound that is generated by the audio output of the system and is not stored with
the composition.
67"
NICS Reports
Fig. 17. A CPM excerpt.
The instructions (or “constraints”) contained in the CPM are distributed on a
timeline. The agent checks the constraints that are applicable at the
“compositional pointer”, a variable that controls the position of the composition
on the timeline, and acts accordingly.
Step 4: Generate notes and play the meme (if in real time mode)
Until this moment, the memes are not real notes but only meta-representations
described by the memotypes (melody direction, melody leap, etc.). Given the
previously generated notes and the CPM, the “actual notes” of the meme must
be calculated and sent to a playing buffer.
Step 5: Store the meme in the composition
An array with the information of the sequence of the memes is kept with the
composition for future reference and tracking of the origin of each meme. There
is another generation mode, the “MemeArray generation mode”, where an
agent can retrieve any previously generated meme and choose it again during
the composition.
Step 6: Repeat previous steps until the end of the CPM
The agent continuously plays the notes of the playing buffer. When the number
of notes in this buffer is equal to or less than 'x' (parameter configured by the
user), the algorithm goes back to step 1 above and a new meme is generated
until the whole CPM is completed.
3.3.2 Collective improvisations
The steps for collective improvisations are very similar to the steps for solo
improvisations, except for the fact that the agents play along with a human
being. We have implemented this task as two separate sub-tasks (a listening
sub-task and a solo improvisation sub-task) running in separate threads.
Memes are generated as in a solo improvisation and the agents' memory is
equally affected by the memes they choose as well as by the memes that they
68"
NICS Reports
listen from the musical data originated by the external improviser. Both agent
and external improviser follow the same CPM.
At the end of the improvisation (solo or interactive), the composition is stored in
the system in order to be used in further runs of the system.
3. Conclusions and Further Work
In this paper we introduced Interactive Musical Environments (iMe) for the
investigation of the emergence and evolution of musical styles in environments
inhabited by artificial agents, under the perspective of human perception and
cognition. This system belongs to a new paradigm of interactive musical
systems that we refer to as “ontomemetical musical systems” for which we
propose a series of prerequisites and applications.
As seen from some of the experiments that we have presented, we understand
that iMe has the potential to be extremely helpful in areas such as the
musicological investigation of musical styles and influences. Besides the study
of the development of musical styles in artificial worlds, we are also conducting
experiments with human subjects in order to assess iMe's effectiveness to
evaluate musical influences in inter-human interaction. The study of creativity
and interactive music in artificial and real worlds could also benefit with a
number of iMe's features, which we are currently evaluating as well.
The memory of an agent is complex and dynamic, comprising of all memotypes,
their weights and connection pointers. The execution of musical tasks affects
the memory state in proportion to the appearance of different memes and
memotypes. A particular musical ontomemesis can thereafter be objectively
associated with the development of any agent's “musicality”.
Bearing in mind that iMe can be regarded as a tool for the investigation of
musical ontomemesis as much as a tool for different sorts of musicological
analyses, a series of different simulation designs could be described.
Future improvements to the system will include the introduction of algorithms
that would allow iMe to become a self-sustained artificial musical environment
such as criteria to control the birth and demise of agents and the automatic
definition of their general characteristics such as attentiveness, character,
emotiveness, etc. Agents should also possess the ability to decide when and
what tasks to perform, besides being able to develop their own Compositional
and Performance Maps.
Acknowledgment
The authors would like to thank the funding support from the Brazilian
Government's Fundacao Coordenacao de Aperfeicoamento de Pessoal de
Nivel Superior (CAPES).
69"
NICS Reports
References
1. Miranda, E.R., The artificial life route to the origins of music. Scientia, 1999.
10(1): p. 5-33.
2. Biles, J.A. GenJam: A Genetic Algorithm for Generating Jazz Solos. in
International Computer Music Conference. 1994.
3. Miranda, E.R., Emergent Sound Repertoires in Virtual Societies. Computer
Music Journal, 2002. 26(2): p. 77-90.
4. Miranda, E.R., At the Crossroads of Evolutionary Computation and Music:
Self-Programming Synthesizers, Swarm Orchestras and the Origins of Melody.
Evolutionary Computation, 2004. 12(2): p. 137-158.
5. Meyer, L.B., Style and Music: Theory, History, and Ideology. 1989,
Philadelphia: University of Pennsylvania Press.
6. Park, M.A., Introducing Anthropology: An Integrated Approach. 2002:
McGraw-Hill Companies.
7. Dawkins, R., The Selfish Gene. 1989, Oxford: Oxford University Press.
8. Cope, D., Computers and Musical Style. 1991, Oxford: Oxford University
Press.
9. Rowe, R., Interactive Music Systems: Machine Listening and Composing.
1993: MIT Press.
10. Pachet, F., Musical Interaction with Style. Journal of New Music Research,
2003. 32(3): p. 333-341.
11. Assayag, G., et al. Omax Brothers: a Dynamic Topology of Agents for
Improvization Learning. in Workshop on Audio and Music Computing for
Multimedia, ACM Multimedia. 2006. Santa Barbara.
12. Snyder, B., Music and Memory: An Introduction. 2000, Cambridge, MA: MIT
Press.
13. Eysenck, M.W. and M.T. Keane, Cognitive Psychology: A Student's
Handbook. 2005: Psychology Press.
14. Cox, A., The mimetic hypothesis and embodied musical meaning.
MusicæScientiæ, 2001. 2: p. 195–212.
15. Jan, S., Replicating sonorities: towards a memetics of music. Journal of
Memetics - Evolutionary Models of Information Transmission, 2000. 4.
16. Gabora, L., The Origin and Evolution of Culture and Creativity. Journal of
Memetics, 1997.
70"
NICS Reports
4.
Vox Populi: An Interactive Evolutionary
System for Algorithmic Music Composition10
Artemis Moroni
Artemis Moroni (researcher), Technological Center
for Informatics—The Automation Institute (CTI/IA),
Rod D. Pedro I, km 143,6, Campinas, São Paulo
13081/1970, Brazil. E-mail: <[email protected]>.
Jônatas Manzolli
Jônatas Manzolli (educator), State University of
Campinas—Interdisciplinary Nucleus of Sound
Communication (UNICAMP/NICS), Cidade
Universitária “Zeferino Vaz,” Barão Geraldo,
Campinas, São Paulo 13081/970, Brazil. E-mail:
<[email protected]>.
Fernando Von Zuben
Fernando Von Zuben (educator), State University
of Campinas—Faculty of Electrical and Computer
Engineering (UNICAMP/FEEC), Cidade
Universitária “Zeferino Vaz,” Barão Geraldo,
Campinas, São Paulo 13081/970, Brazil. E-mail:
<[email protected]>.
Ricardo Gudwin
Ricardo Gudwin (educator), State University of
Campinas—Faculty of Electrical and Computer
Engineering (UNICAMP/FEEC) Cidade
Universitária “Zeferino Vaz,” Barão Geraldo,
Campinas, São Paulo 13081/970, Brazil. E-mail:
<[email protected]>.
Abstract
While recent techniques of digital sound synthesis have put numerous new
sounds on the musician’s desktop, several artificial-intelligence (AI) techniques
have also been applied to algorithmic composition. This article introduces Vox
Populi, a system based on evolutionary computation techniques for composing
music in real time. In Vox Populi, a population of chords codified according to
MIDI protocol evolves through the application of genetic algorithms to maximize
a fitness criterion based on physical factors relevant to music. Graphical
controls allow the user to manipulate fitness and sound attributes.
In Darwin’s time, most geologists subscribed to “catastrophe theory”: that the
Earth would be punished many times over by floods, earthquakes and other
catastrophes, able to destroy all forms of life. On his voyage on board the
10
Referência original deste trabalho: Moroni, A., J. Manzolli, et al. (2000). "Vox Populi: An
Interactive Evolutionary System for Algorithmic Music Composition." Leonardo Music Journal
10: 49-54.
71"
NICS Reports
Beagle, Darwin verified that the diverse animal species of a region differed from
each other in minimal details, but he did not understand how this could result
from a “natural” selection. In October 1838, he learned from a small book,
Essay on Population Origin by Thomas Malthus, about the factors influencing
evolution. Malthus, in turn, was inspired by Benjamin Franklin (the same person
who had invented the lightning rod). Franklin had noted the fact that in nature
there must be locally limiting factors, or a unique plant or animal would spread
all over the Earth; it was only the existence of different kinds of animals that
maintained them in equilibrium. This was the universal mechanism that Darwin
was looking for. The factor responsible for the way evolution happens is natural
selection in the fight for life, i.e. those who are better adapted to the
environment survive and assure species continuity. Furthermore, the fight for
survival among members of a species is more obstinate, since they must fight
over shared resources; small differences, or positive deviations from the typical,
are most valuable. The more obstinate the fight is, the faster the evolution; in
this context only those better adapted themselves survive. However,
characteristics that are positive in a specific environment may have no value in
another.
D. Hofstadter, in Metamagical Themas [1], discusses the arbitrariness of the
genetic code. According to him, the first moral of this development is: Efficiency
matters. A second moral, more implicit, is: Having variants matters. The ratchet
of evolution will advance toward ever more efficient variants. If, however, there
is no mechanism for producing variants, then the individual will live or die simply
on the basis of its own qualities vis-à-vis the rest of the world.
Algorithmic composition and evolution
R. Dawkins demonstrated the power of Darwinism in The Blind Watchmaker,
using a simulated evolution of two-dimensional (2D) branching structures made
from sets of genetic parameters. The user selects the “biomorphs” that survive
and reproduce to create a new generation [2]. S. Todd and W. Latham applied
these concepts to help generate computer sculptures using constructive solid
geometry techniques [3,4]. K. Sims used evolutionary mechanisms of creating
variations and making selections to “evolve” complex equations to be used in
procedural models for computer graphics and animation [5].
A new generation of algorithmic composition researchers has discovered that it
is easy to obtain new musical material by using simulated-evolution techniques
to create new approaches for composition. These techniques have been useful
for searching large spaces using simulated systems of variation and selection.
J.A. Biles has described an application of genetic algorithms to generate jazz
solos [6] that has also been studied by D. Horovitz as a way of controlling
rhythmic structures [7]. On the other hand, it is difficult to drive the results in a
72"
NICS Reports
desired direction. The challenge faced by the designers of evolutionary
composition systems is how to bring more structures and knowledge into the
compositional loop. This loop, in an evolutionary system, is a rather simple one;
it generates, tests and repeats. Such systems maintain a population of potential
solutions; they have a selection process and some “genetic operators,” typically
mathematical functions that simulate crossover and mutation. Basically, a
population is generated; the individuals of the population are tested according
to certain criteria, and the best are kept. The process is
Fig. 1. Vox Populi Reproduction and MIDI Cycles: The Reproduction Cycle is an evolving
process that generates chords by using genetic operators and selecting individuals and
is based on the general framework provided by J.H. Holland’s original genetic algorithm.
The MIDI Cycle refers to the interface’s search for notes to be played by the computer.
When selected, a chord is put in a critical area that is continually verified by the
interface. These notes are played until the next group is selected. (© Artemis Moroni)
repeated by generating a new population of individuals—or things or
solutions—based on the old ones [8]. This loop continues until the results are
satisfactory according to the criteria being used. The effective challenge is to
specify what “to generate” and “to test” mean.
All evolutionary approaches do, however, share many features. They are all
based, like the diagram in Fig. 1, on the general framework provided by J.H.
Holland’s original genetic algorithm (GA) [9] or, indirectly, by the genetic
programming paradigm of J.R. Koza, who proposed a system based on
evolution to search for the computer program most fit for solving a particular
problem [10]. In nearly every case, new populations of potential solutions to
problems (here, the problem of music composition) are created, generation after
generation, through three main processes:
1. By making sure that better solutions to the problem will prevail over time,
more copies of currently better solutions are put into the next generation.
2. By introducing new solutions into the population; that is, a low level of
mutation operates on all acts of reproduction, so that some offspring will have
randomly changed characteristics.
3. By employing sexual crossover to combine good components between
solutions; that is, the “genes” of the parents are mixed to form offspring with
aspects of both.
With these three processes taking place, the evolutionary loop can efficiently
explore many points of the solution space in parallel, and good solutions can
often be found quite quickly. In creative processes such as music composition,
however, the goal is rarely to find a single good solution and then stop; an
ongoing process of innovation and refinement is usually more appropriate.
73"
NICS Reports
Information seen as genotypes and phenotypes
Both biological and simulated evolution involve the basic concepts of genotype
and phenotype, and the processes of selection and reproduction with variations.
The genotype is the genetic code for the creation of an individual. In biological
systems, genotypes are normally composed of DNA. In simulated evolutions
there are many possible representations of genotypes, such as strings of binary
digits, sets of procedural parameters or symbolic expressions. The phenotype is
the individual itself or the form that results from the developing rules and
genotypes. Selection depends on the process by which the fitness of
phenotypes is determined. The likelihood of survival and the number of new
offspring that an individual generates are proportional to its fitness measure.
Fitness is simply a numerical index expressing the ability of an organism to
survive and reproduce. In simulation, it can be evaluated by an explicitly defined
mathematical function or it can be provided by a human observer. Reproduction
is the process by which new genotypes are generated from an existing
genotype. For evolution to progress, there must be variations, or mutations in
new genotypes having some frequency of occurrence. Mutations are usually
probabilistic, as opposed to deterministic.
Note that selection is, in general, nonrandom and operates on phenotypes,
while variation is usually random and operates on the corresponding genotypes.
The repeated cycle of reproduction with variations and selections of the fittest
individuals drives the evolution of a population toward a higher and higher level
of fitness. Sexual combination allows genetic material of more than one parent
to be mixed together in some way to create new genotypes. This permits
features to evolve independently and later to combine into an individual
genotype. Although it is not necessary for evolution to occur, it is a valuable
achievement that may enhance progress in both biological and simulated
evolutions.
If the mechanics of an evolutionary system are well understood and the chain of
causation is properly represented, the process of evolution can be stated in
rather simple terms and can be simulated for engineering and art purposes.
Given the complexity of evolved structures, it may be somewhat surprising that
evolution here appears reduced to rather few rules [11]. In our approach, the
population is made up of four note groups, or chords, as potential survivors of a
selection process. Melodic, harmonic and vocal-range fitnesses are used to
control musical features. Based on the ordering of consonance of musical
intervals, the notion of approximating a sequence of notes to its harmonically
compatible note, or tonal center, is used. The selected notes are sent to the
MIDI port and can be heard as sound events in real time. This sequence
produces a sound resembling a chord cadence or fast counterpoint of note
blocks.
74"
NICS Reports
Individuals of the population are defined as groups of four voices, or notes.
(Henceforth, voices and notes will be used interchangeably.) These voices are
randomly generated in the interval 0– 127, with each value representing a MIDI
event, described by a string of 7 bits. In each iteration, 30 groups are
generated. Figure 2 shows an example of a group— the genotype—internally
represented as a chromosome of 28 bits, or 4 words of 7 bits, one word for
each voice. The phenotype is the corresponding chord.
Two processes are integrated: (1) Reproduction Cycle: an evolving process that
generates chords using genetic operators and selecting individuals; (2) MIDI
Cycle: the interface looking for notes to be played by the computer. When a
chord is selected, the program puts it in a critical area that is continually verified
by the interface. These notes are played until the next group is selected.
The timing of these two processes determines the rhythm of the music being
heard. In any case, a graphic interface allows the user to interfere with the
rhythm by modifying the cycles. Figure 1 depicts the Reproduction Cycle and
the MIDI Cycle.
Fitness evaluation
Traditionally, Western music is based on harmony; hence, a general theory of
music has to engage deeply with formal theories on this matter. The term
“harmony” is inherently ambiguous, since it refers to a lower level where
smoothness and roughness are evaluated and, at the same time, to a higher
aesthetic level where harmony is functional to a given style. However, harmony
is a very subjective concept; the perception of harmony does not seem to have
a natural basis, but appears to be a common response acquired by people in
specific cultural settings. Nevertheless, while there is a difference of opinion on
what constitutes harmony, there is a general agreement on the relative order of
music interval consonance. Numerical theories of consonance have tried to
capture this aspect, but here again, a lot is left to the imagination, as theory
does not clearly define what constitutes the order of simplicity of musical
intervals.
In our case, we have applied, as a fitness function, a numerical theory of
consonance from a physical point of view. Based on a relative ordering of
consonance of musical intervals, a sequence of notes is approximated to its
most harmonically compatible note or tonal center. Tonal centers can be
thought of as an approximation of the melody, describing its flow. This method
uses fuzzy formalism, or fuzzy sets, which are classes of objects with a
continuum of membership grades. Such a set is characterized by a function that
assigns to each object a grade of membership ranging between 0 and 1 [12]. In
Vox Populi, harmony is treated as a function of the commonality, or overlap,
between the harmonic series of notes. This overlap measurement is then
75"
NICS Reports
scaled to be a value between 0 and 1, with 1 denoting complete overlap (i.e. the
two notes are the same) and 0 denoting no overlap at all [13].
Fig. 2. Vox Populi MIDI chromosome: An example of a group—the genotype—inter- nally
represented as a chromosome of 28 bits, or 4 words of 7 bits, one word for each voice.
The phenotype is the corresponding chord. (© Artemis Moroni)
The harmonic series of notes 60 and 64 (do and mi, in the center of the piano,
according to the MIDI protocol) are depicted in Fig. 3, while Fig. 4 depicts their
overlap, or consonance measure. According to our approach, approximation to
the tonal center is posed as an optimization problem based on physical factors
relevant to hearing music. This approach is technically detailed in Moroni et al.
76"
NICS Reports
[14]. In the selection process, the group of voices with the highest musical
fitness is selected and played. The musical fitness for each chord is a
conjunction of three partial fitness functions: melody, harmony and vocal range,
each having a numerical value.
Musical Fitness = Melodic Fitness
+ Harmonic Fitness
+ Vocal Range Fitness
Fig. 3. Vox Populi harmonic series of notes 60 (the piano center, do) and 64 (mi). Each
series represents the relative ordering of musical intervals for notes do and mi and is
treated as a fuzzy set. (© Artemis Moroni)
Melodic fitness is evaluated by comparing the notes that compose a chord to a
value Id (identity), which can be modified by the composer in real time using the
melodic control of the interface. This control “forces” the notes of the selected
77"
NICS Reports
chord to be close to (or distant from) the Id value, which acts as a tonal center
and is treated as an attractor. Harmonic fitness is a function of the consonance
among the components of the chords. Vocal range fitness verifies which notes
of the chord are in the range desired by the composer, who may modify it
through the octave control.
The melodic control and the octave control allow the composer to conduct the
music that is being created, interfering directly in the musical fitness, while other
controls simply modify attributes of the chord that has been selected. Also, the
biological and rhythmic controls allow the user to modify the duration of the
genetic cycle by modifying the duration of the evolution eras. Eras can be
thought as the number of iterations necessary to generate a new population.
The combined use of the controls gives birth to sound orbits, which can be
perceived through intermittent cycles.
Fitness tuning
Part of the reason why evolution in nature is very slow is that the forces of
selection can be imperfect and at times ineffectual. Non-privileged individual
organisms may still succeed in finding mates, having offspring and passing on
Fig. 4. Vox Populi: Overlap between the harmonic series of notes 60 and 64. Note 60 can
be thought of as one of the notes of the chord and note 64 as the tonal center. The sum
of heights of the components of the overlap is the consonance measure between the two
notes. (© Artemis Moroni)
their genes, while organisms with a new advantageous trait may not manage to
live long enough to find a mate and influence the next generation. Todd and
Werner have made a charming comparison with the Frankenstein tale;
Frankenstein hoped for much more than the creation of a single superior living
being—he intended his creature to beget a whole new race that would grow in
number and goodness, generation after generation. Later he worried that this
78"
NICS Reports
process might not go exactly as he planned, with the children becoming more
monstrous than their parents, a realization that led him to abandon his efforts to
create a female progenitor. But, suppose, like Frankenstein, one wants to enter
the “workshop of filthy creation” [15] and replace the human composer with an
artificial composition system— due to a wish to ease a composer’s workload, an
intellectual interest in understanding the composition process, the desire to
explore unknown musical styles or mere curiosity about the possibilities. Maybe
Vox Populi could have been initially included only in the last group as inspired
by a “mere curiosity about the possibilities” but given Vox Populi’s surprising
results, it can now be included in the first two.
Two main approaches have been tried to express the fitness evaluation, both
presenting interesting effects. The first one, derived from a composer’s musical
experience, provided a faster fitness evaluation. This method allows the use of
a large population, 100–200 chords, producing greater diversification and
resulting in a slower convergence to the best chord sequence. In the second
approach, the consonance criterion is used, and a longer calculation is needed
to evaluate musical fitness. In order to assure quick enough real-time
performance by the system, the population was limited to 30 chords. The
advantage of this approach is that it formalizes mathematically the concept of
consonance. It can be easily described and flexibly programmed and modified.
Since the musical fitness criterion used was stricter in the second example
(using 30 chords instead of 100–200), the resulting sound output was less
diversified; it was possible to hear the musical sequence converging to unison.
This fact highlighted the notion that, in musical composition, not only
consonance but also dissonance is desirable. Figure 5 depicts a Vox Populi
musical output.
Vox Populi differs from other systems found in genetic algorithms or
evolutionary computation in which people have to listen to and judge musical
items; instead, Vox Populi uses the keyboard and mouse as real-time music
controllers, acting as an interactive computer-based musical instrument. It
explores evolutionary computation in the context of algorithmic composition and
provides a graphical interface that allows the composer to change the evolution
of the music by using the mouse. These results reflect current concerns at the
forefront of interactive composition computer music and in the development of
new control interfaces.
Interface controls use nonlinear iterative mappings. They can give rise to
attractors, defined as geometric figures that represent the set of stationary
states of a dynamic system or simply trajectories to which the system is
attracted. A piece of music consists of several sets of musical raw material
manipulated and exposed to the listener, such as pitches, harmonies, rhythms,
timbres, etc. These sets are composed of a finite number of elements, and the
79"
NICS Reports
basic aim of a composer is to organize them in an aesthetic way. Modeling a
piece as a dynamic system implies a view in which the composer draws
trajectories or orbits using the elements of each set [16].
Fig. 5. Score of MIDI raw material produced by Vox Populi. This material was produced
by Vox Populi in an interactive session by Jônatas Manzolli, composer. In the latest Vox
Populi version, the user is able to record a piece that is composed during performance.
The interactive pad control supplies a graphical area in which 2D curves can be
drawn. These curves, a blue one and a red one, are linked to the controls of the
interface. The red curve links to the melodic and octave range controls; and the
blue curve links to the biological and rhythmic controls. When the interactive
pad is active, the four other linked controls are disabled. Each curve describes
a relation between the linked variables. They are traversed in the order in which
they were created; their horizontal and vertical components are used for fitness
evaluation and to modify the duration of the genetic cycles, interfering directly in
the rhythm of the composition. The pad control allows the composer to conduct
the music through drawings, suggesting metaphorical “conductor gestures”
used when conducting an orchestra. Using different drawings, the composer
can experience the generated music and conduct it, trying different trajectories
or sound orbits. The trajectories then affect the reproduction cycle and musical
fitness evaluation.
Interface and parameter control
The resulting music moves from very pointillistic sounds to sustained chords,
depending upon the duration of the genetic cycle and the number of individuals
of the original population. The interface is designed to be flexible enough for the
user to modify the music being generated. Below is a short description of the
controls available to the user interacting with Vox Populi. The melodic,
80"
NICS Reports
biological, rhythmic and octave controls allow the composer to modify the
fitness function in real time and are associated with attractors. Vox Populi’s
interface is depicted in Fig. 6 and in Color Plate A No. 2.
Fig. 6. Vox Populi interface. (© Artemis Moroni)
Melodic Control
The mel scroll bar allows one to modify the value Id, which is the tonal center in
the evaluation of melodic fitness. Given an ordered sequence of notes, it seems
intuitively appealing to call the note that is most consonant with all the other
notes the coloring, or tonal, center. Hence, the extraction of the tonal center of a
sequence of notes would involve finding an optimally harmonically compatible
note. As mentioned before, in Vox Populi, the consonance is measured
according to the Id value. This value is obtained from the interface control and
can be changed by the user.
Biological Control
The bio scroll bar allows interference in the duration of the genetic cycle,
modifying the time between genetic iterations. Since the music is being
generated in real time, this artifice is necessary for the timing of the different
processes that are running. This value determines the slice of time necessary to
apply the genetic operators, such as crossover and mutation, and may also be
interpreted as the reproduction time for each generation.
Rhythmic Control
81"
NICS Reports
The rhy scroll bar changes the time between evaluations of musical fitness. It
determines the “time to produce a new generation” or the slice of time
necessary to evaluate the musical fitness of the population. It interferes directly
in the rhythm of the music; any change makes the rhythm faster or slower.
Octave Control
The oct scroll bar allows enlarging or diminishing the interval of voices
considered in the vocal range criterion. The octave fitness forces the notes to
be in range H, assuming that H is the range of the human voice and associated
with the central keys on the piano; but since several orchestras of instruments
are used, this range is too limited for some instruments. We originally intended
to restrict the generated voices to specific ranges in order to make those voices
resemble the human voice. Nevertheless, a user can now enlarge these ranges
by using the octave control.
Orchestra Control
Six MIDI orchestras are used to play the sounds: (1) keyboards; (2) strings and
brasses; (3) keyboards, strings and percussion; (4) percussion; (5) sound
effects and (6) random orchestral parts, by taking an instrument from the
general MIDI list. Using the order above, these orchestras are sequentially
changed into time segments controlled by the seg scroll bar.
Interactive Pad Control
The “Pad On” button enables and disables the pad change on the controls
defined above. They may be grouped into two pairs, which may be interpreted
as variables of a 2D phase space. This allows a user to draw and orient the
curve to determine the evolution of the music.
Fitness Displays
Three other displays allow the user to follow the evolution of fitness. The upper
display, at the right side of Fig. 6, shows the notes and the fitness of the chord
that is being played.
In the middle display, a bar graph shows the four voices (bass, tenor, contralto,
soprano) and their values. It is equivalent to the membership function values
related to the range of the voices. The bottom display shows the melodic,
harmonic and octave fitness bars.
Conclusion
Despite the fact that Vox Populi works at the level of sound events controlled by
MIDI protocols, or notes, in a macrostructural context, we learned two lessons.
First, an evolutionary computational approach was successfully applied to
generate complex sound structures with a perceptual and efficient control in
real time. Second, applications of evolutionary computation may be foreseen to
82"
NICS Reports
prospect sound synthesis. Complex behavior systems have been used for
sound synthesis, like Chaosynth, which uses cellular automata to control
structures [17]. In Chaosynth, the generation occurs via granular synthesis. In
another approach, Fracwave [18] uses the dynamics generated by complex
systems to synthesize sounds using complex dynamics.
We may say that varying the fitness controls in Vox Populi promotes a “sound
catastrophe,” in which the previous winner may no longer be the best.
Conditions for survival have changed, as they do in nature.
The question we pose is how does an idea, or concept, survive? Vox Populi is
simple, efficient and has been used in different ways, which may be considered
variants: as an autonomous or demonstrative system generating music; as a
sound laboratory, where people can try and experience the sound produced; as
a studio, manipulating and generating samples that have been used in
compositions and in sound landscapes. Another use currently being considered
is to couple the system with sensors, allowing the user to describe orbits in
space that would be treated like the 2D curves supplied by the interactive pad.
Will Vox Populi survive?
Vox Populi means “voice of the people.” Since the individuals in the population
are defined as groups of four voices, we can think of them as “choirs,” fighting
to survive and to be present in the next generation, while the environment and
survival conditions are changing dynamically.
One of the first known proposals to formalize composition was made by the
Italian monk Guido d’Arezzo in 1026, who resorted to using a number of simple
rules to map liturgical texts in Gregorian chants [19] due to the overwhelming
number of orders he received for his compositions. The text below is attributed
to d’Arezzo. His compositional approach has survived for several centuries, and
even today, we still seek strategies for constructing the unknown melody.
As I cannot come to you at present, I am in the meantime addressing you using
a most excellent method of finding an unknown melody, recently given to us by
God and I found it most useful in practice. . . .
To find an unknown melody, most blessed brother, the first and common
procedure is this. You sound on the monochord the letters belonging to each
neume, and by listening you will be able to learn the melody as if you were
hearing it sung by a teacher. But this procedure is childish, good indeed for
beginners, but very bad for pupils who have made some progress. For I have
seen many keen witted philosophers who had sought out not merely Italian, but
French, German, and even Greek teachers for the study of this art, but who,
because they relied on this procedure alone, could never become, I should not
say, skilled musicians, but even choristers, nor could they duplicate the
performance of our choir boys [20].
83"
NICS Reports
References
1. D. Hofstadter, Metamagical Themas (New York: Basic Books, 1985) p. 694.
2. R. Dawkins, The Blind Watchmaker (London: Penguin Books, 1991) p. 313.
3. M. Haggerty, “Evolution by Esthetics, an Interview with W. Latham and S.
Todd,” IEEE Computer Graphics 11 (1991) pp. 5–9.
4. S. Todd and W. Latham, Evolutionary Art and Computers (New York:
Academic Press, 1992).
5. K. Sims, “Interactive Evolution of Equations for Procedural Models,” The
Visual Computer 9, No. 9, 466–476 (1993).
6. J.A. Biles, “GenJam: A Genetic Algorithm for Generating Jazz Solos,”
Proceedings of Computer Music Conference (ICMC ’94) (1994) pp. 131–137.
7. D. Horovitz, “Generating Rhythms with Genetic Algorithms,” Proceedings of
Computer Music Conference (ICMC ’94) (1994) 142–143.
8. P. Todd and G. Werner, “Frankensteinian Methods for Evolutionary Music
Composition,” in N. Griffith and P. M. Todd, eds., Musical Networks—Parallel
Distributed Perception and Performance (Cambridge, MA: MIT Press, 1999) p.
313.
9. J.H. Holland, Adaptation in Natural and Artificial Systems (Cambridge, MA:
MIT Press, Bradford Books, 1995) p. 122.
10. J.R. Koza, Genetic Programming (Cambridge, MA: MIT Press, Bradford
Books, 1998) p. 29.
11. W. Atmar, “Notes on the Simulation of Evolution,” IEEE Transactions on
Neural Networks 5, No. 1, 130–147 (1994).
12. L.A. Zadeh, “Fuzzy Sets,” Information and Control 8 (1965) pp. 338–353.
13. G. Vidyamurthy and J. Chakrapani, “Cognition of Tonal Centers: A Fuzzy
Approach,” Computer Music Journal 16, No. 2, 45–50 (1992).
14. A. Moroni, J. Manzolli, F. Von Zuben and R. Gudwin, “Evolutionary
Computation Applied to Algorithmic Composition,” Proceedings of the 1999
Congress on Evolutionary Computation (CEC99) 2 (1999) pp. 807–811.
15. M. Shelley, Frankenstein or The Modern Prometheus (USA: Penguin,
1993).
16. J. Manzolli, “Harmonic Strange Attractors,” CEM BULLETIN 2, No. 2, 4–7
(1991).
17. E.R. Miranda, “Granular Synthesis of Sounds by Means of a Cellular
Automation,” Leonardo 28, No. 4, 297–300 (1995).
84"
NICS Reports
18. F. Damiani, J. Manzolli and P.J. Tatsch, “A Non-Linear Algorithm for the
Design and Production of Digitally Synthesized Sounds,” Technical Digest of
the International Conference on Microelectronics and Packaging (ICMP99)
(1999) pp. 196–199.
19. O. Strunk, Source Readings in Music History (New York: Vail-Ballou Press,
1950) p. 123.
20. Strunk [19].
Manuscript received 18 January 1999.
Artemis Moroni is a technologist at the Automation Institute of the Technological
Center for Informatics in Campinas, São Paulo, Brazil. The main topics of her
research are multimedia devices applied to automation environments,
evolutionary computation and technology applied to art and music.
Jônatas Manzolli is composer and head of the Interdisciplinary Nucleus of
Sound Communication at the State University of Campinas, São Paulo, Brazil.
He teaches in the department of music, and the main topics of his research are
algorithmic composition, gesture interfaces and multimedia devices for sound
environments.
F.J. Von Zuben is a member of the department of computer engineering and
industrial automation at the State University of Campinas, São Paulo, Brazil.
The main topics of his research are artificial neural networks, evolutionary
computation, nonlinear control systems, nonlinear optimization and multivariate
data analysis.
Ricardo Gudwin is a faculty member of the electrical and computer engineering
department at the State University of Campinas, São Paulo, Brazil, where he
develops research into intelligence and intelligent systems, intelligent agents,
semiotics and computational semiotics. His topics of interest also include fuzzy
systems, neural networks, evolving systems and artificial life.
85"
NICS Reports
5.
Abduction and Meaning in Evolutionary
Soundscapes11
Mariana Shellard
Instituto de Artes (IA) – UNICAMP
[email protected]
Luis Felipe Oliveira
Departamento de Comunicação e Artes. Univ.
Federal de Mato Grosso do Sul
[email protected]
Jose E. Fornari
Núcleo Interdisciplinar de Comunicação Sonora
(NICS) – UNICAMP
[email protected]
Instituto de Artes (IA) - UNICAMP
Jonatas Manzolli
Núcleo Interdisciplinar de Comunicação Sonora
(NICS) - UNICAMP
[email protected]
Summary. The creation of an artwork named RePartitura is discussed here
under the principles of Evolutionary Computation (EC) and the triadic model of
thought: Abduction, Induction and Deduction, as conceived by Charles S.
Peirce. RePartitura uses a custom-designed algorithm to map image features
from a collection of drawings and an Evolutionary Sound Synthesis (ESSynth)
computational model that dynamically creates sound objects. The output of this
process is an immersive computer generated sonic landscape, i.e. a
synthesized Soundscape. The computer generative paradigm used here comes
from the EC methodology where the drawings are interpreted as a population of
individuals as they all have in common the characteristic of being similar but
never identical. The set of specific features of each drawing is named as
genotype. Interaction between different genotypes and sound features
produces a population of evolving sounds. The evolutionary behavior of this
sonic process entails the self-organization of a Soundscape, made of a
population of complex, never-repeating sound objects, in dynamic
transformation, but always maintaining an overall perceptual self-similarity in
order to keep its cognitive identity that can be recognize by any listener. In this
article we present this generative and evolutionary system and describe the
topics that permeate from its conceptual creation to its computational
implementation. We underline the concept of self-organization in the generation
of soundscapes and its relationship with computer evolutionary creation,
11
Referência original deste trabalho: Shellard, M., L. Oliveira, et al. (2010). Abduction and
Meaning in Evolutionary Soundscapes. Model-Based Reasoning in Science and Technology. L.
Magnani, W. Carnielli and C. Pizzi, Springer Berlin / Heidelberg. 314: 407-427.
86"
NICS Reports
abductive reasoning and musical meaning for the computational modeling of
synthesized soundscapes.
1 Introduction
One of the foremost philosophical problems is to rationally explain how we
interact with the external world (outside of the mind), in order to understand
reality. We take the assumption that human mind understands, recognizes and
rapport with reality through a constant and dynamic process of mentalmodeling. The process is here seen as divided in three states: 1)Perception,
where the mind receives sensory information from outside, throughout its bodily
senses. This information comes from distinct mediums, such as mechanical
(e.g. hearing and touch), chemical (e.g. olfaction and taste) and
electromagnetic (e.g. vision). According to evolutionary premises, these stimuli
are non-linearly translated into electrochemical information to the nervous
system. 2)Cognition, the state that creates, stores and compares models with
the gathered information, or from previously reasoned models. This is the
information processing stage. 3)Affection, where emotions are aroused, as an
evolutionary strategy to motivate the individual to act, to be placed in-motion, in
order to ratify, refute or redefine the cognitive modeling of a perceived
phenomenon. Here we introduce RePartitura; a case study in which we
correlate these three stages with a pragmatic approach that combines logic
principles and synthetic simulation of creativity using computer models.
RePartitura is here analyzed based on the assumption of mental model
reconstruction and re-building. This cycle of model recreation has insofar
proved to be an eternal process in all fields of human culture; as well as in Arts
and Science. As described by G. Chaitin12, the search for a definite certainty
along of the history of mathematics has always led to models that are:
incomplete, uncomputable and random (Chaitin, 1990). Inspired by Umberto
Eco’s book “The Search for the Perfect Language”, Chaitin describes herculean
efforts of great minds of science to find completeness in mathematics, such as
Georg Cantor’s unresting (and unfinished) pursuit of defining infinity, Kurt
Godel’s proves that “any mathematical model is incomplete”. Following, Alan
Turing’s realization of uncomputability in computational models, and lastly,
Chaitin’s own Algorithmic Information Theory, that leads to randomness. In
conclusion, “any formal axiomatic theory is fated to be incomplete”. In another
hand, he also recognizes that, “viewed from the perspective of Middle Ages,
programming languages give us the God-like power to breathe life into (some)
inanimate matter”. So, computer modeling can be used to create artworks that
12
Chaitin, G. “The search for the perfect language.” http://www.cs.umaine.edu/ ~chaitin/hu.html
87"
NICS Reports
resembles life evolution in a never-ending march for completeness, in an
unreaching process of eternal self-recreation.
RePartitura is a multimodal installation that uses the ESSynth (Fornari et al.,
2001) method for the creation of a synthetic soundscape13 where formant sound
objects are initially built from hand-made drawings used to retrieve artistic
gesture. ESSynth is a sound synthesis that uses Evolutionary Computation
(EC) methodology, that was initially inspired in the Darwinian theory of
evolution. ESSynth was originally constituted by a Population of digital audio
segments, that were defined as the population Individuals. This population
evolved in time, in generation steps, by the interaction of two processes: 1)
Reproduction, that creates new individuals based on the ones from the previous
generation; and 2) Selection, that eliminates poorly-fit individuals for the
environmental conditions and select the best-fit individual, that creates (through
the process of Reproduction) the next generation of its population (Bäck, 2000).
In this way, ESSynth is an adaptive model of non-deterministic sound synthesis
that present complex sonic results, at the same time that these sounds were
bounded by a variant similarity, given the overall generated sound, somehow
similar to the perceptual quality of a soundscape.
In section two we introduce the conceptual artistic perspective of RePartitura.
We describe the process of creating the drawing collection and mapping its
graphic features, inserted by the hand-made gesture that created the drawings,
into genotypes used by the ESSynth that creates the soundscapes. We also
describe the abduction process that emerges the sonic meaning of a
soundscape. In section three, we discuss the possibility of self-organization in
the computer-model sonic output, which is here claimed to describe an
immersive self-similar perceptual environment; a soundscape. In section four
we discuss the capacity of this evolutionary artistic system in emulating a
creative process of abduction by expressing an algorithmic (computational)
behavior here described as artificial abduction. In section five, it is discussed
the aesthetic meaning for the dynamic creation of soundscapes where this is
compared with musical meaning, in terms of its cognitive process, emotional
arousal (through a “prosody” of expectations). Finally, we end this article with a
conclusion, reassessing the ideas and concepts from previous sections and
offer further perspectives into the designing of artificial creative systems.
2 Conceptual perspective
In this section we elucidate the interaction between concepts that were in the
genesis of RePartitura. Firstly, we relate the concept of abduction reasoning, as
13
soundscape refers to both the natural and human acoustic environment, consisting of a
complex and immersive landscapes of sounds that is self-similar but always new.
88"
NICS Reports
presented by Charles S. Peirce, to the computational adaptive methodologies,
such as EC. Secondly, we create RePartitura in line with the concept of
Generative Art and the idea that iterative processes can be related to the
Peircean concept of habits.
2.1 Abduction and Computational Adaptive Methods
The pragmatism of Peirce, points out to the conceptualization of three
categories of logic reasoning as: 1)Deduction, 2)Induction and 3)Abduction.
Abduction is the process of hypothesis building, by the generation of an initial
model, as an attempt of understanding or explaining a perceived phenomenon.
Induction tests this model against other factual data and makes the necessary
adjustments. Deduction applies the established model of the observed
phenomenon. This model will be used for deductive reasoning insofar as the
advent of further information that may jeopardize its model trustworthy, or
require its tackling to a reality change (which is always), where the whole
process of Abduction, Induction and Deduction creates a new model of
reasoning.
In this article our goal is to present a computer methodology related to the
Peircean pragmatic reasoning. In computational terms, it is usual to refer to an
observed phenomenon as a problem. In the concept expressed by this article,
we consider Peircean triadic logical process as related to the following
methodological taxonomy: a) Deduction corresponds to Deterministic Methods ,
as they can present predictable solutions to a problem; b) Induction is related to
Statistic Methods once that they present not a single but a range of possible
solutions to the same problem; c) Abduction is then related to Adaptive
Methods that can redefine and recreate themselves, based on the further
understanding of a problem, or its dynamic change.
Among computational adaptive methods, Evolutionary Computation (EC) is the
one inspired into the biological strategy of adapting populations of individuals,
as initially described by Charles Darwin. EC is normally used to find the best
possible solution to problems when there is not enough information to solve it
through formal (deterministic) methods. An EC algorithm usually seeks out for
the best solution of a complex problem, into an evolving landscape of possible
solutions. In our research group at NICS, we have studied adaptive
methodologies in line with the creation of artworks, such as the system: 1)
VoxPopuli to generate complex and harmonic profiles using genetic algorithms
(Moroni et al., 2000), 2) the RoBoser system, created in collaboration with the
SPECS group from UPF, Barcelona, uses the Distributed Adaptive Control
(DAC) to develop a correlation between robotic adaptive behavior and
algorithmic composition (Verschure & Manzolli, 2005) and 3) the Evolutionary
Sound Synthesis (ESSynth) (Fornari et al., 2001) a method to generate sound
89"
NICS Reports
segments with spectral dynamic changes using genetic algorithms in the
reproduction process and Euclidean distance between individuals as fitness
function for the selection process. ESSynth showed the ability of generating a
queue of waveforms that were perceptually similar but never identical, which is
a fundamental condition of a soundscape. This system was later developed
further to also manipulate the spacial sound location of individuals in order to
create the dynamical spreading acoustic landscape, so typical of a soundscape
(Fornari et al., 2008).
In all of these studies, we considered that adaptive methods, such as EC, could
be used in artistic endeavours. Particularly, in this paper we will describe the
RePartitura research, that relates multimodal installation and the ESSynth
method. Furthermore, the discussion presented here is also related to the works
of (Oliveira et al., 2008) where is discussed the process of musical meaning and
logical inference from the perspective of Peircean pragmatism. This idea is
discussed in the section 5 “Soundscape Meaning” where we focus our
discussion on how listeners deduce some general patterns of musical structures
that are inductively applied to new listening situations such as computer
generated soundscapes.
2.2 Habits, Drawings and Evolution
The collection of drawings that proceeded RePartitura (see example in Figure
1) was based in the concept of defining a generative process as artwork.
Particularly, the process analyzed here was defined as a daily habit of repetitive
actions, which lasted ten months and generated almost three hundred
drawings. This action was done by the artist’s right arm in repetitive
movements, from down-up and semicircular. The movement pattern, along
time, evolved from thick and short curves to long and narrows ones. This
evolutionary characteristic of a gestural habit reflected an adaptation of the
arm’s movement to the area within the paper sheet.
Our first assumption here was to consider this long process of adaptation
producing a visual invariance as a creation of a visual habit. Initially, different
kinds of
90"
NICS Reports
Fig. 1. Sequence of Original Drawings that preceded RePartitura.
paper sheets were tested, such as: newsprint, rice, and a type of coffee filter
paper. The filter paper was better suited for the characteristics of the
movement, it was resistant, absorbent and with a nice tone of slightly yellowish
white. The Indian ink was appropriate to the dynamics of gesture and, as black
color is neutral, it did not cause visual noise. The paper size was established
when the movement was stable, after a period of training. Japanese brushes
and bamboo pen were tested. The second one produced a better result, by
allowing a greater number of movement repetitions without loss of sharpness.
Once that was defined, the material (filter paper, black ink pen and bamboo)
remained the same throughout the entire process. The standardization of the
material restrained the action and helped to create the
Fig. 2. Sequence of initial drawings created during the experimentation period.
habit of the arm’s movement. As the gesture became a habit, the drawings
stretched and the repetition was concentrated in a reduced area, showing a
narrow and long curve (Figure 1), compared to initial ones (Figure 2). During the
process new experiments occurred resulting in new patterns, such as pouring
ink on the paper to avoid the gesture interruption due to the necessity of loading
the pen with ink. But, in doing so, the paper was softened by the ink tearing
easily and this new method was discharged.
91"
NICS Reports
The gradual and progressive adaptation of the gesture and stabilization of
drawing is consider here as a way of generating a habit, which can be
associated, according to Peirce, with the removal of stimuli (Peirce, 1998 pg.
261). At the same time, each drawing was influenced by the environment
(physical and emotional) which led to the disruption of habit. Considering
Peirce’s affirmation that the breaking up of habit and renewed fortuitous
spontaneity will, according to the law of mind, be accompanied by an
intensification of feeling (Peirce, 1998, pg. 262), the emotional and physical
conditions involved in the moment of the action, interfered in the individual
gestures and resulted in accidental variations (e.g. outflow of ink or paper
ripping), causing changes and triggering new possible repetitions.
The collection of drawings shown in Figure 1 was presented as an installation
named Mo(vi)mento. After that, an analysis of visual features and perceived
graphical invariance led us to create a reassignment of this process in the sonic
domain. This was the genesis of RePartitura. The first idea was to represent
similar behaviors in different mediums. After identifying invariant patterns in all
drawings, they were parameterized and used in the creation of sound objects.
ESSynth was chosen because of its similarity with the artistic process that
created the collection of drawings, described above, which was also
characterized by an evolutionary process.
2.3 Repetition, Fragments and Accumulation mapped into Sound Features
We developed an analytical approach in order to identify visual invariance in the
original drawings to represent them into the sound domain. Our idea was to
describe the habits embedded in the drawings, in parametrical terms, to further
use them to control the computer model of an evolutionary sound generation
process. We found out three categories of visual similarity in each drawing of
the collection. They were named as: 1) Repetitions; thin quasi-parallel lines that
compose the drawing main body, 2) Fragments; spots of ink smeared outside
the drawing main body, and 3) Accumulation; the largest concentration of ink at
the bottom of the drawing (where the movement started).These three aspects
are shown in Figure 3.
The identity of each drawing was related to the characteristics of these three
categories. It was developed an algorithm to automatically map these ones from
the drawings digital image and attribute to them specific parametric values.
These categories were related to the evolution of the gesture and the conditions
of each drawing moment. Their evolution was characterized by the habit of the
movement to create the drawings. The values of the parameters of the
drawings created within the same day tended to be similar. However, at times
when emotional inference and external intervention were higher, the drawings
92"
NICS Reports
underwent a break in the gesture habit, which could be detected by the
changes in the parametric values of the three
Fig. 3. The three categories of graphic objects found in all drawings.
categories. From this visual perspective, we developed a translation into the
sonic features of the next stage.
Initially, we established: long-term, middle and very short duration sounds. The
first ones were associated to the Accumulation parameter and were
represented by low frequency noisy sounds. Repetition parameter was
associated with cycles of sinusoidal waves. Fragments were related to sharp
sounds varying from noisy to sinusoidal ones. This mapping is presented in
Table 1.
Table1. Mapping of formal aspects of the drawings into their sonic equivalents.
Invariance
Accumulation
Drawing Aspects
Concentration of ink in the lower area
of the drawing, characterized by ink
stains.
Repetition
Number of repetition curve.
Fragments
Drips of ink.
Sonic Aspects
Constant, long-term duration and low
frequency noisy sounds.
Cycles of sinusoidal waves with
average duration.
Very short sounds, varying from noisy
to sinusoidal waveforms.
The duration of each element of the mapping was also related to the idea that
Perception, Cognition and Affection can be expressed in different time scales of
the sonic ambient. In this domain, the perceptive level can be related to the
sensorial activation of auditory aspects, such as intensity, frequency, and phase
of sounds, which is studied by psychoacoustics. Cognition is related to the
sonic characteristics that can be learned and recognized by the listener. Its time
scale was initially studied by the psychologist William James, who developed
93"
NICS Reports
this concept (James, 1890), which refers by “specious present” the seemingly
present time of awareness for a sonic or musical event. It can be argued that
the ’special present’ is related to short-term memory, which can vary from
individual to individual and acording to the direction of the mode or range in
which the musical information is perceived as a whole, such as a language
sentence, a sound signal or a musical phrase (Poidevin, 2000). Some
experiments have shown that, in music, their identification is approximately the
order of one to three seconds of duration (Leman, 2000). The emotional
aspects are those that evoke emotion in the listener. Affective characteristics
are associated with a longer period of time (up to thirty seconds) and may be
processed with long-term memory, of which it is possible to recognize the genre
of a music or soundscape. The recognition of the whole sonic environment and
its association with listeners expectations is further explored in this article, when
we discuss the research of (Huron, 2006) and (Meyer, 1956).
2.4 Drawings, Adaption and Abduction
In RePartitura, the gestures that engendered drawings were mapped to sonic
objects, and became individuals within an evolutionary population that
compounded the soundscape. This infers an analogy with the evolution of
habits of gestures throughout time. The sound objects are like a mirror for the
striking differences expressed by the visual invariances of the drawing
categories. The application of EC methodology can be seen as a way of
representing the drawing habits in the sonic domain and the trajectories of
these individuals (sound objects) are correlated to the evolution of the initial
drawing gestures. The unique aspects of each drawing, influenced by several
conditions, such as the artist variations of affection and mood, and by the
environmental conditions, such as the external interruptions of any sort,
characterizes the hidden organizing force that make possible the adapting
evolution of habits in this system, which is a paramount characteristic of
abduction.
As postulated by Peirce: “... diversification is the vestige of chance-spontaneity;
and wherever diversity is increasing, there chance must be operative. On the
other hand, wherever uniformity is increasing, habit must be operative. But
wherever actions take place under an established uniformity, there so much
feeling as there may be takes the mode of a sense of reaction” (Hoopes, 1991).
The difference between drawings gestures, that generated the seed of chance
for the change of habits on the sound system, is a representation of the
spontaneity embedded in the process of making each drawing unique, yet
similar. In our work we are inferring a correlation of this idea to the notion of
Abduction, when Peirce defines that: “method of forming a general prediction
without any positive assurance that it will succeed either in the special case or
usually, its justification being that it is the only possible hope of regulating our
94"
NICS Reports
future conduct rationally, and that Induction from past experience gives us
strong encouragement to hope that it will be successful in the future”(Weiss,
1966).
In another paragraph, Peirce correlates habits to the listening of a piece of
music: “ . . . whole function of thought is to produce habits of action; and that
whatever there is connected with a thought, but irrelevant to its purpose, is an
accretion to it, but no part of it. If there be a unity among our sensations which
has no reference to how we shall act on a given occasion, as when we listen to
a piece of music, why we do not call that thinking. To develop its meaning, we
have, therefore, simply to determine what habits it produces, for what a thing
means is simply what habits it involves. Now, the identity of a habit depends on
how it might lead us to act, not merely under such circumstances as are likely to
arise, but under such as might possibly occur, no matter how improbable they
may be. What the habit is depends on when and how it causes us to act. As for
the when, every stimulus to action is derived from perception; as for the how,
every purpose of action is to produce some sensible result. Thus, we come
down to what is tangible and conceivably practical, as the root of every real
distinction of thought, no matter how subtle it may be; and there is no distinction
of meaning so fine as to consist in anything but a possible difference of
practice. (CP 5.400)”.
Thus, meaning is pragmatically connected to habit, and habit is a necessary
condition for the occurrence of action. Meaning is at the heart of actions of
inquiry and of predicting consequences of future actions. For each inquiry there
is an action that occurs in a very specific way. At the core of such process,
there is a very special category of reasoning (or action); the Abduction.
Abductive reasoning can be considered as a valuable analytical tool for the
expansion of knowledge, helping with the understanding of the logical process
of formulating new hypotheses. In regular and coherent situations, the mind
operates deductively and inductively upon stable habits. When an anomalous
situation occurs, abduction comes into play, helping with the reconstruction of
articulated models (the generation of explanatory hypotheses) so that the mind
can be free of doubts. We elucidate this point of view by presenting here the
artwork RePartitura, a computer model that uses a pragmatic approach
paradigm to describe the creative process in sound domain. Here we used
processual gestures and adaptive computation in order to digitally generate
soundscapes. Our focus in this article is to examine the theoretical implications
of that methodology towards a synthetic approach for the logic of creativity in
the sound domain involving interactive installations. Logic of discovery is a
theory that attempts to establish a logical system for the process of creativity.
Peirce argued that in order to have creativity manifesting, new habits must firstly
95"
NICS Reports
emerge as signs in the mental domain; taking that any semiotic system is
primarily a logical system.
2.5 Computer Modeling
The computer design and implementation of RePartitura is further discussed in
(Fornari et al., 2009a, 2009b). In the next paragraphs we present a brief
overview on that. The collection of drawings were mapped by an algorithm
written in Matlab, where the features, classified in three categories, where
processed in different sonic time-scaling. Accumulation were mapped into long
time scale, representing affective aspects. Repetitions went into middle-time
scale, related to the specious present, as defined by James Williams, and thus
representing the cognitive aspects of sounds. Fragments were mapped into
short time scales, corresponding to the perceptual aspects. The first feature
retrieved was given by a simple metric defined by the equation below:
to describe the roundness of each object. For m = 1, the object is a circle. For m
= 0, the object is a line. The second feature retrieved was the object Area, in
pixels, where the object with the biggest value of Area is the Accumulation. The
third feature was the object distance to the image origin, given by two numbers
of their coordinate (x, y) into the image plan. We set apart Fragments and
Repetitions using the value of m. The roundest objects (m < 0.5) were classified
as Fragments. The stretched objects (m < 0.5) were classified as Repetitions.
Each of these objects features were mapped into Sound Object genotype.
The genotypes were transferred to an implementation of ESSynth written in PD
(PureData) language. The individuals (sound objects) were also designed in
PD, as PD patches (in PD jargon). Our model of individual is created by the
main system, as a meta-programming strategy, where “code writes code”, at
certain extent. The individuals would “born”, live within the population, as sound
objects, and, once their life-time was over, they would dye, to never be
repeated again. The initial individuals received their genotypes from the drawing
mapping. After that, by the reproduction of individuals, new genotypes would be
created and eliminated, as the individuals died. Each genotype is described by
the acoustic descriptors of a sound object. In this work, the sound object
features used are divided into two categories: deterministic (melodic or tonal)
and stochastic (percussive or noisy). For each category, there was: intensity,
frequency and distortion, which would bridge this two sonic worlds (deterministic
to stochastic) as a metaphor to the reasoning processes of, respectively:
deduction and induction. For that, the abduction would be represented by the
evolutionary process per se; the soundscape. These ones are given by the self96"
NICS Reports
organization of the population of sound objects whose overall sound output is
the output of the system.
3 Self-Organizing Soundscapes
After presenting the conceptual framework related to the creation and analysis
of RePartitura, we will now discuss the sonic aspects of this work. Our attention
is focused on the idea that a computer generative process can synthesize a
sonic process that resembles a soundscape. Thus, firstly we present a formal
definition of soundscape and correlate that to the computer model that
implements the evolutionary process used here to produce RePartitura dynamic
sonification.
Soundscape is a term coined by Murray Schafer that refers to the immersive
sonic environment perceived by listeners that can recognize it and even be part
of its composition (Schafer, 1977). Thus, a soundscape is initially a fruit of the
listener’s acoustic perception. As such, a soundscape can be recognized by its
cognitive aspects, such as foreground, background, contour, rhythm, space,
density, volume and silence. According to Schafer, soundscapes can be formed
by five distinct categories of analytical sonic concepts, derived from their
cognitive units (or aspects). They are: Keynotes, Signals, Soundmark, Sound
Objects, and Sound Symbols. Keynote is formed by the resilient, omnipresent
sounds, usually in the background of listeners’ perception. It corresponds to the
musical concept of tonality or key. Signals are the foreground sounds that grasp
listener’s conscious attention as they may convey important information.
Soundmarks are the unique sounds only found in a specific soundscape. Sound
Objects are the atomic components of a soundscape. As defined by Pierre
Schaeffer, who coined the term, a Sound Object is formed by sounds that
deliver a particular and unique sonic perception to the listener. Sound symbols
are the sounds which evoke cognitive and affective responses based on the
listener’s individual and sociocultural context. The taxonomy used by Schafer to
categorize soundscapes based on its cognitive units, serves us well to describe
them from the perspective of its macro-structure, as it is easily noticed by the
listener. These cognitive units are actually emergent features self-organized by
the complex sonic system that forms a soundscape. As such, these units can
be retrieved and analyzed by acoustic descriptors, but they are not enough to
define a process of truly generating soundscapes. In order to do that, it is
necessary to define not merely the acoustic representation of sound objects but
their intrinsic features that can be used as a recipe to synthesize a set of
similar-bound but always original sound objects.
In terms of its generation, as part of an environmental behavior, soundscapes
can be seen as self-organized complex open systems, formed by sound objects
acting as dynamic agents. Together, they orchestrate a sonic environment that
97"
NICS Reports
is always acoustically original but, perceptually speaking, this one withholds
enough self-similarity to enable any listener to easily recognize (cognitive
similarity) and discriminate it. This variant similarity or invariance is a trace
found in any soundscape. As such, in order to synthesize a soundscape using a
computer model it is necessary to have an algorithm able to generate sound
objects with perceptual sound invariance. Our investigation is to associate this
perceptual need to a class of computer methods that are related to adaptive
systems. Among them, we studied the EC methodology. Next section, we are
going to correlate EC systems with the concept of Artificial Abduction. With the
next considerations, we aim to link the computer generative process and the
conceptual perspective presented in Section 2.
4 Artificial Abduction
Abduction is initially described as an essentially human mental reasoning
process. However, its concept has a strong relation with Darwinian natural
selection, as both may be seen as “blind” methods of guessing the right solution
for not-well defined problems. In such, EC methodology, that is inspired in the
Darwinian theory, may be able to emulate, to some extent, abductive reasoning.
This is what is named here as Artificial Abduction, and is explained below. Most
of the ideas in this section were discussed in (Moroni, 2005). Here, we point out
the main topics that are linked to RePartitura creative process.
4.1 Abduction and Evolution
As already mentioned, abduction is related to the production of more convincing
hypotheses to explain a given phenomenon through relative evaluation of
several candidate hypotheses, as also discussed in (Chibeni, 1996). In short,
the general scheme of Abductive arguments consists in the proposition of
alternative hypothesis to explain specific evidence (a fact or set of facts), and
the availability of an appreciation (or recognition) mechanism, capable of
attributing a relative value to each explanation. The best one is probably true if,
besides comparatively superior to the others, it is good in some absolute sense.
In opposition to the deductive arguments, the conclusion in abductive inference
does not follow logically from the premises, and does not depend on their
contents. In opposition to the inductive arguments, the conclusion not
necessarily consists of the uniform extension of the evidence.
Our main concern here is simply the existence and specificity of abductive
inference, and its spread application to perform customary reasoning. As
mentioned above, this article examines the theoretical implications of a model
for the logic of creativity in the sound domain. Our aim is to relate the
construction of an alternative hypothesis in the search for the best explanation
for a phenomenon, with the possibility of simulating an artificial evolution using
evolutionary algorithms. EC simulates an artificial evolution categorized by
98"
NICS Reports
hierarchical levels: the gene, the chromosome, the individual, the specie, the
ecosystem. The result of such modeling is a series of optimization algorithms
that result from very simple operations and procedures (crossover, mutation,
evaluation, selection, reproduction) applied to a computer represented genetic
code (genotype). These procedures are implemented in a search algorithm, in
this case, a population-based search. The revolutionary idea behind
evolutionary algorithms is that they work with a population of solutions subject
to a cumulative process of evolutionary steps. Classic problem-solving methods
usually rely on a single solution as the basis for future exploration, attempting to
improve that solution. But there is an additional component that can make
population-based algorithms essentially different from other problem-solving
methods: the concept of competition and/or cooperation among solutions in a
population (Bäck, 2000). Essentially, the degree of adaptation of each
candidate solution will be determined in consonance with the effective influence
of the remainder candidates. As a competitive aspect, each candidate has to
fight for a place in the next generation. On the other hand, symbiotic
relationships may improve the adaptation degree of the population individuals.
Moreover, random variation is applied to search for new solutions in a manner
similar to natural evolution (Michalewicz & Fogel, 1998). This adaptive behavior
produced by EC is also related here with the notion of Abuductive reasoning.
4.2 Evolution and Musical Creativity
Probably, the most famous enquiry about the music creative capacity of
computers was formulated by Ada Lovelace. She realized that Charles
Babbage’s “Analytical Engine” - in essence, a design for a digital computer could “compose and elaborate scientific pieces of music of any degree of
complexity or extent”. But she insisted that the creativity involved in any
elaborated pieces of music, emanating from the Analytical Engine, would have
to be attributed not by the engine but by the engineer (Boden, 1998). She said:
“The Analytical Engine has no pretensions whatsoever to originate anything. It
can do [only] whatever we know how to order it to perform”. That Analytical
Engine have never been built, but Babbage supposes that, in principle, his
machine could be able of playing games such as checkers and chess by
looking forward to possible alternative outcomes, based on current potential
moves.
Since that, for many years artworks have been emerged from computer models
for many years. The main goal is to understand, either for theoretical or
practical purposes, how representational structures can generate behavior, and
how intelligent behavior can emerge out of unintelligent (machinery) behavior
(Boden, 1998). The usage of EC presented here can be seen as an effective
way to produce art based on an efficient manipulation of information. A proper
use of computational creativity is devoted to incrementally increase the fitness
99"
NICS Reports
of candidate solutions without neglecting their aesthetic aspects. A new
generation of computer researchers is applying EC and looking for some kind of
artistic creativity simulation in computers with some surprising results. The
ideas discussed here suggest an effective way of producing art, based on an
dynamic manipulation of information and a proper use of a computational model
resembling the Abductive processes, through EC with an interactive interface.
EC seems to be a good paradigm for computational creativity, because the
process of upgrading hypotheses is implemented as an interactive and iterative
population-based search.
5 Soundscape Meaning
The concept of musical meaning is controversial and has led to a myriad of
different perspectives in the philosophy of western music, and the problems of
musical meaning are conceptually even more daring when considering the pure
music, without words, a.k.a. instrumental music. This very distinct essence that
music has and its non-conceptual nature gives to that subject a distinct
consideration in modern aesthetics. It is from the rising of Modern Age that
these kind of problem emerges, when music looses its connection with the old
cosmologies that assured its proper role in the human knowledge and culture.
Roughly, since the music of Modern Age was understood in terms of language
and rhetoric analysis; a sort of special language, or the language of the
emotions, as in the philosophy of 19th century.
Notwithstanding, also in the 19th century, Edward Hanslick initiated a formalist
perspective of musical aesthetics that takes music as music, without any
necessary connection with emotions or natural language, for its
meaningfulness. Apart from the common-sense understanding of music, the
formalist approach dominated musicology and related fields in 20th century.
Regarding the problem of meaning, the formalist approach led to the question
of how music is understood by the human mind and the result of affection
reactions and emotions in the listener 14 . In the last century, music
psychologists, still in a very formalist perspective, furnished some hypothesis on
how the mind engages with musical form in (meaningful and affective) listening.
Mainly, it is assumed that the mind operates logically in listening to music
actively, and the models so far proposed in psychology are instantiations of a
deductive-inductive perspective (Huron, 2006; Meyer, 1956).
Those models claim that by exposition to a cultural environment the listener
deduces some general patterns of music structures that are inductively applied
14
Hanslick never denied that music induce emotions in the listener but considered that a
secondary effect a secondary one and claimed that the meaning of music is not by the mimesis
of emotions, as usually said, but by the perception of its structures.
100"
NICS Reports
to new listening situations, assuming the general inductive belief that the future
should conform to the past. Thus, a key concept of meaning in music is
expectation; a meaningful music is the one in which the listener can engage
structurally with it and predict consequent relations. Emotions arise in the
struggle of the expected patterns and that actual patterns the music display;
when they are similar there is a limbic reward for the efficient prediction, made
when the prediction is false, there is a contrastive valence that results in the
surprise effect (see Huron, 2006).
The process of acquisition of knowledge, or inquiry, as Peirce usually points
out, is not sufficiently accounted with a deductive-inductive model for the very
reason that before any deduction could be made, a hypothesis should be
presented to the mind. Abduction is the logical process by witch hypotheses are
generated. This threefold logical model of inquiry offers another viewpoint to
consider musical meaning and affect, not opposed to the models of music
psychology but complementary to them. In fact, through the perspective of the
Logic of Discovery, creativity turns out to be a logical process, instead of a
mysterious and obscure one, beyond understanding. The abductive creation of
hypothesis is the very basis of inquiry and, by extension, of knowledge itself. In
Peirce’s philosophy, this threefold logicality is involved in any process of
signification, assuming the possibility of different distributions of the three kinds
of reasonings in each particular case. The maxim of pragmatism, as formulated
by Peirce, claims that the whole meaning of an idea is the sum of all the
practical consequences of such idea. In this sense, the concept of meaning is a
matter of: habits and believes, that, consequently, govern our actions. Habbits
and beliefs are firstly and priorly design by abduction. There is, thus, a
connection between logic, habit and action, in the pragmatic conception of
meaning.
Musical (structural) listening is an action (as much as thought is an action for
Peirce). As such, it is active rather than passive. This action, as any action, is
guided by beliefs 15 and habits, that form a conceptual space which is the
interface between the listener and his cultural ambient (Boden, 1994). It is in the
coupling interaction between habits and structures that music becomes
meaningful and affective. Habits are created by the logical process of Abductive
reasoning. In ordinary music listening, when the audience is familiar with the
stimuli, i.e., it is culturally embedded and have habits embodied that respond
properly to that music genre, listening might be a more deductive-inductive
logical process. The more predictable is the music, the more inductive is its
thinking action. In listening situations with unfamiliar music or when a music
piece presents non-culturally-standards structures, habitual action might not
15
For the relevance of belief in aesthetic appreciation see, for instance, Aiken (1949; 1951).
101"
NICS Reports
conform to that structures and expectations could not be derived properly. This
music requires a process of habit reformulation by the active listener, i.e.,
Abduction.
The conceptual space is altered every time a new habit is called into existence,
shifting the listening experiences from that moment. That is why one could have
a lifelong listening experience with one piece of music and it is absolutely not
the repetition of such experience over and over again. Even if that daily
appreciation is made with the same recording of the piece, the conceptual
space is not the same because it is dynamically altered by abduction
processes. Signification is an emergent property of such conceptual space, i.e.,
the dynamic coupling of a listener (with his audition history embodied as habits
and beliefs) and musical works (culturally embedded).
Similarly, in the case of soundscapes, the conceptual space is also created and
recreated by the Abductive reasoning of listeners, when they recognize and
even contribute to it, as parts of this environment (such as in a crowded
audience).
Soundscapes are formed anywhere as long as there is at least one listener to
abduct it. As asked by the old riddle; “If a tree falls in the forest and no one is
around to hear it, does it make a sound?”. If there is no listener to abduct the
meaning of the sound waves generated by this natural process, there is no
soundscape, as its meaning depends upon its reasoning.
In the case of RePartitura, the EC computer model that synthesizes
soundscapes, attempts to create a doorway to pass through the signification
emerged from the habits acquired by the artist during the drawing collection
production, into a population of sound objects whose genotype is given by the
drawings features mappings. The conceptual space of the synthesized
soundscape is dynamically recreated in a self-similar fashion, which guarantees
that a listener, although not (yet) able to participate of its recreation, can easily
abduct its perpetuated meaning.
6 Discussion
RePartitura, is a computational model-based that attempts to create artificial
abduction; thus emulating the reasoning process that an artist has when
creating an artwork. The artist abducts done since the first insight, when this
one has its initial idea of creating a piece of artwork, and afterwards, during the
process of its confection, when habits are developed while the artwork is being
shaped and reshaped according to the bounding conditions imposed by the
environment, being they external (e.g. material, ambient, etc.) or internal (e.g.
subjective, affective, mood, willingness, inspiration, etc.). To model that in a
computational system, we used an evolutionary sound synthesis system, the
102"
NICS Reports
ESSynth, based on EC methodology, that was inspired on the natural evolution
of species, as described by Darwin. EC is sometimes defined as a nonsupervised method of seeking solution, mostly used for problems not-well
defined (non-deterministic). The idea of a non-supervised method that is able of
finding complex solutions, such as the creation of living beings, without the
supervenience of an even more complex and sophisticated system, such that
would be an “intelligent designer”, is the core of Darwinism and is being
increasingly used in a broad range of fields in order to try to explain the natural
law that allow systems to be self-organized and/or becoming autopoietic. For
that perspective, a complex system can emerge as habits of its compounding
agents, under the influence of permeating laws that regulate their environment
and their mutual interactions. Similarly, abduction can be seen as a mental
process that allow us to naturally identify the self-similarity of a self-organized
system. Peirce himself acknowledges that abduction must be a product of
natural evolution, when he points out that: “...if the universe conforms, with any
approach to accuracy, to certain highly pervasive laws, and if man’s mind has
been developed under the influence of these laws, it is to be expected that he
should have a natural light, or light of nature, or instinctive insight, or genius,
tending to make him guess those laws aright, or nearly aright” (Peirce, 1957)
(Peirce, ed. 1957). As an adaptive model that generates self-organized
soundscapes, considered here as embodying aesthetic value, RePartitura
seemed to fulfill the pre-requisites of being a system that presents a form of
Artificial Abduction.
As the sound objects population of RePartitura evolves in time, so does its
soundscape. Thus, new sound events can emerge during this process. In the
computational implementation presented here, we didn’t set an interaction of
the system with the external world. This can be further done using common
sensors such as the ones for audio (microphone) and/or image (webcam).
Nevertheless, the soundscape will present ripples in its cognitive surface of selfsimilarity, which is welcome. We had RePartitura exhibit for several days in an
art gallery (Sesc - São Paulo, 2009) and it was interesting to realize that,
despite the long hours of exposition in this sonic ambient, it did not tire the
audience as much as it should if it where given by the same acoustic
information, although its overall sound was always very similar. This feature is
found in natural soundscapes, such as the sonic ambient nearby waterfalls,
forests, or by the sea. These seemingly constant sonic information have a
soothing affective response for most of people. Maybe it is done by the fact that
our abduction reasoning is always activated to keep track of the continuity of
sameness. Expectations will, however, be minimal as, cognitively speaking, this
information doesn’t bring novelty to uprise limbic reactions, as the ones related
to: fight, flight or freeze. This prosody is smooth, as being similar, yet enticing,
as it brings a constant flux of perceptual change. We might say, in poetic terms,
103"
NICS Reports
that the prosody of a soundscape is Epic, as it describes a thread of perceptual
change; a cognitive never-ending sonic story, instead of Dramatic, as it
normally doesn’t startle emotive reactions in the listeners by drastic changes in
their expectations (Huron, 2006).
If aesthetic appreciation were governed only by the subjective opinion, there
would not be means to obtain automatic forms of artistic production, with some
aesthetic value, without a total integration human(artist)-machine. On the other
hand, if the rules and laws that conduct art creation did not allow the
maintenance of a set of degrees of free expression, then the automation would
be complete, despite the apparent complexity of the artwork. Since both
extremes do not properly reflect the process of artistic production, the general
conclusion is that there is room for automation either in the exploration of
degrees of free expression, through a human- machine interactive search
procedure, or in the application of mathematical models capable of
incorporating general rules during the computer-assisted creation. In few words,
the degrees of freedom can be modeled, in the form of optimization problems,
and the general rules can be mathematically formalized and inserted in
computational models, as restrictions or directions to be followed by the
algorithm. The single trait of each creation will be understood as the result of a
specific exploration of the search space, by the best blend of free attributes
among all possibilities.
7 Conclusion
We started this article describing that the drawings used in RePartitura explored
the development of a gesture over a period of time. The drawings showed
pattern changes according to the day of its execution. The pattern variation was
associated with physical and physiological influences. The analysis of pattern
variation led us to associate the formation gesture to acquisition of habit and it’s
breaking up. The acquisition of habit was associated with gradual and
progressive aspect of the drawings (elongated, narrow curve aspect). The
breaking up of habit was associated with the influence of chance (resulting in
drawings with overflow of ink). The first was characterized by drawings with less
visual information and the second more visual information. In turn, all these
ideas were associated with Peircean perspective on the formation of habits.
In RePartitura we used the ESSynth for the creation of computer-generated
soundscapes where the formant sound objects are generated from patterns and
invariance’s of the drawings. The image invariances were identified and
parameterized to create genotypes of sonic objects, which became individuals
within a sonic evolutionary ambient. The sound objects orchestrate a sonic
environment that is always acoustically original but, perceptually this one
104"
NICS Reports
withholds enough self-similarity to enable any listener to easily recognize and
discriminate it.
The soundscape meaning is different from the musical meaning due to its
absence of a prior and paradigmatic syntax. Soundscapes have a discourse
less affective, but rather more perceptual and cognitive, thus differing from the
traditional aesthetic of Western music. However, some relations can be
observed if one compares the components of soundscapes with traditional
concepts employed in music analysis. For instance, a soundmark or a signal
may have the rule a theme or motive usually has; motivic developments are
made on similarities and differences in the spectro- morphology of sound
objects and the relations on these sound objects are unique for each
composition, as the thematic development of a symphony that has no other one
similar to it. But besides this similarities, the absence of a priori syntactical rules
makes the listening less directional and opened to other alternative ways of
understanding it. However, the signification over this less directional listening
occurs by the very same logical processes: a deductive-inductive bases
updated and adapted by abductive inferences. But soundscape meaning is
more abductive because it has not a priori syntactical rules of development that
can be presumed by the listener and incorporated in his listening habits and
aesthetical beliefs. Thus, each soundscape is an unique aesthetic experience
that calls for the logic of guessing more often to be understood. We may say
that evolutionary soundscapes are twice abductive, as adaptation and
abduction occurs together in such sonic environment, by its algorithmic
generation, as well in its listener’s meaningful and affective appreciation, as a
piece of art.
References
1. Bäck T, Fogel, David B, Michalewicz Z (eds)(2000) Evolutionary Computation
2: Advanced Algorithms and Operators. Institute of Physics Publishing
2. Boden M (1996) What is creativity? In: M. Boden (ed), Dimensions of
creativity, pp. 75-117. MIT Press, London.
3. Boden M (1998) Creativity and Artificial Intelligence. Elsevier Science:
Artificial Intelligence (1996) 103, 347 - 356.
4. Csikszentmihalyi M (1996) Creativity: Flow and the Psychology of Discovery
and Invention. HarperPerennial, New York
5. Chaitin G.J. (1990) Information Randomness and Incompleteness. World
Scientific, ISBN: 981-02-0154-0, Singapore
6. Chibeni, S.S. (1996) Cadernos de História e Filosofia da Ciência Series 3.
Center from Epstimology and Logic, Unicamp, 6(1): 45 - 73.
105"
NICS Reports
7. Fornari, J. Manzolli, J., Maia Jr., A. Damiani, F. (2001) The Evolutionary
Sound Synthesis Method. In: proceedings of ACM Multimedia, Toronto
8. Fornari, J. Maia Jr, A. Manzolli, J. (2000) Soundscape Design through
Evolutionary Engines. Special Issue “Music at the Leading of Computer
Science”. JBCS - Journal of the Brazilian Computer Society - ISSN 0104-6500
9. Fornari J., Shellard M., Manzolli J. (2009) Creating Evolutionary
Soundscapes with Gestural Data. Article and presentation. SBCM - Simpósio
Brasileiro de Computação Musical
10. Fornari J., Shellard M. (2009) Breeding Patches, Evolving Soundscapes.
Article presentation. 3rd PureData International Convention - PDCon09. São
Paulo
11. Harman G. (1965) The inference to the best explanation. In Philosophical
Review, 74(1): 88 - 95
12. Holland J H. Emergence: from chaos to order. Helix Books, Addison-Wesley
13. Huron D. (1998). Sweet anticipation: music and the psychology of
expecta tion.The MIT Press, Cambridge
14. Manzolli J. (1996) “Auto-organização um Paradigma Composicional”. In
Auto- organização: Estudos Interdisciplinares, Campinas, CLE/Unicamp, ed.
Debrun, M.; Gonzales, M.E.Q.; Pessoa Jr., O., p.417-435.
15. Meyer L. B.. (1956) Emotion and Meaning in Music. Chicago University
Press. Chicago
16. Manzolli J. Verschure P. (2005) Roboser: a real-world Composition System.
In Computer Music Journal, Fall 2005, Vol. 29, No. 3, Pages 55-74
17. Moroni A., Manzolli J., Von Zuben F. Gudwin R.(2000) Vox Populi: An
Interactive Evolutionary System for Algorithmic Music Composition. In Leonardo
Music Journal, 10:49-54.
18. Moroni A., Manzolli J., Von Zuben F. (2005) Artificial Abduction: a
cumulative evolutionary process: In Semiotica. Volume 2005, Issue 153 - 1/4,
Pages 343-362, ISSN (Online) 1613-3692, ISSN (Print) 0037-1998
19. Oliveira L.F., Haselager W.F.G., Manzolli J. Gonzalez (2008) “Musical
meaning and logical inference from the perspective of Peircean pragmatism”.
Proceedings of the IV Conference on Interdisciplinary Musicology (CIM08), C.
Tsougras, R. Parncutt (Eds.), Thessaloniki, Greece
20. Peirce C. S. (1931âĂŞ1965) The Collected Papers of Charles S. Peirce, 8
vols. Cambridge: Harvard University Press. (Reference to Peirce’s papers will
be designated CP followed by volume and paragraph number.)
106"
NICS Reports
21. Peirce C. S. (1957) Essays in the Philosophy of Science, Vincent Tomas
(ed.) Bobbs-Merrill, New York
22. Peirce C. S., Hartshorne C., Weiss P. Collected Papers of Charles Sanders
Peirce, Volumes V and VI: Pragmatism and Pragmatism and the Scientific
Metaphysics (1966). ISBN: 0-67413-802-3
23. Peirce C. S., Hoopes J (1991) Peirce on Signs: Writing on Semiotic. The
University of North Caroline Press, USA
24. Murray R., Schafer M. (1957) “The Soundscape”. ISBN 0-89281-455-1
25. Truax B. (1879) “Handbook for Acoustic Ecology”. ISBN 0-88985-011-9
107"
108"
NICS Reports