NR Edicao 01 - NICS
Transcrição
NR Edicao 01 - NICS
Expediente NICS Reports NICS Reports Periódico eletrônico do Núcleo Interdisciplinar de Comunicação Sonora – NICS Universidade Estadual de Campinas UNICAMP Editores: Jônatas Manzolli, NICS/UNICAMP José Fornari, NICS/UNICAMP Marcelo Gimenes, NICS/UNICAMP Endereço: Rua da Reitoria, 165 Cidade Universitária Zeferino Vaz 13.083-872 Campinas, SP Telefones: +55 (19) 3521-7770 +55 (19) 3521-2570 E-mail: [email protected] Site: http://www.nics.unicamp.br/nr Suporte Técnico Edelson Constantino [email protected] NICS Reports Sumário Editorial ................................................................................................................ 5! Artigos ................................................................................................................. 7! 1.! The pursuit of happiness in music: retrieving valence with contextual music descriptors ............................................................................................. 9! 2.! Panorama dos modelos computacionais aplicados à musicologia cognitiva ......................................................................................................... 27! 3.! An a-life approach to machine learning of musical worldviews for improvisation systems .................................................................................... 52! 4.! Vox Populi: An Interactive Evolutionary System for Algorithmic Music Composition ................................................................................................... 71! 5.! Abduction and Meaning in Evolutionary Soundscapes ........................... 86! NICS Reports NICS Reports Editorial A edição inaugural do NICS Reports (NR) apresenta artigos que estão relacionados a dois temas que se tornaram recorrentes durante os últimos anos de pesquisa do Núcleo. O primeiro deles é a utilização de métodos da Computação Evolutiva como os algoritmos genéticos e outros processos derivados com o objetivo de produzir diversidade musical. Os processos evolutivos desenvolvidos no NICS foram pioneiros no tratamento dessa questão fundamental em composição e design sonoro com suporte computacional que objetiva empreender um estudo em modelos computacionais que vislumbre na criatividade sonora um campo amplo de exploração de sonoridades, tão amplo como o próprio processo biológico que inspira a Computação Evolutiva. Esperamos que esses dois temas e os próximos que virão possam suscitar novas leituras de questões fundamentais sobre composição, performance e processos interativos musicais. Almejamos também que a leitura da primeira edição do NR possa inspirar novos caminhos e estudos. Campinas, outubro de 2012 Os Editores Editorial O segundo tema que a edição inaugural aborda é a Cognição Musical estudada a partir de simulação computacional. Nesse sentido o computador opera como um simulacro das inúmeras potencialidades que o processo cognitivo humano produz quando focado na criação e performance musical. Trata-se de uma abordagem sistêmica na qual esse imenso universo, que é a interação da percepção com o meio sonoro, é traduzido em aspectos específicos que são tratados por modelos computacionais. NICS Reports NICS Reports Artigos Artigos NICS Reports 8" NICS Reports 1. The pursuit of happiness in music: retrieving valence with contextual music descriptors1 José Fornari Interdisciplinary Nucleus for Sound Communication (NICS), University of Campinas (Unicamp), Brazil [email protected] Tuomas Eerola Music Department, University of Jyvaskyla (JYU), Finland [email protected] Abstract. In the study of music emotions, Valence is usually referred to as one of the dimensions of the circumplex model of emotions that describes music appraisal of happiness, whose scale goes from sad to happy. Nevertheless, related literature shows that Valence is known as being particularly difficult to be predicted by a computational model. As Valence is a contextual music feature, it is assumed here that its prediction should also require contextual music descriptors in its predicting model. This work describes the usage of eight contextual (also known as higher-level) descriptors, previously developed by us, to calculate happiness in music. Each of these descriptors was independently tested using the correlation coefficient of its prediction with the mean rating of Valence, reckoned by thirty-five listeners, over a piece of music. Following, a linear model using this eight descriptors was created and the result of its prediction, for the same piece of music, is described and compared with two other computational models from the literature, designed for the dynamic prediction of music emotion. Finally it is proposed here an initial investigation on the effects of expressive performance and musical structure on the prediction of Valence. Our descriptors are then separated in two groups: performance and structural, where, with each group, we built a linear model. The prediction of Valence given by these two models, over two other pieces of music, are here compared with the correspondent listeners’ mean rating of Valence, and the achieved results are depicted, described and discussed. Keywords: music information retrieval, music cognition, music emotion. 1 Referência original deste trabalho: Fornari, J. and T. Eerola (2009). The Pursuit of Happiness in Music: Retrieving Valence with Contextual Music Descriptors. Computer Music Modeling and Retrieval. Genesis of Meaning in Sound and Music. S. Ystad, R. Kronland-Martinet and K. Jensen, Springer Berlin Heidelberg. 5493: 119-133. 9" NICS Reports 1 Introduction Music emotion has been studied by many researches in the field of psychology, such as the ones described in [1]. The literature mentions three main models used in the study of music emotion: 1) categorical model; originated from the work of [2], that describes music in terms of a list of basic emotions [3], 2) dimensional model; originated from the research of [4], who proposed that all emotions can be described in a Cartesian coordinate system of emotional dimensions, also named as circumplex model [5], and 3) component process model; from the work of [6] that describes emotion appraised according to the situation of its occurrence and the current listener's mental (emotional) state. Computational models, for the analysis and retrieval of emotional content in music, have also been studied and developed, in particular by the Music Information Retrieval (MIR) community, that maintains a repository of publication on its field (available at the International Society for MIR link: www.ismir.net). To name a few: in [7] it was developed a computational model for musical genre classification that is similar, although simpler, to the retrieval of emotions in music. In [8] it was provided a good example of audio feature extraction using multivariate data analysis and behavioral validation of its features. There are also several examples of computing models developed for the retrieval of emotional features evoked by music, such as in [9] and [10] that studied the retrieval of higher-level features of music, such as tonality, in a variety of music audio files. 1.1 The dynamic variation of appraised Valence In the study of the dynamic aspects of music emotion, [11] used a twodimensional model to measure emotions appraised by listeners along time, in several music pieces. The emotional dimensions described are the classical ones: Arousal (that ranges from calm to agitated) and Valence (that goes from sad to happy). This one used Time Series techniques to create linear models with five acoustic descriptors to predict each of these two dimensions, for each music piece. In [12] it was used the same listener’s mean ratings collected by [11] to develop and test a general model for each emotional dimension (i.e. one general model for Arousal and another one for Valence). This one used System Identification techniques to create its two models of prediction. In any case, these two studies described above, there was not made any effort to distinguish between musical aspects predicted by the descriptors that are related to the composition, given by its muscal structure or to its expressive performance. 10" NICS Reports 1.2 The balance between expressive performance and musical structure for the appraisal of Valence Music emotion is influenced by two groups of musical aspects. One, that is given by the structural features created by the composer and described in terms of musical notation. The other one relates to the emotions aroused in the listeners during the musician(s) expressive performance. The first group is here named as structural aspects and the second one, performance aspects. Sometimes the difference between a mediocre and a breathtaking interpretation of a musical structure relies on the performers’ ability of properly manipulate basic musical aspects such as: tempo, dynamics and articulation. Such skill often seems to be the key for the musician to recreate the emotional depths whose composer supposedly tried to convey in the musical structure. About this subject, in [13] it is mentioned that: “expert musical performance is not just a matter of technical motor skill; it also requires the ability to generate expressively different performances of the same piece of music according to the nature of intended structural and emotional communication”. Also, in [14] it is said that: “Music performance is not unique in its underlying cognitive mechanisms”. These arguments seem to imply that, in music, structure and performance both cooperate to evoke emotion. The question is to know how the musical structure and expressive performance cooperate and interact with each other on the appraisal of music emotion. There are several researches on this subject. For instance, [15] provided an overview of the state of the art in the field of computational modeling of expressive music performance. He mentioned three important ones. The KTH model; that consists in a set of performance rules that predict timing, dynamics, and articulation based on the current musical context [16]. The Todd model; that, in contrast, applies the notion of “analysis-by-measurement”, once that their empirical evidence comes directly from the ratings of the expressive performances [17]. Finally, there is the Mazzola model that is mainly based on mathematical modeling [18] (see the link: www.rubato.org). Recently, a Machine Learning approach has also been developed. This one builds computational models of expressive performance from a large set of empirical data (precisely measured performances made by skilled musicians) where the system autonomously seeks out significant regularities on the data, via inductive machine learning and data mining techniques [19]. As seen, finding the hidden correlations between musical structure and performance and its effects on music emotion is a broad field of research. Obviously, fully mapping this relation is beyond our scope. Here we intend to initiate an investigation on the subject, using our contextual descriptors as structural and performance ones. 11" NICS Reports The underlying musical aspects that influence the emotional state of listeners have been subject of research in a number of previous studies, although few isolated the influence of each other, sometimes leading to conflicting qualitative results. In fact, it seems that a thorough attempt of combining these aspects of music are still to be done, despite some researches, such as in [20] that described a quite comprehensive study with the “adjective circle”. There have been some other researches, such as [21], that studied the interaction of mode and tempo with music emotion, also studied by [22]. It would be, however, rather ambitious the intent of evaluating the interactions between tempo, dynamics, articulation, mode, and timbre in a large factorial experiment. We aim here to initiate an investigation, using our eight higher-level descriptors, on the prediction of appraised Valence and on how structural and performance features contribute to this particular musical emotion. Here we first show the prediction of Valence for each of our descriptors and for a linear model using them all. Following, we separate these descriptors in two groups: structural and performance, and create with each one a linear model to calculate Valence. This experiment firstly takes one piece of music, its correspondent Valence ground-truth, and calculates its prediction with each descriptor and with the linear model using all descriptors. Next, we take two other pieces of music and their Valence ground-truths to calculate their prediction with the structural and performance models. 2 The difficulty of predicting Valence As seeing in the results shown in [11] and [12], these models successfully predicted the dimension of Arousal, with high correlation with their groundtruths. However, the retrieval of Valence has proved to be difficult to measure by these models. This may be due to the fact that the previous models did not make extensive usage of higher-level descriptors. The literature in this field named as descriptor a model (usually a computational model) that predicts one aspect of music, emulating the perception, cognition or emotion of a human listener. While low-level descriptors account for perceptual aspects of music, such as: loudness (perception of sound intensity) or pitch (perception of fundamental partial), the higher-level ones account for contextual musical features, such as: pulse, tonality or complexity. These refer to the cognitive and aspects of music and deliver one prediction for each overall music excerpt. If this assumption is true, it is understandable why Valence, as a highly contextual dimension of music emotion, is poorly described by models using mostly low-level descriptors. Intuitively, it was expected that Valence, as the measurement of happiness in music, would be mostly correlated to the prediction of higher-level descriptors such as key clarity (major versus minor mode), harmonic complexity, and pulse 12" NICS Reports clarity. However, as described further, the experimental results pointed to another direction. 3 Designing contextual musical descriptors In 2007, during the Braintuning project (see Discussion section for details) we were involved in the development of computational models for contextual descriptors of specific musical aspects. This effort resulted in the development of eight higher- level music descriptors. Their design used a variety of audio processing techniques (e.g. chromagram, similarity function, autocorrelation, filtering, entropy measurement, peak detection, etc.) to predict specific contextual musical aspects. Their output is a normalized between zero (normally meaning the lack of that feature in the analyzed music excerpt) and one (referring to the clear presence of such contextual music aspect). These eight descriptors were designed and simulated in Matlab, as algorithms written in the form of script files that run music stimuli as digital audio files, in 16 bits of resolution, 44.1 KHz of sampling rate and 1 channel (mono). To test and improve the development of these descriptors, behavioral data was collected from thirty-three listeners that were asked to rate the same features predicted by these descriptors. They rated one hundred short excerpts of music (five seconds of length each) from movie sound tracks. Their mean rating was then correlated with the descriptors predictions. After several experiments and adjustments, all descriptors presented coefficient of correlation from 0.5 to 0.65 with their respective ground- truths. They are briefly described as following below. 3.1 Pulse Clarity This descriptor measures the sensation of pulse in music. Pulse is here seen as a fluctuation of musical periodicity that is perceptible as “beatings”, in a subtonal frequency (below 20Hz), therefore, perceived not as tone (frequency domain) but as pulse (time domain). This can be of any musical nature (melodic, harmonic or rhythmic) as long as it is perceived by listeners as a fluctuation in time. The measuring scale of this descriptor is continuous, going from zero (no sensation of musical pulse) to one (clear sensation of musical pulse). 3.2 Key Clarity This descriptor measures the sensation of tonality, or tonal center, in music. This is related to the sensation of how tonal an excerpt of music is perceived by listeners, disregarding its specific tonality, but focusing on how clear its perception is. Its scale is also continuous, ranging from zero (atonal) to one 13" NICS Reports (tonal). Intermediate regions, neighboring the middle of its scale tend to refer to musical excerpts with sudden tonal changes, or dubious tonalities. 3.3 Harmonic complexity This descriptor measures the sensation of complexity conveyed by musical harmony. In communication theory, musical complexity is related to entropy, which can be seen as the degree of disorder of a system. However, here we are interested in measuring the perception of its entropy, instead of the entropy itself. For example, in acoustical terms, white-noise could be seen as a very complex sound, yet its auditory perception is of a very simple, unchanging stimuli. The challenge here is finding out the cognitive sense of complexity. Here we focused only on the complexity of musical harmony, leaving the melodic and rhythmic complexity to further studies. The measuring scale of this descriptor is continuous and goes from zero (no harmonic complexity perceptible) to one (clear perception of harmonic complexity). 3.4 Articulation In music theory, the term articulation usually refers to the way in which a melody is performed. If it is clearly noticeable a pause in between each note in the melodic prosody, it is said that the articulation of its melody is staccato, which means “detached”. In the other hand, if there is no pause in between the notes of the melody, then it is said that this melody is legato, meaning “linked”. This descriptor attempts to grasp the articulation from musical audio files and attributing to it an overall grade that ranges continuously from zero (staccato) to one (legato). 3.5 Repetition This descriptor accounts for the presence of repeating patterns in a musical excerpt. These patterns can be: melodic, harmonic or rhythmic. This is done by measuring the similarity of hopped time-frames along the audio file, tracking repeating similarities happening within a perceptibly time delay (around 1Hz to 10Hz). Its scale ranges continuously from zero (not noticeable repetition within the musical excerpt) to one (clear presence of repeating musical patterns). 3.6 Mode Mode is the musical term referring to one of the eight modes in the diatonic musical scale. The most well-known are: major (first mode) and minor (sixth mode). In the case of our descriptor, mode refers to a computational model that calculates out of an audio file an overall output that continuously ranges from zero (minor mode) to one (major mode). It is somewhat fuzzy to intuit what its middle range grades would stand for, but the intention of this descriptor is mostly to distinguish between major and minor excerpts, as there is still ongoing 14" NICS Reports discussion on whether major mode carries in itself valence of appraised happiness, as well as minor mode accounts for sadness (see Discussion section for counter-intuitive result on this subject). 3.7 Event Density This descriptor refers to the overall amount of perceptually distinguishable, yet simultaneous, events in a musical excerpt. These events can also be: melodic, harmonic and rhythmic, as long as they can be perceived as independent entities by our cognition. Its scale ranges continuously from zero (perception of only one musical event) to one (maximum perception of simultaneous events that the average listener can grasp). 3.8 Brightness This descriptor measures the sensation of how bright a music excerpt is felt to be. It is intuitive to know that this perception is somehow related to the spectral centroid, which accounts for the presence of partials with higher frequencies in the frequency spectrum of an audio file. However other aspects can also be of influence in its perception, such as: attack, articulation, or the unbalance or lacking of partials in other regions of the frequency spectrum. Its measurement goes continuously from zero (excerpt lacking brightness, or muffled) to one (excerpt is clearly bright). 4 Building a model to predict Valence In the research on temporal dynamics of emotion, described in [11], Schubert created ground-truths with data collected from thirty-five listeners that dynamically measured the emotion categories depicted into a two-dimensional emotion plan that was then mapped into two coordinates, or dimensions: Arousal and Valence. Listener’s ratings variations were sampled every one second. The pruned data of these measurements, mean rated and mapped into Arousal and Valence, created the ground- truths that was used later in [12] by Korhonen, as well as in this work. Here, we calculated the correlation between each descriptor prediction and Schubert’s Valence ground-truth for one music piece, named “Aranjuez concerto”, by Joaquín Rodrigo. During the initial minute of this 2:45’ long piece of music, the guitar plays alone (solo). Then, it is suddenly accompanied by full orchestra, whose intensity fades towards the end, till the guitar, once again, plays the theme alone. For this piece, the correlation coefficient presented between the descriptors predictions and its Valence ground-truth are: event density: r = 0.59, harmonic complexity: r = 0.43, brightness: r = 0.40, pulse clarity: r = 0.35, repetition: r = 0.16, articulation: r = 0.09, key clarity: r = 0.07, mode: r = 0.05. 15" NICS Reports Then, a multiple regression linear model was created with all eight descriptors. The model employs a time frame of three seconds (related to the cognitive “now time” of music) and hop-size of one second to predict the continuous development of Valence. This model presented a correlation coefficient of r = 0.6484, which leaded to a coefficient of determination of: R2 = 42%. For the same ground-truth, Schubert’s model used five music descriptors: 1) Tempo, 2) Spectral Centroid, 3) Loudness, 4) Melodic Contour and 5) Texture. The descriptors output differentiation was regarded as the model predictors. Using time series analysis, he built an ordinary least square (OLS) model for this particular music excerpt. Korhonen’s approach used eighteen low-level descriptors (see [12] for details) to test several models designed with System Identification techniques. The best general model reported in his work was an ARX (Auto-Regressive with eXtra inputs). Table 1 shows below the comparison of results for all three models, in terms of best achieved R2 (coefficient of determination) in the measurement of Valence for the Aranjuez concerto. Table 1. Emotional dimension: VALENCE. Ground-truth: Aranjuez concerto. This table shows that our model performed significantly better than the previous ones, for this specific ground-truth. The last column of table 1 shows the achieved result for the descriptor prediction “event density”, the one that presented the highest correlation with the ground-truth. This descriptor alone presented better results than the two previous models. The results shown seem to suggest that higher-level descriptors can in fact be successfully used to improve the dynamic prediction of Valence. Figure 1 depicts the comparison between this ground-truth, given by the mean rating of Valence for the Aranjuez concerto ranked by listeners, and the prediction given by our multiple-regressive model, using all eight descriptors. 16" NICS Reports Fig. 1. Mean rating of the behavioral data for Valence (continuous line) and our model prediction (dashed line). It is seen here that, in spite of the prediction curve presents some rippling effect, when visually compared with the ground-truth (the mean-rating behavioral data), in overall, its prediction follows the major variations of Valence along with the music performing time, what resulted in a high coefficient of determination. As described in the next sections, the next step of this study was to distinguish between performance and structural aspect of music and to study how they account for the prediction of Valence. Hence, we separated our eight contextual descriptors into these two groups and created with them two new linear models; one to predict the performance aspects influencing the appraisal of Valence, and another one to predict its structural aspects. 4.1 Performance Model This model is formed by the higher-level descriptors related to the dynamic aspects of musical performance. These descriptors try to capture music features that are manipulated mostly by the performer(s) instead of the aspects already described in the musical structure (i.e. its composition). They are commonly related to musical features such as: articulation, dynamics, tempo and micro-timing variability. As the “dynamics” aspect is related to Arousal, as seen in [11, 12] and the examples studied here had their “tempo” aspect approximately unchanged, here we focused on “pulse clarity” and “brightness” aspects, as they also have 17" NICS Reports been used as descriptors of expressive performance in other studies, such as in [21]. We considered as belonging to the performance, the following descriptors: 1) articulation, 2) pulse clarity and 3) brightness. Articulation is a descriptor that measures how much similar musical events are perceptually separated to each other. This is a fundamental component of expressive performance that has been studied in many researches, such as in [22] where was analyzed the articulation strategies applied by pianists in expressive performances of the same scores. Articulation may also be seen as a musical trademark or fingerprint to help identifying a musical genre or the performing artist style. Pulse clarity is the descriptor that measures how clear, or perceptible, is the pulse in a musical performance. This is chiefly in the distinction between expressive performances characterized by an interpretation more towards the Ad Libitum (without clear pulse), or the Marcato (with clear pulse). Brightness is the descriptor that accounts for the musical aspects related to the variation of the perception of brightness along of an expressive performance. It scale will cover from Muffled (without brightness) to Bright (or brilliant). 4.2 Structural Model Structural descriptors are the ones that account for the static or structural aspects of a piece of music given by the composition musical score, or any other kind of notation, so they are supposed to be little influenced by the expressive performance aspects. Several researches have studied them, such as in [23]. We considered as structural descriptors the following: 1) mode, 2) key clarity, 3) harmonic complexity, 4) repetition and 5) event density. Mode is the descriptor that grades musical structure tonality. If the structure of the excerpt analyzed is clearly minor, the scale will have value near to zero, otherwise, if it is clearly major, the scale will have value towards one. If the music excerpt presents ambiguity in its tonality, or if it is atonal, its scale will have values around 0.5. Key Clarity measures how tonal a particular excerpt of music structure is. Its scale goes from atonal (e.g. electro-acoustic, serialistic, spectral music structures) to clearly tonal structures (e;g; diatonic, modal, minimalist structures). Harmonic Complexity is a descriptor that refers to the complexity of an structure in terms of its harmonic clusters, what is related to the perceptual entropy of: 1) chords progression and 2) chord structures. 18" NICS Reports Repetition describes the amount of repeating similar patterns found in the musical structure. This repetition has to happen in a sub-tonal frequency, thus perceived as rhythmic information. Event Density is the descriptor that accounts for the amount of perceptible simultaneous musical events found in a structure excerpt. They can be melodic, harmonic or rhythmic as long as they can be aurally distinctively perceived. 4.3 Valence prediction with Structural and Performance Models As before, here we also used the ground-truth developed by the work of [13] where thirty-five listeners rated the music emotion dynamically appraised in a circumplex model, for several pieces of music, and then mapped to the dimensions of Arousal and Valence. For this part we chose the Valence ratings of two musical pieces: 1) “Pizzicato Polka” by Strauss, and 2) “Morning” by Peer Gynt. The Valence ground-truths of them were chosen mainly because they presented a repeating musical structure with slight changes in the expressive performance, so both structural and performance models could be tested and compared. Figures 2 and 3 show the comparison between each Valence ground-truth and its prediction for the structural and performance models. These ones were created using multiple regression technique. Figure 1 shows the “Pizzicato Polka” example. It is seen here three curves overlapped: mean-rating, structural model prediction and performance model prediction. The “Mean Rating” curve is the Valence ground-truth. “Structural Model” curve is the prediction for the structural linear model, the same way as the “Performance Model” curve is the prediction for the performance model. Fig. 2. Structural and performance models for “Pizzicato Polka”, by Strauss. 19" NICS Reports “Pizzicato” is a simple and tonal orchestral piece where the strings are mostly played in pizzicato (i.e. strings plucked). Its musical parts are repeated several times. Each part has even two similar sub-parts (i.e. A = A1 + A2). The musical parts that compound this piece of music are shown in table 2. Table 2. Musical parts of “Pizzicato”. The second and most complex example is shown in Figure 3. It is the rating and predictions of Valence for the piece of music named “Morning”. This figure describes the results for the Rating (Valence ground-truth) and its predictions for the structural and performance models. Fig. 3. Structural and performance models for “Morning”, by Peer Gynt. “Morning” has a more advanced orchestration, whose melody swaaps between solo instruments and tonalities (key changes), although it still has a repetitive musical structure. The musical parts that constitute this piece are shown in table 3. Here, an extra column was included to describe what these changes represent in terms of musical structure. 20" NICS Reports Table 3. Musical parts of “Morning”. Finally, table 4 shows the coefficient of correlation for each piece of music, for the structural and performance models. Table 4. Experimental results for the overall correlation between the Valence groundtruths and the Performance and Structural models. As seen on the table above, the coefficient of correlation for these two pieces of music are approximately the same, where the structural model correlation is higher than the performance one, for the overall prediction of Valence. 5 Discussion This work was developed during the project named: “Tuning you Brain for Music”, the Braintuning project (www.braintuning.fi). An important part of it was the study of acoustic features retrieved from musical excerpts and their correlation with specific emotions appraised. Following this goal, we designed the contextual descriptors, here briefly described. They were initially conceived because of the lack of such descriptors in the literature. In Braintuning, a fairly large number of studies for the retrieval of emotional connotations in music were investigated. As seem in previous models, for the dynamic retrieval of contextual emotions such as the appraisal of happiness (represented here by the dimension of Valence), low-level descriptors are not enough, once they do not take into consideration the contextual aspects of music. It was interesting to notice that the prediction of Valence done by the descriptor “Event Density” presented the highest correlation with Valence ground-truth, while the predictions of “Key Clarity” and “Mode” correlated very poorly. This seems to indicate that, at least for this particular case, the perception of major or minor tonality in music (represented by “Mode”) or its tonal center (given by “Key Clarity”) is not relevant to predict Valence, as it could be intuitively inferred. What counted the most here was the amount of simultaneous musical events (given by “event density”), remembering that by “event”, it is here understood any perceivable rhythmic, melodic or harmonic stimuli. The first part of this 21" NICS Reports experiment chose the music piece “Aranjuez” because it was the one that the previous models presented the lowest correlation with Valence ground-truth. Although the result presented here is enticing, further studies are definitely needed in order to establish any solid evidence. The second part of this experiment studied the effects of expressive performance and musical structure on the appraisal of Valence. In “Pizzicato”, the rating curve starts near zero and then abruptly plummets to negative values (i.e. sad). During the musical part A, the rating rises until it becomes positive (i.e. happy), when part B starts. Both models approximately follow the rating and present a peak where the rating inverts, as part B starts. They both present a negative peak around 35s, where part A repeats for the first time. At the same time the rating declines a little but still remains positive (happy). Maybe this is related to the listeners’ memory of part A, and the models don't take into consideration their previous predictions. When part C starts, the rating rises sharply, as this is appraised as a particular "happy" passage of this music. Here, the performance model seems to present higher values (although wavy) than the structure model, until the beginning of part D, around 68s. Parts A-B- A repeats again around 92s where the rating shows a similar shape as before, although much narrower, maybe because the listeners have “recognized” this part. Here the performance model follows the rating closely. The structural model presents an abrupt rising between 110s and 120s where part B is taking place. In Coda, both models present positive predictions, but the rating is negative. In "Morning", the rating starts from negative (sad) and rises continuously until it reaches positive values (happy), when part A3 stars. This is understandable once that part A3 begins with an upward key change, which in fact delivers the appraisal of an joy. The rating keeps raising until the next “key change” in part A5, and reaches its highest values in part A6, from 50s to 80s, when the whole orchestra plays together the “A” theme, back to the original key. Both models start from values close to zero. They show a steep rise in values from part A1 to part A2 (more visible in the performance model prediction). When part A3 starts, both models predictions decrease and the performance model goes to the negative side. This may have happened because articulation and pulse clarity, as the descriptors within the performance model, decrease in values at this passage, as well as in 40s, when A5 starts. During part A6, the structural model prediction is more similar to the rating than the performance model, which makes sense once that this is mostly a structural change and the performance parameters almost remain still. The rating decreases when part B1 starts in 78s. This is expected once that, in this part, the music mode changes from major to minor. Consequently, at this moment, the performance model prediction almost remains unchanged. The structural model prediction rises from negative to near 22" NICS Reports zero (or positive) values and shows a peak around the beginning of part B2. When A7 starts in 123s the rating drops to negative values and rises continuously until 138s, when Coda starts. Both models do not follow this behavior. Structural model prediction remains positive as well as in any other part “A”. Performance model is also little affected by this passage. 6 Conclusion This work intended to investigate the usage of contextual descriptors for the prediction of the dynamic variation of music emotions. We chose to study the emotional dimension of Valence (here referred to as the perception of happiness in music) because this is a highly contextual aspect of music and known to be particularly difficult to be predicted by computational models. We briefly introduced eight contextual descriptors previously developed by us. They are: event density, harmonic complexity, brightness, pulse clarity, repetition, articulation, key clarity and mode. We used the same music stimuli and correspondent Valence ground-truths of two important models from the literature. Firstly, we selected a piece of music whose previous models did not reach satisfactory correlations in the prediction of Valence. We then predicted Valence with each descriptor and with a linear model made with all descriptors. The highest correlation descriptor was the "event density", presenting coefficient of determination higher than the ones presented by the previous models. Secondly, we studied the relation between the appraisal of Valence with the expressive performance aspects and musical structure ones. Our descriptor were then separated in two groups, one covering the structural aspects (mode, key clarity, harmonic complexity, repetition and event density) and the other for the performance ones (articulation, pulse clarity and brightness). Two models with each descriptor group were then created and named as: structure and performance. Although these models did not reach outstanding coefficients of correlation with ground-truths (around 0.33 for performance model and 0.44 for the structural one) they reached very similar coefficients for two pieces of music stylistically very distinct. This seems to indicate that the results of these models, despite their simplicity and limitations, are pointing to a further promising outcome. It also seems to make sense the results showing that the structural model presents a higher correlation with ground-truth than the performance one. The structural model accounts for a greater portion of musical aspects. The structure comprehends the musical composition, arrangement, orchestration, and so forth. In theory, it conveys “the seed” of all emotional aspects whose expressive performance is supposed to bring about. 23" NICS Reports There is a great number of topics that can be tested in further investigations on this subject. For instance, we did not take into consideration the memory aspects that will certainly influence the emotional appraisal of Valence. New models including this aspect, should consider principles found in the literature such as the forgetting curve and the novelty curve. We used rating data from the ground-truth of another experiment that, in spite of bringing enticing results, was not meant for this kind of experiment. In a further investigation, a new listeners’ rating data should be collected, with different performances of the same musical structure, as well as different structures of similar performances. This is a quite demanding task but that seems to be the correct path to be followed in order to enable the development of better descriptors and models. 7 Acknowledgements We would like to thank the BrainTuning project (www.braintuning.fi) FP6-2004NEST-PATH-028570, the Music Cognition Group at the University of Jyväskylä (JYU), and the Interdisciplinary Nucleus of Sound Communication (NICS) at the State University of Campinas (UNICAMP). We are specially grateful to Mark Korhonen, for sharing the ground-truth data from his experiments with us. References 1. Sloboda, J. A. and Juslin, P. (Eds.): Music and Emotion: Theory and Research. Oxford: Oxford University Press. ISBN 0-19-263188-8. (2001) 2. Ekman, P.: An argument for basic emotions. Cognition & Emotion, 6 (3/4): 169–200, (1992). 3. Juslin, P. N., & Laukka, P.: Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin(129), 770-814. (2003) 4. Russell, J.A.: Core affect and the psychological construction of emotion. Psychological Review Vol. 110, No. 1, 145- 172. (2003) 5. Laukka, P., Juslin, P. N., & Bresin, R.: A dimensional approach to vocal expression of emotion. Cognition and Emotion, 19, 633-653. (2005) 6. Scherer, K. R., & Zentner, K. R.: Emotional effects of music: production rules. In J. P. N. & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361-392). Oxford: Oxford University Press (2001) 7. Tzanetakis, G., & Cook, P.: Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302. (2002) 8. Leman, M., Vermeulen, V., De Voogdt, L., Moelants, D., & Lesaffre, M.: Correlation of Gestural Musical Audio Cues. Gesture-Based Communication in 24" NICS Reports Human-Computer Interaction. 5th International Gesture Workshop, GW 2003, 40-54. (2004) 9. Wu, T.-L., & Jeng, S.-K.: Automatic emotion classification of musical segments. Proceedings of the 9th International Conference on Music Perception & Cognition, Bologna, (2006) 10. Gomez, E., & Herrera, P.: Estimating The Tonality Of Polyphonic Audio Files: Cogtive Versus Machine Learning Modelling Strategies. Paper presented at the Proceedings of the 5th International ISMIR 2004 Conference, October 2004., Barcelona, Spain. (2004) 11. Schubert, E.: Measuring emotion continuously: Validity and reliability of the two- dimensional emotion space. Aust. J. Psychol., vol. 51, no. 3, pp. 154–165. (1999) 12. Korhonen, M., Clausi, D., Jernigan, M.: Modeling Emotional Content of Music Using System Identification. IEEE Transactions on Systems, Man and Cybernetics. Volume: 36, Issue: 3, pages: 588- 599. (2006) 13. Slodoba, J. A.: Individual differences in music performance. Trends in Cognitive Sciences, Volume 4, Issue 10, Pages 397-403. (2000) 14. Palmer, C.: Music Performance. Annual Review of Psychology. 48:115-38. (1997) 15. Gerhard, W., Werner, G.: Computational Models of Expressive Music Performance: The State of the Art. Journal of New Music Research 2004, Vol. 33, No. 3, pp. 203–216. (2004) 16. Friberg, A., Bresin, R., Sundberg, J.: Overview of the KTH rule system for music performance. Advances in Experimental Psychology, special issue on Music Performance, 2(2-3), 145-161. (2006) 17. Todd, N.P.M.: A computational model of Rubato. Contemporary Music Review, 3, 69–88. (1989) 18. Mazzola, G., Göller, S.: Performance and interpretation. Journal of New Music Research, 31, 221–232. (2002) 19. Widmer, G., Dixon, S. E., Goebl, W., Pampalk, E., Tobudic, A.: Search of the Horowitz factor. AI Magazine, 24, 111–130. (2003) 20. Hevner, K.: Experimental studies of the elements of expression in music. American Journal of Psychology, Vol. 48, pp. 246-268. (1936) 21. Gagnon, L., Peretz, I.: Mode and tempo relative contributions to "happy sad" judgments in equitone melodies. Cognition and Emotion, vol. 17, pp. 2540. (2003) 25" NICS Reports 22. Dalla Bella, S., Peretz, I., Rousseau, L., Gosselin, N.: A developmental study of the affective value of tempo and mode in music. Cognition, 80(3), B110. (2001) 21. Juslin, P. N.: Cue utilization in communication of emotion in music performance: relating performance to perception. J Exp Psychol Hum Percept Perform, 26(6), 1797–1813. (2000) 22. Bresin, R., Battel, G.: Articulation strategies in expressive piano performance. Journal of New Music Research, Vol.29, No.3, Sep 2000, pp.211-224. (2000) 23. BeeSuan O.,: Towards Automatic Music Structural Analysis: Identifying Characteristic Within-Song Excerpts in Popular Music. Doctorate dissertation. Department of Technology. University Pompeu Fabra. (2005) 26" NICS Reports 2. Panorama dos modelos computacionais aplicados à musicologia cognitiva2 Marcelo Gimenes Núcleo Interdisciplinar de Comunicação Sonora Universidade Estadual de Campinas [email protected] Resumo: Este artigo apresenta um panorama do estado da arte dos modelos computacionais que interessam à musicologia cognitiva. Alguns destes são inspirados em fenômenos naturais, tentando imitar, por exemplo, processos executados pela mente humana, enquanto outros não têm essa preocupação. Diferentes modelos podem co-existir em modelos mais complexos. Os sistemas são organizados considerando o fluxo da informação musical, desde a percepção dos sons e a aquisição de conhecimentos musicais até a manipulação deste conhecimento em processos criativos. Entre as abordagens apresentadas encontram-se sistemas baseados em regras, em gramática e que usam aprendizagem de máquina. Além desses, também são apresentados modelos baseados na computação evolutiva (e.g., algoritmos genéticos) e na vida artificial. Palavras-chave: musicologia cognitiva, modelos computacionais, inteligência artificial 1. Introdução Entre as muitas transformações sofridas pela ciência durante o século XX, o surgimento da computação, da inteligência artificial, das técnicas para obtenção de imagens do cérebro e, ao mesmo tempo, do declínio da popularidade da psicologia comportamental, entre outros fatores, conduziram ao que chamamos de revolução cognitiva (Huron, 1999). Progressivamente, um crescente interesse pelo estudo da memória, atenção, reconhecimento de padrões, formação de conceitos, categorização, raciocínio e linguagem (Huron, 1999) ocupou o espaço que antes pertencia à psicologia comportamental. Nesse contexto, surgem as ciências cognitivas como uma área interdisciplinar de pesquisa que reúne especialmente a filosofia, a psicologia experimental, as neurociências e a computação com o objetivo de estudar a natureza e a estrutura dos processos cognitivos. Para atingir este fim, um papel particularmente importante é exercido pela modelagem computacional, por proporcionar uma representação formal do conhecimento e a verificação experimental de diferentes teorias cognitivas. 2 Referência original deste trabalho: Gimenes, M. (2011). "Panorama dos modelos computacionais aplicados à musicologia cognitiva." Revista Cognição & Artes Musicais 3(2). 27" NICS Reports Acompanhando essas transformações, a musicologia, especialmente nas últimas décadas, adota uma perspectiva na qual a música não é somente vista como obra de arte mas, em particular, como um processo que resulta da atuação de diversos agentes (músicos, ouvintes, etc.) (Honing, 2006). Esta visão conduziu a novas vertentes musicológicas que passaram a emprestar do rigoroso método científico (teste e falsificação), a formalização do conhecimento (modelos computacionais) e o empirismo (busca de provas). Em vista desses fatos, a musicologia cognitiva (também conhecida como cognição musical ou musicologia computacional) conquista nas últimas décadas progressivamente cada vez mais adeptos interessados em estudar o pensamento musical ou, em outras palavras, os hábitos musicais da mente (Huron, 1999). Sendo um ramo das ciências cognitivas, a musicologia cognitiva possui o mesmo caráter interdisciplinar daquela, reunindo teorias e métodos desenvolvidos pela filosofia (e.g., teorias do conhecimento), psicologia (e.g., experimentalismo), neurociências (e.g., imagens do cérebro) e ciência da computação (e.g., simulação). O objeto de estudo da musicologia cognitiva é, portanto, a representação e o processamento (e.g., aquisição, armazenamento, geração) do conhecimento musical pela mente para o quê busca suporte nos modelos computacionais. Com o auxílio destes, simulações procuram demonstrar as teorias acerca dos processos cognitivos humanos. Obviamente, quanto mais próximo o modelo estiver das características destes processos, mais perto ele vai estar de atingir aquela finalidade. É sabido, contudo, que esses modelos ainda não alcançaram plenamente o objetivo de avaliar e falsificar as teorias que eles representam (Honing, 2006). Sendo uma atividade inteligente, a música oferece material abundante para a investigação das atividades cognitivas humanas. Na ciência da computação, a área que explora o comportamento inteligente é chamada de inteligência artificial. Linhas gerais, dois paradigmas são utilizados. O primeiro, chamado de modelos simbólicos, representam explicitamente as partes do problema sob análise através de um vocabulário de símbolos que corresponde a objetos e/ou conceitos, podendo ter um modelo do mundo no qual opera (Geraint Wiggins & Smail, 2000). A compreensão do resultado das operações do sistema é facilitada pela correspondência semântica existente com esses símbolos. O segundo paradigma, adota uma abordagem sub-simbólica, também conhecida como conexionista. Sistemas conexionistas organizam e manipulam o conhecimento através das chamadas redes neurais, um sistema de nós (processadores simples) que são interligados (vagamente) simulando as conexões dos neurônios no cérebro. Uma vez que esses processadores não têm uma relação de significado explícito com símbolos do mundo real, sua operação é de difícil compreensão. 28" NICS Reports Feitas estas considerações preliminares, as próximas seções irão apresentar um panorama diversos dos modelos computacionais utilizados pela musicologia cognitiva. Alguns deles, como veremos, se preocupam em implementar modelos teóricos que versam sobre a cognição humana e, portanto, interessam diretamente à musicologia cognitiva. Outros, contudo, adotam uma posição "engenherística", mais voltada ao resultado (criação musical) do que propriamente à descrição desses modelos. Optamos por incluir estes últimos pelo interesse que despertam e por possuírem muitos paralelos com os primeiros. Grosso modo, as seções estão organizadas de modo a acompanhar o fluxo da informação musical, desde a percepção dos sons, a aquisição e representação do conhecimento até processos de geração musical. Antes de iniciarmos a exposição desses modelos, a seção a seguir, "2. Experimentos em Inteligência Musical", irá apresentar os Experimentos em Inteligência Musical, um sistema que se tornou referência na área, a fim de termos uma visão geral de como sistemas computacionais podem exibir comportamento inteligente. Na penúltima seção, encerramos esse panorama com o sistema Ambientes Interativos Musicais (Musical Interactive Environments - iMe), que adota explicitamente modelos cognitivos para explorar a evolução musical. 2. Experimentos em Inteligência Musical David Cope (1991) iniciou o projeto Experimentos em Inteligência Musical (Experiments in Musical Intelligence - EMI) há cerca de 30 anos visando a simulação computacional de estilos musicais. A idéia inicial era criar um sistema no qual pudesse ser incorporado o modo com que ele manipulava suas idéias musicais. Se, a qualquer momento sentisse a necessidade de ajuda em função de um bloqueio mental, por exemplo, o sistema poderia ser usado para gerar automaticamente um número de novos compassos da mesma forma que ele faria pessoalmente. As implementações iniciais deste sistema codificaram o conhecimento musical através de regras para a escrita de partes. Cope relata que os resultados não foram muito satisfatórios e que o sistema produziu apenas "depois de muito ensaio e erro ... uma música sem sabor que, basicamente, aderia a essas regras" (Cope, 1999, p. 21). Partindo dessa experiência e vencidos os primeiros obstáculos, Cope passou a enfrentar uma série de outras questões, tais como qual seria a melhor maneira de segmentar as músicas originais ou como os segmentos deveriam ser reorganizados para que a música gerada pelo sistema tivesse sentido musical. Cope observou que os compositores tendem a reutilizar determinadas estruturas durante toda sua obra e que estas acabam por caracterizar seus estilos musicais. Ele descobriu que estes elementos duram entre 2 e 5 tempos 29" NICS Reports (7 a 10 notas melódicas), muitas vezes combinam estruturas melódicas, harmônicas e rítmicas e ocorrem normalmente de quatro a dez vezes em uma música (Cope, 1999, p. 23). A esses elementos recorrentes Cope deu o nome de "assinaturas". Numa fase posterior, Cope passou a experimentar com corais de Bach, segmentando-os em cada tempo dos compassos. O sistema analisava um corpo de peças musicais e extraía as assinaturas que eram, em seguida, categorizadas em léxicos. O sistema também armazenava as notas para as quais as vozes se moviam de um tempo para outro do compasso. Novamente, os resultados foram insatisfatórios: novas músicas tendiam a vaguear, sem uma estrutura definida de grande escala. O problema, desta vez, era que a lógica das frases musicais não estava sendo observada. Para resolver este problema, informações de estruturas globais tinham de ser incorporadas, juntamente com as regras de movimentação de uma nota para outra. Novos módulos de análise foram adicionados ao EMI para permitir a preservação do local que cada segmento ocupava nas seções das peças originais. O "caráter" de cada tempo, definido através de elementos como o ritmo e número de notas também tinha que ser preservado, a fim de garantir que a música produzida pelo sistema proporcionasse uma sensação de continuidade. De fato, transpostos esses obstáculos iniciais, as músicas que o sistema passou a produzir em seguida eram bastante convincentes, especialmente quando tocadas por músicos humanos. Se, de um lado, EMI é capaz de simular determinados estilos musicais, de outro, a arquitetura do sistema como um todo é extremamente complexa. Resumidamente, o processo se inicia com a elaboração de um banco de dados musicais, uma tarefa manual, tediosa e demorada que depende inteiramente da experiência musical do usuário. Uma série de peças semelhantes têm de ser escolhidas de forma a garantir que o resultado final seja consistente. Tonalidade, tempo e métrica devem ser considerados nesta análise. Cope mencionou certa vez que esta fase inicial, da seleção à codificação das músicas, durava vários meses de trabalho (Muscutt, 2007). Uma vez pronto o banco de dados musicais, EMI analisa as peças e deduz assinaturas musicais e regras para a composição. Um algoritmo completo de busca de padrões é aplicado sobre o material de entrada e todas as possibilidades (resultados parciais ou totais) são calculados estatisticamente. Todos os segmentos são marcados para as funções estrutural hierárquica e harmônica. A conectividade das estruturas também é checada para melodia, acompanhamento e harmonia. Durante a recombinação (geração de novo material), as assinaturas devem sobreviver, mantendo sua forma original (relações intervalares) e o contexto 30" NICS Reports local. A estrutura global de uma das composições originais é usada como referência para as novas músicas produzidas. O sistema fixa as assinaturas em seus locais de origem e depois preenche as lacunas com base nas regras encontradas durante a análise estatística. Para isso, o sistema utiliza uma Rede de Transição Aumentada (Augmented Transition Network - ATN) (Woods, 1970), uma estrutura utilizada na definição das línguas naturais e "projetada para produzir sentenças lógicas a partir de pedaços de frases e peças que tenham sido armazenados de acordo com a função de sentença" (Cope, 1991, p. 26)3. Finalmente, Cope ouve cada uma das peças geradas pelo sistema e mantém aquelas que considera mais convincentes, em média, uma em cada quatro ou cinco peças que são descartadas (Muscutt, 2007). 3. Percepção musical As pessoas são capazes de fazer generalizações e de aprender conceitos musicais elementares (e.g., alturas, escalas) a partir de exemplos musicais. Esse conhecimento, uma vez adquirido, passa a ser o ponto de partida para a apreciação de novas peças musicais (Cambouropoulos, 1998, p. 31). A modelagem da percepção humana envolve, portanto, a descoberta de estruturas de diferentes tipos e hierarquias. Cambouropoulos (1998) propôs um modelo computacional teórico denominado Teoria Geral Computacional da Estrutura Musical (General Computational Theory of Musical Structure - GCTMS) que tem por objetivo precisamente descrever os componentes estruturais da música. O modelo propõe captar elementos que seriam reconhecidos por um ouvinte e, conseqüentemente, inclui conceitos típicos das habilidades cognitivas humanas (e.g., abstração, reconhecimento de identidades e/ou semelhanças e categorização). O GCTMS é constituído por uma série de componentes que abordam separadamente cada tarefa analítica. Um deles, a Representação de Intervalos de Alturas Gerais (General Pitch Interval Representation - GPIR), codifica a informação musical. O Modelo de Detecção de Limites Locais (Local Boundary Detection Model - LBDM) é responsável pela segmentação e os Modelos de Estruturas de Acentuação e Métrica (Accentuation and Metrical Structure Models - AMSM), pela definição de modelos estruturais. Segundo o autor, esse sistema não requer que a música seja previamente marcada com elementos de nível estrutural e pode consistir em apenas uma seqüência de eventos simbólicos (notas, etc.) que o sistema traduz para a sua representação interna. Uma vez obtida a representação, o próximo passo é a 3 O aprofundamento desse tema fugiria ao escopo deste texto. Maiores informações podem ser obtidas em (Woods, 1970). 31" NICS Reports segmentação do fluxo musical, para a qual o GCTMS leva em conta princípios da psicologia Gestalt (Bod, 2001). A palavra Gestalt significa "forma" em alemão e contém a idéia de que os sentidos humanos são orientados pela percepção do todo (e.g., uma entidade física, psicológica ou simbólica) antes da percepção das partes. Grupamentos melódicos, por exemplo, podem ser definidos em razão da sua similaridade (movimento ascendente e/ou descendente), ou proximidade (ocorrência de pausas). Esses grupamentos são realizados pela memória de curto prazo 4 (Snyder, 2000). Os conceitos da Gestalt vêm sendo adotados por diversos pesquisadores (McAdams, 1984; Polansky, 1978; Tenney & Polansky, 1980). Deutsch (Deutsch, 1982a, 1982b), por exemplo, analisa como as regras da Gestalt podem ser aplicadas a combinações de notas. A Teoria Geradora da Música Tonal (Generative Theory of Tonal Music - GTTM) de Lerdahl e Jackendoff (1983) também usa princípios da Gestalt para definir segmentos e agrupamentos. A segmentação musical, uma questão básica para muitos dos sistemas que exploram a cognição e/ou a análise musical, permanece, em grande medida, apesar dos muitos progressos alcançados até hoje, um problema de difícil solução em função da infinidade de parâmetros (melodia, ritmo, etc.) e níveis de hierarquia a serem considerados. Em muitos casos, os segmentos se sobrepõem, o que indica a possibilidade de haver várias soluções aceitáveis. Para lidar com esses problemas, diversos sistemas aplicam filtros para simplificar a entrada de dados. Ao invés de considerar a altura das notas, por exemplo, podem ser usadas as distâncias intervalares (Deutsch, 1982b). Obviamente, esta e outras estratégias têm o potencial de comprometer os resultados da segmentação, algo que deve ser levado em consideração caso a caso. Diversos sistemas (Baker, 1989a, 1989b; Camilleri, Carreras, & Duranti, 1990; Chouvel, 1990; Hasty, 1978) adotam diferentes algoritmos para lidar com a segmentação. O LDBM, mencionado acima, constrói uma representação de intervalos a partir da seqüência de notas musicais. Em seguida, tenta detectar "descontinuidades perceptuais" ou "limites de percepção", através de parâmetros como duração (notas longas/curtas) e saltos melódicos. Para descobrir os pontos máximos de mudança local são aplicadas duas regras 4 Em um modelo funcional simplificado, a memória pode ser descrita através de três processos (memória ecóica, memória de curto prazo e memória de longo prazo) que correspondem a diferentes níveis temporais da experiência musical. A memória de curto prazo processa eventos separados por mais de 63 milissegundos (16 eventos por segundo), o nível dos grupamentos melódicos e rítmicos (Snyder, 2000). 32" NICS Reports (mudança de identidade e regra de proximidade), inspiradas nos princípios de semelhança e proximidade da Gestalt. Com base nessa análise, para cada par de notas de uma melodia é atribuído um coeficiente de descontinuidade que determina a "força da divisão". Thom et al (2002) apresentaram uma revisão abrangente de diversos algoritmos de segmentação melódica comparando os resultados com a segmentação executada por músicos. Entre os algoritmos analisados encontrase, além do já mencionado LDBM, o sistema Grouper, proposto por Temperley (2004) e que se baseia em um conjunto de regras de preferência (lacunas, extensão de frase, paralelismo métrico, etc.) adaptadas da já citada GTTM (1983). O modelo Implicação-Realização (Narmour, 1990), também inspirado em princípios da Gestalt, envolve a análise de processos que ocorrem na percepção de estruturas melódicas. "Estruturas de implicação" são as expectativas que orientam a percepção e a criação musical e correspondem às influências estilísticas recebidas através da exposição a contextos musicais. Essas estruturas conduzem a "estruturas de realização", que são arquétipos para possíveis continuações das estruturas de implicação. Alguns pesquisadores usam o conceito de agentes inteligentes para definir critérios e implementar algoritmos de segmentação. Gimenes (2008), por exemplo, aplica conceitos da Gestalt a uma combinação de "informações perceptivas" (e.g., direção melódica, salto melódico, o intervalo melódico entre ataques, etc.), extraídas dos "órgãos sensoriais" de agentes inteligentes. No sistema Cypher (Rowe, 2004) categorias diferentes de agentes são especializadas em parâmetros musicais diferentes (harmonia, registro, dinâmica, etc.). Após a segmentação, uma vez definidas estruturas locais, muitas vezes, em função da tarefa analítica em questão, é necessário que estas sejam comparadas. Estabelecer que duas estruturas são iguais é algo relativamente fácil de fazer. Encontrar estruturas "semelhantes", por outro lado, é algo bem mais difícil. Visando a contribuir para a solução desta questão, Martins et al (2005) propuseram um algoritmo para medir a similaridade entre sub-seqüências em um espaço geral rítmico usando uma estrutura chamada Vetor de Coeficientes de Similaridade. Neste modelo, capaz de comparar estruturas rítmicas de tamanhos diferentes, todas as sub-seqüências de um determinado ritmo são comparadas. Uma subdivisão hierárquica das seqüências de ritmo é feita em vários níveis e uma matriz de distância para cada nível é calculada usando uma medida conhecida como "distância de bloco". A informação sobre a similaridade das sub-estruturas rítmicas é então recuperada a partir das 33" NICS Reports matrizes de distâncias e codificadas para o Vetor de Coeficientes de Similaridade5. 4. Conhecimento Musical Esta seção apresenta alguns sistemas computacionais tendo em vista a aquisição e a armazenagem do conhecimento musical. 4.1 Sistemas baseados em regras Sistemas baseados em regras, também conhecidos como sistemas especialistas ou baseados em conhecimento, tentam encapsular explicitamente o conhecimento especialista humano em um determinado domínio. No caso da música, a quantidade de elementos que devem ser tratados de forma eficiente para descrever uma peça musical é enorme, fato que explica os muitos problemas dessa abordagem. Um exemplo de sistema musical baseado em regras é CHORAL, proposto por Ebcioglu (1988). Este sistema codifica cerca de 350 normas destinadas à harmonização de melodias no estilo coral de Bach e aborda aspectos como progressões de acordes e linhas melódicas das partes. Pachet (1998) propôs um sistema para explorar variações harmônicas em seqüências de acordes de jazz. Uma dessas variações é a conhecida "regra de substituição pelo trítono" segundo a qual um acorde dominante (ch1) pode ser substituído por outro acorde dominante (ch2), em que a raiz de ch2 é a quarta aumentada (ou trítono) de ch1. Esta substituição é possível uma vez que o terceiro e o sétimo graus de ch1 correspondem ao sétimo e terceiro graus de ch2. A Figura 1 abaixo mostra um acorde maior de dó dominante e a sua correspondente substituição pelo trítono. Figura 1: Substituição pelo trítono. Outra substituição de acordes muito utilizada também é aplicável aos acordes dominantes e consiste na preparação destes por acordes de sétima menor com base no segundo grau da escala local. Também é de Pachet (1994, p. 1) o sistema MusES, que tem como objetivo experimentar "várias técnicas de representação do conhecimento orientadas a objeto no campo da harmonia tonal". Este sistema faz análises de seqüências de acordes de jazz, assim como gera automaticamente harmonizações e improvisações. 5 O aprofundamento desse tema fugiria ao escopo deste texto. Maiores informações podem ser obtidas em (Martins et al., 2005). 34" NICS Reports 4.2 Sistemas baseados em gramática A música, assim como a linguagem, é constituída por seqüências de estruturas ordenadas e pode, desse modo, ser descrita em termos gramaticais. Gramáticas são conjuntos finito de regras, que permitem a descrição de uma coleção potencialmente infinita de símbolos estruturados (Geraint Wiggins, 1998, p. 3). A Figura 2 mostra um exemplo simples de gramática. Figura 2: Exemplo de gramática. (SN: sintagma nominal, SV: sintagma verbal, A: artigo, S: substantivo, V: verbo) Assim, um segundo paradigma para a codificação do conhecimento musical são os sistemas baseados em gramática. Na realidade, sistemas baseados em conhecimento e sistemas gramaticais são muito semelhantes, uma vez que as duas abordagens são constituídas regras e focam na forma que está sendo produzida (Geraint Wiggins, 1999, p. 4). O conhecido método de análise de Schenker adota princípios gramaticais (Forte, 1983; Marsden, 2007). Em termos gerais, este método consiste em submeter uma música a uma série de reduções (e.g., progressões auxiliares e notas de passagem) até que uma estrutura elementar global ("ursatz") seja revelada. Abordagens semelhantes também são adotadas pela GTTM de Lerdahl e Jackendoff (Cambouropoulos, 1998). Neste caso, o objetivo é descrever os processos cognitivos envolvidos na música tonal, em termos de agrupamentos (com base nos princípios da Gestalt), métrica, período de tempo e estruturas redutoras. As regras de Steedman (Geraint Wiggins, 1999) são um outro sistema baseado em gramática que visa a captar estruturas musicais do jazz e de peças pop de blues de 12 compassos. Neste sistema, os processos mentais que levam à expectativa em progressões de jazz são considerados. 4.3 Aprendizagem de máquina Ao ter contato com a música, as pessoas começam a identificar naturalmente determinadas estruturas e regularidades. Se no futuro os mesmos elementos se repetirem, conexões com o material previamente aprendido irão surgir 35" NICS Reports espontaneamente. Portanto, além das abordagens anteriormente mencionadas (sistemas baseados em regras e sistemas baseados em gramática), é possível também adquirir conhecimento através de indução, ou seja, inferindo regras gerais a partir de exemplos particulares. O objetivo de sistemas que usam essa técnica, que chamamos de aprendizagem de máquina, é fazer com que o computador "aprenda" a partir de um conjunto de dados de exemplo. O sistema extrai os padrões locais usando sistemas probabilísticos. Ao gerar novas seqüência, essas probabilidades são utilizadas (Pachet, 2002a). Um caso particular de processo estocástico comumente adotado, o modelo conhecido como cadeias de Markov permite estabelecer as probabilidades de ocorrência de um estado futuro com base no estado atual. Uma das desvantagens dos modelos de Markov são a ausência de informações de longo prazo (Pachet, 2002a) e, assim, a dificuldade de capturar a estrutura geral de peças musicais. Além disso, o tamanho do contexto musical tem uma implicação direta na eficiência de algoritmos. Cadeias de Markov de baixa ordem não capturam eficientemente regras probabilísticas, enquanto que ordens superiores, apesar de capturar algumas estruturas de curto prazo (W. F. Walker, 1994), possuem um custo computacional importante (Assayag, Dubnov, & Delerue, 1999). A conhecida Suíte ILLIAC, de Hiller e Isaacson (1959) foi composta com o uso de cadeias de Markov. Xenakis usou a mesma técnica para as composições Analogique em fins dos anos 1950. Um número de sistemas mais recentes usam modelos probabilísticos para modelagem do estilo musical (Cope, 2004; Pachet, 2003; Thom, 2000a; W. Walker, Hebel, Martirano, & Scaletti, 1992) e improvisação de música interativa (Assayag, Bloch, Chemillier, Cont, & Dubnov, 2006; Pachet, 2003; Raphael, 1999; Thom, 2000b; Vercoe & Puckette, 1985), entre outras finalidades. Trivino-Rodriguez e Morales-Bueno (2001) usaram grafos de predição com atributos múltiplos para gerar novas músicas. O sistema iMe introduzido por Gimenes (2007) utiliza técnicas estocásticas para modelagem da memória e geração de música. 4.3.1 Métodos e estruturas de dados Alguns métodos e estruturas de dados têm sido freqüentemente utilizados por sistemas baseados em aprendizagem de máquina. Pachet (2002b), por exemplo, usa árvores de prefixo para armazenar uma árvore ordenada de todas as sub-seqüências (ponderada pelo seu número de ocorrências) de uma seqüência musical. Neste caso, os dados de entrada são simplificados armazenando-se reduções ao invés da seqüência inteira. No contexto de compressão sem perda, Jakob Ziv e Abraham Lempel propuseram o algoritmo de análise incremental (Shlomo Dubnov, Assayag, 36" NICS Reports Lartillot, & Bejerano, 2003) em que um dicionário de motivos é construído percorrendo-se uma seqüência de símbolos. Novas frases são adicionadas ao dicionário quando o algoritmo encontra uma seqüência que se diferencia das anteriores por um único caractere. Outra estrutura de dados é a Árvore de Previsão de Sufixo (Ron, Singer, & Tishby, 1996), que armazena cadeias de Markov de comprimento variável. Esta estrutura foi proposta por Rissanen (1983) a fim de superar as desvantagens (e.g., o crescimento de parâmetros) do modelo original de Markov. O algoritmo constrói um dicionário dos motivos que aparecem um número importante de vezes e que, portanto, são significativos para predizer o futuro imediato. Há, conseqüentemente, perda da informação original (Assayag & Dubnov, 2004, p. 1). Finalmente, uma outra estrutura, o Fator Oracle (FO) é um autômato que capta todos os fatores (sub-frases) em uma seqüência musical e uma série linear de elos de transição (ponteiros) (S. Dubnov & Assayag, 2005). Ponteiros para a frente (ou fator de links) no momento da geração permitem a reconstrução das frases originais. Os ponteiros para trás (ou links de sufixo) acompanham as outras sub-frases que compartilham o mesmo sufixo e geram recombinações baseadas no contexto do material aprendido (S. Dubnov & Assayag, 2005). O sistema OMAX (Assayag et al., 2006) usa este modelo para a aprendizagem de estilo musical e improvisação em tempo real6. 5. Processos generativos A criação musical pode ser vista como o resultado da interação entre representações do conhecimento musical e processos generativos associados a ela. Um paradigma explorado nos primórdios da Inteligência Artificial (IA) foi a composição algorítmica (Hiller & Isaacson, 1959). Outros modelos incluem a computação evolutiva, os agentes inteligentes e os modelos de inspiração biológica (e.g., vida artificial, autômatos celulares e enxames). Muitas vezes sistemas musicais complexos utilizam mais de um desses modelos. 5.1 Composição Algorítmica O uso de algoritmos na música é provavelmente tão antigo quanto a própria música. Cope afirma que seria impossível compor sem usar pelo menos alguns algoritmos: "aqueles que confiam amplamente em sua intuição para a música realmente usam algoritmos subconscientemente" (Muscutt, 2007, p. 20). Um algoritmo é simplesmente uma "receita passo a passo para alcançar um objetivo específico"; a música algorítmica, portanto, pode ser considerada 6 Uma explanação mais aprofundada sobre esses algoritmos pode ser encontrada em (Shlomo Dubnov et al., 2003) e (S. Dubnov & Assayag, 2005) 37" NICS Reports como "uma receita passo a passo para a criação de novas composições" (Muscutt, 2007, p. 10). Um exemplo famoso de composição algorítmica são os Jogos Musicais de Dados (Würfelspiel Musikalisches) atribuídos a Mozart. O processo consiste em se criar segmentos musicais que depois são utilizados em composições onde a ordem das seqüências é determinada pelo lançamento dos dados (Cope, 1991). A Suíte ILLIAC, mencionada acima, é conhecida por ter sido a primeira peça musical gerada por computador. Mesmo que os computadores não sejam um pré-requisito para a composição algorítmica, eles facilitam muito sua execução (Muscutt, 2007). Diversos sistemas musicais algorítmicos não necessariamente focam na música, mas simplesmente mapeiam ou fazem associações entre o resultado de algoritmos genéricos e parâmetros musicais. Esses sistemas devem ser diferenciados daqueles que incorporam conhecimento musical (Miranda, 2002b), de maior interesse para a musicologia cognitiva. 5.2 Computação Evolutiva As principais proposições teóricas sobre as origens e a evolução das espécies foram introduzidas durante o século XIX. Lamarck (Packard, 2007) sugeriu inicialmente que os indivíduos teriam a capacidade de se adaptar ao ambiente e que os resultados dessa adaptação poderia ser transmitida de pais para filhos. Para Darwin (1998), indivíduos com características favoráveis em relação ao seu ambiente teriam mais chances de sobreviver, se comparados a indivíduos com traços menos favoráveis. Por este motivo, após uma série de gerações, a população de indivíduos com características favoráveis cresceria e seria mais adaptada ao ambiente. Eventualmente, após diversas gerações, as diferenças seriam tão significativas que resultariam em novas espécies. As idéias que fundamentam os modelos evolutivos são a adaptação, a transmissão e a sobrevivência do mais apto. Sabemos que os genes (Mendel, 1865) permitem a transmissão de características particulares, mas a evolução "ocorre quando um processo de transformação cria variantes de algum tipo de informação. Normalmente, há um mecanismo que favorece a melhor transformação e descarta aquelas que são consideradas inferiores, de acordo com determinados critérios" (Miranda, 1999, p. 8). Um número crescente de pesquisadores está desenvolvendo modelos computacionais estudar a evolução musical. Miranda (2003) estudou as origens e a evolução da música "no contexto das convenções culturais que podem emergir sob uma série de restrições (por exemplo, psicológicas, fisiológicas e ecológicas)". Em seu sistema, uma comunidade de agentes 38" NICS Reports evolui um conjunto de melodias (canções) "após um período de criação espontânea, adaptação e reforço de memória" (Miranda et al., 2003, p. 94). Para atingir esta meta, os agentes possuem habilidades motoras, auditivas e cognitivas e evoluem vetores de parâmetros de controle motor imitando as canções uns dos outros. Todd e Werner (1999) modelaram a pressão de acasalamento seletivo nas origens do gosto musical, onde uma sociedade evolui canções de acasalamento através de "machos" compositores e "fêmeas" críticas. 5.2.1 Algoritmos genéticos Algoritmos Genéticos (AGs), um caso particular em computação evolutiva (Holland, 1992), são uma técnica de busca inspirada por alguns dos conceitos (e.g., herança, mutação, seleção) da teoria de Darwin sobre a evolução pela seleção natural. Os AGs têm sido utilizados em muitas aplicações musicais (Brown, 1999; Horowitz, 1994; Jacob, 1995; McIntyre, 1994; Moroni, Manzolli, Zuben, & Gudwin, 2000; Tokui & Iba, 2000; Weinberg, Godfrey, Rae, & Rhoads, 2007 ) em diferentes contextos, em especial para gerar material de composição e improvisação. Grosso modo, um AG envolve a geração sucessiva de populações de cromossomos que representam o domínio a ser explorado. A cada geração, a população anterior de cromossomos é transformada por um número de operadores (mutação, crossover, etc.) e uma função de aptidão avalia a adequação dos novos candidatos para uma determinada solução. De uma geração para outra, apenas os candidatos mais aptos sobrevivem (Figura 3). Figura 3: Algoritmo genético. Criar uma função aptidão adequada não é uma tarefa fácil, porém. No sistema GenJam (Biles, 1999), por exemplo, a função de aptidão é executada pelo operador humano, que avalia cada candidato recém-gerado. Esta abordagem, conhecida como Algoritmo Genético Interativo (AGI), apresenta um sério problema, pois o número de candidatos gerados é normalmente grande. No sistema Vox Populi (Moroni et al., 2000) a função de aptidão (outro AGI) é controlada em tempo real pelo usuário através de uma interface gráfica. Em 39" NICS Reports qualquer caso, a seleção dos candidatos mais aptos se baseia no julgamento (experiência musical prévia, etc.) do controlador humano. Biles (1994) define GenJam (abreviação de Genetic Jammer) como um estudante aprendendo a improvisar solos de jazz. Este sistema integra um conversor de áudio para MIDI, o que permite improvisações e "trading fours"7 em tempo real com um instrumento monofônico. Neste modo, GenJam ouve os últimos quadro compassos tocados pelo ser humano, mapeando-os para sua representação cromossômica. Em seguida os cromossomos são modificados e o resultado é tocado durante os quatro compassos seguintes (Biles, 1998, p. 1). Na realidade, a adequação dos sistemas baseados em AG para a musicologia cognitiva é muito limitada, já que de nenhuma maneira simulam o comportamento cognitivo humano. Nas palavras de Wiggins (1999, p. 12), "... eles carecem de estrutura em seu raciocínio - compositores desenvolveram métodos complexos e sutis ao longo de séculos que envolvem diferentes técnicas para resolver os problemas abordados aqui. Ninguém poderia seriamente sugerir que um autor de hinos trabalha da mesma forma que um AG, por isso, enquanto podem produzir resultados (quase) aceitáveis, não esclarecem em nada o funcionamento da mente do compositor". 5.3 Agentes Inteligentes Agentes inteligentes (Figura 4), também conhecidos como agentes racionais, autônomos ou de software (Jones, 2008; Russell & Norvig, 2002), são sistemas adaptativos que residem em um ambiente dinâmico e complexo em que sentem e agem de forma autônoma executando uma série de tarefas, a fim de atingir os objetivos para os quais foram concebidos (Maes, 1991; Russell & Norvig, 2002). 7 Modo de execução no jazz em que os músicos se revezam improvisando trechos de quatro compassos. 40" NICS Reports Figura 4: Arquitetura de um agente Miranda e Todd (2003) identificaram três abordagens para a construção de sistemas baseados em agentes para a composição: (i) a conversão de comportamento extra-musical em som, (ii) algoritmo de inspiração genética e (iii) sistemas culturais. Uma perspectiva em que os agentes não necessariamente realizam tarefas musicais exemplifica o primeiro caso, em que alguns aspectos do seu comportamento (como se movimentar de um espaço definido, etc.) é mapeado para o som. A interação afeta o comportamento dos agentes e as músicas que eles produzem, mas essa música, por outro lado, não necessariamente afeta o seu comportamento. Na segunda abordagem (algoritmo de inspiração genética), os agentes reproduzem artificialmente os mecanismos da teoria da evolução pela seleção natural de Darwin. A sobrevivência e a reprodução dos agentes dependem da música que eles produzem e o sistema como um todo tenderia a produzir mais "músicas de sucesso". A terceira e última abordagem utiliza agentes virtuais e processos de autoorganização para modelar sistemas culturais onde mecanismos de reforço evoluem as habilidades dos agentes. Apenas esta última abordagem permitiria o "estudo das circunstâncias e dos mecanismos pelos quais a música teria surgido e evoluído em comunidades virtuais de músicos e ouvintes" (2003, p. 1). Sistemas em que atuam mais de um agente são chamados de multi-agentes (Wulfhorst, Nakayama, & Vicari, 2003) e muitas vezes usados em simulações de interações sociais. Para Miranda (1999, p. 5), a linguagem e a música devem ser encaradas como um fenômeno cultural que emerge de interações sociais e não como um recurso completo que surge no nascimento de um bebê. Em um mesmo sistema podem existir diversos tipos de agentes, cada um especializado em uma habilidade específica, como o que ocorre na Sociedade da Mente proposta por Minsky (1988). Outra possibilidade é que todos possuam as mesmas habilidades. No campo musical, Cypher (Rowe, 2004) adota a primeira abordagem, enquanto os músicos virtuais de Miranda (2002b) e os agentes de iMe (Gimenes & Miranda, 2008), a segunda. Além dos sistemas acima mencionados, muitos outros sistemas são baseados em agentes. OMAX, por exemplo, modela uma topologia de agentes interativos com foco em diferentes habilidades (ouvintes, fatiadores, alunos, etc.) (Assayag et al., 2006). Frank (Casal & Morelli) usa técnicas de MPEG7 e genéticas de co-evolução, juntamente com agentes artificiais em performances ao vivo. Wulfhorst et al (2003) apresentaram uma arquitetura genérica de um sistema multi-agentes que interagem com músicos humanos. Impett (2001) 41" NICS Reports utiliza um sistema para gerar composições musicais onde os agentes se adaptam às mudanças do ambiente em que residem. Pachet (2000) usa agentes em um contexto evolutivo para emergir formas de ritmo em simulações em tempo real. O modelo mimético de Miranda (2002b) utiliza agentes inteligentes para incorporar mecanismos de evolução musical. Mimese seria a habilidade de imitar as ações de outras pessoas e animais. A hipótese é de esta característica seria uma das chaves para o surgimento da música em uma sociedade virtual (Miranda, 2002b, p. 79). Todos os agentes são capazes de ouvir e produzir sons (sintetizador vocal), além de guardar associações entre os parâmetros motores e perceptivos (memória). Como são programados para imitar uns aos outros, após algum tempo, um repertório compartilhado de melodias é criado. 5.4 Modelos biologicamente inspirados Além dos algoritmos genéticos e dos agentes, outros modelos, tais como a vida artificial (a-life), autômatos celulares e enxames procuram inspiração em fenômenos biológicos para abordar a criatividade musical. 5.4.1 Os modelos da vida artificial Os sistemas baseados na vida artificial tentam replicar fenômenos biológicos através de simulações em computador (Miranda, 2003) e lidar com conceitos (e.g., as origens dos organismos vivos, o comportamento emergente e autoorganização) que buscam esclarecer a gênese e a evolução da música. Miranda e Todd (2003, p. 6) observam que talvez a aplicação mais interessante de técnicas de a-life "seja o estudo das circunstâncias e dos mecanismos pelos quais a música teria surgido e evoluído em mundos artificiais habitado por comunidades virtuais de músicos e ouvintes". Alguns estudiosos têm abordado esta questão ao longo da história (Thomas, 1995; Wallin, Merker, & Brown, 2000), embora modelos computacionais não tenham sido freqüentemente utilizados para a validação teórica. 5.4.2 Autômatos celulares Autômatos celulares consistem em uma rede multidimensional de células, cada uma possuindo um estado em um determinado momento, de uma série de estados possíveis. Uma função determina a evolução destes estados em passos de tempo discretos. 42" NICS Reports Figura 5: Autômato celular. No conhecido Jogo da Vida de Conway, por exemplo, o estado das células (viva ou morta) é determinado pelo estado de seus vizinhos. Em cada ciclo de tempo todas as células são avaliadas e seus estados alterados de acordo com um conjunto de regras (Tabela 1). A configuração inicial das células afeta a dinâmica do sistema e pode permitir a emergência de comportamentos interessantes, especialmente no domínio visual. Tempo t Condição Tempo t + 1 morta 3 vizinhos vivos viva viva 4 ou + vizinhos vivos morta viva 1 ou 0 vizinhos vivos morta viva 2 ou 3 vizinhos vivos viva Tabela 1: Regras de evolução do Jogo da Vida de Conway. Diversos sistemas têm usado autômatos celulares. Chaosynth (Miranda, 2002a) usa essa técnica para controlar um sintetizador de áudio de síntese granular. Camus (Miranda, 2001) utiliza dois autômatos celulares simultâneos o Jogo da Vida e o Demon Cyclic Space (Griffeath & Moore, 2003) - para gerar estruturas musicais (acordes, melodias, etc.). 5.4.3 Enxames Os elementos individuais de um sistema auto-organizado podem se comunicar uns com os outros e modificar seu meio ambiente através de um método conhecido como estigmergia. O comportamento de cada elemento do sistema não é suficiente para determinar a organização do sistema como um todo. Por outro lado, esta organização resulta (emerge) daquele comportamento. Esse fenômeno ocorre, por exemplo, com enxames de abelhas e bandos de pássaros. Como Swarm Granulador (Blackwell, 2006) é um sistemas que se baseiam no conceito de enxames. Um ser humano toca um instrumento musical, o que produz atratores em torno dos quais gravitam partículas artificiais. As regras seguidas por estas partículas são simples e envolvem os conceitos de coesão ("se separados, aproximem-se"), separação ("se muito pertos, afastem-se") e 43" NICS Reports alinhamento ("tentativa de igualar as velocidades") (Blackwell, 2006, p. 4). O comportamento do sistema, que é mapeado em parâmetros de som, emerge destas interações entre as partículas. 6. Ambientes Interativos Musicais O último dos sistemas abordados neste artigo, os Ambientes Interativos Musicais (Musical Interactive Environments - iMe) adota diversas das técnicas mencionadas acima. Trata-se de um sistema interativo musical que tem como objetivo principal explorar a evolução da música tendo como referência a transmissão de memes (estruturas) musicais e, conseqüentemente, as faculdades perceptivas e cognitivas dos seres humanos. Este sistema segue as condições do Modelo Ontomemético de Evolução Musical (Ontomemetical Model of Music Evolution - OMME), que se baseia nas noções de ontogênese e de memética8. No sistema iMe, especialmente concebido para abordar a interatividade sob um ponto de vista improvisacional, agentes executam atividades inspiradas no mundo real (ouvir, executar, praticar, improvisar-solo, improvisar-grupo, ler e compor música) e se comunicam entre si e com o mundo exterior. O resultado dessas atividades é que a memória dos agentes é constantemente alterada e, conseqüentemente, seus estilos musicais evoluem. O sistema utiliza o protocolo de comunicação MIDI para a troca de mensagens entre os agentes e entre estes e o mundo exterior, a partir do qual os agentes extraem a representação musical simbólica necessária para as interações. Esta representação possui paralelos com os modelos perceptivos e cognitivos humanos, ou seja, com a forma como os sons são captados pelos ouvidos, processados e armazenados pela memória (Snyder, 2000). Uma série de filtros equipam os "ouvidos" dos agentes e são responsáveis pela extração de características particulares do fluxo sonoro, tais como o aumento e/ou a diminuição da freqüência sonora (direção da melodia) ou a densidade musical (número simultâneo de notas). A segmentação implementada no iMe inspira-se em princípios da psicologia Gestalt. Em linhas gerais, o algoritmo de segmentação simula o fenômeno da habituação, ou seja, dado que um sinal (determinada característica do fluxo sonoro) permanece estável durante algum tempo, o seu interesse (atenção) decai. Enquanto os agentes percebem o fluxo sonoro, a repetição do mesmo sinal resulta em uma falta de interesse enquanto que uma mudança de comportamento desse sinal, depois de um certo número de repetições, desperta sua atenção. 8 Maiores detalhes sobre esse modelo podem ser encontrados em (Gimenes, 2010). 44" NICS Reports Em (Gimenes, 2009) são descritas com detalhe algumas das possibilidades do OMME implementadas pelo sistema iMe, especialmente na área da musicologia cognitiva. É possível, por exemplo que, em um determinado cenário, um agente ouça uma peça e um outro agente ouça uma outra peça. Ao final da simulação, a diferença do conhecimento musical dos agentes irá corresponder à diferença dos estilos musicais entre as peças. Uma outra área explorada pelo sistema iMe é a criatividade musical visando, mais especificamente, contribuir para a construção da "musicalidade das máquinas" e para a interação entre máquinas e seres humanos. Uma performance pública foi realizada durante o Peninsula Arts Contemporary Music Festival em fevereiro de 2008 na Universidade de Plymouth (Reino Unido) onde essa possibilidade foi demonstrada. 7. Conclusão Este artigo apresentou o estado da arte dos modelos computacionais com aplicações para a musicologia cognitiva. De várias maneiras, estes modelos procuram modelar como os seres humanos lidam com questões como percepção, representação do conhecimento, aprendizagem, criatividade e raciocínio. Diversas abordagens foram apresentadas. Sistemas baseados em regras encapsulam o conhecimento especialista humano através de regras explícitas, enquanto que os sistemas baseados em gramática definem um conjunto finito de regras que descrevem a estrutura desse conhecimento. Sistemas que usam aprendizagem de máquina, por outro lado, tentam reproduzir os processos de aquisição do conhecimento humano. Além desses, modelos baseados na computação evolutiva (e.g., algoritmos genéticos) e na vida artificial tentam replicar fenômenos biológicos através de simulações computacionais, e analisam temas como as origens dos organismos vivos, o comportamento emergente e a auto-organização. Como vimos, alguns desses modelos se preocupam em implementar teorias que versam sobre a cognição humana e, portanto, interessam diretamente à musicologia cognitiva. Outros, contudo, são mais voltados a um resultado sonoro do que propriamente à descrição desses modelos. Estes últimos foram incluídos pelo interesse que despertam e por possuírem muitos paralelos com os primeiros. Bibliografia Assayag, G., Bloch, G., Chemillier, M., Cont, A., & Dubnov, S. (2006). Omax Brothers: a Dynamic Topology of Agents for Improvisation Learning. Workshop on Audio and Music Computing for Multimedia, Santa Barbara, EUA. 45" NICS Reports Assayag, G., & Dubnov, S. (2004). Using Factor Oracles for machine Improvisation. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 8(9), 604-610. Assayag, G., Dubnov, S., & Delerue, O. (1999). Guessing the Composer's Mind: Applying Universal Prediction to Musical Style. International Computer Music Conference, Beijing, China. Baker, M. (1989a). An Artificial Intelligence approach to musical grouping analysis. Contemporary Music Review, 3(1), 43-68. Baker, M. (1989b). A cognitive model for the perception of musical grouping structures. Contemporary Music Review(Music and the Cognitive Sciences). Biles, J. A. (1994). GenJam: A Genetic Algorithm for Generating Jazz Solos. International Computer Music Conference, Aarhus, Denmark. Biles, J. A. (1998). Interactive GenJam: Integrating Real-Time Performance with a Genetic Algorithm. International Computer Music Conference, Univ. of Michigan, Ann Arbor, EUA. Biles, J. A. (1999). Life with GenJam: Interacting with a Musical IGA. International Conference on Systems, Man, and Cybernetics, Tokyo, Japan. Blackwell, T. (2006). Swarming and Music. In E. Miranda & J. A. Biles (Org.), Evolutionary Computer Music. London: Springer. Bod, R. (2001). A Memory-Based Model For Music Analysis: Challenging The Gestalt Principles. International Computer Music Conference, Havana, Cuba. Brown, C. (1999). Talking Drum: A Local Area Network Music Installation. Leonardo Music Journal, 9, 23-28. Cambouropoulos, E. (1998). Towards a General Computational Theory of Musical Structure. University of Edinburgh, Edinburgh. Camilleri, L., Carreras, F., & Duranti, C. (1990). An Expert System Prototype for the Study of Musical Segmentation. Interface, 19(2-3), 147-154. Casal, D. P., & Morelli, D. (2007). Remembering the future: towards an application of genetic co-evolution in music improvisation. MusicAL Workshop, European Conference on Artificial Life, Lisboa, Portugal. Chouvel, J. M. (1990). Musical Form: From a Model of Hearing to an Analytic Procedure. Interface, 22, 99-117. Cope, D. (1991). Recombinant Music: Using the Computer to Explore Musical Style. Computer, 24(7), 22-28. Cope, D. (1999). One approach to musical intelligence. IEEE Intelligent Systems, 14(3), 21-25. 46" NICS Reports Cope, D. (2004). A Musical Learning Algorithm. Computer Music Journal, 28(3), 12-27. Darwin, C. (1998). The Origin of Species (new ed.): Wordsworth Editions Ltd. Deutsch, D. (1982a). Grouping Mechanisms in Music. In D. Deutsch (Org.), The Psychology of Music. Nova York: Academic Press. Deutsch, D. (1982b). The Processing of Pitch Combinations. In D. Deutsch (Org.), The Psychology of Music. Nova York: Academic Press. Dubnov, S., & Assayag, G. (2005). Improvisation Planning And Jam Session Design Using Concepts Of Sequence Variation And Flow Experience. Sound and Music Computing, Salerno, Italia. Dubnov, S., Assayag, G., Lartillot, O., & Bejerano, G. (2003). Using MachineLearning Methods for Musical Style Modeling. IEEE Computer, 10(38), 73-80. Ebcioglu, K. (1988). An expert system for harmonizing four-part chorales. Computer Music Journal, 12(3), 43-51. Forte, A. (1983). Introduction to Schenkerian Analysis: Form and Content in Tonal Music: W. W. Norton & Company. Gimenes, M. (2010). A Ontomemética e a Evolução Musical. VI Simpósio de Cognição e Artes Musicais, Rio de Janeiro. Gimenes, M., & Miranda, E. (2008). An A-Life Approach to Machine Learning of Musical Worldviews for Improvisation Systems. 5th Sound and Music Computing Conference, Berlin, Germany. Gimenes, M., Miranda, E., & Johnson, C. (2007). The Emergent Musical Environments: An Artificial Life Approach. Workshop on Music and Artificial Life (ECAL), Lisboa, Portugal. Griffeath, D., & Moore, C. (2003). New Directions in Cellular Automata: Oxford University Press. Hasty, C. F. (1978). A theory of segmentation developed from late works of Stefan Wolpe. Yale University. Hiller, L., & Isaacson, L. (1959). Experimental Music. Nova York: McGraw-Hill. Holland, J. H. (1992). Adaptation in Natural and Artificial Systems - An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence: The MIT Press. Honing, H. (2006). On the growing role of observation, formalization and experimental method in musicology. Empirical Musicology Review, 1(1), 2-6. Horowitz, D. (1994). Generating Rhythms with Genetic Algorithms. International Computer Music Conference, Aarhus, Denmark. 47" NICS Reports Huron, D. (Producer). (1999, 27/03/2007) The 1999 Ernest Bloch Lectures. Lecture 1 - Music and Mind: Foundations of Cognitive Musicology. retrieved from http://www.music-cog.ohiostate.edu/Music220/Bloch.lectures/1.Preamble.html Impett, J. (2001). Interaction, simulation and invention: a model for interactive music. Workshop on Artificial Models for Musical Applications, Cosenza, Italia. Jacob, B. L. (1995). Composing with genetic algorithms. International Computer Music Conference, Banff Centre for the Arts, Canada. Jones, M. T. (2008). Artificial Intelligence - A System Approach. Hingham, Massachusetts: Infinity Science Press. Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music. Cambridge, Mass.: MIT Press. Maes, P. (1991). Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back: MIT Press. Marsden, A. (2007). Automatic Derivation Of Musical Structure: A Tool For Research On Schenkerian Analysis. International Conference on Music Information Retrieval, Viena, Austria. Martins, J., Gimenes, M., Manzolli, J., & Maia Jr, A. (2005). Similarity Measures for Rhythmic Sequences. Simpósio Brasileiro de Computação Musical, Belo Horizonte, Brasil. McAdams, S. (1984). The auditory Image: A metaphor for musical and psychological research on auditory organisation. In W. R. Crozier & A. J. Chapman (Org.), Cognitive Processes in the Perception of Art. Amsterdam: North-Holland Press. McIntyre, R. A. (1994). Bach in a box: the evolution of four part Baroque harmony using thegenetic algorithm. IEEE World Congress on Computational Intelligence, Orlando, EUA. Mendel, G. (1865). Experiments on Plant Hybridization. Paper presented at the Meetings of the Natural History Society of Brünn. Retrieved from http://www.mendelweb.org/Mendel.html Minsky, M. (1988). The Society of Mind: Pocket Books. Miranda, E. (1999). The artificial life route to the origins of music. Scientia, 10(1), 5-33. Miranda, E. (2001). Composing music with computers. Oxford: Focal Press. Miranda, E. (2002a). Computer sound design: synthesis techniques and programming (2nd ed.). Oxford: Focal Press. 48" NICS Reports Miranda, E. (2002b). Emergent Sound Repertoires in Virtual Societies. Computer Music Journal, 26(2), 77-90. Miranda, E. (2003). On the evolution of music in a society of self-taught digital creatures. Digital Creativity, 14(1), 29-42. Miranda, E., Kirby, S., & Todd, P. (2003). On Computational Models of the Evolution fo Music: From the Origins of Musical Taste to the Emergence of Grammars. Contemporary Music Review, 22(2), 91-111. Miranda, E., & Todd, P. M. (2003). A-Life and Musical Composition: A Brief Survey. Simpósio Brasileiro de Computação Musical, Campinas, Brasil. Moroni, A., Manzolli, J., Zuben, F. V., & Gudwin, R. (2000). Vox Populi: An Interactive Evolutionary System for Algorithmic Music Composition. Leonardo Music Journal, 10, 49-54. Muscutt, K. (2007). Composing with Algorithms An Interview with David Cope. Computer Music Journal, 31(3), 10-22. Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures: University Of Chicago Press. Pachet, F. (1994). The MusES system: an environment for experimenting with knowledge representation techniques in tonal harmony. Simpósio Brasileiro de Computação Musical, Baxambu, Brasil. Pachet, F. (1998). Sur la structure algebrique des sequences d'accords de Jazz. Journees d'Informatique Musicale, Agelonde, France. Pachet, F. (2000). Rhythms as emerging structures. International Computer Music Conference, Berlin, Germany. Pachet, F. (2002a). The continuator: Musical interaction with style. International Computer Music Conference, Gothenburg, Sweden. Pachet, F. (2002b). Interacting with a Musical Learning System: The Continuator. In C. Anagnostopoulou, M. Ferrand & A. Smaill (Org.), Music and Artificial Intelligence, Lecture Notes in Artificial Intelligence (Vol. 2445, pp. 119132): Springer Verlag. Pachet, F. (2003). The Continuator: Musical Interaction With Style. Journal of New Music Research, 32(3), 333-341. Packard, A. S. (2007). Lamarck, the Founder of Evolution: His Life and Work: Dodo Press. Polansky, L. (1978). A hierarchical gestalt analysis of Ruggle's Portals. International Computer Music Conference, Evanston, EUA. 49" NICS Reports Raphael, C. (1999). A Probabilistic Expert System for Automatic Musical Accompaniment. Journal of Computational and Graphical Statistics, 10(3), 487512. Rissanen, J. (1983). A universal data compression system. IEEE Transactions on Information Theory, 29(5), 656-664. Ron, D., Singer, Y., & Tishby, N. (1996). The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length. Machine Learning, 25, 117-149. Rowe, R. (2004). Machine Musicianship. Cambridge, MA: MIT Press. Russell, S. J., & Norvig, P. (2002). Artificial Intelligence: A Modern Approach: Prentice Hall. Snyder, B. (2000). Music and Memory: An Introduction. Cambridge, MA: MIT Press. Temperley, D. (2004). Cognition of Basic Musical Structures: The MIT Press. Tenney, J., & Polansky, L. (1980). Temporal Gestalt Perception in Music. Journal of Music Theory, 24(2), 205-241. Thom, B. (2000a). BoB: an Interactive Improvisational Music Companion. International Conference on Autonomous Agents, Barcelona, Spain. Thom, B. (2000b). Unsupervised Learning and Interactive Jazz/Blues Improvisation. Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence. Thom, B., Spevak, C., & Hothker, K. (2002). Melodic segmentation: evaluating the performance of algorithms and musical experts. International Computer Music Conference, Gothenburg, Sweden. Thomas, D. A. (1995). Music and the Origins of Language. Cambridge: Cambridge University Press. Todd, P. M., & Werner, G. M. (1999). Frankensteinian Methods for Evolutionary Music Composition. In N. Griffith & P. M. Todd (Org.), Musical networks: Parallel distributed perception and performance. Cambridge, MA: MIT Press/Bradford Books. Tokui, N., & Iba, H. (2000). Music Composition with Interactive Evolutionary Computation. International Generative Art, Milan, Italia. Trivino-Rodriguez, J. L., & Morales-Bueno, R. (2001). Using Multiattribute Prediction Suffix Graphs to Predict and Generate Music. Computer Music Journal, 25(3), 62-79. 50" NICS Reports Vercoe, B., & Puckette, M. (1985). The synthetic rehearsal: Training the synthetic performer. International Computer Music Conference, Vancouver, Canada. Walker, W., Hebel, K., Martirano, S., & Scaletti, C. (1992). ImprovisationBuilder: improvisation as conversation. International Computer Music Conference, San Jose State University, EUA Walker, W. F. (1994). A Conversation-Based Framework For Musical Improvisation. University of Illinois. Wallin, N. J., Merker, B., & Brown, S. (Eds.). (2000). The Origins of Music. Cambridge, MA: MIT Press. Weinberg, G., Godfrey, M., Rae, A., & Rhoads, J. (2007). A Real-Time Genetic Algorithm In Human-Robot Musical Improvisation. CMMR, Copenhagen. Wiggins, G. (1998). Music, syntax and the meaning of "meaning". First Symposium on Music and Computers, Corfu, Greece. Wiggins, G. (1999). Automated generation of musical harmony: what's missing? International Joint Conference on Artificial Intelligence. Wiggins, G., Papadopoulos, A., Phon-Amnuaisuk, S., & Tuson, A. (1999). Evolutionary Methods for Musical Composition. International Journal of Computing Anticipatory Systems. Wiggins, G., & Smail, A. (2000). Musical Knowledge: what can Artificial Intelligence bring to the musician? Readings in Music and Artificial Intelligence (pp. 29-46). Woods, W. (1970). Transition Network Grammars for Natural Language Analysis. Communications of the ACM, 13(10), 591-606. Wulfhorst, R. D., Nakayama, L., & Vicari, R. M. (2003). A Multiagent approach for Musical Interactive Systems. International Joint Conference on Autonomous Agents and Multiagent Systems, Nova York, EUA. 51" NICS Reports 3. An a-life approach to machine learning of musical worldviews for improvisation systems9 Marcelo Gimenes Interdisciplinary Centre for Computer Music Research University of Plymouth, UK [email protected] Eduardo R. Miranda Interdisciplinary Centre for Computer Music Research University of Plymouth, UK [email protected] Abstract: In this paper we introduce Interactive Musical Environments (iMe), an interactive intelligent music system based on software agents that is capable of learning how to generate music autonomously and in real-time. iMe belongs to a new paradigm of interactive musical systems that we call “ontomemetical musical systems” for which a series of conditions are proposed. 1. Introduction Tools and techniques associated with Artificial Life (A-Life), a discipline that studies natural living systems by simulating their biological occurrence on computers, are an interesting paradigm that deals with extremely complex phenomena. Actually, the attempt to mimic biological events on computers is proving to be a viable route for a better theoretical understanding of living organisms [1]. We have adopted an A-Life approach to intelligent systems design in order to develop a system called iMe (Interactive Music Environment) whereby autonomous software agents perceive and are influenced by the music they hear and produce. Whereas most A-Life approaches to implementing computer music systems are chiefly based on algorithms inspired by biological development and evolution (for example, Genetic Algorithms [2]), iMe is based on cultural development (for example, Imitation Games [3, 4]). Central to iMe are the notions of musical style and musical worldview. Style, according to a famous definition proposed by Meyer, is “a replication of patterning, whether in human behaviour or in the artefacts produced by human behaviour, that results from a series of choices made within some set of constraints” [5]. Patterning implies the sensitive perception of the world and its 9 Referência original deste trabalho: Gimenes, M. and E. Miranda (2008). An A-Life Approach to Machine Learning of Musical Worldviews for Improvisation Systems. 5th Sound and Music Computing Conference, Berlin, Germany. 52" NICS Reports categorisation into forms and classes of forms through cognitive activity, “the mental action or process of acquiring knowledge and understanding through thought, experience and the senses” (Oxford Dictionary). Worldview, according to Park [6], is “the collective interpretation of and response to the natural and cultural environments in which a group of people lives. Their assumptions about those environments and the values derived from those assumptions.” Through their worldview people are connected to the world, absorbing and exercising influence, communicating and interacting with it. Hence, a musical worldview is a two-way route that connects individuals with their musical environment. In our research we want to tackle the issue of how different musical influences can lead to particular musical worldviews. We therefore developed a computer system that simulates environments where software agents interact among themselves as well as with external agents, such as other systems and humans. iMe's general characteristics were inspired in the real world: agents perform musical tasks for which they possess perceptive and cognitive abilities. Generally speaking, agents perceive and are influenced by music. This influence is transmitted to other agents as long as they generate new music that is then perceived by other agents, and so forth. iMe enables the design and/or observation of chains of musical influence similarly to what happens with human musical apprenticeship. The system addresses the perceptive and cognitive issues involved in musical influence. It is precisely the description of a certain number of musical elements and the balance between them (differences of relative importance) that define a musical style or, as we prefer to call it, a musical worldview: the musical aesthetics of an individual or of a group of like-minded individuals (both, artificial and natural). iMe is referred to as an ontomemetic computer music system. In Philosophy of Science, ontogenesis refers to the sequence of events involved in the development of an individual organism from its birth to its death. However, our research is concerned with the development of cultural organisms rather than biological organisms. We therefore coined the term “ontomemetic” by replacing the affix “genetic” by the term “memetic”. The notion of “meme” was suggested by Dawkins [7] as the cultural equivalent of gene in Biology. Musical ontomemesis therefore refers to the sequence of events involved in the development of the musicality of an individual. An ontomemetic musical system should foster interaction between entities and, at the same time, allow for the observation of how different paths of development can lead to different musical worldviews. Modelling perception and cognition abilities plays an important role in our system, as we believe that the way in which music is perceived and organized in our memory has direct 53" NICS Reports connections with the music we make and appreciate. The more we get exposed to certain types of elements, the more these elements get meaningful representations in our memory. The result of this exposure and interaction is that our memory is constantly changing, with new elements being added and old elements being forgotten. Despite the existence of excellent systems that can learn to simulate musical styles [8] or interact with human performers in real-time ([9-11]), none of them address the problem from the ontomemetic point of view, i.e.: • to model perceptive and cognitive abilities in artificial entities based on their human correlatives • to foster interaction between these entities as to nurture the emergence of new musical worldviews • to model interactivity as ways through which reciprocal actions or influences are established • to provide mechanisms to objectively compare different paths and worldviews in order to assess their impact in the evolution of a musical style. An ontomemetic musical system should be able to develop its own style. This means that we should not rely on a fixed set of rules that restrain the musical experience to particular styles. Rather, we should create mechanisms through which musical style could eventually emerge from scratch. In iMe, software entities (or agents) are programmed with identical abilities. Nevertheless, different modes of interactions give rise to different worldviews. The developmental path, that is the order in which the events involved in the development of a worldview takes place, plays a crucial role here. Paths are preserved in order to be reviewed and compared with other developmental paths and worldviews. A fundamental requisite of an ontomemetic system is to provide mechanisms to objectively compare different paths and worldviews in order to assess the impact that different developmental paths might have had in the evolution of a style. This is not trivial to implement. 1.1 Improvisation Before we introduce the details of iMe, a short discussion about musical improvisation will help to better contextualise our system. Not surprisingly, improvised music seems to be a preferred field when it comes to the application of interactivity, and many systems have been implemented focusing on controllers and sound synthesis systems designed to be operated during performance. The interest in exploring this area, under the point of view of an ontomemetic musical system relies on the fact that, because of the intrinsic characteristics of improvisation, it is intimately connected with the ways human 54" NICS Reports learning operates. However, not many improvisation to date are able to learn. systems produced for music According to a traditional definition, musical improvisation is the spontaneous creative process of making music while it is being performed. It is like speaking or having a conversation as opposed to reciting a written text. As it encompasses musical performance, it is natural to observe that improvisation has a direct connection with performance related issues such as instrument design and technique. Considering the universe of musical elements played by improvisers, it is known that certain musical ideas are more adapted to be played with polyphonic (e.g., piano, guitar) as opposed to monophonic instruments (e.g., saxophone, flute) or with keyboards as opposed to wind instruments, and so forth. Since instrument design and technique affect the easiness or difficulty of performing certain musical ideas, we deduce that different musical elements must affect the cognition of different players in different ways. The technical or “performance part” of a musical improvisation is, at the same time, passionate and extremely complex but, although we acknowledge the importance of its role in defining one's musical worldview, our research (and this paper) is focused primarily on how: (i) music is perceived by the sensory organs, (ii) represented in memory and (iii) the resulting cognitive processes relevant to musical creation in general (and more specifically, to improvisation) conveys the emergence and development of musical worldviews. Regarding specifically the creative issue, it is important to remember that improvisation, at least in its most generalised form, follows a protocol that consists of developing musical ideas “on top” of pre-existing schemes. In general, these include a musical theme that comprises, among other elements, melody and harmonic structure. Therefore, in this particular case, which happens to be the most common, one does not need to create specific strategies for each individual improvisational session but rather follow the generally accepted protocol. Despite of the fact that this may give the impression to be limiting the system, preventing the use of more complex compositional strategies, one of the major interests of research into music improvisation relies on the fact that once a musical idea has been played, one cannot erase it. Therefore, each individual idea is an “imposition” in itself that requires completion that leads to other ideas, which themselves require completion, and so on. Newly played elements complete and re-signify previous ones in such ways that the improviser's musical worldview is revealed. In this continuous process two concurrent and different plans play inter-dependent roles: a pathway (the “lead sheet”) to which the generated ideas have to adapt and the “flow of musical ideas” that is 55" NICS Reports particular to each individual at each given moment and that imply (once more) their musical worldview. The general concepts introduced so far are all an integral part of iMe and will be further clarified as we introduce the system. 2. The iMe System iMe was conceived to be a platform in which software agents perform music related tasks that convey musical influence and emerge their particular styles. Tasks such as read, listen, perform, compose and improvise have already been implemented; a number of others are planned for the future. In a multi-agent environment one can design different developmental paths by controlling how and when different agents interact; a hypothetical example is shown in Fig. 1. Fig. 1. The developmental paths of two agents. In the previous figure we see the representation of a hypothetical timeline during which two agents (Agent 'A' and Agent 'B') perform a number of tasks. Initially, Agent 'A' would listen to one piece of music previously present in the environment. After that, Agent 'B' would listen to 4 pieces of music and so forth until one of them, Agent 'A' would start to compose its own pieces. From this moment Agent 'B' would listen to the pieces composed by Agent 'A' until Agent 'B' itself would start to compose and then Agent 'A' would interact with Agent 'B's music as well. In general, software agents should normally act autonomously and decide if and when to interact. Nevertheless, in the current implementation of iMe we decided to constrain their skills in order to have a better control over the development of their musical styles: agents can choose which music they interact with but not how many times or when they interact. When agents perform composition or improvisation tasks, new pieces are delivered to the environment and can be used for further interactions. On the other hand, by performing tasks such as read or listen to music, agents only receive influence. Interaction can be established not only amongst the agents themselves, but also between agents and human musicians. The main outcome of these 56" NICS Reports interactions is the emergence and development of the agents' musical styles as well as the musical style of the environment as a whole. The current implementation of iMe's perceptive algorithms was specially designed to take into account a genre of music texture (homophonic) in which one voice (the melody) is distinguishable from the accompanying harmony. In the case of the piano for instance, the player would be using the left hand to play a series of chords while the right hand would be playing the melodic line. iMe addresses this genre of music but also accepts music that could be considered a subset of it; e.g., a series of chords, a single melody or any combination of the two. Any music that fits into these categories should generate an optimal response by the system. However, we are also experimenting with other types of polyphonic music with a view on widening the scope of the system. In a very basic scenario, simulations can be designed by simply specifying: • A number of agents • A number of tasks for each agent • Some initial music material for the interactions iMe generates a series of consecutive numbers that correspond to an abstract time control (cycle). Once the system is started, each cycle number is sent to the agents, which then execute the tasks that were scheduled to that particular cycle. As a general rule, when an agent chooses a piece of music to read (in the form of a MIDI file) or is connected to another agent to listen to its music, it receives a data stream which is initially decomposed into several feature streams, and then segmented as described in the next section. 2.1 System's Perception and Memory iMe's perception and memory mechanisms are greatly inspired by the work of Snyder [12] on musical memories. According to Snyder, “the organisation of memory and the limits of our ability to remember have a profound effect on how we perceive patterns of events and boundaries in time. Memory influences how we decide when groups of events end and other groups of events begin, and how these events are related. It also allows us to comprehend time sequences of events in their totality, and to have expectations about what will happen next. Thus, in music that has communication as its goal, the structure of the music must take into consideration the structure of memory - even if we want to work against that structure”. iMe's agents initially “hear” music and subsequently use a number of filters to extract independent but interconnected streams of data, such as melodic 57" NICS Reports direction, melodic inter-onset intervals, and so on. This results in a feature data stream that is used for the purposes of segmentation, storage (memory) and style definition (Fig. 2). Fig. 2. Feature extraction and segmentation. To date we have implemented ten filters, which extract information from melodic (direction, leap, inter-onset interval, duration and intensity) and non-melodic notes (vertical number of notes, note intervals from the melody, inter-onset interval, duration and intensity). As it might be expected, the higher the number of filters, the more accurate is the representation of the music. In order to help clarify these concepts, in Fig. 3 we present a simple example and give the corresponding feature data streams that would have been extracted by an agent, using the ten filters: 1 2 3 4 5 6 7 8 9 10 11 ... a) 0 1 1 1 1 -1 -1 -1 1 1 1 ... b) 0 2 2 1 2 2 1 2 2 1 2 ... c) 120 120 120 120 120 120 120 120 120 120 120 ... d) 120 120 120 120 120 120 120 120 120 120 120 ... e) 6 6 6 6 6 6 6 6 6 6 6 ... f) 2 0 0 0 0 0 0 0 2 0 0 ... g) 5, 7 -2 -2 -2 -2 -2 -2 -2 7, 9 -2 -2 ... h) 120 -2 -2 -2 -2 -2 -2 -2 120 -2 -2 ... i) 960 -2 -2 -2 -2 -2 -2 -2 960 -2 -2 ... 58" NICS Reports j) 6 -2 -2 -2 -2 -2 -2 -2 6 -2 -2 ... Fig. 3. Feature streams, where a) melody direction, b) melody leap, c) melody interonset interval, d) melody duration, e) melody intensity, f) non melody number of notes, g) non melody note intervals from melody, h) non melody interonset interval, i) non melody duration, j) non melody intensity. Number -2 represents the absence of data in a particular stream. Melody direction can value -1, 0 and 1, meaning descending, lack of and ascending movement, respectively. Melody leaps and intervals are shown in half steps. In streams that hold time information (interonset intervals and duration) the value 240 (time resolution) is assigned to quarter notes. Intensity is represented by the MIDI range (0 to 127); in Fig. 3 this was simplified by dividing this value by ten. After the extraction of the feature data stream, the next step is the segmentation of the music. A fair amount of research has been conducted on this subject by a number of scholars. In general, the issue of music segmentation remains unsolved to a great extent due to its complexity. One of the paradigms that substantiate segmentation systems has been settled by Gestalt psychologists who argued that perception is driven from the whole to the parts by the application of concepts that involve simplicity and uniformity in organising perceptual information [13]. Proximity, closure, similarity and good continuation are some of these concepts. Fig. 4 shows a possible segment from piece by J. S. Bach (First Invention for Two Voices) according to Gestalt theory. In this case the same time length separates all except for the first and the last notes, which are disconnected from the previous and the following notes by rests. This implies the application of similarity and proximity rules. Fig. 4. An example of a music segment. In the example discussed below we decided to build the segmentation algorithm on top of only one of the principles that guide group organization: the occurrence of surprise. As the agents perceive the continuous musical stream by the various expert sensors (filters), wherever there is a break in the continuity of the behaviour of one (or a combination of some) of the feature streams, this is an indication of positions for a possible segmentation. The whole musical stream is segmented at these positions. If discontinuities happen in more than one feature at the same time, this indicates the existence of 59" NICS Reports different levels of structural organization within the musical piece; this conflict must be resolved (this will be clarified later). In the example of Fig. 3, we shall only consider the melody direction stream ('a' of Fig. 3). Hence, every time the direction of the melody is about to change, a new grouping starts. These places are indicated on the musical score shown in Fig. 3 with the symbol 'v'. To designate these segmented musical structures we adopted the expression “musical meme” or simply “meme”, a term that has been introduced by Dawkins [7] to describe basic units of cultural transmission in the same way that genes, in biology, are units of genetic information. “Examples of memes are tunes, catch-phrases, clothes fashions, ways of making pots or of building arches. Just as genes propagate themselves in the gene pool by leaping from body to body via sperm and eggs, so memes propagate in the meme pool by leaping from brain to brain via a process which, in a broad sense, can be called imitation.” [7]. The idea of employing this concept is attractive because it covers both the concept of structural elements and processes of cultural development, which fits well with the purpose of our research. A meme is generally defined as a short musical structure, but it is difficult to ascertain what is the minimal acceptable size for a meme. In iMe, memes are generally small structures in the time dimension and they can have any number of simultaneous notes. Fig. 5 shows a meme (from the same piece of the segment shown in Fig. 4) and its memotype representation following the application of three filters: melodic direction, leap and duration: Mel. direction: 0 1 1 1 -1 1 -1 1 -1 Mel. leap: 0 2 2 1 3 2 4 7 12 Mel. duration: 0 60 60 60 60 60 60 120 120 Fig. 5. Meme and corresponding memotype representation. Since the memes were previously separated into streams of data, they can be represented as a group of memotypes, each corresponding to a particular musical feature. A meme is therefore represented by 'n' memotypes, in which 'n' is the number of streams of data representing musical features. In any meme the number of elements of all the memotypes is the same and corresponds to the number of vertical structures. By “vertical structure” we mean all music elements that happen at the same time. 60" NICS Reports 2.2 Memory The execution of any of the musical tasks requires the perception and segmentation of the musical flow and the adaptation of the memory. As a result, the agents need to store this information in their memory by comparing it with the elements that were previously perceived. This is a continuous process that constantly changes the state of the memory of the agents. In iMe, the memory of the agents comprises a Short Term Memory (STM) and a Long Term Memory (LTM). The STM consists of the last x memes (x is defined “a priori” by the user) that were most recently brought to the agent's attention, representing the focus of their “awareness”. A much more complex structure, the LTM is a series of specialized “Feature Tables” (FTs), a place designed to store all the memotypes according to their categories. FTs are formed by “Feature Lines” (FLs) that keep a record of the memotypes, the dates of when the interactions took place (date of first contact dfc, date of last contact - dlc), the number of contacts (noc), weight (w) and “connection pointers” (cp). In Fig. 6 we present the excerpt of a hypothetical FT (for melody leaps) in which there are 11 FLs. The information between brackets in this Fig. corresponds to the memotype and the numbers after the colon correspond to the connection pointers. This representation will be clarified by the examples given later. Feature n. 2 (melody leaps): Line 0: [0 0]: 0 0 0 0 0 0 0 0 0 0 Line 1: [2 2 0 1 0 1 2 5 0]: 1 Line 2: [1 0 0 3 2 2 0]: 2 20 10 10 Line 3: [1 0 0 0 1 2 2 4]: 3 Line 4: [2 0 2 0 4 1 3 0]: 4 Line 5: [0 3 2 7 0 2 0 4]: 5 8 10 Line 6: [3 0 2 0 3 2 4]: 6 5 3 Line 7: [1 0 1 2 2 0 3]: 7 3 Line 8: [2 0 2 0 2 0 0]: 8 31 8 Line 9: [2 0]: 47 4 9 9 4 9 9 Line 10: [5 0 8 2 1 2]: 10 Fig. 6. A Feature Table excerpt. 2.2.1 Adaptation Adaptation is generally accepted as one of the cornerstones of evolutionary theories, Biology and indeed A-Life systems. With respect to cultural evolution, however, the notion of adaptation still seem to generate heated debates amongst memetic theory scholars. Cox [14] asserts that the “memetic hypothesis” is based on the concept that the understanding that someone has on sounds comes from the comparison with the sounds already produced by this person. The process of comparison would involve tacit imitation, or memetic participation that is based on the previous personal experience on the production of the sound. 61" NICS Reports According to Jan [15] “the evolution of music occurs because of the differential selection and replication of mutant memes within idioms and dialects. Slowly and incrementally, these mutations alter the memetic configuration of the dialect they constitute. Whilst gradualistic, this process eventually leads to fundamental changes in the profile of the dialect and, ultimately, to seismic shifts in the overarching principles of musical organization, the rules, propagated within several dialects.” iMe defines that every time agents interact with a piece of music their musical knowledge changes according to the similarities and/or differences that exist between this piece and their own musical “knowledge”. At any given time, each memotype for each one of the FTs in an agent's memory is assigned with a weight that represents their relative importance in comparison with the corresponding memotypes in the other memes. The adaptation mechanism is fairly simple: the weight is increased when a memotype is perceived by an agent. The more an agent listens to a memotype, the more its weight is increased. Conversely, if a memotype is not listened to for some time, its weight is decreased; in other words, the agent begins to forget it. The forgetting mechanism - an innovation if compared to other systems, such as the ones cited earlier - is central to the idea of an ontomemetic musical system and is responsible for much of the ever-changing dynamics of the weights of memotypes. In addition to this mechanism, we have implemented a “coefficient of permeability” (values between 0 and 1) that modulates the calculation of the memotype weights. This coefficient is defined by a group of other variables (attentiveness, character and emotiveness), the motivation being that some tasks entail more or less transformation to the agent's memory depending on the required level of attentiveness (e.g., a reading task requires less attention than an improvisation task). On the other hand, attributes such as character and emotiveness can also influence the level of “permeability” of the memory. When a new meme is received by the memory, if the memotype is not present in the corresponding FT, a new FL is created and added to the corresponding FT. The same applies to all the FTs in the LTM. The other information in the FLs (dates, weight and pointers) is then (re)calculated. This process is exemplified below. Let us start a hypothetical run in which the memory of an agent is completely empty. As the agent starts perceiving the musical flow (Fig. 3), the agent's “sensory organs” (feature filters) generate a parallel stream of musical features, according to the mechanism described earlier. The first meme (Fig. 7) then arrives at the agent's memory and, as a result, the memory is adapted (Fig. 8). 62" NICS Reports Feature stream: mdi: 0, 1, 1, 1 mle: 0, 2, 2, 1 mii: 120, 120, 120, 120 mdu: 120, 120, 120, 120 Fig. 7. Meme 1, where mdi is melody direction, mle is melody leap, mii is melody interonset interval and mdu is melody duration. In order to keep the example simple, we are only showing the representation of four selected features: melody direction (FT1), leap (FT2), interonset interval (FT3) and duration (FT4). Fig. 8 shows the memotypes in each of the Feature Tables. Notice that the connection pointers (cp) of FTs 2 to 4 actually point to the index (i) of the memotype of FT1. The initial weight (w) was set to 1.0 for all of the memotypes and the information date (dfc, dlc) refers to the cycle in which this task is performed during the simulation; in this case, the first task. i Memotype dfc dlc noc w cp Melody direction: 1 0, 1, 1, 1 1 1 1 1.0 1 1 1 1.0 1 1 1 1 1.0 1 1 1 1 1.0 1 Melody leap: 1 0, 2, 2, 1 Melody interonset interval: 1 120, 120, 120, 120 Melody duration: 1 120, 120, 120, 120 Fig. 8. Agent's memory after adaptation to meme 1. Then comes the next meme (Fig. 9), as follows: Feature stream: mdi: 1, -1, -1 mle: 2, 2, 1 mii: 120, 120, 120 mdu: 120, 120, 120 Fig. 9. Meme 2. And the memory is adapted accordingly (Fig. 10): i Memotype Dfc dlc noc w cp Melody direction: 1 0, 1, 1, 1 1 1 1 1.0 2 2 1, -1, -1 1 1 1 1.0 1 0, 2, 2, 1 1 1 1 1.0 1 2 2, 2, 1 1 1 1 1.0 2 Melody leap: 63" NICS Reports Melody interonset interval: 1 120, 120, 120, 120 1 1 1 1.0 1 2 120, 120, 120 1 1 1 1.0 2 1 120, 120, 120, 120 1 1 1 1.0 1 2 120, 120, 120 1 1 1 1.0 2 Melody duration: Fig. 10. Agent's memory after adaptation to meme 2. Here all the new memotypes are different from the previous ones and stored in separate FLs in the corresponding FTs. Now the memotype of index 1 in FT1 points (cp) to the index 2. Differently from the other FTs, this information represents the fact that memotype of index 2 comes after the memotype of index 1. This shows how iMe keeps track of the sequence of memes to which the agents are exposed. The cp of the other FTs still point to the index in FT1 that connect the elements of the meme to which the memory is being adapted. The weights of the new memes are set to 1.0 as previously. The same process is repeated with the arrival of meme 3 (Figs. 11 and 12) and meme 4 (Figs. 13 and 14). Feature stream: mdi: -1, 1, 1, 1, 1, 1 mle: 2, 2, 1, 2, 2, 2 mii: 120, 120, 120, 120, 120, 120 mdu: 120, 120, 120, 120, 120, 120 Fig. 11. Meme 3. i Memotype dfc dlc Noc W Cp Melody direction: 1 0, 1, 1, 1 1 1 1 1.0 2 2 1, -1, -1 1 1 1 1.0 3 3 -1, 1, 1, 1, 1, 1 1 1 1 1.0 1 0, 2, 2, 1 1 1 1 1.0 1 2 2, 2, 1 1 1 1 1.0 2 3 2, 2, 1, 2, 2, 2 1 1 1 1.0 3 1 120, 120, 120, 120 1 1 1 1.0 1 2 120, 120, 120 1 1 1 1.0 2 3 120, 120, 120, 120, 120, 120 1 1 1 1.0 3 1 120, 120, 120, 120 1 1 1 1.0 1 2 120, 120, 120 1 1 1 1.0 2 Melody leap: Melody interonset interval: Melody duration: 64" NICS Reports 3 120, 120, 120, 120, 120, 120 1 1 1 1.0 3 Fig. 12. Agent's memory after adaptation to Meme 3. Feature stream: mdi: 1, -1, -1 mle: 1, 1, 2 mii: 120, 120, 120 mdu: 120, 120, 120 Fig. 13. Meme 4. The novelty here is that the memotypes for melody direction, interonset interval and duration had already been stored in the memory. Only the melody leap has new information and, as a result a new FL was added to FT2 and not to the other FTs. The weights of the repeated memotypes were increased by '0.1', which means that the relative weight of this information increased if compared to the other memotypes. We can say thereafter that the weights ultimately represent the relative importance of all the memotypes in relation to each other. The memotype weight is increased by a constant factor (e,g, f = 0.1) every time it is received and decreases by another factor if, at the end of the cycle, it is not “perceived”. The later case will not happen in this example because we are considering that the run is being executed entirely in one single cycle. i 1 2 3 1 2 3 4 1 2 3 1 2 3 Memotype dfc dlc Melody direction: 0, 1, 1, 1 1 1 1, -1, -1 1 1 -1, 1, 1, 1, 1, 1 1 1 Melody leap: 0, 2, 2, 1 1 1 2, 2, 1 1 1 2, 2, 1, 2, 2, 2 1 1 1, 1, 2 1 1 Melody interonset interval: 120, 120, 120, 120 1 1 120, 120, 120 1 1 120, 120, 120, 120, 120, 120 1 1 Melody duration: 120, 120, 120, 120 1 1 120, 120, 120 1 1 120, 120, 120, 120, 120, 120 1 1 noc W Cp 1 2 1 1.0 1.1 1.0 2 3 2 1 1 1 1 1.0 1.0 1.0 1.0 1 2 3 2 1 2 1 1.0 1 1.1 2, 2 1.0 3 1 2 1 1.0 1 1.1 2, 2 1.0 3 Fig. 14. Agent's memory after adaptation to meme 4. Finally, the memory receives the last meme (Fig. 15) and is adapted accordingly (Figs. 15 and16). Feature stream: mdi: -1, 1, -1, -1, -1 mle: 2, 2, 2, 2, 1 mii: 120, 120, 120, 120, 120 mdu: 120, 120, 120, 120, 480 Fig. 15. Meme 5. 65" NICS Reports i Memotype dfc dlc noc w cp Melody direction: 1 0, 1, 1, 1 1 1 1 1.0 2 2 1, -1, -1 1 1 2 1.1 3, 4 3 -1, 1, 1, 1, 1, 1 1 1 1 1.0 4 -1, 1, -1, -1, -1 1 1 1 1.0 1 0, 2, 2, 1 1 1 1 1.0 1 2 2, 2, 1 1 1 1 1.0 2 3 2, 2, 1, 2, 2, 2 1 1 1 1.0 3 4 1, 1, 2 1 1 1 1.0 2 5 2, 2, 2, 2, 1 1 1 1 1.0 4 1 120, 120, 120, 120 1 1 1 1.0 1 2 120, 120, 120 1 1 2 1.1 2, 2 3 120, 120, 120, 120, 120, 120 1 1 1 1.0 3 4 120, 120, 120, 120, 120 1 1 1 1.0 4 1 120, 120, 120, 120 1 1 1 1.0 1 2 120, 120, 120 1 1 2 1.1 2, 2 3 120, 120, 120, 120, 120, 120 1 1 1 1.0 3 4 120, 120, 120, 120, 480 1 1 1 1.0 4 2 Melody leap: Melody interonset interval: Melody duration: Fig. 16. Agent's memory after adaptation to meme 5. 3.3 Generative Processes Gabora [16] explains that, in the same way that information patterns evolve through biological processes, mental representation - or memes - evolves through the adaptive exploration and transformation of an informational space through variation, selection and transmission. Our minds perform tasks on its replication through an aptitude landscape that reflects internal movements and a worldview that is continuously being updated through the renovation of memes. In iMe agents are also able to compose through processes of re-synthesis of the different memes from their worldview. Obviously, the selection of the memes that will be used in a new composition implies that the musical worldview of this agent is also re-adapted by reinforcing the weights of the memes that are chosen. In addition to compositions (non real-time), agents also execute two types of real-time generative tasks: solo and collective improvisations. The algorithm is described below. 66" NICS Reports 3.3.1 Solo improvisations During solo improvisations, only one agent play at a time, following the steps below Step 1: Generate a new meme according to the current “meme generation mode” The very first memotype of a new piece of music is chosen from the first Feature Table (FT1), which guides de generation of the whole sequence of memes, in a Markov-like chain. Let us assume that the user configured FT1 to represent melody direction. Hence, this memotype could be, hypothetically [0, 1, 1, -1], where 0 represents “repeat the previous note”, 1 represents upward motion and -1 represents downward motion. Once the memotype from FT1 is chosen (based on the distribution of probability of the weights of the memotypes in that table), the algorithm looks at the other memotypes at the other FTs to which the memotype at FT1 points at and chooses a memotype for each FT of the LTM according to the distribution of probability of the weights at each FT. At this point we would end up with a new meme (a series of n memotypes, where n = number of FTs in the LTM). The algorithm of the previous paragraph describes one of the generation modes that we have implemented: the “LTM generation mode”. There are other modes. For instance, there is the “STM generation mode”, where agents choose from the memes stored in their Short Term Memory. Every time a new meme is generated, the agent checks the Compositional and Performance Map (explanation below) to see which generation mode is applicable at any given time. Step 2: Adapt the memory with the newly generated meme Once the new meme is generated, the memory is immediately adapted to reflect this choice, according to the criteria explained in the previous section. Step 3: Adapt the meme to the Compositional and Performance Map (CPM) The new meme is then adapted according to criteria foreseen at the CPM. The CPM (Fig. 17), iMe's equivalent to a “lead sheet”, possesses instructions regarding a number of parameters that address both aspects of the improvisation: the generation of new musical ideas and the performance of these ideas. Examples of the former are: the meme generation mode, transformations to the meme, local scales and chords, note ranges for right and left hand. Examples of the latter are: ratio of loudness between melodic and non-melodic notes, shifts for note onset, loudness and duration both for melodic and non-melodic notes. Instructions regarding the performance only affect the sound that is generated by the audio output of the system and is not stored with the composition. 67" NICS Reports Fig. 17. A CPM excerpt. The instructions (or “constraints”) contained in the CPM are distributed on a timeline. The agent checks the constraints that are applicable at the “compositional pointer”, a variable that controls the position of the composition on the timeline, and acts accordingly. Step 4: Generate notes and play the meme (if in real time mode) Until this moment, the memes are not real notes but only meta-representations described by the memotypes (melody direction, melody leap, etc.). Given the previously generated notes and the CPM, the “actual notes” of the meme must be calculated and sent to a playing buffer. Step 5: Store the meme in the composition An array with the information of the sequence of the memes is kept with the composition for future reference and tracking of the origin of each meme. There is another generation mode, the “MemeArray generation mode”, where an agent can retrieve any previously generated meme and choose it again during the composition. Step 6: Repeat previous steps until the end of the CPM The agent continuously plays the notes of the playing buffer. When the number of notes in this buffer is equal to or less than 'x' (parameter configured by the user), the algorithm goes back to step 1 above and a new meme is generated until the whole CPM is completed. 3.3.2 Collective improvisations The steps for collective improvisations are very similar to the steps for solo improvisations, except for the fact that the agents play along with a human being. We have implemented this task as two separate sub-tasks (a listening sub-task and a solo improvisation sub-task) running in separate threads. Memes are generated as in a solo improvisation and the agents' memory is equally affected by the memes they choose as well as by the memes that they 68" NICS Reports listen from the musical data originated by the external improviser. Both agent and external improviser follow the same CPM. At the end of the improvisation (solo or interactive), the composition is stored in the system in order to be used in further runs of the system. 3. Conclusions and Further Work In this paper we introduced Interactive Musical Environments (iMe) for the investigation of the emergence and evolution of musical styles in environments inhabited by artificial agents, under the perspective of human perception and cognition. This system belongs to a new paradigm of interactive musical systems that we refer to as “ontomemetical musical systems” for which we propose a series of prerequisites and applications. As seen from some of the experiments that we have presented, we understand that iMe has the potential to be extremely helpful in areas such as the musicological investigation of musical styles and influences. Besides the study of the development of musical styles in artificial worlds, we are also conducting experiments with human subjects in order to assess iMe's effectiveness to evaluate musical influences in inter-human interaction. The study of creativity and interactive music in artificial and real worlds could also benefit with a number of iMe's features, which we are currently evaluating as well. The memory of an agent is complex and dynamic, comprising of all memotypes, their weights and connection pointers. The execution of musical tasks affects the memory state in proportion to the appearance of different memes and memotypes. A particular musical ontomemesis can thereafter be objectively associated with the development of any agent's “musicality”. Bearing in mind that iMe can be regarded as a tool for the investigation of musical ontomemesis as much as a tool for different sorts of musicological analyses, a series of different simulation designs could be described. Future improvements to the system will include the introduction of algorithms that would allow iMe to become a self-sustained artificial musical environment such as criteria to control the birth and demise of agents and the automatic definition of their general characteristics such as attentiveness, character, emotiveness, etc. Agents should also possess the ability to decide when and what tasks to perform, besides being able to develop their own Compositional and Performance Maps. Acknowledgment The authors would like to thank the funding support from the Brazilian Government's Fundacao Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES). 69" NICS Reports References 1. Miranda, E.R., The artificial life route to the origins of music. Scientia, 1999. 10(1): p. 5-33. 2. Biles, J.A. GenJam: A Genetic Algorithm for Generating Jazz Solos. in International Computer Music Conference. 1994. 3. Miranda, E.R., Emergent Sound Repertoires in Virtual Societies. Computer Music Journal, 2002. 26(2): p. 77-90. 4. Miranda, E.R., At the Crossroads of Evolutionary Computation and Music: Self-Programming Synthesizers, Swarm Orchestras and the Origins of Melody. Evolutionary Computation, 2004. 12(2): p. 137-158. 5. Meyer, L.B., Style and Music: Theory, History, and Ideology. 1989, Philadelphia: University of Pennsylvania Press. 6. Park, M.A., Introducing Anthropology: An Integrated Approach. 2002: McGraw-Hill Companies. 7. Dawkins, R., The Selfish Gene. 1989, Oxford: Oxford University Press. 8. Cope, D., Computers and Musical Style. 1991, Oxford: Oxford University Press. 9. Rowe, R., Interactive Music Systems: Machine Listening and Composing. 1993: MIT Press. 10. Pachet, F., Musical Interaction with Style. Journal of New Music Research, 2003. 32(3): p. 333-341. 11. Assayag, G., et al. Omax Brothers: a Dynamic Topology of Agents for Improvization Learning. in Workshop on Audio and Music Computing for Multimedia, ACM Multimedia. 2006. Santa Barbara. 12. Snyder, B., Music and Memory: An Introduction. 2000, Cambridge, MA: MIT Press. 13. Eysenck, M.W. and M.T. Keane, Cognitive Psychology: A Student's Handbook. 2005: Psychology Press. 14. Cox, A., The mimetic hypothesis and embodied musical meaning. MusicæScientiæ, 2001. 2: p. 195–212. 15. Jan, S., Replicating sonorities: towards a memetics of music. Journal of Memetics - Evolutionary Models of Information Transmission, 2000. 4. 16. Gabora, L., The Origin and Evolution of Culture and Creativity. Journal of Memetics, 1997. 70" NICS Reports 4. Vox Populi: An Interactive Evolutionary System for Algorithmic Music Composition10 Artemis Moroni Artemis Moroni (researcher), Technological Center for Informatics—The Automation Institute (CTI/IA), Rod D. Pedro I, km 143,6, Campinas, São Paulo 13081/1970, Brazil. E-mail: <[email protected]>. Jônatas Manzolli Jônatas Manzolli (educator), State University of Campinas—Interdisciplinary Nucleus of Sound Communication (UNICAMP/NICS), Cidade Universitária “Zeferino Vaz,” Barão Geraldo, Campinas, São Paulo 13081/970, Brazil. E-mail: <[email protected]>. Fernando Von Zuben Fernando Von Zuben (educator), State University of Campinas—Faculty of Electrical and Computer Engineering (UNICAMP/FEEC), Cidade Universitária “Zeferino Vaz,” Barão Geraldo, Campinas, São Paulo 13081/970, Brazil. E-mail: <[email protected]>. Ricardo Gudwin Ricardo Gudwin (educator), State University of Campinas—Faculty of Electrical and Computer Engineering (UNICAMP/FEEC) Cidade Universitária “Zeferino Vaz,” Barão Geraldo, Campinas, São Paulo 13081/970, Brazil. E-mail: <[email protected]>. Abstract While recent techniques of digital sound synthesis have put numerous new sounds on the musician’s desktop, several artificial-intelligence (AI) techniques have also been applied to algorithmic composition. This article introduces Vox Populi, a system based on evolutionary computation techniques for composing music in real time. In Vox Populi, a population of chords codified according to MIDI protocol evolves through the application of genetic algorithms to maximize a fitness criterion based on physical factors relevant to music. Graphical controls allow the user to manipulate fitness and sound attributes. In Darwin’s time, most geologists subscribed to “catastrophe theory”: that the Earth would be punished many times over by floods, earthquakes and other catastrophes, able to destroy all forms of life. On his voyage on board the 10 Referência original deste trabalho: Moroni, A., J. Manzolli, et al. (2000). "Vox Populi: An Interactive Evolutionary System for Algorithmic Music Composition." Leonardo Music Journal 10: 49-54. 71" NICS Reports Beagle, Darwin verified that the diverse animal species of a region differed from each other in minimal details, but he did not understand how this could result from a “natural” selection. In October 1838, he learned from a small book, Essay on Population Origin by Thomas Malthus, about the factors influencing evolution. Malthus, in turn, was inspired by Benjamin Franklin (the same person who had invented the lightning rod). Franklin had noted the fact that in nature there must be locally limiting factors, or a unique plant or animal would spread all over the Earth; it was only the existence of different kinds of animals that maintained them in equilibrium. This was the universal mechanism that Darwin was looking for. The factor responsible for the way evolution happens is natural selection in the fight for life, i.e. those who are better adapted to the environment survive and assure species continuity. Furthermore, the fight for survival among members of a species is more obstinate, since they must fight over shared resources; small differences, or positive deviations from the typical, are most valuable. The more obstinate the fight is, the faster the evolution; in this context only those better adapted themselves survive. However, characteristics that are positive in a specific environment may have no value in another. D. Hofstadter, in Metamagical Themas [1], discusses the arbitrariness of the genetic code. According to him, the first moral of this development is: Efficiency matters. A second moral, more implicit, is: Having variants matters. The ratchet of evolution will advance toward ever more efficient variants. If, however, there is no mechanism for producing variants, then the individual will live or die simply on the basis of its own qualities vis-à-vis the rest of the world. Algorithmic composition and evolution R. Dawkins demonstrated the power of Darwinism in The Blind Watchmaker, using a simulated evolution of two-dimensional (2D) branching structures made from sets of genetic parameters. The user selects the “biomorphs” that survive and reproduce to create a new generation [2]. S. Todd and W. Latham applied these concepts to help generate computer sculptures using constructive solid geometry techniques [3,4]. K. Sims used evolutionary mechanisms of creating variations and making selections to “evolve” complex equations to be used in procedural models for computer graphics and animation [5]. A new generation of algorithmic composition researchers has discovered that it is easy to obtain new musical material by using simulated-evolution techniques to create new approaches for composition. These techniques have been useful for searching large spaces using simulated systems of variation and selection. J.A. Biles has described an application of genetic algorithms to generate jazz solos [6] that has also been studied by D. Horovitz as a way of controlling rhythmic structures [7]. On the other hand, it is difficult to drive the results in a 72" NICS Reports desired direction. The challenge faced by the designers of evolutionary composition systems is how to bring more structures and knowledge into the compositional loop. This loop, in an evolutionary system, is a rather simple one; it generates, tests and repeats. Such systems maintain a population of potential solutions; they have a selection process and some “genetic operators,” typically mathematical functions that simulate crossover and mutation. Basically, a population is generated; the individuals of the population are tested according to certain criteria, and the best are kept. The process is Fig. 1. Vox Populi Reproduction and MIDI Cycles: The Reproduction Cycle is an evolving process that generates chords by using genetic operators and selecting individuals and is based on the general framework provided by J.H. Holland’s original genetic algorithm. The MIDI Cycle refers to the interface’s search for notes to be played by the computer. When selected, a chord is put in a critical area that is continually verified by the interface. These notes are played until the next group is selected. (© Artemis Moroni) repeated by generating a new population of individuals—or things or solutions—based on the old ones [8]. This loop continues until the results are satisfactory according to the criteria being used. The effective challenge is to specify what “to generate” and “to test” mean. All evolutionary approaches do, however, share many features. They are all based, like the diagram in Fig. 1, on the general framework provided by J.H. Holland’s original genetic algorithm (GA) [9] or, indirectly, by the genetic programming paradigm of J.R. Koza, who proposed a system based on evolution to search for the computer program most fit for solving a particular problem [10]. In nearly every case, new populations of potential solutions to problems (here, the problem of music composition) are created, generation after generation, through three main processes: 1. By making sure that better solutions to the problem will prevail over time, more copies of currently better solutions are put into the next generation. 2. By introducing new solutions into the population; that is, a low level of mutation operates on all acts of reproduction, so that some offspring will have randomly changed characteristics. 3. By employing sexual crossover to combine good components between solutions; that is, the “genes” of the parents are mixed to form offspring with aspects of both. With these three processes taking place, the evolutionary loop can efficiently explore many points of the solution space in parallel, and good solutions can often be found quite quickly. In creative processes such as music composition, however, the goal is rarely to find a single good solution and then stop; an ongoing process of innovation and refinement is usually more appropriate. 73" NICS Reports Information seen as genotypes and phenotypes Both biological and simulated evolution involve the basic concepts of genotype and phenotype, and the processes of selection and reproduction with variations. The genotype is the genetic code for the creation of an individual. In biological systems, genotypes are normally composed of DNA. In simulated evolutions there are many possible representations of genotypes, such as strings of binary digits, sets of procedural parameters or symbolic expressions. The phenotype is the individual itself or the form that results from the developing rules and genotypes. Selection depends on the process by which the fitness of phenotypes is determined. The likelihood of survival and the number of new offspring that an individual generates are proportional to its fitness measure. Fitness is simply a numerical index expressing the ability of an organism to survive and reproduce. In simulation, it can be evaluated by an explicitly defined mathematical function or it can be provided by a human observer. Reproduction is the process by which new genotypes are generated from an existing genotype. For evolution to progress, there must be variations, or mutations in new genotypes having some frequency of occurrence. Mutations are usually probabilistic, as opposed to deterministic. Note that selection is, in general, nonrandom and operates on phenotypes, while variation is usually random and operates on the corresponding genotypes. The repeated cycle of reproduction with variations and selections of the fittest individuals drives the evolution of a population toward a higher and higher level of fitness. Sexual combination allows genetic material of more than one parent to be mixed together in some way to create new genotypes. This permits features to evolve independently and later to combine into an individual genotype. Although it is not necessary for evolution to occur, it is a valuable achievement that may enhance progress in both biological and simulated evolutions. If the mechanics of an evolutionary system are well understood and the chain of causation is properly represented, the process of evolution can be stated in rather simple terms and can be simulated for engineering and art purposes. Given the complexity of evolved structures, it may be somewhat surprising that evolution here appears reduced to rather few rules [11]. In our approach, the population is made up of four note groups, or chords, as potential survivors of a selection process. Melodic, harmonic and vocal-range fitnesses are used to control musical features. Based on the ordering of consonance of musical intervals, the notion of approximating a sequence of notes to its harmonically compatible note, or tonal center, is used. The selected notes are sent to the MIDI port and can be heard as sound events in real time. This sequence produces a sound resembling a chord cadence or fast counterpoint of note blocks. 74" NICS Reports Individuals of the population are defined as groups of four voices, or notes. (Henceforth, voices and notes will be used interchangeably.) These voices are randomly generated in the interval 0– 127, with each value representing a MIDI event, described by a string of 7 bits. In each iteration, 30 groups are generated. Figure 2 shows an example of a group— the genotype—internally represented as a chromosome of 28 bits, or 4 words of 7 bits, one word for each voice. The phenotype is the corresponding chord. Two processes are integrated: (1) Reproduction Cycle: an evolving process that generates chords using genetic operators and selecting individuals; (2) MIDI Cycle: the interface looking for notes to be played by the computer. When a chord is selected, the program puts it in a critical area that is continually verified by the interface. These notes are played until the next group is selected. The timing of these two processes determines the rhythm of the music being heard. In any case, a graphic interface allows the user to interfere with the rhythm by modifying the cycles. Figure 1 depicts the Reproduction Cycle and the MIDI Cycle. Fitness evaluation Traditionally, Western music is based on harmony; hence, a general theory of music has to engage deeply with formal theories on this matter. The term “harmony” is inherently ambiguous, since it refers to a lower level where smoothness and roughness are evaluated and, at the same time, to a higher aesthetic level where harmony is functional to a given style. However, harmony is a very subjective concept; the perception of harmony does not seem to have a natural basis, but appears to be a common response acquired by people in specific cultural settings. Nevertheless, while there is a difference of opinion on what constitutes harmony, there is a general agreement on the relative order of music interval consonance. Numerical theories of consonance have tried to capture this aspect, but here again, a lot is left to the imagination, as theory does not clearly define what constitutes the order of simplicity of musical intervals. In our case, we have applied, as a fitness function, a numerical theory of consonance from a physical point of view. Based on a relative ordering of consonance of musical intervals, a sequence of notes is approximated to its most harmonically compatible note or tonal center. Tonal centers can be thought of as an approximation of the melody, describing its flow. This method uses fuzzy formalism, or fuzzy sets, which are classes of objects with a continuum of membership grades. Such a set is characterized by a function that assigns to each object a grade of membership ranging between 0 and 1 [12]. In Vox Populi, harmony is treated as a function of the commonality, or overlap, between the harmonic series of notes. This overlap measurement is then 75" NICS Reports scaled to be a value between 0 and 1, with 1 denoting complete overlap (i.e. the two notes are the same) and 0 denoting no overlap at all [13]. Fig. 2. Vox Populi MIDI chromosome: An example of a group—the genotype—inter- nally represented as a chromosome of 28 bits, or 4 words of 7 bits, one word for each voice. The phenotype is the corresponding chord. (© Artemis Moroni) The harmonic series of notes 60 and 64 (do and mi, in the center of the piano, according to the MIDI protocol) are depicted in Fig. 3, while Fig. 4 depicts their overlap, or consonance measure. According to our approach, approximation to the tonal center is posed as an optimization problem based on physical factors relevant to hearing music. This approach is technically detailed in Moroni et al. 76" NICS Reports [14]. In the selection process, the group of voices with the highest musical fitness is selected and played. The musical fitness for each chord is a conjunction of three partial fitness functions: melody, harmony and vocal range, each having a numerical value. Musical Fitness = Melodic Fitness + Harmonic Fitness + Vocal Range Fitness Fig. 3. Vox Populi harmonic series of notes 60 (the piano center, do) and 64 (mi). Each series represents the relative ordering of musical intervals for notes do and mi and is treated as a fuzzy set. (© Artemis Moroni) Melodic fitness is evaluated by comparing the notes that compose a chord to a value Id (identity), which can be modified by the composer in real time using the melodic control of the interface. This control “forces” the notes of the selected 77" NICS Reports chord to be close to (or distant from) the Id value, which acts as a tonal center and is treated as an attractor. Harmonic fitness is a function of the consonance among the components of the chords. Vocal range fitness verifies which notes of the chord are in the range desired by the composer, who may modify it through the octave control. The melodic control and the octave control allow the composer to conduct the music that is being created, interfering directly in the musical fitness, while other controls simply modify attributes of the chord that has been selected. Also, the biological and rhythmic controls allow the user to modify the duration of the genetic cycle by modifying the duration of the evolution eras. Eras can be thought as the number of iterations necessary to generate a new population. The combined use of the controls gives birth to sound orbits, which can be perceived through intermittent cycles. Fitness tuning Part of the reason why evolution in nature is very slow is that the forces of selection can be imperfect and at times ineffectual. Non-privileged individual organisms may still succeed in finding mates, having offspring and passing on Fig. 4. Vox Populi: Overlap between the harmonic series of notes 60 and 64. Note 60 can be thought of as one of the notes of the chord and note 64 as the tonal center. The sum of heights of the components of the overlap is the consonance measure between the two notes. (© Artemis Moroni) their genes, while organisms with a new advantageous trait may not manage to live long enough to find a mate and influence the next generation. Todd and Werner have made a charming comparison with the Frankenstein tale; Frankenstein hoped for much more than the creation of a single superior living being—he intended his creature to beget a whole new race that would grow in number and goodness, generation after generation. Later he worried that this 78" NICS Reports process might not go exactly as he planned, with the children becoming more monstrous than their parents, a realization that led him to abandon his efforts to create a female progenitor. But, suppose, like Frankenstein, one wants to enter the “workshop of filthy creation” [15] and replace the human composer with an artificial composition system— due to a wish to ease a composer’s workload, an intellectual interest in understanding the composition process, the desire to explore unknown musical styles or mere curiosity about the possibilities. Maybe Vox Populi could have been initially included only in the last group as inspired by a “mere curiosity about the possibilities” but given Vox Populi’s surprising results, it can now be included in the first two. Two main approaches have been tried to express the fitness evaluation, both presenting interesting effects. The first one, derived from a composer’s musical experience, provided a faster fitness evaluation. This method allows the use of a large population, 100–200 chords, producing greater diversification and resulting in a slower convergence to the best chord sequence. In the second approach, the consonance criterion is used, and a longer calculation is needed to evaluate musical fitness. In order to assure quick enough real-time performance by the system, the population was limited to 30 chords. The advantage of this approach is that it formalizes mathematically the concept of consonance. It can be easily described and flexibly programmed and modified. Since the musical fitness criterion used was stricter in the second example (using 30 chords instead of 100–200), the resulting sound output was less diversified; it was possible to hear the musical sequence converging to unison. This fact highlighted the notion that, in musical composition, not only consonance but also dissonance is desirable. Figure 5 depicts a Vox Populi musical output. Vox Populi differs from other systems found in genetic algorithms or evolutionary computation in which people have to listen to and judge musical items; instead, Vox Populi uses the keyboard and mouse as real-time music controllers, acting as an interactive computer-based musical instrument. It explores evolutionary computation in the context of algorithmic composition and provides a graphical interface that allows the composer to change the evolution of the music by using the mouse. These results reflect current concerns at the forefront of interactive composition computer music and in the development of new control interfaces. Interface controls use nonlinear iterative mappings. They can give rise to attractors, defined as geometric figures that represent the set of stationary states of a dynamic system or simply trajectories to which the system is attracted. A piece of music consists of several sets of musical raw material manipulated and exposed to the listener, such as pitches, harmonies, rhythms, timbres, etc. These sets are composed of a finite number of elements, and the 79" NICS Reports basic aim of a composer is to organize them in an aesthetic way. Modeling a piece as a dynamic system implies a view in which the composer draws trajectories or orbits using the elements of each set [16]. Fig. 5. Score of MIDI raw material produced by Vox Populi. This material was produced by Vox Populi in an interactive session by Jônatas Manzolli, composer. In the latest Vox Populi version, the user is able to record a piece that is composed during performance. The interactive pad control supplies a graphical area in which 2D curves can be drawn. These curves, a blue one and a red one, are linked to the controls of the interface. The red curve links to the melodic and octave range controls; and the blue curve links to the biological and rhythmic controls. When the interactive pad is active, the four other linked controls are disabled. Each curve describes a relation between the linked variables. They are traversed in the order in which they were created; their horizontal and vertical components are used for fitness evaluation and to modify the duration of the genetic cycles, interfering directly in the rhythm of the composition. The pad control allows the composer to conduct the music through drawings, suggesting metaphorical “conductor gestures” used when conducting an orchestra. Using different drawings, the composer can experience the generated music and conduct it, trying different trajectories or sound orbits. The trajectories then affect the reproduction cycle and musical fitness evaluation. Interface and parameter control The resulting music moves from very pointillistic sounds to sustained chords, depending upon the duration of the genetic cycle and the number of individuals of the original population. The interface is designed to be flexible enough for the user to modify the music being generated. Below is a short description of the controls available to the user interacting with Vox Populi. The melodic, 80" NICS Reports biological, rhythmic and octave controls allow the composer to modify the fitness function in real time and are associated with attractors. Vox Populi’s interface is depicted in Fig. 6 and in Color Plate A No. 2. Fig. 6. Vox Populi interface. (© Artemis Moroni) Melodic Control The mel scroll bar allows one to modify the value Id, which is the tonal center in the evaluation of melodic fitness. Given an ordered sequence of notes, it seems intuitively appealing to call the note that is most consonant with all the other notes the coloring, or tonal, center. Hence, the extraction of the tonal center of a sequence of notes would involve finding an optimally harmonically compatible note. As mentioned before, in Vox Populi, the consonance is measured according to the Id value. This value is obtained from the interface control and can be changed by the user. Biological Control The bio scroll bar allows interference in the duration of the genetic cycle, modifying the time between genetic iterations. Since the music is being generated in real time, this artifice is necessary for the timing of the different processes that are running. This value determines the slice of time necessary to apply the genetic operators, such as crossover and mutation, and may also be interpreted as the reproduction time for each generation. Rhythmic Control 81" NICS Reports The rhy scroll bar changes the time between evaluations of musical fitness. It determines the “time to produce a new generation” or the slice of time necessary to evaluate the musical fitness of the population. It interferes directly in the rhythm of the music; any change makes the rhythm faster or slower. Octave Control The oct scroll bar allows enlarging or diminishing the interval of voices considered in the vocal range criterion. The octave fitness forces the notes to be in range H, assuming that H is the range of the human voice and associated with the central keys on the piano; but since several orchestras of instruments are used, this range is too limited for some instruments. We originally intended to restrict the generated voices to specific ranges in order to make those voices resemble the human voice. Nevertheless, a user can now enlarge these ranges by using the octave control. Orchestra Control Six MIDI orchestras are used to play the sounds: (1) keyboards; (2) strings and brasses; (3) keyboards, strings and percussion; (4) percussion; (5) sound effects and (6) random orchestral parts, by taking an instrument from the general MIDI list. Using the order above, these orchestras are sequentially changed into time segments controlled by the seg scroll bar. Interactive Pad Control The “Pad On” button enables and disables the pad change on the controls defined above. They may be grouped into two pairs, which may be interpreted as variables of a 2D phase space. This allows a user to draw and orient the curve to determine the evolution of the music. Fitness Displays Three other displays allow the user to follow the evolution of fitness. The upper display, at the right side of Fig. 6, shows the notes and the fitness of the chord that is being played. In the middle display, a bar graph shows the four voices (bass, tenor, contralto, soprano) and their values. It is equivalent to the membership function values related to the range of the voices. The bottom display shows the melodic, harmonic and octave fitness bars. Conclusion Despite the fact that Vox Populi works at the level of sound events controlled by MIDI protocols, or notes, in a macrostructural context, we learned two lessons. First, an evolutionary computational approach was successfully applied to generate complex sound structures with a perceptual and efficient control in real time. Second, applications of evolutionary computation may be foreseen to 82" NICS Reports prospect sound synthesis. Complex behavior systems have been used for sound synthesis, like Chaosynth, which uses cellular automata to control structures [17]. In Chaosynth, the generation occurs via granular synthesis. In another approach, Fracwave [18] uses the dynamics generated by complex systems to synthesize sounds using complex dynamics. We may say that varying the fitness controls in Vox Populi promotes a “sound catastrophe,” in which the previous winner may no longer be the best. Conditions for survival have changed, as they do in nature. The question we pose is how does an idea, or concept, survive? Vox Populi is simple, efficient and has been used in different ways, which may be considered variants: as an autonomous or demonstrative system generating music; as a sound laboratory, where people can try and experience the sound produced; as a studio, manipulating and generating samples that have been used in compositions and in sound landscapes. Another use currently being considered is to couple the system with sensors, allowing the user to describe orbits in space that would be treated like the 2D curves supplied by the interactive pad. Will Vox Populi survive? Vox Populi means “voice of the people.” Since the individuals in the population are defined as groups of four voices, we can think of them as “choirs,” fighting to survive and to be present in the next generation, while the environment and survival conditions are changing dynamically. One of the first known proposals to formalize composition was made by the Italian monk Guido d’Arezzo in 1026, who resorted to using a number of simple rules to map liturgical texts in Gregorian chants [19] due to the overwhelming number of orders he received for his compositions. The text below is attributed to d’Arezzo. His compositional approach has survived for several centuries, and even today, we still seek strategies for constructing the unknown melody. As I cannot come to you at present, I am in the meantime addressing you using a most excellent method of finding an unknown melody, recently given to us by God and I found it most useful in practice. . . . To find an unknown melody, most blessed brother, the first and common procedure is this. You sound on the monochord the letters belonging to each neume, and by listening you will be able to learn the melody as if you were hearing it sung by a teacher. But this procedure is childish, good indeed for beginners, but very bad for pupils who have made some progress. For I have seen many keen witted philosophers who had sought out not merely Italian, but French, German, and even Greek teachers for the study of this art, but who, because they relied on this procedure alone, could never become, I should not say, skilled musicians, but even choristers, nor could they duplicate the performance of our choir boys [20]. 83" NICS Reports References 1. D. Hofstadter, Metamagical Themas (New York: Basic Books, 1985) p. 694. 2. R. Dawkins, The Blind Watchmaker (London: Penguin Books, 1991) p. 313. 3. M. Haggerty, “Evolution by Esthetics, an Interview with W. Latham and S. Todd,” IEEE Computer Graphics 11 (1991) pp. 5–9. 4. S. Todd and W. Latham, Evolutionary Art and Computers (New York: Academic Press, 1992). 5. K. Sims, “Interactive Evolution of Equations for Procedural Models,” The Visual Computer 9, No. 9, 466–476 (1993). 6. J.A. Biles, “GenJam: A Genetic Algorithm for Generating Jazz Solos,” Proceedings of Computer Music Conference (ICMC ’94) (1994) pp. 131–137. 7. D. Horovitz, “Generating Rhythms with Genetic Algorithms,” Proceedings of Computer Music Conference (ICMC ’94) (1994) 142–143. 8. P. Todd and G. Werner, “Frankensteinian Methods for Evolutionary Music Composition,” in N. Griffith and P. M. Todd, eds., Musical Networks—Parallel Distributed Perception and Performance (Cambridge, MA: MIT Press, 1999) p. 313. 9. J.H. Holland, Adaptation in Natural and Artificial Systems (Cambridge, MA: MIT Press, Bradford Books, 1995) p. 122. 10. J.R. Koza, Genetic Programming (Cambridge, MA: MIT Press, Bradford Books, 1998) p. 29. 11. W. Atmar, “Notes on the Simulation of Evolution,” IEEE Transactions on Neural Networks 5, No. 1, 130–147 (1994). 12. L.A. Zadeh, “Fuzzy Sets,” Information and Control 8 (1965) pp. 338–353. 13. G. Vidyamurthy and J. Chakrapani, “Cognition of Tonal Centers: A Fuzzy Approach,” Computer Music Journal 16, No. 2, 45–50 (1992). 14. A. Moroni, J. Manzolli, F. Von Zuben and R. Gudwin, “Evolutionary Computation Applied to Algorithmic Composition,” Proceedings of the 1999 Congress on Evolutionary Computation (CEC99) 2 (1999) pp. 807–811. 15. M. Shelley, Frankenstein or The Modern Prometheus (USA: Penguin, 1993). 16. J. Manzolli, “Harmonic Strange Attractors,” CEM BULLETIN 2, No. 2, 4–7 (1991). 17. E.R. Miranda, “Granular Synthesis of Sounds by Means of a Cellular Automation,” Leonardo 28, No. 4, 297–300 (1995). 84" NICS Reports 18. F. Damiani, J. Manzolli and P.J. Tatsch, “A Non-Linear Algorithm for the Design and Production of Digitally Synthesized Sounds,” Technical Digest of the International Conference on Microelectronics and Packaging (ICMP99) (1999) pp. 196–199. 19. O. Strunk, Source Readings in Music History (New York: Vail-Ballou Press, 1950) p. 123. 20. Strunk [19]. Manuscript received 18 January 1999. Artemis Moroni is a technologist at the Automation Institute of the Technological Center for Informatics in Campinas, São Paulo, Brazil. The main topics of her research are multimedia devices applied to automation environments, evolutionary computation and technology applied to art and music. Jônatas Manzolli is composer and head of the Interdisciplinary Nucleus of Sound Communication at the State University of Campinas, São Paulo, Brazil. He teaches in the department of music, and the main topics of his research are algorithmic composition, gesture interfaces and multimedia devices for sound environments. F.J. Von Zuben is a member of the department of computer engineering and industrial automation at the State University of Campinas, São Paulo, Brazil. The main topics of his research are artificial neural networks, evolutionary computation, nonlinear control systems, nonlinear optimization and multivariate data analysis. Ricardo Gudwin is a faculty member of the electrical and computer engineering department at the State University of Campinas, São Paulo, Brazil, where he develops research into intelligence and intelligent systems, intelligent agents, semiotics and computational semiotics. His topics of interest also include fuzzy systems, neural networks, evolving systems and artificial life. 85" NICS Reports 5. Abduction and Meaning in Evolutionary Soundscapes11 Mariana Shellard Instituto de Artes (IA) – UNICAMP [email protected] Luis Felipe Oliveira Departamento de Comunicação e Artes. Univ. Federal de Mato Grosso do Sul [email protected] Jose E. Fornari Núcleo Interdisciplinar de Comunicação Sonora (NICS) – UNICAMP [email protected] Instituto de Artes (IA) - UNICAMP Jonatas Manzolli Núcleo Interdisciplinar de Comunicação Sonora (NICS) - UNICAMP [email protected] Summary. The creation of an artwork named RePartitura is discussed here under the principles of Evolutionary Computation (EC) and the triadic model of thought: Abduction, Induction and Deduction, as conceived by Charles S. Peirce. RePartitura uses a custom-designed algorithm to map image features from a collection of drawings and an Evolutionary Sound Synthesis (ESSynth) computational model that dynamically creates sound objects. The output of this process is an immersive computer generated sonic landscape, i.e. a synthesized Soundscape. The computer generative paradigm used here comes from the EC methodology where the drawings are interpreted as a population of individuals as they all have in common the characteristic of being similar but never identical. The set of specific features of each drawing is named as genotype. Interaction between different genotypes and sound features produces a population of evolving sounds. The evolutionary behavior of this sonic process entails the self-organization of a Soundscape, made of a population of complex, never-repeating sound objects, in dynamic transformation, but always maintaining an overall perceptual self-similarity in order to keep its cognitive identity that can be recognize by any listener. In this article we present this generative and evolutionary system and describe the topics that permeate from its conceptual creation to its computational implementation. We underline the concept of self-organization in the generation of soundscapes and its relationship with computer evolutionary creation, 11 Referência original deste trabalho: Shellard, M., L. Oliveira, et al. (2010). Abduction and Meaning in Evolutionary Soundscapes. Model-Based Reasoning in Science and Technology. L. Magnani, W. Carnielli and C. Pizzi, Springer Berlin / Heidelberg. 314: 407-427. 86" NICS Reports abductive reasoning and musical meaning for the computational modeling of synthesized soundscapes. 1 Introduction One of the foremost philosophical problems is to rationally explain how we interact with the external world (outside of the mind), in order to understand reality. We take the assumption that human mind understands, recognizes and rapport with reality through a constant and dynamic process of mentalmodeling. The process is here seen as divided in three states: 1)Perception, where the mind receives sensory information from outside, throughout its bodily senses. This information comes from distinct mediums, such as mechanical (e.g. hearing and touch), chemical (e.g. olfaction and taste) and electromagnetic (e.g. vision). According to evolutionary premises, these stimuli are non-linearly translated into electrochemical information to the nervous system. 2)Cognition, the state that creates, stores and compares models with the gathered information, or from previously reasoned models. This is the information processing stage. 3)Affection, where emotions are aroused, as an evolutionary strategy to motivate the individual to act, to be placed in-motion, in order to ratify, refute or redefine the cognitive modeling of a perceived phenomenon. Here we introduce RePartitura; a case study in which we correlate these three stages with a pragmatic approach that combines logic principles and synthetic simulation of creativity using computer models. RePartitura is here analyzed based on the assumption of mental model reconstruction and re-building. This cycle of model recreation has insofar proved to be an eternal process in all fields of human culture; as well as in Arts and Science. As described by G. Chaitin12, the search for a definite certainty along of the history of mathematics has always led to models that are: incomplete, uncomputable and random (Chaitin, 1990). Inspired by Umberto Eco’s book “The Search for the Perfect Language”, Chaitin describes herculean efforts of great minds of science to find completeness in mathematics, such as Georg Cantor’s unresting (and unfinished) pursuit of defining infinity, Kurt Godel’s proves that “any mathematical model is incomplete”. Following, Alan Turing’s realization of uncomputability in computational models, and lastly, Chaitin’s own Algorithmic Information Theory, that leads to randomness. In conclusion, “any formal axiomatic theory is fated to be incomplete”. In another hand, he also recognizes that, “viewed from the perspective of Middle Ages, programming languages give us the God-like power to breathe life into (some) inanimate matter”. So, computer modeling can be used to create artworks that 12 Chaitin, G. “The search for the perfect language.” http://www.cs.umaine.edu/ ~chaitin/hu.html 87" NICS Reports resembles life evolution in a never-ending march for completeness, in an unreaching process of eternal self-recreation. RePartitura is a multimodal installation that uses the ESSynth (Fornari et al., 2001) method for the creation of a synthetic soundscape13 where formant sound objects are initially built from hand-made drawings used to retrieve artistic gesture. ESSynth is a sound synthesis that uses Evolutionary Computation (EC) methodology, that was initially inspired in the Darwinian theory of evolution. ESSynth was originally constituted by a Population of digital audio segments, that were defined as the population Individuals. This population evolved in time, in generation steps, by the interaction of two processes: 1) Reproduction, that creates new individuals based on the ones from the previous generation; and 2) Selection, that eliminates poorly-fit individuals for the environmental conditions and select the best-fit individual, that creates (through the process of Reproduction) the next generation of its population (Bäck, 2000). In this way, ESSynth is an adaptive model of non-deterministic sound synthesis that present complex sonic results, at the same time that these sounds were bounded by a variant similarity, given the overall generated sound, somehow similar to the perceptual quality of a soundscape. In section two we introduce the conceptual artistic perspective of RePartitura. We describe the process of creating the drawing collection and mapping its graphic features, inserted by the hand-made gesture that created the drawings, into genotypes used by the ESSynth that creates the soundscapes. We also describe the abduction process that emerges the sonic meaning of a soundscape. In section three, we discuss the possibility of self-organization in the computer-model sonic output, which is here claimed to describe an immersive self-similar perceptual environment; a soundscape. In section four we discuss the capacity of this evolutionary artistic system in emulating a creative process of abduction by expressing an algorithmic (computational) behavior here described as artificial abduction. In section five, it is discussed the aesthetic meaning for the dynamic creation of soundscapes where this is compared with musical meaning, in terms of its cognitive process, emotional arousal (through a “prosody” of expectations). Finally, we end this article with a conclusion, reassessing the ideas and concepts from previous sections and offer further perspectives into the designing of artificial creative systems. 2 Conceptual perspective In this section we elucidate the interaction between concepts that were in the genesis of RePartitura. Firstly, we relate the concept of abduction reasoning, as 13 soundscape refers to both the natural and human acoustic environment, consisting of a complex and immersive landscapes of sounds that is self-similar but always new. 88" NICS Reports presented by Charles S. Peirce, to the computational adaptive methodologies, such as EC. Secondly, we create RePartitura in line with the concept of Generative Art and the idea that iterative processes can be related to the Peircean concept of habits. 2.1 Abduction and Computational Adaptive Methods The pragmatism of Peirce, points out to the conceptualization of three categories of logic reasoning as: 1)Deduction, 2)Induction and 3)Abduction. Abduction is the process of hypothesis building, by the generation of an initial model, as an attempt of understanding or explaining a perceived phenomenon. Induction tests this model against other factual data and makes the necessary adjustments. Deduction applies the established model of the observed phenomenon. This model will be used for deductive reasoning insofar as the advent of further information that may jeopardize its model trustworthy, or require its tackling to a reality change (which is always), where the whole process of Abduction, Induction and Deduction creates a new model of reasoning. In this article our goal is to present a computer methodology related to the Peircean pragmatic reasoning. In computational terms, it is usual to refer to an observed phenomenon as a problem. In the concept expressed by this article, we consider Peircean triadic logical process as related to the following methodological taxonomy: a) Deduction corresponds to Deterministic Methods , as they can present predictable solutions to a problem; b) Induction is related to Statistic Methods once that they present not a single but a range of possible solutions to the same problem; c) Abduction is then related to Adaptive Methods that can redefine and recreate themselves, based on the further understanding of a problem, or its dynamic change. Among computational adaptive methods, Evolutionary Computation (EC) is the one inspired into the biological strategy of adapting populations of individuals, as initially described by Charles Darwin. EC is normally used to find the best possible solution to problems when there is not enough information to solve it through formal (deterministic) methods. An EC algorithm usually seeks out for the best solution of a complex problem, into an evolving landscape of possible solutions. In our research group at NICS, we have studied adaptive methodologies in line with the creation of artworks, such as the system: 1) VoxPopuli to generate complex and harmonic profiles using genetic algorithms (Moroni et al., 2000), 2) the RoBoser system, created in collaboration with the SPECS group from UPF, Barcelona, uses the Distributed Adaptive Control (DAC) to develop a correlation between robotic adaptive behavior and algorithmic composition (Verschure & Manzolli, 2005) and 3) the Evolutionary Sound Synthesis (ESSynth) (Fornari et al., 2001) a method to generate sound 89" NICS Reports segments with spectral dynamic changes using genetic algorithms in the reproduction process and Euclidean distance between individuals as fitness function for the selection process. ESSynth showed the ability of generating a queue of waveforms that were perceptually similar but never identical, which is a fundamental condition of a soundscape. This system was later developed further to also manipulate the spacial sound location of individuals in order to create the dynamical spreading acoustic landscape, so typical of a soundscape (Fornari et al., 2008). In all of these studies, we considered that adaptive methods, such as EC, could be used in artistic endeavours. Particularly, in this paper we will describe the RePartitura research, that relates multimodal installation and the ESSynth method. Furthermore, the discussion presented here is also related to the works of (Oliveira et al., 2008) where is discussed the process of musical meaning and logical inference from the perspective of Peircean pragmatism. This idea is discussed in the section 5 “Soundscape Meaning” where we focus our discussion on how listeners deduce some general patterns of musical structures that are inductively applied to new listening situations such as computer generated soundscapes. 2.2 Habits, Drawings and Evolution The collection of drawings that proceeded RePartitura (see example in Figure 1) was based in the concept of defining a generative process as artwork. Particularly, the process analyzed here was defined as a daily habit of repetitive actions, which lasted ten months and generated almost three hundred drawings. This action was done by the artist’s right arm in repetitive movements, from down-up and semicircular. The movement pattern, along time, evolved from thick and short curves to long and narrows ones. This evolutionary characteristic of a gestural habit reflected an adaptation of the arm’s movement to the area within the paper sheet. Our first assumption here was to consider this long process of adaptation producing a visual invariance as a creation of a visual habit. Initially, different kinds of 90" NICS Reports Fig. 1. Sequence of Original Drawings that preceded RePartitura. paper sheets were tested, such as: newsprint, rice, and a type of coffee filter paper. The filter paper was better suited for the characteristics of the movement, it was resistant, absorbent and with a nice tone of slightly yellowish white. The Indian ink was appropriate to the dynamics of gesture and, as black color is neutral, it did not cause visual noise. The paper size was established when the movement was stable, after a period of training. Japanese brushes and bamboo pen were tested. The second one produced a better result, by allowing a greater number of movement repetitions without loss of sharpness. Once that was defined, the material (filter paper, black ink pen and bamboo) remained the same throughout the entire process. The standardization of the material restrained the action and helped to create the Fig. 2. Sequence of initial drawings created during the experimentation period. habit of the arm’s movement. As the gesture became a habit, the drawings stretched and the repetition was concentrated in a reduced area, showing a narrow and long curve (Figure 1), compared to initial ones (Figure 2). During the process new experiments occurred resulting in new patterns, such as pouring ink on the paper to avoid the gesture interruption due to the necessity of loading the pen with ink. But, in doing so, the paper was softened by the ink tearing easily and this new method was discharged. 91" NICS Reports The gradual and progressive adaptation of the gesture and stabilization of drawing is consider here as a way of generating a habit, which can be associated, according to Peirce, with the removal of stimuli (Peirce, 1998 pg. 261). At the same time, each drawing was influenced by the environment (physical and emotional) which led to the disruption of habit. Considering Peirce’s affirmation that the breaking up of habit and renewed fortuitous spontaneity will, according to the law of mind, be accompanied by an intensification of feeling (Peirce, 1998, pg. 262), the emotional and physical conditions involved in the moment of the action, interfered in the individual gestures and resulted in accidental variations (e.g. outflow of ink or paper ripping), causing changes and triggering new possible repetitions. The collection of drawings shown in Figure 1 was presented as an installation named Mo(vi)mento. After that, an analysis of visual features and perceived graphical invariance led us to create a reassignment of this process in the sonic domain. This was the genesis of RePartitura. The first idea was to represent similar behaviors in different mediums. After identifying invariant patterns in all drawings, they were parameterized and used in the creation of sound objects. ESSynth was chosen because of its similarity with the artistic process that created the collection of drawings, described above, which was also characterized by an evolutionary process. 2.3 Repetition, Fragments and Accumulation mapped into Sound Features We developed an analytical approach in order to identify visual invariance in the original drawings to represent them into the sound domain. Our idea was to describe the habits embedded in the drawings, in parametrical terms, to further use them to control the computer model of an evolutionary sound generation process. We found out three categories of visual similarity in each drawing of the collection. They were named as: 1) Repetitions; thin quasi-parallel lines that compose the drawing main body, 2) Fragments; spots of ink smeared outside the drawing main body, and 3) Accumulation; the largest concentration of ink at the bottom of the drawing (where the movement started).These three aspects are shown in Figure 3. The identity of each drawing was related to the characteristics of these three categories. It was developed an algorithm to automatically map these ones from the drawings digital image and attribute to them specific parametric values. These categories were related to the evolution of the gesture and the conditions of each drawing moment. Their evolution was characterized by the habit of the movement to create the drawings. The values of the parameters of the drawings created within the same day tended to be similar. However, at times when emotional inference and external intervention were higher, the drawings 92" NICS Reports underwent a break in the gesture habit, which could be detected by the changes in the parametric values of the three Fig. 3. The three categories of graphic objects found in all drawings. categories. From this visual perspective, we developed a translation into the sonic features of the next stage. Initially, we established: long-term, middle and very short duration sounds. The first ones were associated to the Accumulation parameter and were represented by low frequency noisy sounds. Repetition parameter was associated with cycles of sinusoidal waves. Fragments were related to sharp sounds varying from noisy to sinusoidal ones. This mapping is presented in Table 1. Table1. Mapping of formal aspects of the drawings into their sonic equivalents. Invariance Accumulation Drawing Aspects Concentration of ink in the lower area of the drawing, characterized by ink stains. Repetition Number of repetition curve. Fragments Drips of ink. Sonic Aspects Constant, long-term duration and low frequency noisy sounds. Cycles of sinusoidal waves with average duration. Very short sounds, varying from noisy to sinusoidal waveforms. The duration of each element of the mapping was also related to the idea that Perception, Cognition and Affection can be expressed in different time scales of the sonic ambient. In this domain, the perceptive level can be related to the sensorial activation of auditory aspects, such as intensity, frequency, and phase of sounds, which is studied by psychoacoustics. Cognition is related to the sonic characteristics that can be learned and recognized by the listener. Its time scale was initially studied by the psychologist William James, who developed 93" NICS Reports this concept (James, 1890), which refers by “specious present” the seemingly present time of awareness for a sonic or musical event. It can be argued that the ’special present’ is related to short-term memory, which can vary from individual to individual and acording to the direction of the mode or range in which the musical information is perceived as a whole, such as a language sentence, a sound signal or a musical phrase (Poidevin, 2000). Some experiments have shown that, in music, their identification is approximately the order of one to three seconds of duration (Leman, 2000). The emotional aspects are those that evoke emotion in the listener. Affective characteristics are associated with a longer period of time (up to thirty seconds) and may be processed with long-term memory, of which it is possible to recognize the genre of a music or soundscape. The recognition of the whole sonic environment and its association with listeners expectations is further explored in this article, when we discuss the research of (Huron, 2006) and (Meyer, 1956). 2.4 Drawings, Adaption and Abduction In RePartitura, the gestures that engendered drawings were mapped to sonic objects, and became individuals within an evolutionary population that compounded the soundscape. This infers an analogy with the evolution of habits of gestures throughout time. The sound objects are like a mirror for the striking differences expressed by the visual invariances of the drawing categories. The application of EC methodology can be seen as a way of representing the drawing habits in the sonic domain and the trajectories of these individuals (sound objects) are correlated to the evolution of the initial drawing gestures. The unique aspects of each drawing, influenced by several conditions, such as the artist variations of affection and mood, and by the environmental conditions, such as the external interruptions of any sort, characterizes the hidden organizing force that make possible the adapting evolution of habits in this system, which is a paramount characteristic of abduction. As postulated by Peirce: “... diversification is the vestige of chance-spontaneity; and wherever diversity is increasing, there chance must be operative. On the other hand, wherever uniformity is increasing, habit must be operative. But wherever actions take place under an established uniformity, there so much feeling as there may be takes the mode of a sense of reaction” (Hoopes, 1991). The difference between drawings gestures, that generated the seed of chance for the change of habits on the sound system, is a representation of the spontaneity embedded in the process of making each drawing unique, yet similar. In our work we are inferring a correlation of this idea to the notion of Abduction, when Peirce defines that: “method of forming a general prediction without any positive assurance that it will succeed either in the special case or usually, its justification being that it is the only possible hope of regulating our 94" NICS Reports future conduct rationally, and that Induction from past experience gives us strong encouragement to hope that it will be successful in the future”(Weiss, 1966). In another paragraph, Peirce correlates habits to the listening of a piece of music: “ . . . whole function of thought is to produce habits of action; and that whatever there is connected with a thought, but irrelevant to its purpose, is an accretion to it, but no part of it. If there be a unity among our sensations which has no reference to how we shall act on a given occasion, as when we listen to a piece of music, why we do not call that thinking. To develop its meaning, we have, therefore, simply to determine what habits it produces, for what a thing means is simply what habits it involves. Now, the identity of a habit depends on how it might lead us to act, not merely under such circumstances as are likely to arise, but under such as might possibly occur, no matter how improbable they may be. What the habit is depends on when and how it causes us to act. As for the when, every stimulus to action is derived from perception; as for the how, every purpose of action is to produce some sensible result. Thus, we come down to what is tangible and conceivably practical, as the root of every real distinction of thought, no matter how subtle it may be; and there is no distinction of meaning so fine as to consist in anything but a possible difference of practice. (CP 5.400)”. Thus, meaning is pragmatically connected to habit, and habit is a necessary condition for the occurrence of action. Meaning is at the heart of actions of inquiry and of predicting consequences of future actions. For each inquiry there is an action that occurs in a very specific way. At the core of such process, there is a very special category of reasoning (or action); the Abduction. Abductive reasoning can be considered as a valuable analytical tool for the expansion of knowledge, helping with the understanding of the logical process of formulating new hypotheses. In regular and coherent situations, the mind operates deductively and inductively upon stable habits. When an anomalous situation occurs, abduction comes into play, helping with the reconstruction of articulated models (the generation of explanatory hypotheses) so that the mind can be free of doubts. We elucidate this point of view by presenting here the artwork RePartitura, a computer model that uses a pragmatic approach paradigm to describe the creative process in sound domain. Here we used processual gestures and adaptive computation in order to digitally generate soundscapes. Our focus in this article is to examine the theoretical implications of that methodology towards a synthetic approach for the logic of creativity in the sound domain involving interactive installations. Logic of discovery is a theory that attempts to establish a logical system for the process of creativity. Peirce argued that in order to have creativity manifesting, new habits must firstly 95" NICS Reports emerge as signs in the mental domain; taking that any semiotic system is primarily a logical system. 2.5 Computer Modeling The computer design and implementation of RePartitura is further discussed in (Fornari et al., 2009a, 2009b). In the next paragraphs we present a brief overview on that. The collection of drawings were mapped by an algorithm written in Matlab, where the features, classified in three categories, where processed in different sonic time-scaling. Accumulation were mapped into long time scale, representing affective aspects. Repetitions went into middle-time scale, related to the specious present, as defined by James Williams, and thus representing the cognitive aspects of sounds. Fragments were mapped into short time scales, corresponding to the perceptual aspects. The first feature retrieved was given by a simple metric defined by the equation below: to describe the roundness of each object. For m = 1, the object is a circle. For m = 0, the object is a line. The second feature retrieved was the object Area, in pixels, where the object with the biggest value of Area is the Accumulation. The third feature was the object distance to the image origin, given by two numbers of their coordinate (x, y) into the image plan. We set apart Fragments and Repetitions using the value of m. The roundest objects (m < 0.5) were classified as Fragments. The stretched objects (m < 0.5) were classified as Repetitions. Each of these objects features were mapped into Sound Object genotype. The genotypes were transferred to an implementation of ESSynth written in PD (PureData) language. The individuals (sound objects) were also designed in PD, as PD patches (in PD jargon). Our model of individual is created by the main system, as a meta-programming strategy, where “code writes code”, at certain extent. The individuals would “born”, live within the population, as sound objects, and, once their life-time was over, they would dye, to never be repeated again. The initial individuals received their genotypes from the drawing mapping. After that, by the reproduction of individuals, new genotypes would be created and eliminated, as the individuals died. Each genotype is described by the acoustic descriptors of a sound object. In this work, the sound object features used are divided into two categories: deterministic (melodic or tonal) and stochastic (percussive or noisy). For each category, there was: intensity, frequency and distortion, which would bridge this two sonic worlds (deterministic to stochastic) as a metaphor to the reasoning processes of, respectively: deduction and induction. For that, the abduction would be represented by the evolutionary process per se; the soundscape. These ones are given by the self96" NICS Reports organization of the population of sound objects whose overall sound output is the output of the system. 3 Self-Organizing Soundscapes After presenting the conceptual framework related to the creation and analysis of RePartitura, we will now discuss the sonic aspects of this work. Our attention is focused on the idea that a computer generative process can synthesize a sonic process that resembles a soundscape. Thus, firstly we present a formal definition of soundscape and correlate that to the computer model that implements the evolutionary process used here to produce RePartitura dynamic sonification. Soundscape is a term coined by Murray Schafer that refers to the immersive sonic environment perceived by listeners that can recognize it and even be part of its composition (Schafer, 1977). Thus, a soundscape is initially a fruit of the listener’s acoustic perception. As such, a soundscape can be recognized by its cognitive aspects, such as foreground, background, contour, rhythm, space, density, volume and silence. According to Schafer, soundscapes can be formed by five distinct categories of analytical sonic concepts, derived from their cognitive units (or aspects). They are: Keynotes, Signals, Soundmark, Sound Objects, and Sound Symbols. Keynote is formed by the resilient, omnipresent sounds, usually in the background of listeners’ perception. It corresponds to the musical concept of tonality or key. Signals are the foreground sounds that grasp listener’s conscious attention as they may convey important information. Soundmarks are the unique sounds only found in a specific soundscape. Sound Objects are the atomic components of a soundscape. As defined by Pierre Schaeffer, who coined the term, a Sound Object is formed by sounds that deliver a particular and unique sonic perception to the listener. Sound symbols are the sounds which evoke cognitive and affective responses based on the listener’s individual and sociocultural context. The taxonomy used by Schafer to categorize soundscapes based on its cognitive units, serves us well to describe them from the perspective of its macro-structure, as it is easily noticed by the listener. These cognitive units are actually emergent features self-organized by the complex sonic system that forms a soundscape. As such, these units can be retrieved and analyzed by acoustic descriptors, but they are not enough to define a process of truly generating soundscapes. In order to do that, it is necessary to define not merely the acoustic representation of sound objects but their intrinsic features that can be used as a recipe to synthesize a set of similar-bound but always original sound objects. In terms of its generation, as part of an environmental behavior, soundscapes can be seen as self-organized complex open systems, formed by sound objects acting as dynamic agents. Together, they orchestrate a sonic environment that 97" NICS Reports is always acoustically original but, perceptually speaking, this one withholds enough self-similarity to enable any listener to easily recognize (cognitive similarity) and discriminate it. This variant similarity or invariance is a trace found in any soundscape. As such, in order to synthesize a soundscape using a computer model it is necessary to have an algorithm able to generate sound objects with perceptual sound invariance. Our investigation is to associate this perceptual need to a class of computer methods that are related to adaptive systems. Among them, we studied the EC methodology. Next section, we are going to correlate EC systems with the concept of Artificial Abduction. With the next considerations, we aim to link the computer generative process and the conceptual perspective presented in Section 2. 4 Artificial Abduction Abduction is initially described as an essentially human mental reasoning process. However, its concept has a strong relation with Darwinian natural selection, as both may be seen as “blind” methods of guessing the right solution for not-well defined problems. In such, EC methodology, that is inspired in the Darwinian theory, may be able to emulate, to some extent, abductive reasoning. This is what is named here as Artificial Abduction, and is explained below. Most of the ideas in this section were discussed in (Moroni, 2005). Here, we point out the main topics that are linked to RePartitura creative process. 4.1 Abduction and Evolution As already mentioned, abduction is related to the production of more convincing hypotheses to explain a given phenomenon through relative evaluation of several candidate hypotheses, as also discussed in (Chibeni, 1996). In short, the general scheme of Abductive arguments consists in the proposition of alternative hypothesis to explain specific evidence (a fact or set of facts), and the availability of an appreciation (or recognition) mechanism, capable of attributing a relative value to each explanation. The best one is probably true if, besides comparatively superior to the others, it is good in some absolute sense. In opposition to the deductive arguments, the conclusion in abductive inference does not follow logically from the premises, and does not depend on their contents. In opposition to the inductive arguments, the conclusion not necessarily consists of the uniform extension of the evidence. Our main concern here is simply the existence and specificity of abductive inference, and its spread application to perform customary reasoning. As mentioned above, this article examines the theoretical implications of a model for the logic of creativity in the sound domain. Our aim is to relate the construction of an alternative hypothesis in the search for the best explanation for a phenomenon, with the possibility of simulating an artificial evolution using evolutionary algorithms. EC simulates an artificial evolution categorized by 98" NICS Reports hierarchical levels: the gene, the chromosome, the individual, the specie, the ecosystem. The result of such modeling is a series of optimization algorithms that result from very simple operations and procedures (crossover, mutation, evaluation, selection, reproduction) applied to a computer represented genetic code (genotype). These procedures are implemented in a search algorithm, in this case, a population-based search. The revolutionary idea behind evolutionary algorithms is that they work with a population of solutions subject to a cumulative process of evolutionary steps. Classic problem-solving methods usually rely on a single solution as the basis for future exploration, attempting to improve that solution. But there is an additional component that can make population-based algorithms essentially different from other problem-solving methods: the concept of competition and/or cooperation among solutions in a population (Bäck, 2000). Essentially, the degree of adaptation of each candidate solution will be determined in consonance with the effective influence of the remainder candidates. As a competitive aspect, each candidate has to fight for a place in the next generation. On the other hand, symbiotic relationships may improve the adaptation degree of the population individuals. Moreover, random variation is applied to search for new solutions in a manner similar to natural evolution (Michalewicz & Fogel, 1998). This adaptive behavior produced by EC is also related here with the notion of Abuductive reasoning. 4.2 Evolution and Musical Creativity Probably, the most famous enquiry about the music creative capacity of computers was formulated by Ada Lovelace. She realized that Charles Babbage’s “Analytical Engine” - in essence, a design for a digital computer could “compose and elaborate scientific pieces of music of any degree of complexity or extent”. But she insisted that the creativity involved in any elaborated pieces of music, emanating from the Analytical Engine, would have to be attributed not by the engine but by the engineer (Boden, 1998). She said: “The Analytical Engine has no pretensions whatsoever to originate anything. It can do [only] whatever we know how to order it to perform”. That Analytical Engine have never been built, but Babbage supposes that, in principle, his machine could be able of playing games such as checkers and chess by looking forward to possible alternative outcomes, based on current potential moves. Since that, for many years artworks have been emerged from computer models for many years. The main goal is to understand, either for theoretical or practical purposes, how representational structures can generate behavior, and how intelligent behavior can emerge out of unintelligent (machinery) behavior (Boden, 1998). The usage of EC presented here can be seen as an effective way to produce art based on an efficient manipulation of information. A proper use of computational creativity is devoted to incrementally increase the fitness 99" NICS Reports of candidate solutions without neglecting their aesthetic aspects. A new generation of computer researchers is applying EC and looking for some kind of artistic creativity simulation in computers with some surprising results. The ideas discussed here suggest an effective way of producing art, based on an dynamic manipulation of information and a proper use of a computational model resembling the Abductive processes, through EC with an interactive interface. EC seems to be a good paradigm for computational creativity, because the process of upgrading hypotheses is implemented as an interactive and iterative population-based search. 5 Soundscape Meaning The concept of musical meaning is controversial and has led to a myriad of different perspectives in the philosophy of western music, and the problems of musical meaning are conceptually even more daring when considering the pure music, without words, a.k.a. instrumental music. This very distinct essence that music has and its non-conceptual nature gives to that subject a distinct consideration in modern aesthetics. It is from the rising of Modern Age that these kind of problem emerges, when music looses its connection with the old cosmologies that assured its proper role in the human knowledge and culture. Roughly, since the music of Modern Age was understood in terms of language and rhetoric analysis; a sort of special language, or the language of the emotions, as in the philosophy of 19th century. Notwithstanding, also in the 19th century, Edward Hanslick initiated a formalist perspective of musical aesthetics that takes music as music, without any necessary connection with emotions or natural language, for its meaningfulness. Apart from the common-sense understanding of music, the formalist approach dominated musicology and related fields in 20th century. Regarding the problem of meaning, the formalist approach led to the question of how music is understood by the human mind and the result of affection reactions and emotions in the listener 14 . In the last century, music psychologists, still in a very formalist perspective, furnished some hypothesis on how the mind engages with musical form in (meaningful and affective) listening. Mainly, it is assumed that the mind operates logically in listening to music actively, and the models so far proposed in psychology are instantiations of a deductive-inductive perspective (Huron, 2006; Meyer, 1956). Those models claim that by exposition to a cultural environment the listener deduces some general patterns of music structures that are inductively applied 14 Hanslick never denied that music induce emotions in the listener but considered that a secondary effect a secondary one and claimed that the meaning of music is not by the mimesis of emotions, as usually said, but by the perception of its structures. 100" NICS Reports to new listening situations, assuming the general inductive belief that the future should conform to the past. Thus, a key concept of meaning in music is expectation; a meaningful music is the one in which the listener can engage structurally with it and predict consequent relations. Emotions arise in the struggle of the expected patterns and that actual patterns the music display; when they are similar there is a limbic reward for the efficient prediction, made when the prediction is false, there is a contrastive valence that results in the surprise effect (see Huron, 2006). The process of acquisition of knowledge, or inquiry, as Peirce usually points out, is not sufficiently accounted with a deductive-inductive model for the very reason that before any deduction could be made, a hypothesis should be presented to the mind. Abduction is the logical process by witch hypotheses are generated. This threefold logical model of inquiry offers another viewpoint to consider musical meaning and affect, not opposed to the models of music psychology but complementary to them. In fact, through the perspective of the Logic of Discovery, creativity turns out to be a logical process, instead of a mysterious and obscure one, beyond understanding. The abductive creation of hypothesis is the very basis of inquiry and, by extension, of knowledge itself. In Peirce’s philosophy, this threefold logicality is involved in any process of signification, assuming the possibility of different distributions of the three kinds of reasonings in each particular case. The maxim of pragmatism, as formulated by Peirce, claims that the whole meaning of an idea is the sum of all the practical consequences of such idea. In this sense, the concept of meaning is a matter of: habits and believes, that, consequently, govern our actions. Habbits and beliefs are firstly and priorly design by abduction. There is, thus, a connection between logic, habit and action, in the pragmatic conception of meaning. Musical (structural) listening is an action (as much as thought is an action for Peirce). As such, it is active rather than passive. This action, as any action, is guided by beliefs 15 and habits, that form a conceptual space which is the interface between the listener and his cultural ambient (Boden, 1994). It is in the coupling interaction between habits and structures that music becomes meaningful and affective. Habits are created by the logical process of Abductive reasoning. In ordinary music listening, when the audience is familiar with the stimuli, i.e., it is culturally embedded and have habits embodied that respond properly to that music genre, listening might be a more deductive-inductive logical process. The more predictable is the music, the more inductive is its thinking action. In listening situations with unfamiliar music or when a music piece presents non-culturally-standards structures, habitual action might not 15 For the relevance of belief in aesthetic appreciation see, for instance, Aiken (1949; 1951). 101" NICS Reports conform to that structures and expectations could not be derived properly. This music requires a process of habit reformulation by the active listener, i.e., Abduction. The conceptual space is altered every time a new habit is called into existence, shifting the listening experiences from that moment. That is why one could have a lifelong listening experience with one piece of music and it is absolutely not the repetition of such experience over and over again. Even if that daily appreciation is made with the same recording of the piece, the conceptual space is not the same because it is dynamically altered by abduction processes. Signification is an emergent property of such conceptual space, i.e., the dynamic coupling of a listener (with his audition history embodied as habits and beliefs) and musical works (culturally embedded). Similarly, in the case of soundscapes, the conceptual space is also created and recreated by the Abductive reasoning of listeners, when they recognize and even contribute to it, as parts of this environment (such as in a crowded audience). Soundscapes are formed anywhere as long as there is at least one listener to abduct it. As asked by the old riddle; “If a tree falls in the forest and no one is around to hear it, does it make a sound?”. If there is no listener to abduct the meaning of the sound waves generated by this natural process, there is no soundscape, as its meaning depends upon its reasoning. In the case of RePartitura, the EC computer model that synthesizes soundscapes, attempts to create a doorway to pass through the signification emerged from the habits acquired by the artist during the drawing collection production, into a population of sound objects whose genotype is given by the drawings features mappings. The conceptual space of the synthesized soundscape is dynamically recreated in a self-similar fashion, which guarantees that a listener, although not (yet) able to participate of its recreation, can easily abduct its perpetuated meaning. 6 Discussion RePartitura, is a computational model-based that attempts to create artificial abduction; thus emulating the reasoning process that an artist has when creating an artwork. The artist abducts done since the first insight, when this one has its initial idea of creating a piece of artwork, and afterwards, during the process of its confection, when habits are developed while the artwork is being shaped and reshaped according to the bounding conditions imposed by the environment, being they external (e.g. material, ambient, etc.) or internal (e.g. subjective, affective, mood, willingness, inspiration, etc.). To model that in a computational system, we used an evolutionary sound synthesis system, the 102" NICS Reports ESSynth, based on EC methodology, that was inspired on the natural evolution of species, as described by Darwin. EC is sometimes defined as a nonsupervised method of seeking solution, mostly used for problems not-well defined (non-deterministic). The idea of a non-supervised method that is able of finding complex solutions, such as the creation of living beings, without the supervenience of an even more complex and sophisticated system, such that would be an “intelligent designer”, is the core of Darwinism and is being increasingly used in a broad range of fields in order to try to explain the natural law that allow systems to be self-organized and/or becoming autopoietic. For that perspective, a complex system can emerge as habits of its compounding agents, under the influence of permeating laws that regulate their environment and their mutual interactions. Similarly, abduction can be seen as a mental process that allow us to naturally identify the self-similarity of a self-organized system. Peirce himself acknowledges that abduction must be a product of natural evolution, when he points out that: “...if the universe conforms, with any approach to accuracy, to certain highly pervasive laws, and if man’s mind has been developed under the influence of these laws, it is to be expected that he should have a natural light, or light of nature, or instinctive insight, or genius, tending to make him guess those laws aright, or nearly aright” (Peirce, 1957) (Peirce, ed. 1957). As an adaptive model that generates self-organized soundscapes, considered here as embodying aesthetic value, RePartitura seemed to fulfill the pre-requisites of being a system that presents a form of Artificial Abduction. As the sound objects population of RePartitura evolves in time, so does its soundscape. Thus, new sound events can emerge during this process. In the computational implementation presented here, we didn’t set an interaction of the system with the external world. This can be further done using common sensors such as the ones for audio (microphone) and/or image (webcam). Nevertheless, the soundscape will present ripples in its cognitive surface of selfsimilarity, which is welcome. We had RePartitura exhibit for several days in an art gallery (Sesc - São Paulo, 2009) and it was interesting to realize that, despite the long hours of exposition in this sonic ambient, it did not tire the audience as much as it should if it where given by the same acoustic information, although its overall sound was always very similar. This feature is found in natural soundscapes, such as the sonic ambient nearby waterfalls, forests, or by the sea. These seemingly constant sonic information have a soothing affective response for most of people. Maybe it is done by the fact that our abduction reasoning is always activated to keep track of the continuity of sameness. Expectations will, however, be minimal as, cognitively speaking, this information doesn’t bring novelty to uprise limbic reactions, as the ones related to: fight, flight or freeze. This prosody is smooth, as being similar, yet enticing, as it brings a constant flux of perceptual change. We might say, in poetic terms, 103" NICS Reports that the prosody of a soundscape is Epic, as it describes a thread of perceptual change; a cognitive never-ending sonic story, instead of Dramatic, as it normally doesn’t startle emotive reactions in the listeners by drastic changes in their expectations (Huron, 2006). If aesthetic appreciation were governed only by the subjective opinion, there would not be means to obtain automatic forms of artistic production, with some aesthetic value, without a total integration human(artist)-machine. On the other hand, if the rules and laws that conduct art creation did not allow the maintenance of a set of degrees of free expression, then the automation would be complete, despite the apparent complexity of the artwork. Since both extremes do not properly reflect the process of artistic production, the general conclusion is that there is room for automation either in the exploration of degrees of free expression, through a human- machine interactive search procedure, or in the application of mathematical models capable of incorporating general rules during the computer-assisted creation. In few words, the degrees of freedom can be modeled, in the form of optimization problems, and the general rules can be mathematically formalized and inserted in computational models, as restrictions or directions to be followed by the algorithm. The single trait of each creation will be understood as the result of a specific exploration of the search space, by the best blend of free attributes among all possibilities. 7 Conclusion We started this article describing that the drawings used in RePartitura explored the development of a gesture over a period of time. The drawings showed pattern changes according to the day of its execution. The pattern variation was associated with physical and physiological influences. The analysis of pattern variation led us to associate the formation gesture to acquisition of habit and it’s breaking up. The acquisition of habit was associated with gradual and progressive aspect of the drawings (elongated, narrow curve aspect). The breaking up of habit was associated with the influence of chance (resulting in drawings with overflow of ink). The first was characterized by drawings with less visual information and the second more visual information. In turn, all these ideas were associated with Peircean perspective on the formation of habits. In RePartitura we used the ESSynth for the creation of computer-generated soundscapes where the formant sound objects are generated from patterns and invariance’s of the drawings. The image invariances were identified and parameterized to create genotypes of sonic objects, which became individuals within a sonic evolutionary ambient. The sound objects orchestrate a sonic environment that is always acoustically original but, perceptually this one 104" NICS Reports withholds enough self-similarity to enable any listener to easily recognize and discriminate it. The soundscape meaning is different from the musical meaning due to its absence of a prior and paradigmatic syntax. Soundscapes have a discourse less affective, but rather more perceptual and cognitive, thus differing from the traditional aesthetic of Western music. However, some relations can be observed if one compares the components of soundscapes with traditional concepts employed in music analysis. For instance, a soundmark or a signal may have the rule a theme or motive usually has; motivic developments are made on similarities and differences in the spectro- morphology of sound objects and the relations on these sound objects are unique for each composition, as the thematic development of a symphony that has no other one similar to it. But besides this similarities, the absence of a priori syntactical rules makes the listening less directional and opened to other alternative ways of understanding it. However, the signification over this less directional listening occurs by the very same logical processes: a deductive-inductive bases updated and adapted by abductive inferences. But soundscape meaning is more abductive because it has not a priori syntactical rules of development that can be presumed by the listener and incorporated in his listening habits and aesthetical beliefs. Thus, each soundscape is an unique aesthetic experience that calls for the logic of guessing more often to be understood. We may say that evolutionary soundscapes are twice abductive, as adaptation and abduction occurs together in such sonic environment, by its algorithmic generation, as well in its listener’s meaningful and affective appreciation, as a piece of art. References 1. Bäck T, Fogel, David B, Michalewicz Z (eds)(2000) Evolutionary Computation 2: Advanced Algorithms and Operators. Institute of Physics Publishing 2. Boden M (1996) What is creativity? In: M. Boden (ed), Dimensions of creativity, pp. 75-117. MIT Press, London. 3. Boden M (1998) Creativity and Artificial Intelligence. Elsevier Science: Artificial Intelligence (1996) 103, 347 - 356. 4. Csikszentmihalyi M (1996) Creativity: Flow and the Psychology of Discovery and Invention. HarperPerennial, New York 5. Chaitin G.J. (1990) Information Randomness and Incompleteness. World Scientific, ISBN: 981-02-0154-0, Singapore 6. Chibeni, S.S. (1996) Cadernos de História e Filosofia da Ciência Series 3. Center from Epstimology and Logic, Unicamp, 6(1): 45 - 73. 105" NICS Reports 7. Fornari, J. Manzolli, J., Maia Jr., A. Damiani, F. (2001) The Evolutionary Sound Synthesis Method. In: proceedings of ACM Multimedia, Toronto 8. Fornari, J. Maia Jr, A. Manzolli, J. (2000) Soundscape Design through Evolutionary Engines. Special Issue “Music at the Leading of Computer Science”. JBCS - Journal of the Brazilian Computer Society - ISSN 0104-6500 9. Fornari J., Shellard M., Manzolli J. (2009) Creating Evolutionary Soundscapes with Gestural Data. Article and presentation. SBCM - Simpósio Brasileiro de Computação Musical 10. Fornari J., Shellard M. (2009) Breeding Patches, Evolving Soundscapes. Article presentation. 3rd PureData International Convention - PDCon09. São Paulo 11. Harman G. (1965) The inference to the best explanation. In Philosophical Review, 74(1): 88 - 95 12. Holland J H. Emergence: from chaos to order. Helix Books, Addison-Wesley 13. Huron D. (1998). Sweet anticipation: music and the psychology of expecta tion.The MIT Press, Cambridge 14. Manzolli J. (1996) “Auto-organização um Paradigma Composicional”. In Auto- organização: Estudos Interdisciplinares, Campinas, CLE/Unicamp, ed. Debrun, M.; Gonzales, M.E.Q.; Pessoa Jr., O., p.417-435. 15. Meyer L. B.. (1956) Emotion and Meaning in Music. Chicago University Press. Chicago 16. Manzolli J. Verschure P. (2005) Roboser: a real-world Composition System. In Computer Music Journal, Fall 2005, Vol. 29, No. 3, Pages 55-74 17. Moroni A., Manzolli J., Von Zuben F. Gudwin R.(2000) Vox Populi: An Interactive Evolutionary System for Algorithmic Music Composition. In Leonardo Music Journal, 10:49-54. 18. Moroni A., Manzolli J., Von Zuben F. (2005) Artificial Abduction: a cumulative evolutionary process: In Semiotica. Volume 2005, Issue 153 - 1/4, Pages 343-362, ISSN (Online) 1613-3692, ISSN (Print) 0037-1998 19. Oliveira L.F., Haselager W.F.G., Manzolli J. Gonzalez (2008) “Musical meaning and logical inference from the perspective of Peircean pragmatism”. Proceedings of the IV Conference on Interdisciplinary Musicology (CIM08), C. Tsougras, R. Parncutt (Eds.), Thessaloniki, Greece 20. Peirce C. S. (1931âĂŞ1965) The Collected Papers of Charles S. Peirce, 8 vols. Cambridge: Harvard University Press. (Reference to Peirce’s papers will be designated CP followed by volume and paragraph number.) 106" NICS Reports 21. Peirce C. S. (1957) Essays in the Philosophy of Science, Vincent Tomas (ed.) Bobbs-Merrill, New York 22. Peirce C. S., Hartshorne C., Weiss P. Collected Papers of Charles Sanders Peirce, Volumes V and VI: Pragmatism and Pragmatism and the Scientific Metaphysics (1966). ISBN: 0-67413-802-3 23. Peirce C. S., Hoopes J (1991) Peirce on Signs: Writing on Semiotic. The University of North Caroline Press, USA 24. Murray R., Schafer M. (1957) “The Soundscape”. ISBN 0-89281-455-1 25. Truax B. (1879) “Handbook for Acoustic Ecology”. ISBN 0-88985-011-9 107" 108" NICS Reports