David Ralley
University of Illinois
Department of Computer Science
1304 Springfield Ave.
Urbana, Ill., 61801
david@cmp-rs.music.uiuc.edu
Abstract: One of the fundamental challenges of music composition is the development of existing material. Melodic development is often described as an organic process in which key characteristics of a melody are re-arranged, isolated and permuted to form new material. This paper explores the possibility of using an interactive genetic algorithm as a means for developing a user supplied melodic idea, freeing composers from finding developmental material, and allowing them more time evaluating possibilities proposed by the system. Results are mixed, showing the limitations of the human echoic memory in interactive aural evaluations.
1 Introduction
The process of musical composition has at its core two problems, one of which is common to other artistic endeavors--discovery of novel and interesting material, and one which is more unique to music--the processing of ideas through systematic permutation and recombination of themes. The fact that variations of a musical idea play such an important role in serious music, and that these variations are somehow derived through recombination and mutation of original material suggests that the process of finding developmental material from extant melodies might be a problem that could be undertaken by a genetic algorithm (GA).
This paper will describe an application designed to produce candidate melodic strings for the user to rate, and then recombine using GA techniques. There will be a brief discussion of related applications of AI techniques to algorithmic composition, a look at some other attempts at using GAs for musical purposes, followed by a discussion of the theories and implementational issues used in this application. Finally, the paper will conclude with a summary of the experimental results from two separate experiments and a brief discussion of the strengths and weakness of this system. The reader is referred to Goldberg (1989) for details of GA construction and operation.
2 Related work
The process of automating musical composition has intrigued composers as far back as 1757, when Johan Phillip Kirnberger published Ever Ready minuet and polonaise composer (Hedges, 1978). The twentieth century has seen a dramatic increase in the use of algorithms for compositional purposes. Almost every major strategy that has been used in artificial intelligence problems has been employed at one time or another. Ebcioglu (1988) used an expert system, Millen (1990) used cellular automata, constraint satisfaction has been widely used by people such as Ovans (1990), Courtot (1990) and Langston (1991); Schwanauer (1988) has used a generate-and-test algorithm with backtracking, and Todd (1991) used a neural net.
Hartmann (1990) employed GA-like techniques in a compositional program LUDWIG, which used dissonance levels within harmonies as a fitness function, but met with poor results. The first direct use of GAs on a musical problem was the work of Horner and Goldberg (1990), which attempted to find bridge material that transformed one musical theme into another. Used for finding transitional material in minimalistic pieces, the results were quite successful. McIntyre(1993) imitated Ebicioglu's CHORAL system using a GA, and Putnam (1994) has done work in evolving new compositions using GAs in a internet-accessible system but the enormous amount of noise involved in a distributed, subjective evaluation has left Putnam's results decidedly mixed.
Of these four GA based composition programs, three attempt to solve small, highly constrained subproblems in the larger compositional space. Putnam's research attempts a very general tool for musical composition, and it is seminal that his system also uses interactive evaluation, for it is apparent that if the composer wishes to retain some stylistic involvement in the system's final product this sort of interaction must take place.
Using human interaction as a means for evaluation in a GA has an important precedent in Cladwell and Johnson's (1990) criminal identification program, which required users to rate potential suspect faces based on similarity to the actual suspect. Obviously, the introduction of human evaluation means that the evaluation function will not be as error free as it could be, but being able to separate noise from the evaluation signal is another of a GAs strong suits (Goldberg, 1991). The use of even a medium sized population can become tedious for a human evaluator to deal with, which lead this application to employ cluster analysis as outlined by Sprath (1980) and Yin and Germay (1993) as a means of data reduction, allowing the user to choose how many individuals per generation will be interactively evaluated.
In this work, the measurement of similarity during clustering was done by using a simple string matching algorithm that looked for the longest match between two strings of notes. The actual similarity function is:
St=W1(Pp)+W2(Pi)+W3(Pr)+W4(Pri)+W5(Prel)
where
St is the overall similarity metric
Pp is the prime pitch similarity
Pi is the inverted pitch similarity<
Pr is the retrograde pitch similarity
Pri is the retrograde inversion similarity
Prel is the relative pitch similarity
W1..W5 are the user supplied weights for evaluation
Relative values deal with the distance between two successive elements in the underlying alphabet. Over the English alphabet, for instance, the word
L O N G F E L L O W uses alphabet positions
12 15 14 7 6 5 12 12 15 23 so the relative offsets would be
0 3 -1 -7 -1 -1 7 0 3 8
Any type of transposition encoding (where the alphabet has been rotated to start on a different letter) of a word would have a different set of absolute letter values, but would share the same relative offsets. Similar changes happen to melodic strings over an alphabet when the melody is transposed to a different key. Retrograde is the reversing of a string, and inversion, in the context of music, means that the relative offsets of a melody have all been multiplied by -1. Retrograde inversion is the combination of the last two transformations.
3 Representation
The representation of melodies includes two parts, a header, and a list of melodic events. Notes are stored as intervallic offsets from some starting pitch and starting index in an alphabet, which are specified in the header. In this way, many of the standard musical transformations, such as changing from one alphabet to another (from a major key to a minor key for instance), transposition or inversion are very simple to perform. Assuming that alphabet 0 is a major scale Mary Had a Little Lamb, shown in figure 1, would be represented by:
(71 6 0)(0 -1 -1 1 1 0 0 -1 0 0 1 2 0)
Figure 1: Sample Input
The header information shows that this melody begins on note 71 (B above middle C), which is the sixth note of scale 0.
Most mutations alter a single bit, changing one note into an adjacent note in the alphabet. If the bit happens to be in the header, can change the entire pitch content of the individual. Mutations on the body of the string are implemented so that they effect one note, not subsequent notes. Additional mutations can reverse a string, or invert it by multiplying all the pitch offsets by -1.
The initial population was seeded with melodies similar to the user's input melody. To do so, some rudimentary spectral analysis was performed, finding the distribution of pitches and intervals in the user's input. This information was then used to determine initial melodies stochastically so that the population retained a certain amount of similarity to the starting melody. The initial population might be represented by a Venn diagram like figure2.
Figure 2: Venn diagram of initial population
Each population is subdivided into some number of clusters (in this case 3), and from each cluster there is one representative centroid chosen (points A, B and C) The user's ranking of these points then determines the mix of melodies from the various subsets of the current generation that will be included in the next. It is conceivable that mutations might even push the boundaries of the initial melody space subset, introducing new material that had previously not been considered.
The size of the search space is dependent on the length of the initial melody and the size of the pitch alphabet. For this experiment the size of the alphabet was limited by the pitch and intervallic material in the initial population, and never exceeded 12. Strings of length 12 were used for most of the runs, so the search space was bounded by 1212, which is on the order of 9.1012 possibilities. The fundamental building blocks in this problem are sections melodic material, which match some section of the goal melody.
The selection scheme uses binary tournament selection with replacement as outlined in Oei et al (1991), which has been to less likely to be susceptible to sampling noise than proportional selection. Reproduction is standard one point crossover.
The user is given control over the number of elements in the population, the number of clusters evaluated after each generation, the frequency of mutation operators, as well as the mix of the similarity submetrics described above, so as to encourage cluster groupings that are meaningful to the user. For the test system, all notes were given the same duration, all similarity metric weights were set at .5, and the mutation rate was set to .001. A population size of 100 was used, as this was proportional to the string length and alphabet sized used in the sample input.
4 Experimental Results
There were no problems deriving interesting output from recombination and mutation of the user's input. The GA was able to produce a large number of interesting melodic variations of initial material. If anything, the system was too cautious, as there was a propensity toward homogeneity caused by the mediating tendencies of the population seeding mechanism, coupled with the averaging aspects of the clustering algorithm. A sample result from the 15th generation is presented for the reader's own evaluation. If the unchanged rhythmic elements are ignored can the listener perceive the origins of this melody? Is the end result, shown in figure 2 an improvement on Mary Had a Little Lamb?
The real problem with such a system is its subjectiveness. There is no way to really measure the success or failure of such a system if the solution the user desires is not predetermined, and the acceptability of the final melodic material is entirely up to the user. The question that had to be asked in order to get quantitative results is "If a composer had a particular goal in mind, could the system be directed towards that goal?"
Figure 3: Experimental Results
The answer, perhaps not surprisingly, is "No". Even with simple input and goal melodies as short as 5 notes, it is almost impossible to tell which of these candidates is closer to the goal than the others. As a result the evaluation noise is high, and the GA is not able to make headway towards the goal state.
What if the noise inherent in the human evaluation was removed from the system? To do this, the clustering mechanism was removed, and the similarity metrics that had previously been used to determine proper cluster placement were now used to evaluate the similarity, and thus fitness between candidate solutions and the goal. This method varied in success, sometimes yielding excellent results, and other times failing miserably.
It was quickly discovered that there were two flaws to this technique, one which involved the fitness function itself, and the other which involved the types of mutation operators that were in use. We begin by looking at the problems with the string matching fitness function.
When Horner and Goldberg produced their melodic GA, they used a dual fitness function for judging the similarity between a candidate solution and the goal, one component was a comparison of alphabet mix, which looked at how many of the same notes were in the candidate and the goal, regardless of position, and the second looked at absolute matches between the two based on position. The two metrics were then averaged to give the overall fitness. The current project avoided using the position by position comparison because it was not generalizable to compare strings of different lengths. There is a problem with not using any position-wise matching, however, as any positionless evaluation function must overcome successively higher order Hamming cliffs as building blocks are built in the wrong place, often thwarting their improvement because of their position. The worst case comes with order n-1:
goal: A B C D E
candidate: F A B C D
Here the GA has built a string that almost matches, but now must
change all n positions to improve the fitness of the candidate string. There is
some luck involved, for the GA may begin building correct partial strings
anywhere, and
they may happen to match the goal, but more often than not an alignment
problem is faced. To try to remedy this problem a rotation mutation,
capable of moving the last element to the first position, or the first
element to the last position was added. Even with such a mutation
however, the GA still has no guidance as to how to fill the non-matching
positions, and this must either be done through good crossover, or
mutation. Since the worst Hamming cliff is faced late in the run, it is
often the case that the population diversity no longer supports the needed
gene in a mismatched position. Mutations can be used to alter these non-
matching positions, but with just the string matching algorithm as a guide
for the fitness function, the resultant search is blind, not heuristic.
The other problem with this system is that the mutations that were seen as basic from a musical perspective, were really quite large from a string matching perspective. For instance, suppose the goal string is ABCDE, and one candidate is ABFDE, which is close to being correct. A transposition of this individual might yield BCGEF, which preserves the relative relations between elements in the string, but destroys the absolute similarity between the individual and the goal. If fitness is an average of absolute and relative similarity, then a single mutation could halve the creature's fitness, by eliminating all absolute similarity. For a musician transposition is a simple, fundamental mutation, but for a string matching algorithm, mutations that change the entire string are too great to be beneficial.
Given this knowledge about flaws in the mutation operators and the fitness function, the system was modified to add the Horner and Goldberg metrics of position matching and mix to encourage good material to be retained early, and to encourage partial strings to be built in the matching positions. In addition, all the mutation operators that performed anything other than bit mutations (changing one note into an adjacent note) were eliminated. Under these conditions the GA was much more successful at handling the Hamming cliff mentioned above, as it had a guide as to which optima in the space was the correct one.
One further problem was the loss of good potential solutions from the population due to sampling error. Since binary tournament selection samples the population randomly for new parents, it was not difficult for a single beneficial mutation to be lost during the run. To further encourage retention of the fittest creatures from one generation into the next, a simple caching mechanism was built in, and this shortened the number of generations needed to converge close to the goal state.
5 Summary & Conclusions
There were really two separate problems being addressed by this paper, one of which was the use of interactive evaluation as a means for moving towards an implicit goal state, and the other is using the system to move towards an explicit goal. The first goal of this project, to use a GA as a tool for searching melody space interactively was essentially successful, in that it produced new and novel musical material derived from the users input. Judging this success was subjective however. The user had no explicit goal state in mind, so there was no way to determine if they were able to lead the GA to a particular solution, or if the GA was simply adept at producing interesting random strings.
To use this system to move towards an explicit goal, it was quickly determined that users were unable to judge the relative similarity of two candidate solutions to the goal effectively, and this was particularly true if both candidate solutions were very dissimilar from the goal. The noise inherent in this evaluation was too great to be handled by the GA effectively. Eliminating the use of human evaluation improved the signal to noise ratio, but served to highlight differences between the kinds of changes a string matching algorithm and a musician see as being atomic. In general, composers take larger steps through the search space with mutations, and GAs prefer to take smaller steps that preserve building blocks.
There are two issues that are key to making such a system work, one has to do with the choice of similarity/fitness metrics, and the other with human aural perception and memory. Similarity metrics should be intelligent enough to recognize the similarity between two sequences that have been transposed or inverted, and be able to weigh the absolute and relative similarities in such a way that the musically "simple" mutations applied to the strings don't radically change the string's fitness. For instance, if one of two identical melodies has been transposed by a step, then the fitness operators need to recognize that the two melodies are still, perhaps, 90% alike, not that the transposition has removed 50% of the similarity.
The other issue has to do with human perception. Melodies exist in time, and must be evaluated by listening to a series of notes strung one after another. Accurately judging the similarity of two melodies to a goal based on listening is almost impossible, especially if neither of the melodies are similar to the goal. The longer the melody, the more difficult a task this becomes, which calls into question the persistence of a listener's echoic memory over time. Perhaps the reason for Cladwell and Johnson's success in criminal suspect identification can be attributed to the larger size of our iconic memory, and the speed with which that buffer can be reloaded, compared to aural information. (Baddeley, 1982)
This paper raises more questions than it answers. Some of these questions have to deal with algorithmic decisions made during the course of the experiment: would niching be a more effective means of data reduction than clustering? Could increasing tournament size during selection increase the speed and efficiency of the search? Could the distance between an incorrect element in a string and the correct one in the alphabet be used to guide the proper mixing of notes in the melodies? Would using messy GA techniques be an effective way of evaluating similarity, as well as for handling mutations that could allow the population to contain melodies of differing lengths?
This system could easily be expanded to cover any aspect of musical language that can be quantized into a dictionary. Rhythm is one such obvious extension, but it is not inconceivable that timbre might also be developed using GA techniques.
Although this paper calls into question the ability of a human evaluator to guide a musical GA through melody space to an explicit goal, it does show that a GA can be used as a tool for music composition, either though interactive evaluation and subjective success, or through algorithmic evaluation towards a single goal state. There is no lack of room for future research along these lines, as the complexity of musical composition is high, but it is hoped that this paper shows that there is promise for use of a GA as a general compositional tool.
References
Baddeley, A.D. (1982). Your memory: a user's guide. New York: Macmillan.
Cladwell, C., & Johnson, V.S. (1991). Tracking a criminal suspect through "face space" with a genetic algorithm. Genetic Algorithms and their Applications: Proceedings of the Fourth International Conference on Genetic Algorithms, 416-421.
Courtot, F. (1990). A constraint-based logic program for generating polyphonies. In S. Arnold and G. Hair. (Ed.), Proceedings (pp. 292- 294). Glasgow: ICMC Glasgow.
Ebcioglu, K. (1988). An Expert system for harmonizing four-part chorales. Computer Music Journal, 12(3) 43-51.
Goldberg, D.E. (1989). Genetic algorithms in search, optimization and machine learning. Reading, Addison-Wesley.
href="http://uxh.cso.uiuc.edu/~gedept/ge/directory/faculty/Goldberg.html ">Goldberg, D.E. (1991). Genetic algorithms, noise, and the sizing of populations. (IlliGAL Report No. 91010). Urbana, Ill.: University of Illinois at Urbana-Champaign.
Hartmann, P. (1990). Natural selection of musical identities. Proceedings of the International Conference on Computer Music, 103-106.
Hedges, S.A. (1978). Dice music in the eighteenth century. Music and Letters 59, 180-187.
Horner, A. & Goldberg D. E. (1991). Genetic algorithms and computer-assisted composition. Genetic Algorithms and Their Applications: Proceedings of the Fourth International Conference on Genetic Algorithms, 437-441.
Langston, P. (1991) IMG/1: An incidental music generator. Computer Music Journal,15(1), 29-39.
McIntyre, R.A. (1993) Bach in a box: the evolution of four part baroque harmony using the genetic algorithm. In J. Koza (Ed.) Genetic Algorithms at Stanford 1993 (pp.135-144). Stanford: Stanford University Press.
Millen, D. (1990) Cellular automata music. In S. Arnold and G. Hair. (Ed.), Proceedings (pp. 314-416). Glasgow: ICMC Glasgow.
Oei, C.K., Goldberg, D.E. & Chang, S-J. (1991). Tournament selection, niching and preservation of diversity. (IlliGAL Report No. 91011). Urbana, Ill.: University of Illinois at Urbana- Champaign. Ovans, R. (1990) Music composition as a constraint satisfaction problem. In S. Arnold andG. Hair. (Ed.), Proceedings (pp. 317-319). Glasgow: ICMC Glasgow.
Putnam, J.B. (1994) Genetic programming of music. Unpublished manuscript viewed April 20, 1995 at http://nmt.edu/~jefu/notes.html.
Schwanauer, S. (1988) Learning machines & Tonal composition. In Proceedings of the First Workshop on Artificial Intelligence and Music. (pp. 121-135). Boston: MIT Press.
Spath, H. (1980) Cluster analysis algorithms. Chichester: Ellis Horwood Limited.
Todd, P.M. (1991) A connectionist approach to algorithmic composition. In P. Todd andG. Loy (Eds.), Music and Connectionism. Cambridge, Mass: MIT Press.
Yin, X., & Germay, N. (1993). A fast genetic algorithm with sharing scheme using cluster analysis methods in multimodal function optimization. Artificial Neural Nets and Genetic Algorithms: Proceedings of the International Conference, 450-457.