Music Analysis and Generation in Video Games

Wednesday, March 11, 2009

The function of sound in games

The use of interactive sound in video games has largely been a work of people working independently, reinventing the wheel on a case-by-case basis. Very little thought has been put into defining a framework general enough to be used in a wide range of applications. Communities of music composers and programmers can informally address this problem and conferences are helping facilitate that by featuring audio and music tracks, but innovations will come faster if there is a way to analyze why certain techniques work where others fail.

This hasn’t happened yet for video games partly because of the nature of the industry. Working on single project after single project often leads designers to be shortsighted and work on solving the problem in front of them, drawing only slightly upon previous works and works of their peers. It’s also partly because video game music (Nobuo Uematsu fans excluded) often doesn’t get the attention it deserves amongst media and consumers. Sound is very important for drawing people into a game, doing it in such subtle ways that people don’t realize the effect is has on them. In fact, many times when you do notice the music it’s because something is off. Mahito Yokota (audio designer of Super Mario Galaxy) says that “players are able to focus better on the game play” when the game tempo and music tempos are synchronized in a meaningful way (http://us.wii.com/iwata_asks_vol3_index.jsp).

So what would such a framework look like?
Studying music, I spent most of the time determining how notes, chords and counterpoint functioned in a larger piece of music. Most time in composition class was spent justifying musical choices to my teacher. Nothing was really disapproved of as long as it had a functional reason for existing. This approach can obviously be taken too far, become overly academic and lead to terrible sounding music with highly detailed internal rules, but for it’s purpose – to teach rather than to create – it works great. It teaches the student to understand how each note functions within the context of the whole piece. Although you may not explicitly use this knowledge when composing, it can still inform your choices, give you places to start and allow you to know why something just doesn’t sound right.

And it doesn’t take a huge step of faith to see how this can be applied to music in games: all we need to do is find a way to analyze how sound functions in game space. Luckily there’s a lot of overlap in how both games and music unfold over time.

Saturday, August 23, 2008

Analyzing songs to create in-game objects

Concept: Each song becomes a unit - generating a character or an object that can be used in a game from a single song

This concept of using music in games borrows from Song Summoner and Monster Rancher. It's easy to take their idea of using a common resource (MP3s or CDs) and improving it. Basically the question is: How do we create a character that actually represents the song you're creating it from?
The type of analysis for this technique would lend itself towards global attributes of a song, such as tempo, overall melodic stability, overall energy, etc. This is because the song isn't unfolding over time, instead we are treating the song as a single object, kind of like box that we can peer into with MIR techniques.
So what can we extract that would be relevent for a user? How would the user feel a character relates to a certain song? This is a difficult question for me to answer since I have been trained to recognize certain musical attributes. For example "Harmonic stability" for me means something very real, but when I try to point out how that changes in a song to someone untrained, they don't know what I'm talking about. On the other hand, if some basic attributes are directly recognizable by the average user then we can include some more esoteric features and not lose anything by them not being recognized.

So we need features that can be arranged into a single axis (like tempo: a value from "slow" (50?) to "fast" (120?). And a character should not be penalized for being anywhere on the axis (for example if we tie tempo to character speed then a slow tempo would create a slow character, with no advantage), instead each side of the axis should be opposite, but positive characteristics. This way each character can accurately represent the song. The final thing is that it should avoid genre cliches (such as distorted guitars = AN ANGRY CHARACTER!!~!!) since for every example that the shoes fits there are 20 examples you didn't think of that don't work that way.

atures into attributes - Initial mappings

1) Average Tempo

Slow Fast
<--------------------->
Slow Speed, Fast Speed,
High Defense Low defense

Explanation: The faster song the song, the faster the character is. Since there is no advantage to being slow, another attribute should be tied in to make it advantagous to be slow. In this case a slow song creates a high defense but slow moving character

2) Overall Energy

Low energy High energy
<--------------------->
Healing Attacking

Explanation: The louder a song is (or more compressed a song is) the more the character is orientated towards attacking. So quiet songs become healing characters and loud, aggressive songs become more attack orientated characters.

3) Harmonic stability/Pitch certainty (IE how distorted/clean the song is)

Unstable Stable
<--------------------->
Chaotic Orderly (Not an attribute, but a character type)

Explanation: This is an attribute that may not be easy to explain to someone but might be felt intuitively. Basically the lower the value, the noiser a song is.

4) Number of sections in the song = number

Explanation: A standard verse/chorus/verse/chorus song is said to have two sections, although it might also have an intro and outro, making it have 4 sections. As it turns out section analysis is quite robust, so this is a good feature to use. The number of moves available to a character could be tied to the number of sections. So a Bob Dylan song might have 2 or 3 powerful moves, whereas a Mr. Bungle song would have 10 or more less powerful moves.

5) Scale Harmonic Stability

Unstable Stable
<-------------------->
Magic-using Object-using

Explanation: Atonal music tends to be linked to mystical experiences, so I chose this axis to represent music that fits easily into a tonal context to be more "real-life" characters: warriors, or merchants. As the music becomes more atonal, the character becomes more reliant on magic than on items.

Wednesday, August 13, 2008

Song Summoner: The Unsung Heroes

http://www.dsfanboy.com/2008/07/08/ipods-new-square-enix-srpg-better-than-ffta2/

Talk about 'Unsung', this game has received very little press. Probably because it is on such an underrepresented platform for gaming, no major game reviewers have done more than briefly mention it. Which is a surprise because everyone who HAS played it loves it. Square-Enix doesn't seem to have done much to promote it, probably because it is an experiment. What it shows is that there doesn't need to be an absolute connection between why you're using songs to generate troops (excepting the song related puns which are apparently the worst part of the game). Instead it uses the platforms advantages, mining into the MP3s on your iPod like a vast resource. It creates in the user a pokemon-like drive to "Collect them all" and constantly see which of your favorite songs create the best troopers.

Verdict:
Pros:
Uses musical resources on a platform with musical resources to spare

Cons:
Does not actual analysis of music, Troopers generated from songs have no connection to the actual song

Monday, August 11, 2008

MobileTaps & Ludum Dare 12

I haven't posted in a few days since I've been working heavily on MobileTaps (the first product for the startup I'm working with, Gesture Blue) and over the weekend spent roughly 42 hours (plus six hours sleeping) on the Ludum Dare 48-hour solo game development contest.

While music isnt the center of the game, it does play a role. Enemies movements are synched with the music so that it is predictable when they will move. The normal mode is basically a song, with enemies appearing at faster rates as the song gets more intense. This would be intresting to do with music anylsis : Tempo determines the speed at which the characters move and intensity determines the rate they appear at. In Time Attack mode, the music gets more intense as you are closer to losing, but as of right now the transitions aren't that smooth.

Win32 : for those with Visual Studio
http://www.andrewbeckmusic.com/LD12/TowerOfAbbaAbeckWin32.zip

Win32 : for those without visual studio:
http://www.andrewbeckmusic.com/LD12/TowerOfAbbaAbeckWin32NoVC.zip

And for OS X:
http://www.andrewbeckmusic.com/LD12/LD12-TowerOfAbbaAbeck.zip

Wednesday, August 6, 2008

Why use music to generate content: Monster Rancher

Why use music to generate game content? The simplest reason, ignoring the joy of seeing your favorite song turn into a playable game, is that there is so much out there. Let's look at Monster Rancher, Tecmos proto-pokemon title from 1997. The game is primarily a breeding game where you create monsters, breed them to create new monsters, pit them against each other and profit.

The game has nothing to do with music per se, except that new monsters can be created by analyzing a CD in the drive. Now let me be clear about what I mean by "analyzing a CD". There is NO music analysis at all, just looking at the raw data in the CD and interpreting it to fit the game. Think along the lines of, well 6 tracks means a water monster and 7 tracks means a fire monster, etc, although what they do is a little more intricate than that. The monsters then have nothing to do with the music, unless it's a CD that the designers specifically included a special monster for.

Even though there is no music analysis, this game highlights a reason to use music analysis in video games. Everyone has music to use. In 1997, when Monster Rancher first came out, it was CDs. I personally had a huge collection of CDs because I would save up my lunch money and get a new CD every week, so this game allowed me to use my CDs in a unique way. Now its MP3s. I can imagine a kid the same age I was when I played Monster Rancher feeding his favorite songs to see what would happen in a video game. Next post I'll look at a game that does this on the iPod Touch - Square Enix' Song Summoner.

Intro

Mega Man 3 was my first introduction to music. According to my family I would turn on the opening screen and just listen to the main menu music for five minutes before actually playing. In fact, when I went back to my favorite NES games, I found they all had great music. There was one level in particular in Little Nemo that I loved even though I found the rest of the game boring. Sure enough that level had the song that I can still hum from that game. In retrospect I realize that the games I remember the most had the best music.

Is this to say that I loved these video games for their music? I don't think so - good music enhances the experience and changes what would just be a good game into a fever inducing race against time. Both music and games, like many of the best experiences in life, follow a similar arc. To simplify a profound experience, things start slow and introduce you to the elements then get more and more complicated and build up until the final climatic moment. So good video game music can follow this arc and enhance the gameplay - during simple moments the music sits in the background and builds tension, during chaotic moments the music will become more intense and releases tension, in the simplest case. A great example of this is in Super Mario Galaxy, espcially during the boss battles with Bowser.
Skip to 1:30 in this video (http://www.youtube.com/watch?v=g1zHLX7q8Mg&feature=related) and listen to how the music starts and dynamically changes depending on the action. The lead-up to about 3:20 is a good example of a smooth transition between the two versions of the same song. Super Mario Galaxy has very subtle changes of music throughout the whole game that back up the mood very well. The challenges of following the action is much more difficult than in movies since the action cannot be predicted. Super Mario Galaxy solves this problem by writing multiple versions of the same music for different "modes" the player might be in.

Another great example (though watching doesn't do it justice, it must be played) is Jonathan Mak's Everyday Shooter. Instead of explosions, shooting enemies produces musical events; notes, phrases or just atmospheric sounds. Thus, naturally the music becomes more intense as the fighting becomes more intense. When each level starts there aren't many enemies on screen, thus the music is simple, by the end when the whole screen is filled with enemies the music becomes a multi-layered complex mess which works perfectly.

So those themes are one-half of what this blog is about, this is the "Music Generation in Video Games". The other half is "Music Analysis in Video Games", and this focuses on the link between the video game experience and music listening experience but goes in the opposite direction. Instead of actions in game affecting events in the music, the events in the game follow events in the music. AudioSurf is a great, though far from perfect, recent example of how this could work. It attempts to analyze tempo and other features but fails at extracting most things from the song, but excels at extracting the intensity of the song and similar moments in the song. Intense moments are shown in game as a ship going down hill, while calmer moments are shown as going uphill. This works incredibly well because the games action follows the arc of intensity that is inherent in the best songs. The song in this video shows how this works well (http://www.youtube.com/watch?v=shZpeWWGh48). The moment of coming over the hill into an intense moment of the song can be really exhilerating while playing.

About this blog

During the course of writing this blog I will be exploring different ways of extracting features from music that can be used in-game and writing a small suite of open source games that show how music and gameplay can become intertwined. I will also be discussing some open-source libraries for game developement such as Ogre, Chipmunk, OIS, ODE, etc, and libraries that can be used to simply audio feature extraction, such as Aubio, Vamp plugins, or the Echo Nest Web API.