Musing About Music Similarity

This post was written for my client Spectralmind and appeared initially on their blog:

When we demo Spectralmind’s SEARCH by Sound, a similarity search engine for music, we often realize how different the focus is on certain aspects of “similarity” among listeners. The similarity results calculated by the Spectralmind platform appear “similar” to one listener, but are judged as “not similar” by another or “somewhat similar” by a third.

Musical similarity is a very complex area and the reason for the deviations in judgement stems from the fact that similarity has so many dimensions. This raises the question, to which dimension do people relate when asked about the similarity of music?

Personally I observe that people try to exemplify similarity first of all from melody. The particular succession of higher and lower tones that form a melody is clearly a distinctive feature, which allows the listener to determine the degree of likeness or even closeness between two musical works.

Trombone Shorty at the Jazzfest Wien, 2011
Trombone Shorty at the Jazzfest Wien, 2011

But there are other dimensions of similarity as well:

  • Timbral similarity: timbre refers to the the tone color of a sound, which varies significantly among the characteristics of the sound-creating device, such as voice, string or wind instruments. As a listener we are able to identify the kinds of instruments playing, even in an ensemble like a band or an orchestra. The same melody played by a piano or a saxophone or a guitar makes a big difference in terms of timbral similarity.
  • Rhythmic similarity: rhythm is made up of a repeating pattern of sounds and silences. We perceive rhythm as fast or slow. Through rhythmic beats alone, we can set apart musical genres from each other, like rock from reggae. Music, dance and even spoken language rely on rhythm as a main and defining element. Different rhythms can be put underneath the same melody (which can be highly entertaining or massively disturbing). This practical example of melodic similarity combined with rhythmic dissimilarity highlights the difficulty to assess an overall measure of similarity between two pieces of music.
  • Structural similarity: this refers to the occurrence of specific sections within a piece of music. Common sections are intro, verse, chorus (also known as refrain), interlude and outro among many more. These are formal criteria, which can be applied to describe constructive or sequential similarities of e.g. pop music songs or symphonic compositions.

There are many more dimensions of similarity beyond the ones mentioned. Some of them are even inaccessible to human perception, but very perceptible to musical data-mining programs such as the Spectralmind Audio Intelligence Platform.
Similarity decisions need to be judged by the rationale of the similarity search. Sometimes, melodic resemblance is the searched-for attribute. In other cases it might be rhythmic conformity or timbral affinity. Or a mix of multiple qualities. The crucial factor is the intended use of the similar-sounding music. Having this intention in mind helps to escape a possible bias.

We are striving to improve our software in a way that makes its similarity opinion more comprehensible and transparent. Users have a desire to understand which dimensions of similarity the software uses to suggest something as similar.

The body language of online communication

This post was written for my client gnowsis and appeared initially on their blog:

When we participate in a meeting, our brain registers much more than just the spoken conversation. Next to verbal expressions, a good, often significant part of the communication happens in nonverbal manner. We have learnt to draw conclusions from the body language of our counterparts, their facial expression, posture, gestures or the signals sent through mere eye-contact. We decode a smile and flushing cheeks, the tone and volume of voice and plenty of other signals, through which we „read“ others. In a meeting, everybody communicates, even without saying a single word.

Psychologist claim, that nonverbal communication makes up about two-thirds of all communication between two people or between one speaker and a group of listeners.

The nonverbal parts of the communication are essential contributors of our understanding. At a conscious or unconscious level, they help us to make judgements on other peoples attention, involvement, interest, engagement, sympathy, tension, uncertainty, ambivalence or frustration. And sometimes even on truth or lie.

When it comes to online communication, we seem to lose most of the subtext transmitted trough this variety of nonverbal channels. Instead of our five senses, we are thrown back onto the perception of what´s visible on a screen: the text of an email, the status updates and notifications in a social network, the counts of a like button. Compared to the richness of a personal encounter, this looks like a rather poor form of communication. Nonetheless, in todays work environments, online communication represents a big portion, sometimes even the predominant mode of our interactions with others. But as human communication is infinitely multifaceted, I´m wondering about equivalents to nonverbal expression in our remote forms of communication.

What is the body language of online communication? 

Technology provided us undreamed-off possibilities of communication in terms of reach and speed. At the same time, technology has not given much to maintain richness of expression beyond the levels of handwritten letters: next to the interpretation of written information, we might be able to draw conclusions from certain, more or less implicit online behavior, like

  • choice of communications channel
  • switching between communications channels
  • preference or avoidance of „real-time“ communications channels
  • immediacy or delay of response
  • extensiveness of online expression
  • inclusion or exclusion of others to participate in the communicative exchange
  • amplification of expression through typography
  • symbolic content, like emoticons or „likes“
  • receptiveness for requests to connect or to „follow“
  • choice of online communications in avoidance of physical liaison
  • affirmation of content through forwarding, reposting or retweeting

The relative contribution of these indicators to the overall value of our online communications often remains in the dark.

Nonverbal messages interact with verbal messages in six ways: repeating, conflicting, complementing, substituting, regulating and accenting/moderating. As online communication proliferates further, more unambiguous nonverbal signals, ingrained in software functionality, might emerge as a welcome alternation or even as necessity.

Would this be a way to overcome technology-induced intermediation of human communication?
What´s your take?

Music – and How Computers Hear It

This post was written for my client Spectralmind and appeared initially on their blog:

Spectralmind works with music. But what is “music”? A look into Wikipedia  gives some helpful clues about music, and unwittingly, even about Spectralmind:

”Music is an art form whose medium is sound and silence. Its common elements are pitch (which governs melody and harmony), rhythm (and its associated concepts tempo, meter, and articulation), dynamics, and the sonic qualities of timbre and texture.”

In fact, these described elements of music are the ingredients Spectralmind uses for the creation of music tech products. Music is the base material from which we explore, analyze and extract information:

Algorithms, packaged into software, “listen” to music. What the algorithm “hears”, are music properties, including rhythm, timbre and many more.

Of course, a computer does not perceive music like humans do. Computers just calculate, they cannot take into consideration the cultural heritage, emotions and interpretations human listeners feel or are aware of.

”The border between music and noise is always culturally defined—which implies that, even within a single society, this border does not always pass through the same place; in short, there is rarely a consensus … By all accounts there is no single and intercultural universal concept defining what music might be.” (musicologist Jean-Jacques Nattiez, quoted in Wikipedia).
Applying a uniform algorithmic evaluation across a large number of music titles creates an objective mathematical description of each piece of analyzed music and, derived from here, an approach of comparability. We call it “music intelligence”. Such intelligence can be exploited in various ways like identifying music, determining similarities between music titles or organizing music. Still, there will always remain a gap between ”human understanding” and ”machine understanding” of music, as there will always be a gap in the understanding of music between human listeners.

“The creation, performance, significance, and even the definition of music vary according to culture and social context.”

Ever increasing sophistication of algorithms and availability of computational power lets us apply the music intelligence approach on large catalogs of music, thus eliminating great portions of cost and manual labor for large inventory music classification.

Sensory Search

This post was written for my client Spectralmind and appeared initially on their blog:

When we search for things, we use all of our senses. We look around for orientation, we feel for the keys in our pocket, we smell the scent of food in a restaurant, we listen for our kids playing in the garden. Our senses help us to discover what we are looking for and they provide us with rich impressions, which we can match against our preferences, desires and needs.

In comparison to such sensual searching in real life, searching for something on a computer is a poor experience. Search as we know it is limited to entering textual terms into a form, hoping for a result that is straightforward enough to get what we need. This works reasonably well with all things written. Computerized search is mainly search for written stuff.

But how do you search for something unwritten, like an image, a color, or a song? Search engines would require us to describe these things through words, using a language, forcing us into conventions of search terms and search operators. But how do you describe a piece of music within the narrow confines of a search engine’s syntax? How do you express these deeply subjective impressions a song leaves behind in your mind? What is not described in words, is hard to find. What can’t be described in words, remains hidden.

This problem is the baseline of what Spectralmind does: searching, finding and discovering music in addition to and beyond what can be expressed in words. As a result, Spectralmind brings seeing and hearing, the visual and acoustic senses, back into the digital search and discovery of music.

You’ve made it to this blog and we are happy to have you here with us. We would love to see you come back from time to time to learn more about Spectralmind, the way we approach music over and above bare tunes. Music is a carrier of rich and universal information, which we believe we can unleash through our technology, creativity and passion to give it the attention, it deserves.