Shazam – how does it work its magic?
To be honest, before the existence of Shazam, if a potential client had come up with such a project development idea, we would hardly believe it would be possible – at least as easy as it seems to be now.
We believe majority of the smartphone users is already acquainted the miraculous application called Shazam. It is a mobile application that uses the built-in microphone of your mobile phone to record a sample of any audio being played at where ever you are, say a restaurant, night club or in your car listening to the radio, sends this to its central database where data of millions of audio tracks are stored, and returns you name of the song and the artist. Needless to mention – all of these happen in a few seconds.
But how does Shazam works its magic? Without going into scientific-level details, let’s try to have a brief understanding.
The company has a library of millions songs, and it has devised a technique to break down each track into a simple numeric signature – a code that is unique to each track. “The main thing here is creating a ‘fingerprint’ of each performance,” says Andrew Fisher, Shazam’s CEO. When you hold your phone up to a song you’d like to ID, Shazam turns your clip into a signature using the same method. Then it’s just a matter of pattern-matching—Shazam searches its library for the code it created from your clip; when it finds that bit, it knows it’s found your song.
OK, but then comes the second question, how does Shazam make these fingerprints? Shazam creates a spectrogram for each song in its database—a graph that plots three dimensions of music: frequency vs. amplitude vs. time. The algorithm then picks out just those points that represent the peaks of the graph—notes that contain “higher energy content” than all the other notes around it. In practice, this seems to work out to about three data points per second per song.
One could think that ignoring nearly all of the information in a song would lead to inaccurate matches, but Shazam’s fingerprinting technique is remarkably immune to disturbances—it can match songs in noisy environments over bad cell connections. Fisher says that the company has also recently found a way to match music that has been imperceptibly sped up (as club DJs sometimes do to match a specific tempo or as radio DJs do to fit in a song before an ad break). And it can tell the difference between different versions of the same song.
Fisher declined to tell me Shazam’s overall hit-and-miss rate. All he would say is that the service is good enough to keep people coming back for more—the average user looks for songs eight times a month. The most common reason Shazam fails to identify a song is that it doesn’t have enough data. The system needs at least five seconds of music to make a match, and sometimes people turn it on just as the song is ending. There are also frequently errors when people look up live performances. Fisher says that Shazam is technically capable of working on live performances, but they’ve turned off that ability for what he terms “business reasons.” “Right now people trust the brand – trying to match live songs wouldn’t get very high accuracy,” he says.
For the ones who are interested, more technical and specific details on how Shazam works can be found in here.