MuseAmi is a company which uses machine learning and Optical character recognition (OCR) technology to translate written music to audio and audio back to written form. MuseAmi also offers applications for smart phones and tablets which place real-time recording studio effects in an accessible user-friendly format.
The company’s founder and CEO, internationally-acclaimed concert pianist Robert Taub hopes MuseAmi can become the “Holy Grail of music translation” and says the idea came from a wish to help his daughter learn the violin while he was away on tour.
“I thought, ‘Sure I can tell her exactly what note to play and she can copy and play by ear, but wouldn’t it be great if there were a daddy substitute where she could just pick up a cell phone or a dedicated device and snap a photo of the printed music that her teacher had assigned to her and listen to it, right then and there?’”
Robert asked around among his friends at Princeton University as to who might help him reach his goal, and the name that kept popping up was that of Yann LeCun whom Robert describes as “legendary” in the field of machine learning, a branch of artificial intelligence.
A brief meeting with Yann in a New Jersey coffee shop convinced him that MuseAmi was a viable project.
“Yann developed convolutional neural networks to perform OCR on cursive handwriting. He broke cursive handwriting into a series of discreet symbol sets, and taught his system to read them. He realised that music is a symbol-based language, and that we can harness the language of written music into a set of discreet symbols in much the same way as he did with cursive handwriting, and perform a similar type of OCR.”
There are other programs available which read sheet music but MuseAmi’s approach will be, as Robert stresses, “wholly different. We use machine learning. We’ve taught our architecture, our neural network, which is the recreation of the human visual cortex, to read music”.
This technology is not yet commercially available, however several patents have been registered and the end is in sight, according to Robert. “We are still working on it. Our internal milestones, of which there are many, lead us to commercialisation by the end of 2011.”
In the meantime, Muse-Ami have produced Improvox, an application for smartphones and tablets which can transform the user’s voice in myriad ways, all in real time, meaning any singer can sing in-tune without the benefit of studio wizardry such as Auto-Tune, which is frequently used to hide the vocal imperfections of well-known pop acts.
“[With Improvox] we have waveform analysis that processes for pitch detection, we quantize rhythm, which is unique, and we process for amplitude, in other words volume. We also process for the clarity of the note, whether you’re singing with a gravelly voice or with bell-like clarity.”
The fact that Robert and his team are all musicians means that they have strived to make their software flexible, as music is not always as rigid as the technology which services it.
“We developed a module we’re calling vibrato detection because we noticed that people who were playing games like Guitar Hero were punished if they were a good singer. If they’re a good singer and introduce a little vibrato, the algorithms that were in those games would give you a low score because it considered you to be singing out of tune, when in fact you were singing really well.”
All these functions are designed to return music to its participatory roots, which declined over the last century due to the advent of radio and recorded music. A piano in every home for the 21st century.
“In the twentieth century, technology gave rise to a new type of listening which was in my mind an aberration, passive listening. I would like to regard background music or passive listening as something that came about because of technology, and something that we can now transcend because of technology.”
One might think that the idea of making music creation accessible to all, even those who are not musically talented, may have drawn a negative reaction from some of Robert’s peers in the orchestral world, but the opposite appears to be the case.
“It’s been extremely positive. We took forty musicians from one of the major orchestras after a rehearsal one day, and they played around with this technology and the overwhelming response was, ‘When can I have it?’”
Nor is the former artist-in-residence at Princeton’s Institute for Advanced Study worried that technological aids such as these will discourage young musicians from the kind of practice that has seen him play with some of the world’s great orchestras.
“I wondered about that. My feeling is that if you can make some of the tasks of learning a little bit easier, if you can make those hurdles that you have to jump over just a little bit lower, and get rid of some of the barriers to entry, that’s great, that helps everybody.”
The scale of Robert Taub’s ambition is not limited to music. The technology being employed in MuseAmi can lend itself to other symbol-based languages, offering a service akin to Google Translate but without the requisite Internet connection.
“There are other languages that also use symbols, like Mandarin. There’s no reason that we can’t proceed down that line in chapter two. We feel that our whole approach with machine learning for OCR really gives us an advantage. We’ve patented certain processes. When you’re in a restaurant in the middle of Shanghai and there’s a menu on the wall, we feel that processing in the palm of your hand is more powerful than needing an Internet connection.”
For now, Robert is happy to help develop music through technology, and join some fairly illustrious company in the process.
“Beethoven was in the vanguard of pushing forward the evolving piano technology in the early nineteenth century. He was not content to allow his compositions to be confined to the piano forte instruments of the time. What we take for granted is actually technology that has evolved over the course of time.”