Google Can Find A Song By Humming

Audio version of the article

 

Google recently launched Hum to Search, a new machine-learned system within Google Search that helps to find a song by humming. This approach produces an embedding of a melody directly from a song’s spectrogram without creating an intermediate representation. This allows the model to match a hummed tune to the original polyphonic recordings without a MIDI (Musical Instrument Digital Interface) version of each track or any other complex hand-engineered logic to extract the melody.

One of the significant challenges in recognizing a hummed melody is that a hummed tune often contains relatively less information; for instance, this hummed example of Bella Ciao is illustrated. The difference between the hummed version and the original version can be visualized using spectrograms, as shown below:

Visualization of a hummed clip and a matching studio recording.

https://ai.googleblog.com/2020/11/the-machine-learning-behind-hum-to.html

Using the image on the left, the model needs to locate the audio corresponding to the right-hand image. To get this, the model needs to learn to focus on the dominant section of the audio and ignore background vocals, instruments, and voice timbre, and other noises. To find the dominant melody that might be used to match these two spectrograms, one can look for similarities in the lines towards the bottom of the given images.

Machine Learning Behind the feature

The initial step in developing Hum to Search is modifying the music-recognition models used in Now Playing and Sound Search to work with hummed recordings. Thus, a neural network is trained with pairs of input (here pairs of hummed or sung audio and recorded audio). Then, it produces embeddings for each input, to be used for matching later.

Training setup for the neural network

https://ai.googleblog.com/2020/11/the-machine-learning-behind-hum-to.html

To recognize humming, the network should produce embeddings, which requires the pairs of audios containing the same melody to be close to each other, despite having different instrumental accompaniment and singing voices. The resultant model can then generate an embedding for a tune similar to the referred song.

Training of the model

  • To train the model, the first challenge was to obtain training data. For that, Google augmented the audio during training, such as by varying the pitch or tempo of the (sung) input randomly. The model worked well enough for singing purposes, but not for humming or whistling.
  • To improve the model for the required purpose, it uses SPICE, a pitch extraction model that produces a melody consisting of discrete audio tones. This generates additional training data of simulated hummed melodies from the existing audio dataset.
  • This approach later replaced the simple tone generator with a neural network that produces audio resembling an actual hummed or whistled tune. For example, this is the sung (input) clip, transformed into a humming clip or whistling clip.
  • Finally, the training data was compared by mixing and matching the audio samples. For example, if there is a similar clip from two different singers, it aligns those two clips with the preliminary models. Therefore, the model can have an additional pair of audio clips that represent the same melody.

Generating hummed audio from sung audio

https://ai.googleblog.com/2020/11/the-machine-learning-behind-hum-to.html

However, this model needed some further changes. After applying those changes, the current system gains better accuracy on a song database that contains over half a million songs that are being updated continuously.

Hum to Search in the Google App

https://ai.googleblog.com/2020/11/the-machine-learning-behind-hum-to.html

To try this feature,

  • Open the latest version of the Google app.
  • Tap the mic icon and ask, “what’s this song?” OR click the “Search a song” button
  • You can hum, sing, or whistle.
  • Hum to Search can then find and playback a song without having to type its name.

This article has been published from a wire agency feed without modifications to the text. Only the headline has been changed.

Source link

- Advertisment -

Most Popular

Introductory Guide on XCFramework and Swift Package

In WWDC 2019, Apple announced a brand new feature for Xcode 11; the capability to create a new kind of binary frameworks with a special format...

Understanding Self Service Data Management

https://dts.podtrac.com/redirect.mp3/www.dataengineeringpodcast.com/podlove/file/704/s/webplayer/c/episode/Episode-159-Isima.mp3 Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often...

Understanding Machine Learning Data Preparation Techniques

Predictive modeling machine learning projects, such as classification and regression, always involve some form of data preparation. The specific data preparation required for a dataset...

Java and Python in Top List of Self taught Languages

Here's a report for the times: Specops Software sifted data from Ahrefs.com using its Google and YouTube search analytics tool to surface a list of the programming languages people most...

Crypto bulls predict the future for Bitcoin

Bitcoin is back. The cryptocurrency last week passed the $18,000 level for the first time since its all-time peak in December 2017. As...

Tracking Machine Learning experiments with Allegro AI

https://cdn.changelog.com/uploads/practicalai/97/practical-ai-97.mp3 DevOps for deep learning is well… different. You need to track both data and code, and you need to run multiple different versions of...
- Advertisment -