Latent Timbre Synthesis

Fast Deep Learning tools for experimental electronic music
by Kıvanç Tatar, Daniel Bisig, and Philippe Pasquier

Publications: 

->Tatar, K., Bisig, D., & Pasquier, P. (Upcoming).  Latent Timbre Synthesis: Audio-based Variational Auto-Encoders for Music Composition Application. Accepted to the Special Issue of Neural Computing and Applications : "Networks in Art, Sound and Design". Springer. 

->Tatar, K., Bisig, D., & Pasquier, P. (2020). Introducing Latent Timbre Synthesis. https://arxiv.org/abs/2006.00408

Source code ->︎

The model learns from a dataset of audio files and synthesizes sounds in real-time. The model let users to explore a universe of timbre that it learns from the dataset. 

The visualizations below illustrate a latent space generated by a Variational Autoencoder. On the right, the green and red dots represent latent vectors of audio frames of two audio examples from the dataset, while the green is the latent vectors of a synthesized sound generated by interpolating the latent vectors of two original audio files. 

Examples

A set of examples are available here -> ︎

The naming convention of example audio files are as follows. Original 1 and Original 2 are the excerpts of original samples. 00-original-icqt+gL_1 and 00-original-icqt+gL_2 tracks are generated using the original magnitude spectrums, and phase is added after using a reconstruction technique. Likewise, our Deep Learning model generates only the magnitude spectrum, and phase is added later using a reconstruction technique. Hence, the original-icqt+gL_1 and original-icqt+gL_2 are the ideal reconstructions that the Deep Learning model aims to achieve during the training.

Reconstructions-> 00-x_interpolations 0.0 and 00-x_interpolations 1.0 are reconstructions of the original audio files using the Deep Learning model, original 1 and 2 respectively. Ideally, these reconstructions should be as close as possible to the original magnitude responses combined with phase estimations; which is the  original-icqt+gL_1 and original-icqt+gL_2 files, respectively. 

Timbre Interpolations-> 00-x_interpolations 0.1 means that this sample is generated using 90% of the timbre of original_1 and 10% of the timbre of original_2. Try to think 0.1 almost like a slider value from audio_1 to audio_2.

Timbre Extrapolations-> x_interpolations-1.1 means that we are drawing an abstract line between timbre example 1 and 2, and then following the direction of that line, we are moving further away from the timbre 2 by 10%. X Extrapolations -0.1 means we are drawing a line between timbre 2 and 1, and moving further away from timbre 1 in that direction by 10%. 

An example video of interpolate_two app is on the way! We are also finalizing a set of visualizations as well as a qualitative study. I will keep this page updated as we progress. 

Nine composers joined our qualitative study. We share their compositions as a compilation album ->💿️

 

Acknowledgements


This work has been supported by the Swiss National Science Foundation,  the Natural Sciences and Engineering Research Council of Canada, and Social Sciences and Humanities Research Council of Canada.

Ce travail est supporté par le Fonds national Suisse de la recherche scientifique, le Conseil national des sciences et de l’ingénieurie du Canada, et le Conseil national des sciences humaines et sociales du Canada.
Copyright
Kıvanç Tatar ©2018-2020


Music

︎ Bandcamp:->Tatar ->Çekiç
︎ Soundcloud