Back button in the shape of a bird

Mimicry

Thu, 26 Oct, 2022

A.I. & Bird mimcry

A screenshot of a youtube thumbnail for the video 'Parrots & Cockatiels Singing Hit Songs FUNNY BIRDS COMPLILATION' that shows an image of a parrot with a tick over it beside an image of Rihanna with an x over it.

My recent work has focused a lot on A.I. (Artificial intelligence), specifically focusing on generative art using deep learning algorithms. My work in this area has primarily focused on image and sound generation using natural or non-traditional, within machine learning, input samples and seeing how the algorithm interacts with and produces art based on these input samples. Eg. Artificial illumination, Artificial artefacts.

Recently I’ve been thinking a lot about the concept of mimicry, after recently watching a video of a crow communicating in a human vocal style I became interested in the parallels between the crow mimicking the human voice and an artificial intelligence mimicking sound. Both the crow and the machine have no human contextual knowledge of the sound they are producing, although the crow is more than likely producing the sound as a form of communication whereas the machine is simply mimicking the sound because it's been primed to do so.

I wanted to explore this idea further, looking at the idea of a sort of mimicry feedback loop between the bird, artificial intelligence and the human audio environment. The samples shown below show a range of these experiments starting with an A.I. mimicking a bird mimicking an iPhone notification, moving away from environmental noise I then produced a sample of an A.I. mimicking a bird mimicking Beyonce, and finally, I wanted to close this feedback look with an A.I. mimicking a wren’s birdsong. I go into more detail about each sample below as well as some of my understanding of the technical elements of how the algorithm works, like all things A.I. these algorithms are incredibly complex and take a high degree of skill to understand - something that I don’t fully have at the moment, I instead know how to take apart various open source scripts and get them to work for my experiments, so any technical explanations that I give are just my understanding of the system and how these algorithms work. For a detailed explanation of the various scripts used see the sources list at the bottom of the page.

One thing of note here although I am using the term mimicry throughout this blog post about machine-generated audio, in fact, given how the A.I. RAW audio generation process (Open AI’s Jukebox) works is much more akin to a call and response. The machine is primed on an initial sample of audio, using this primed sample it then attempts to generate a similar (but not identical) sample, drawing from the thousands of hours of audio samples the A.I. has been trained on, creating a unique RAW audio sample configured using the thousands of hours of training data but primed based on the initial bird-call. Although the machine is not directly mimicking the sound like the bird is, there is still a form of call-and-response audio communication happening between the bird, the environment, and the A.I.

To help visualise what is going on in each sample I have accompanied each audio sample with a video. In each video, we see the initial 12 seconds from the original sample I’ve primed the A.I. to generate the RAW audio response from the left side of the screen, after this, the visuals move to the right and instead display a blending blobular birdlike form. These amorphous bird visuals are also A.I. generated, trained on ~1,00 images of birds for roughly 24 hours (nowhere near long enough, hence the distinctive form of the visuals) using NVIDIA labs styleGAN3. Although the machine-made audio and visuals do not match in any way I felt it helps to explain the transition from bird-call to machine-call.

Finally, these pieces were initially presented at D.A.T.A. event 73.0 as part of their open projector. Thanks to Aisling, Paul and Tom for setting up the event and letting me present.

Algorithm mimicing bird mimicing iphone notification sound.

This first piece is primed using this sample of a cockatiel mimicking an iPhone notification sound. I find this a really interesting sample to prime the A.I. on, there is a sort of feedback loop happening here with a machine mimicking the cockatiel mimicking a machine, although not happening in a physical context there is a strange sort of machine-animal communication happening here that has quite a surreal element to it.

In terms of the actual A.I. generated audio we can see some of the fundamentals of how Open AI’s Jukebox algorithm works. Initially, the A.I. takes the primed sample, in this case, the cockatiel’s iPhone song and breaks the raw audio sample down into smaller chunks of data, compressing the audio to reduce the amount of information that needs to be processed by the algorithm. “As an example, a four-minute-long audio segment will have an input length of ∼10 million, where each position can have 16 bits of information. In comparison, a high-resolution RGB image with 1024 × 1024 pixels has an input length of ∼3 million and each position has 24 bits of information” (Dhariwal et al., 2020). These chunks of data are then processed through a series of VQ-VAE algorithms that tokenise the sample audio. I don’t fully understand how these VQ-VAE processes work but if you want to find out more and try to figure it out in detail the Open AI’s Jukebox paper can be found here.

This token of primed sample audio is then analysed by the A.I. and a new piece of audio is generated blending thousands of tokens of previously trained audio samples into a piece of RAW audio, morphing minute samples of audio tokens into a semi-coherent structure. (Side note I use the term RAW audio throughout the text, this is the audio format (WAV) that the A.I. generates its sound in, rather than simply generating MIDI tracks or sheet music, the A.I. is generating the audio itself giving the pieces a surreal and natural quality to them.) I will discuss the tokenised audio samples in the next piece specifically focusing on the new musical technique of ‘Spawning’. The audio that Open AI’s jukebox is trained on is primarily English-speaking mainstream music, so the A.I. almost always tries to move towards a musical element in its generated pieces.

This can be seen in this first piece as the A.I. generated audio initially holds a similar timbre to the primed audio of the cockatiel but quickly moves towards a musical quality with a stringed instrument taking hold of the sample, then morphing into a flute-like sound finally fading out with an electronic phase. I find the way the generated audio seamlessly morphs from one instrument to another fascinating, there are really strong links between this A.I. process and music sampling which I discuss in further detail in the next sample.

Algorithm mimicing bird mimicing Beyoncé.

For the next piece, I wanted to explore the human voice within this machine-bird mimicry loop. For this sample, I used this video of a parrot singing Beyoncé’s ‘If I were a boy’ as the priming audio sample. Open AI’s Jukebox has lyrical capabilities that allow the user to input lyrics to the generated sample, however, for this piece I decided to leave out the lyrics to focus instead on the raw audio mimicry of the A.I. and the bird. This then results in the generated aspect of the piece having an unsettling quality to the vocal track as the A.I. is not singling any discernable words but instead is just vocalising sounds.

Similar to the previous sample the A.I. generated audio starts in a similar vein to the primed audio, however, this time the A.I. picks up on some audio of Latin hip hop playing in the background of the primed audio sample and then runs with this heavy drums Latin rapping.

I feel like this sample highlights an interesting area within A.I. generated art, one that was initially highlighted to me on the amazing interdependence podcast, that is around the new concept of Spawning. The term Spawning has been coined by Holy Hendron, Mat Dryhurst and Jordan Meyer from Spawning.ai. “Spawning is a term we created to describe the act of creating entirely new art with an AI trained on older art. We felt it was needed to distinguish that this process is different from older techniques like sampling or collage. ” (Spawning.ai). As referenced in the quote spawning is a new technique similar, but still unique, to sampling. I feel that the A.I.-generated pieces in this post reflect this. The pieces themselves are generated, or spawned, from thousands of hours of human-made audio data, although the morphing and mixing of these tokens of audio may be akin to sampling, the level of data blending that the A.I. is doing is unique. On the other side of this point, the A.I. is also not producing the audio you hear in these pieces from nothing, we have the inital primed audio, but also thousands of hours of trained audio all initially created by actual artists, living and dead. It’s important to remember with A.I. that it's the data that is essential to how it works and behind each piece of data is an actual natural interaction, in this case, artists creating music.

Algorithm mimicing Wren.

Finally, for the last piece I wanted to close the mimicry loop with an A.I. mimicking bird-song, in this example the song of a wren. Interestingly the A.I. is quite good at recreating the wren’s birdsong. There are still some minor deviations made by the A.I., a slight bass note that can be heard towards the end as well as the hoot of an owl breaking its way through the wren’s song.

These experiments were an exploration of A.I.-generated RAW audio through the lens of mimicry and the various forms of audio mimicry that take place in our natural environment. At this moment I don’t plan on continuing on this conceptual idea of mimicry and A.I., instead, I want to focus on building my own A.I. audio generator, Open AI’s Jukebox has a lot of ethical issues surrounding it, mostly to do with consent and whether the artists who it used to train the A.I. gave consent for their I.P. to be used in this way (Also Open AI is funded by Elon Musk and I don’t want to work with any platform that is heavily associated with him). Inspired by Holy Hendron and spawning.ai I want to use consentual data sets in my work from here on out building my own A.I. generators using more ethical and open-source libraries.

Audio sources:

Script sources:

Image set sources: