Left to our own devices: The state of creative AI tools for artists
This article comes with a member-exclusive database of over 30 different creative AI tools for artists, ranging from off-the-shelf web applications to more customizable APIs and machine-learning models. You can access the database here.
A handful of W&M community members helped us curate tools and resources for this database; they are listed at the bottom of this article.
Technology has changed the nature of music creation in far-reaching ways that few could have imagined in the early 2000s. The proliferation of powerful personal computers — alongside better, more accessible software for making music on those devices — now enables anyone with a laptop and headphones to craft radio-ready hits, on par with recordings that previously required heaps of expensive gear. At large, commercial music creation has escaped the exclusive world of million-dollar studios for the few, into an open world of near-limitless creation for the many.
With the power of musical creation now in the hands of several million artists around the globe, we must now consider the next chapter of technological innovation — Artificial Intelligence (AI) and Machine Learning (ML) — and the ways in which it will likely transform and destabilize the art of making music as we know it.
Few terms can conjure more acute feelings of anxiety and trepidation from musicians than “AI.” For decades, commentators in mainstream media and entertainment have depicted AI and ML as the endgame for human creativity — and, in some cases, for survival itself. Many in music view a looming age of computer-led creation as anathema to the core purpose of the arts and human expression. Is this perspective justified, or a mere fever dream of ambitious science fiction?
In this two-part research series, we’ll seek to answer this question, among many others, about the fledgling fields of AI and ML and what they mean for the present and future of music creation. In part one, we’ll assess the current landscape of AI tools for musicians as it stands in mid-2022, with a concise but thorough look at how software led or assisted by artificial intelligence has already worked its way into the creative process. In part two, we’ll turn our focus to a far less certain future, exploring what music creation might look like in five to ten years’ time — alongside reasons for both optimism and concern about what such a future might bring for music’s creative class.
But first: What are AI and ML?
Before we dive in, however, let’s establish a basic understanding of the terms that are the focal point of this piece: Artificial Intelligence and Machine Learning. Although the two terms are often used interchangeably, they are not in fact one in the same:
- Artificial Intelligence is a burgeoning field of computing at large, which seeks to develop computers, devices, and software that are capable of behaving in ways that emulate or exceed human capabilities.
- Machine Learning, in contrast, is a subset of AI itself: ML is a catch-all for the processes, algorithms, and designs that allow computerized systems to experiment, learn, and derive insights automatically, without constant human supervision or intervention.
A brief look at a ubiquitous piece of software — the word processor — is instructive in understanding the difference.
- A traditional word processor — one that simply lets you create, edit, and proofread text — represents neither AI or ML. Every function within the application was explicitly programmed by an engineer, and nothing additional is inferred, learned, or derived from user data or other sources.
- Now let’s look at Google Docs, which has a feature called Smart Compose to suggest words or even entire phrases to make the act of writing easier. Smart Compose uses a compositional language model, drawing on several different ML techniques for natural language processing (NLP). The model analyzes large amounts of data to reinforce its ideal outputs, and continually refine its potential suggestions. Several other emerging features in Google Docs, such as document auto-summarization, use similar ML models.
- This system allows Google Docs to become an AI-assisted word processor as a whole. In addition to the basic functions of any similar application, the Smart Compose software will suggest words and entire phrases to make the act of writing easier. Or, put another way, Machine Learning is the specific technique used to give Google Docs AI qualities to the end user.
It is worth noting that nearly all the tools covered in this report utilize human engineers in the process of training their AI models. While their plugins or apps seek to automate the complex process of engineering, songwriting, vocal performance, or sound selection either partially or fully, real live humans remain an integral part of the training and development process.
The current state of AI tools for artists
In this piece, we’ll examine five areas of the music creation process in which AI is already being employed: Source separation, songwriting and ideation (including fully generative tools), audio and voice synthesis, sound selection and design, and mixing/mastering.
As part of this research report, the W&M team curated a list of active AI Tools that musicians and artists can use today, which can be found here.
The most popular form of automation represented in our database by far is songwriting/ideation (15 tools), followed by mixing/mastering (6 tools) and voice/speech synthesis (5 tools). In terms of where these tools live, the most popular forms in our database are web-based apps (14 tools), DAW plugins (13 tools), and APIs and code libraries (12 tools) — representing a wide range of interfaces and assumptions about users’ prior technical or musical expertise.
Sound selection and design
As sample-based genres have increasingly grown in popularity — with hip-hop and electronic music at the forefront — producers and songwriters increasingly feel the need to continually expand their personal sample collection.
Nowadays, almost every commercial music producer you know suffers from an identical problem: A deluge of samples to choose from when crafting their latest track. It’s not uncommon to find producers who have hard drives full of samples that are too much for one human to sort through over a lifetime of making music. This is precisely where this class of AI-assisted tools aims to help.
In mid-2022, any discussion of samples in music — whether individual drum one-shots or radio-ready arrangements and loops — starts with one of the foremost (and best-funded) industry leaders, Splice. The music creation company raised $55 million in February 2021 from investors including Goldman Sachs. Its Splice Sounds subscription, which gives customers access to royalty-free samples and loops starting at $9.99/month or $99.99/year, reportedly generates over 1 million daily sound downloads and has paid out over $36 million in royalties to featured producers to date.
In November 2019, Splice introduced a feature called Similar Sounds (SiSo) to their Splice Sounds toolkit, which lets artists search the platform’s extensive database for loops or sounds with a similar sonic profile to any given sample. The search capability offers producers a powerful way to dive deeper into Splice’s catalog, exploring other samples the company offers that may be complementary — or outright better for the project at hand.
Several other third-party options — Waves COSMOS, XLN XO, and Algonaut Atlas — offer plugins that can be used within a music creator’s DAW of choice to comb through their own sample library (usually by pointing the plugin directly to a master sample folder to index), with speed and breadth that no human can match manually. This kind of tool is especially valuable for producers with tens (or hundreds) of gigabytes of samples on their hard drive.
From a technical point of view, all of these plugins leverage one of the most well-researched capabilities of artificial intelligence: Object recognition. By training on enormous datasets of audio samples and reinforcing the correct identification of specific types of sounds, an AI model can reliably identify specific audio characteristics such as drum variants (snares, hi-hats, etc.), synth lines, guitar riffs, vocal samples, timbres, and key signatures.
Source/stem separation
In the past few years, we’ve also seen a growth in AI/ML-powered stem separation technology that can isolate stems — batches of related sounds, such as vocals, drums, melodies, bass lines, and more — from a fully mixed and mastered stereo file.
In theory, this technology could allow recordings to be remixed or remastered without having access to the original multitrack sessions, completely democratizing the creation of derivative works. In reality, today’s stem separation tools still produce digital artifacts, offering fidelity that is not yet equal to a multitrack recording.
There are several larger tech companies that offer this service, including French streaming platform Deezer, which released an open-source model in 2020 called Spleeter. According to Github, Spleeter has been incorporated into several different tools including those offered by iZotope, VirtualDJ, Algoriddim (creators of the djay app), SpectralLayers, and Acon Digital. Facebook — now Meta — also released an open-source stem separation model called Demucs, which is now in its third iteration.
Elsewhere, independent startup Audioshake, which raised a $2M seed round in October 2021 and won Sony’s Demixing Challenge last year, claims to have clients ranging from Mute Records and Warner/Chappell to Concord Publishing and music distributor CDBaby. Other independent platforms include Lalal.ai and Audionamix, which offers the ability to “remonetize classic content by separating music and dialogue — even when no session exists.”
There are many compelling reasons for investments and interest in this sector — from restoring audio where no multitrack exists, to quickly isolating or removing vocals for DJs and remix producers, to remixing tracks for spatial audio formats like Dolby Atmos, to name just a few. Less obvious are the potential applications for VR and AR, where legacy audio can be made “3D” in real-time, or for streaming companies to further tweak their recommendation algorithms to better “understand” music by splitting it into its subsequent parts. Both Meta and Deezer are implicated in these scenarios, so it will be fascinating to see how this tech evolves, especially once the digital artifacts are all but removed.
Songwriting/ideation
Generative songwriting and ideation software comprises not only the most common class of tools in this piece, but also perhaps the most thought-provoking one. The category encompasses a broad range of plugins and applications that aim to do everything from kickstarting an idea to delivering a fully-polished, finished musical product, with minimal human intervention. Adaptive tools, which are closely related to generative ones, additionally give creators the ability to make a composition evolve or adapt over time, often in reaction to external input sources such as MIDI or other information.
Let’s start with the elephant in the room. If you’re wondering if we’ve entered an age of fully computer-generated music, where an algorithm can generate a song without a human hitting a single note on their piano or guitar, the answer is yes.
Whether the output is “radio-ready” or musically on par with commercial hits is another debate that we discuss later in this piece. But as far as simply composing songs is concerned, there are already a wide range of options for doing so:
- Loudly offers users a configurable algorithm trained on over 8 million songs, wrapped in an Android mobile app that delivers more complex, developed musical compositions (as opposed to mere templates or idea starters) in just five seconds.
- Boomy takes a similar approach, offering AI-led music creation via a mobile or desktop app, even for creators who have no musical experience or training. The company’s website claims its creators have made over 6.5 million tracks with the tool — which, according to the company, amounts to nearly 7% of the world’s recorded music.
- Amper AI — a subsidiary of Shutterstock, a global leader in stock image and video footage — allows users to generate a full song with only two human inputs: Desired genre and song length. Their main target customers are not musicians, but rather creators of commercial audiovisual content (e.g. vloggers, brand agencies).
- AIVA offers a comparable AI-led creation tool, also targeted more to film and video production houses. In addition to a “click-and-create” offering, AIVA also enables MIDI file uploads, for composers who need to generate a soundtrack that has a similar “emotional impact” to an existing score.
- Musico puts a slightly different spin on AI-led creation, with a focus on adaptive music that can sync to real-time control signals (mobile device, controller, etc) in an effort to appeal to livestreamers, DJs, and other content creators.
As of mid-2022, there are also a number of software offerings for producers and creators who want to experiment with AI without relinquishing full control over the creative process:
- CoSo by Splice uses the company’s Similar Sounds technology (discussed earlier in this piece) to enable AI-assisted creation on a mobile device, enabling a user to stack sounds from Splice’s vast library to create an original work.
- Soundfuloffers artists a mix of human-created templates — augmented by AI generative algorithms — that aim to serve as everything from a songstarter in Ableton to fully turnkey creation. The software offers full MIDI and audio export, so a human creator can adapt the generative work to their own liking.
- BandLab SongStarter, the latest offering from the buzzy browser and mobile-based DAW, offers AI-generated song ideas that can be further developed and refined within BandLab itself.
- ORB Producer Suite offers a range of plugins for use in any major DAW, which seek to empower human creators with AI-generated rhythmic patterns, melodic sequences, and more.
Although there is indeed some overlap between the two groups we’ve established above, we feel the two product cohorts represent an important distinction in scope. While the former set of tools aim to truly make music creation as simple as clicking a button, the latter do not seek to completely outsource the creative process from the human creator, and rather position themselves as tools for starting fresh ideas, overcoming writer’s block, or just augmenting your existing studio capabilities and talents.
Finally, in addition to the above — which are all easily accessible products for just about any end user — it’s also worth noting that a number of options exist for those with higher levels of comfort with machine learning, programming, and computer science, such as Dadabots SampleRNN, Google Magenta, and OpenAI MuseNet. For artists, programmers, and the generally curious who seek to explore the most advanced cutting edge of musical AI, such libraries and Git repositories can often yield more customizable and powerful results than consumer-facing mobile tools.
Voice/speech synthesis
While it might not seem possible to find a more ambitious class of tools than those seeking to create entire musical works themselves, there’s another category worthy of careful study and consideration for its equally if not more existential implications: Vocal synthesis.
The forces of biological evolution have left us with exceptionally discerning mechanisms for identifying and understanding the human voice separately from other audio signals, presenting an extremely high bar for any AI tools looking to emulate the world’s most flexible and timbre-rich instrument.
Supertone AI represents one noteworthy approach in this field. The company’s product, which has already attracted prominent entertainment investors in Korea, claims the ability to generate entirely synthetic human vocal performances given ample training data, in a way that is indistinguishable from the “real thing.” The implications of such a tool, if fully realized, are mind-boggling. It may soon be possible to “resurrect” artists who have passed away (complementing the pre-pandemic trend of hologram tours), or for record labels to churn out copious amounts of original music for an act while they are on tour, sick, or otherwise unavailable to record.
In fact, some other companies are already working directly with artists on developing AI models of their own voice — although in most cases, the motivation is more about experimentation with modern-day creative processes and IP frameworks, rather than around picture-perfect imitation or legacy preservation per se. For instance, the startup Never Before Heard Sounds collaborated with the independent artist Holly Herndon on developing a custom voice model and website known as Holly+, through which anyone can run any audio file of their choice to get a new rendering in the “style” of Herndon’s speech. (Many of these derivative works are already being monetized as NFTs on an approved auction site.)
On the other side of the vocal spectrum lies Eclipsed Sounds’ SOLARIA, an “AI vocalist” for use within Dreamtronics’ Synthesizer V software. Rather than seeking to resurrect or model a specific vocalist based on training data, SOLARIA offers a generative AI algorithm which allows producers and artists to create harmonies, doubles, and even lead vocal lines based on their own specifications and melodic needs.
Mixing/mastering
Mixing (sculpting a rough song into a polished mix) and mastering (preparing a finished mix for streaming, radio, and optimized playback across a wide variety of listening environments) are near-universal parts of the creative process for artists, producers, and composers. The field of AI-assisted mixing and mastering is currently dominated by two heavyweights, LANDR and iZotope — each of which takes a unique approach to a highly complex problem.
As one of the commercial pioneers in AI-led mastering, LANDR represents one of the most technologically ambitious and forward-looking offerings covered in this piece. While the aforementioned sample classifiers promote an automated approach to identifying sounds, they still assume the human creator takes the lead in a still manual production process. In contrast, LANDR’s mastering tool takes a radically different approach: It asks the artist to hand over a mixed (but unmastered) record, which is then mastered start-to-finish by an algorithm. While LANDR does offer some areas for creator input, such as loudness and desired style (e.g. warm vs. airy), the mastering tool represents a fully machine-led approach to a particular aspect of the creative process.
Separately, iZotope’s primary AI plugins — Neutron and Ozone — offer artists intelligent mixing and mastering tools that can be used during the creative process inside of any major DAW. Both plugins offer similar tools (Mix Assistant and Master Assistant, respectively) that aim to simplify and automate the mixing and mastering process with only a few bits of initial input from the creator.
Challenges and misconceptions
As AI and ML technologies continue to become more capable, it’ll also likely become more disruptive in all aspects of our lives. That said, they are not without their challenges and misconceptions, especially when it comes to creativity.
If the technology is here to stay, it’s important to cut through the noise and dig into some of the key preconceived notions around the sector, and how it might be warping our judgment on the role the tech will play in our everyday creative lives. To round out our research, we spoke with several key reps across the creative AI companies we listed above, to get a sense of the myths and misconceptions that they encounter most often in their work.
Adoption requires a deeper understanding of workflow and UX
Some in the industry feel the user experience is being overlooked in favor of flashy, one-off showcases of the technology’s capabilities.
Alejandro Koretzky, Splice’s Head of AI and ML, led the team behind the development of Splice’s ML tools, including SiSo and CoSo (discussed earlier). According to Koretzky, the challenge at the heart of building AI tools for artists is putting the user first at every step of the process. “The way I see it is, people and companies have been too distracted by the shiny mirrors of modern AI,” says Koretzky. “[We’ve] seen so many standalone demos [of AI technology] that make a lot of noise and are great for PR, but never become a product because people don’t really care. It’s not enough to come up with a standalone demo showing capability, [you have to] address standalone workflow.”
“Workflow” is the magic word when it comes to creativity. Breaking down barriers and resolving friction between an idea and its execution is the golden path for many songwriters, producers, and musicians. But how does something as revolutionary as AI fit into a structured and often idiosyncratic creative workflow, without completely disrupting it?
“I don’t want to force AI into a workflow that doesn’t need AI,” says Koretzky. “I try not to sit down and say ‘Let’s build cool AI technology and then figure out how to plug it into [a] product’ — it’s the other way around. I always think in terms of workflow, and then go back to identify what type of technologies can unlock certain types of workflow. Companies have been getting more serious about what matters at the end of the day, and that’s user experience.”
Daai El All, CEO of Soundful, agrees on the importance of workflow as the central problem space to solve for. “Our focus, in the beginning, was the up-and-coming producers, artists, singers, songwriters, that don’t know how to produce, or even those that do,” he says. “How can you help them get over creative blocks, get from Point A to Point B faster?”
Machines may generate ideas, but humans still make the decisions (if they want to)
The idea of creating music in an instant with no prior background knowledge, musical ability, or technical skills could understandably have working musicians concerned about the potential result of flooding an already oversaturated music market. But much like sampling before it, and synthesizers before that, El All says the perceived threat of AI is actually an opportunity.
“When synthesizers came out, the orchestral people said, ‘We don’t want it, we don’t want to use it, it’s going to take our jobs away,’ says El All. “But it actually did the complete opposite; it created new jobs, it created a new way of being creative, and added an instrument or a tool into your [creative arsenal]”.
Koretzky agrees: “The key word here is ‘decision-making.’ These systems make decisions. They’re predictive. They get input, they classify something or they get input and they create something new, so it’s up to us — people — to say how far the machine should go in terms of making decisions. In the case of the music space, far from being a threat, AI can enhance human decision-making in ways that just weren’t possible before.”
That said, there are certain fields in music that Koretzy claims do not want to be explicitly creative in their output, and are more interested in the ends than the means. “Think about people who need a piece of music for some advertising, for example. In that case, it might be OK for the machine to spit out a piece of music,” says Koretzy. “For that industry need, or use case, those people might not want to get creative, they might want the opposite: ‘Don’t make me make decisions.’”
Amper’s former CEO Drew Silverstein has made a similar, if somewhat controversial, comparison in the past between “artistic music” and “functional music” — the latter of which is valued more for its utilitarian use case than for its inherent creativity (e.g. music for advertising), and arguably presents a lower barrier to entry for people without prior musical knowledge or skill sets.
New tech enables new forms of experimentation — even if it removes others
David S. Rosen is the CEO and co-founder of Dopr, a platform that uses neuroscience to predict listener habits. His take on the AI-taking-our-jobs rhetoric is that instead of focusing on shortcuts devaluing music’s output, we should instead look at shortcuts as inspiration.
“I think the generative component really adds a new level of access and inclusivity where you can bring a reverse philosophy to the discipline and the craft,” Rosen says. “So rather than saying ‘Hey, I’m eight years old and taking music lessons and I learned all the key signatures and notes,’ it’s inspiring creative pursuits and being artistic in some way by actually creating first, and hoping that is an easier access point to give people that feeling of what it’s like to actually create early on.”
While it may be true that AI and ML tools can get you from a blank page to an idea faster than ever, it’s often within the journey, not at the destination, where learning and discovery occurs in a creative context. Could it be true then, that while AI tools may facilitate results, they’ll also leapfrog the process of experimentation and happy accidents that give an artist their sound and style?
Jonathan Bailey is the CTO of iZotope, whose plugin suite was one of the first mainstream products to feature AI assistant mixing and mastering functionality. “When questions like these come up, I often look to other creative industries to see how it plays out there,” he says. “In the world of photography, at one point in time, you needed to understand the chemistry of your developer bath in order to literally print your photograph or develop your film. In 2021 that’s not very important anymore. So you could argue that that’s a fundamental skill that’s been lost to advances in technology. Arguably, there are happy accidents or certain process elements that get lost along with that.”
At the same time, Bailey is “not too nostalgic for those days, in feeling like there’s no great photographic art being created in the world now.” Just like in photography, the discipline of music-making has naturally evolved with technology, from mixing on a 64-track SSL console to mixing today in Pro Tools. The evolution represents “a new inflection point on [the] curve [of tools for creatives continuing to evolve and knowledge expectations adapting] and I hope it continues to change — that’s part of the advancement of the craft.”
“Studio-quality” isn’t always the end goal
While the output of today’s creative AI tools falls behind that of what we associate with “studio-quality” music, that end goal is not necessarily relevant for today’s more experimental artists.
For Berlin-based artist and musician Portrait XO, who uses AI models frequently in her work, the goal of engaging with technology is more about pushing the boundaries of creativity than about aiming for some preconceivd notion of perfection. “The biggest misconceptions around AI and music-making is that I’m not interested in [using] AI to create any ‘perfect’ version of anything for me, or generate fully composed music for me,” she says. “I’ve fallen in love with the weird glitches, and strange morphing sounds. I think AI tools for music-making can unlock the most intimate relationship we can achieve with ourselves, the way we create, and how we sound. We can hear ourselves back in unexpected fascinating ways that naturally spark new ideas as we experience new feelings about ourselves.”
This is consistent with what other pioneering artists in creative AI have expressed, in terms of intentionally keeping the output of these tools rough and low-fidelity. For instance, in a 2019 interview with Resident Advisor, Holly Herndon — who has built an ML model that can recreate any audio file in the style of her own voice, and used the model as a de facto instrument in her album PROTO — shared that she intentionally leaves the output of the model “pretty raw and noisy, to show that this is the state of the tech we have access to.”
Conclusion
AI and ML are ushering in a whole new paradigm for music-making — where fundamental compositional and engineering skills could be replaced by automated processes and tools, reframing the role of artists and producers more as high-level arbiters of taste, style, and voice than solely as creators of original sounds.
It’s difficult to foresee the impact that this paradigm shift may have. But the recent surge in private investment in AI-powered music tools — $99.8M so far in 2022 alone, according to our research — suggests that technological development in the sector, as well as surrounding debates about how it may or may not affect creativity, show no sign of slowing down. And in spite of doomsday predictions, history has shown us that when similarly disruptive technologies such as synthesizers, sampling, and autotune are introduced, it’s often the most innovative artists who define the most cutting-edge use cases and creative techniques for the rest of the industry, while retaining their distinct voice.
In part two of this series, we’ll look ahead to try and predict how AI and ML will shift and shape our music-making tools over the next decade. We’ll speak to the artists, companies and creatives experimenting with new modeling technology that could redefine how we think about recording, songwriting, and the modern DAW in 2030 and beyond.
Contributors
Project leads, writers, and interviewers: Dave Edwards, Declan McGlynn
Tool curators: Brodie Conley, CJ Carr (dadabots), Alexander Flores (W&M tech lead), Cherie Hu (W&M founder/publisher), Henry Li, Tony Rovello, Reed Tantiviramanond (Local Dialect), Thomas Vieira
Article/database editor: Cherie Hu