Following recent security concerns about Amazon’s plans to enable Alexa to mimic voices, we look at how easy it is to do, what the benefits are, and what risks it poses.
Alexa The Mimic
Recently, Amazon announced that it was working on technology to enable its Alexa digital assistant to take on the voice of anyone, e.g. the voice of a user or any of their loved ones. It was reported that Rohit Prasad, an Amazon senior vice president, said at a Las Vegas conference that the reason was to help users to “make the memories last” following the loss of loved ones in the pandemic and that a video segment highlighted how Alexa could, in theory, read a story to a child in the voice of his/her grandmother!
Other Voice Mimicking Options
There are many different options available for creating a fake voice / digitally cloning a user’s own voice. Some examples include:
– Microsoft’s Custom Neural Voice is a text-to-speech feature that allows users to create a one-of-a-kind, customised, synthetic voice for their applications, and build a highly natural-sounding voice by providing their own audio samples as the training data. Microsoft says it can “represent brands, personify machines, and allow users to interact with applications conversationally”. It also has a use in restoring impaired people’s speech.
– Respeecher is a digital voice cloning tool which the company says is “indistinguishable from the original speaker” and has been designed for use by filmmakers, game developers, and other content creators.
– Resemble AI, which offers custom brand voices for assistants, e.g. a user’s own voice for their smart assistant, Alexa and Google Assistant and integrates with DialogFlow, IBM Watson, or any other NLU engine.
– Descript, which is a deepfake voice generator that can be used to create realistic voices based on transcripts or audio clips, and can create a text-to-speech model of your voice.
– Scotland-based ‘CereVoice Me,’ which is a voice cloning system that allows users to produce a text-to-speech (TTS) version of your own voice for Windows.
– iSpeech, a free voice cloning platform to create familiar voice interfaces for products, applications, and services.
– ReadSpeaker, is proprietary voice cloning software that produces text-to-speech (TTS) voices that are indistinguishable from the source and offers a range of TTS engines that allow a cloned voice to speak across all a user’s audio channels: smart speaker apps, interactive marketing campaigns, advertisements, and more.
What Could Possibly Go Wrong?
The recent announcement of Amazon’s plans to allow Alexa to mimic voices triggered long-held concerns that the cloned voices could be used to launch deep fake audio attacks on some voice authentication security systems.
One real-life example from 2019 is when hackers were able to use AI software to mimic an energy company CEO’s voice in order to steal £201,000 from a UK-based energy company. The CEO of the company received a phone call from someone that he believed to be the German chief executive of the parent company. The person on the end of the phone ordered the CEO of the UK-based energy company to immediately transfer €220,000 (£201,000) into the bank account of a Hungarian supplier. The voice was reported to have been so accurate in its sound, that the CEO of the energy company even recognised what he thought was the subtleties of the German accent of his boss, and even “melody” of the accent. The call was so convincing that the energy company made the transfer of funds as requested.
Other concerns about the use of voice cloning include:
– Issues of consent and disclosure, i.e. of the person whose voice is used, and informing the listener that the voice is fake. For example, Microsoft has now stipulated that its Custom Neural Voice AI model cannot be used to mimic a voice without that person’s consent, and software will have to disclose that voices are fake.
– Concerns that AI (e.g. for faking voices) is advancing too far ahead of regulation, which has led Microsoft to say that existing customers must obtain permission to continue using the Custom Neural Voice tool from June 30, 2023, and new customers will have to apply to use it, with Microsoft deciding whether the intended usage is appropriate.
– Criticism (by Rights activists) that internal company ethics committees deciding what is appropriate in the use of a voice mimicking software can’t be truly independent and their public transparency is limited by competitive pressures, and that external oversight may be necessary.
What Does This Mean For Your Business?
Although there are good arguments for value of software that can clone a voice, e.g. interfaces for products, applications, and services, and for use by filmmakers, game developers, and other content creators, there are concerns that they could also be used to make deepfakes for sinister purposes. For examples, this could be to get past voice authentication security systems, or to impersonate people to obtain money. There are also ethical concerns about how producers of these tools decide upon appropriate usage and matters of consent. Clearly, a balance needs to be struck and many people feel that more regulations and external oversight are needed to limit risk and potential harm.