OpenAI has announced that, through the recently introduced Whisper API, it is now enabling third-party developers to include ChatGPT in their products and services. Using this tool will be far less expensive than using its current language models.
The company released the open-source Whisper speech-to-text model in September 2022. The Whisper API is a hosted version of that model. According to OpenAI, Whisper is an artificial speech recognition system that costs $0.006 per minute and enables large-scale transcription in a number of languages. It accepts files in a variety of formats, including M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM.
Whisper appears to be exceptional thanks to training on 680,000 hours of multilingual and “multitask” data gathered from the web, despite competitor tech companies like Google, Amazon, and Meta having created high-quality speech recognition algorithms. Greg Brockman, the president and chairman of OpenAI, claims that this gives it improved recognition capabilities, such as the ability to recognize unusual accents, background noise, and technical jargon.
According to Brockman, the developer ecosystem did not actually build around the model we released. “The Whisper API is the same huge model that is available as open source, but we’ve optimized it to the maximum degree.” It’s incredibly convenient and much, much faster.
Read also: 12 Ways ChatGPT Can Make Your Business Operations Easier
Brockman viewpoint
Regarding Brockman’s remark, businesses have restrictions when using voice transcription technology. The main obstacles to adopting tech like tech-to-speech are accuracy, accent- or dialect-related recognition issues, and costs, according to a 2020 Statista survey.
One of Whisper’s limitations is “next-word prediction.” This is due to the system’s extensive data training. However, OpenAI warns that Whisper may contain non-said words in its transcriptions, possibly because it is simultaneously attempting to predict the next spoken word and transcribe the audio recording.
In addition, Whisper’s performance varies by language, with speakers of less-represented languages in the training set having a higher mistake rate.
According to a 2020 Stanford study, Amazon, Apple, Google, IBM, and Microsoft’s systems generated 19% fewer errors with white users than black ones.
OpenAI’s Whisper API strategy
OpenAI anticipates enhancing existing software, services, tools, and solutions with Whisper’s transcription capabilities. The Whisper API is currently used by the AI-powered language learning app Speak to offer a brand-new virtual speaking partner within the app.
In addition, OpenAI’s entry into the speech-to-text business could be quite lucrative. The potential market value is projected to reach $5.4 billion by 2026, up from $2.2 billion in 2021.
Brockman stated, “Our vision is to become an all-encompassing intellect.” “We want to be able to very flexibly accept whatever type of data you have and whatever type of work you’re attempting to complete in order to multiply your focus.”