AWS Transcribe

Overview

AWS Transcribe is a speech recognition service provided by AWS. It uses a machine learning model to take voice data as input and return text as output. AWS Transcribe can be used to add speech-to-text functionality to an application. It is also available in a real-time streaming form for some of the regions.

Introduction to AWS Transcribe

Audio transcribing is usually a time-consuming process. It either involves hiring someone manually, which consists of a lot of time or deploying some application that is difficult to maintain. AWS Transcribe eases the task by providing the conversion of live or recorded audio into text files.

The cost involved in using AWS Transcribe is comparatively low. It makes it easy for developers and customers to add speech-to-text capability to an application. Behind the scene, AWS Transcribe uses deep learning algorithms to do the conversion.

Key Features

Let's explore some of the common features of AWS Transcribe:

1. Automatic Speech Recognition

AWS Transcribe uses machine learning and deep learning algorithms to provide optimal conversions.
It also provides the feature of live voice typing.
It enables users to open a bidirectional stream over HTTP2.
Users can send an audio stream while receiving textual content simultaneously in real time.
Amazon Transcribe provider calls are restricted to a maximum of four hours.

2. Clear Formatting and Punctuation

The text output generated is grammatically correct with suitable punctuation.
This feature makes the level of AWS Transcribe higher in the list of transcribing services.
AWS Transcribe integrates multiple real-time transcription technologies to serve various use cases.

3. Variety of Languages

AWS Transcribe started with the support of just two languages, Spanish and English. But they kept on adding multiple languages.
Currently, they provide transcribing services for languages ranging from French to German. This eases the process of creating subtitles and captioning.

4. Automatic Language Detection

AWS Transcribe automatically detects the language from the input source for transcribing, making the work even more accessible.
It cannot identify multiple languages but instead recognizes the dominant language and carries out the conversion.

5. Output Customization

Using AWS Transcribe, we can customize the vocabulary, eventually customizing the output.
For a single transcription, it provides ten alternative outputs.

6. Safety and Privacy

Transcribing helps us mask or remove words that are unsuitable or sensitive to use.
Using the vocabulary filtering feature, the list of words or phrases can be mentioned, which have to be removed.
You can use AWS Key Management Service to generate keys to secure the transcripts.

7. Timestamp Generation

Amazon Transcribe generated a timestamp for each word.
This helps to instantly locate a particular word or phrase within the original recording.
This feature also comes helpful in creating subtitles by tagging each word with a time stamp.

8. Recognize Multiple Speakers

Amazon transcribe is mature enough to figure out the speaker change and accordingly change the attitude of the transcript.
It reduces the hassle involved in transcribing audio recordings with multiple speakers.

How does Speech-To-Text Work?

The diagram below explains speech-to-text conversion: How does Speech-To-Text Work

Speech-to-text conversion takes place in a series of steps:

The audio file contains sounds, which generate a series of vibrations.
These vibrations are picked up and converted into digital language using an analogue-to-digital converter.
The converter measures the sound waves in detail and filters them to distinguish relevant sounds.
The sounds are segmented into tiny segments and matched to phonemes. A phoneme is a unit of sound used to distinguish one word from another in a particular language.
These are then run through a mathematical model and compared with known letters, words, or phrases.
The match is then represented as text.

Use Cases

Following are some of the real-world use cases of AWS Transcribe.

Use Cases AWS Transcribe

1. Customer Service Improvisation

If you are not available at a specific time to talk to a customer, you can use the AWS Transcribe service to transcribe the call and look at it later when you have time.

In addition, you can also carry out data analytics for the calls, as you get the sentiment data, talking time, non-talking time, loudness, etc., in the output.

Customer Service Improvisation

2. Automated Closed Captioning and Meeting Notes

Creating subtitles and captions for your videos can be made much easier using AWS Transcribe. It is used to increase workplace productivity by creating meetings and discussion notes.

3. Analyze Media Content

Amazon Transcribe can be used to automatically make audio and video content searchable by generating transcripts. This feature of AWS Transcribe proves to be of great help to media distributors and content producers.

4. Medical Documentation

AWS Transcribe Medical can be used to transcribe medical documentation. Physicians and clinicians can use it to record clinical interviews and meetings efficiently. Medical terms can be understood well by it.

Medical Documentation AWS Transcribe

Benefits:

Reduces the cost involved in hiring a transcriber.
It prevents the hassle of maintaining speech-to-text converter applications.
It helps in noting down the summary of meetings quickly.
Provides the best service with correct grammar and punctuation at affordable rates.
It prevents the hassle of using multiple applications for transcribing different languages. AWS Transcribe supports almost every language.
It adds value to video and audio content by making them searchable.

Start a Transcription Job Using AWS CLI

The following commands are used to create a transcription job. Before using AWS CLI commands, ensure you run the aws configure command. The aws configure command connects the cli to your AWS account, enabling proper billing and authentication.

Start by typing the following command aws transcribe start-transcription-job \

After running this, a prompt asks you to enter the required parameters. You can specify all the parameters required in one go using the following command.

If you want to append all the parameters in one line, you can use:

Get Status of AWS Transcription Job

You can check for the successful creation of the transcription job by checking its status using the following command:

If you open your dashboard, you will see a transcription job listed on the transcription job page, with the status Completed.

Status of AWS Transcription Job

Deleting an AWS Transcription Job

Use the following command to **delete a transcription job **using the command line:

Pricing

AWS works on a pay-as-you-go model. For AWS Transcribe, you need to pay based on the seconds of audio transcribed every month.

Free-Tier:

You can use the free tier to start with the AWS Transcribe service.
Under the free tier, we can use the AWS Transcribe service for up to 12 months with up to 60 minutes of transcription each month.
It cannot be rolled over if you do not use the limit allowed for a month under the free tier.
The calculation is made based on usage across all AWS regions.

Standard Pricing:

The standard pricing varies from region to region. For the US East region, the following pricing structure is followed.

AWS Transcribe Standard Pricing

The pricing varies depending on custom vocabulary, vocabulary filtering, etc.
You can apply a certain amount of discount based on your region.
Usage is billed per second, with a minimum charge of 15 seconds per request.

AWS Transcribe Call Analytics

Free-Tier:

You can get started with call analytics for free using the free tier.
Under the free tier, you can analyze approximately 60 minutes of the call per month, iteratively for 12 months.

Standard Pricing:

The standard pricing is calculated per second use and varies from region to region.
In the picture below, you can see the pricing for the US East region.

AWS Transcribe Call Analytics

AWS Transcribe Medical

Free-Tier:

You can easily get started with Transcribe medical using the free tier.
Under the free tier, you can analyze approximately 60 minutes of the call per month, iteratively, for 12 months starting from the first transcription.

Standard Pricing:

Here also, the standard pricing is calculated per second use and varies from region to region.
In the picture below, you can see the pricing for the US East region.

AWS Transcribe Medical Standard Pricing

Companies using AWS Transcribe

Some of the companies using AWS Transcribe are:

Slack

Slack is a collaboration hub.
Slack uses Transcribe to generate live meeting subtitles with zero loss in details.
Users can send short audio and video messages using slack.
These audio and video messages are made searchable using transcribe.

Companies using AWS Transcribe Slack

Byjus

Byjus is an Ed-tech platform.
They use Amazon Transcribe to help developers add speech-to-text functions to their applications.
Used in meetings to transcribe subtitles and notes.
Use it to analyze customer service calls.

Companies using AWS Transcribe Byjus

Livongo

Livongo is a health-based app.
It uses transcribing to analyze the health coaches' delivery.

Intuit

Intuit is a financial management solutions provider.
It uses transcribing to analyze contact centre calls.
And also to provide an accurate transcription, safe and redacted output.

Companies using AWS Transcribe Intuit

And many more companies use AWS Transcribe to serve their needs.

Conclusion

AWS Transcribe is an AWS service that provides the facility to convert speech into text.
It takes audio files as input and generates text files as output.
It teaches features like automatic grammar and punctuation, automatic language detection, etc.
Some of the everyday use cases of AWS Transcribe include creating subtitles and captions, transcribing customer calls and meeting discussions, and medical documentation.
It provides us multiple benefits by saving our costs and time along with increased efficiency and accuracy of the transcription.
We need to pay for each second of audio transcribed using AWS Transcribe within a month.
The free tier can be used for 12 months on signup with a transcription limit of 60 minutes per month.