Audio to text conversion using AWS Transcribe and Sentiment Analysis using Comprehend API

9 min readFeb 12, 2020

It is an Automatic Speech Recognition (SAR) service by Amazon.
it is capable of recognizing speech from existing audio or video
file, or from a stream of audio or video content and also from an audio
input coming directly from your computer’s microphone.

Amazon Transcribe uses advanced machine learning technologies to recognize speech in audio files and transcribe them into the text You can use Amazon Transcribe to convert audio to text and to create applications that incorporate the content of audio files, For example, you can transcribe the audio track from a video recording to create closed captioning for the video.

Use case of AWS

Voice analytics
Media Entertainment
Advertising
Search Compliance

What type of Service it is?

It is a fully managed application service in the machine learning stack, you don’t have to provision any of the servers or manage any infrastructure, you can simply supply the source file through an S3 bucket and will get the transcribed output via the same or different bucket or could be in a bucket that is being “owned by amazon”.

1. Amazon Transcribe

14 Supported Languages for transcription

Modern Standard Arabic ( ar SA) added to supported list recently on May 28, 2019
Australian English ( en AU)
British English ( en GB)
Indian English ( en IN) added to supported list recently on May 15, 2019
US English ( en US)
French ( fr FR)
Canadian French ( fr CA)
German (de DE)
Indian Hindi (hi IN) added to supported list recently on May 15, 2019
Italian (it IT)
Korean (ko KR)
Brazilian Portuguese ( pt BR)
Spanish (es ES) added to supported list recently on April 19, 2019
US Spanish (es US)

https://docs.aws.amazon.com/transcribe/?id=docs_gateway

11 Supported Regions

It is supported in 11 regions for the ones who do not know about what an AWS region is, it is basically a Geographical boundary defined by AWS and it contains multiple Availability Zones(know as Data Centres). To give fault tolerance and load balancing capabilities to AWS services in that region or across multiple regions simultaneously. that being said not all of the Services launched by AWS made available in all of the regions.

Asia Pacific (Sydney)
Asia Pacific (Singapore)
Asia Pacific (Mumbai)
Canada (Central)
EU (Ireland)
EU (London)
EU (Paris)
US East (Northern Virginia)
US East (Ohio)
US West (Oregon)
US West ( N.California

Key Features

Recognize voices (Identifying multiple speakers in a audio clip)
Transcribe separate Audio channels (Agent on L and Customer on R)
Transcribing Streaming Audio (Real time sound to text ex: microphone)
Custom Vocabulary (Custom words like: EC2, S3, Names, Industry terms)
Support for Telephony Audio (at 8KHz with high accuracy)
Timestamp generation and Confidence score (timestamp for each word to locate it in recording along with confidence score between 0.00 to 1.0)

Technical Specification of Speech Input

Supported formats: • FLAC, MP3, MP4, or WAV
Supported duration and size:
• Less than 4 hours in length or less than 2 Gb of audio data
You must specify the language and format of the input file.
For best results:
• Use a lossless format, such as FLAC or WAV, with PCM 16 bit encoding.
• Use a sample rate of 8000 Hz for telephone audio.

You can specify that Amazon Transcribe identify between 2 to 10 speakers in the audio clip.

Technical Specification of Custom Vocabulary

A custom vocabulary is a list of specific words that you want Amazon Transcribe to recognize in your audio input. These are generally domain specifi c words and phrases, words that Amazon Transcribe isn’t recognizing, or proper nouns.
You can have up to 100 vocabularies in your account. The size limit for a custom vocabulary is 50 Kb. You can have it defined in either a list format or a table format.

Pricing:

Amazon Transcribe Pricing - Amazon Web Services (AWS)

With Amazon Transcribe, you pay-as-you-go based on the seconds of audio transcribed per month. It's easy to get started…

aws.amazon.com

Amazon Web Services Simple Monthly Calculator

The AWS Simple Monthly Calculator helps customers and prospects estimate their monthly AWS bill more efficiently. Using…

calculator.s3.amazonaws.com

The Architecture

Create an account

AWS Free Tier

Gain free, hands-on experience with the AWS platform, products, and services Explore more than 60 products and start…

aws.amazon.com

AWS Management Console

Everything you need to access and manage the AWS cloud - in one web interface. The AWS Management Console brings the…

aws.amazon.com

IAM Role Best for Programmatic Access

An IAM Role is basically a set of permissions that can be assumed by someone(or an entity) to gain access to the allowed services as per their responsibility and allowed scope, roles are a way of providing temporary credentials that aws generates to ensure maximum security for our workloads, role contains temporary access key id and secret key and one additional component which is security token, these temporary keys generated by roles are used to provide desired access to the entity who assumes a role, and these keys are generally valid for 12 hours and security token component make sure to generate new keys 5 minutes before of the expiry of the 12 hour duration so we don’t have to worry about rotating these keys by our self and it just happens automatically.

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios.html

AWS Python SDK Setup with code

Boto 3 Documentation - Boto 3 Docs 1.11.9 documentation

Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS…

boto3.amazonaws.com

https://docs.aws.amazon.com/transcribe/latest/dg/API_Operations.html

https://github.com/ranasingh-gkp/Amazon_Transcribe-/blob/master/How-to-Use-AWS-SDK-Software-Development-Kit-for-Python-Boto-and-Running-a-Transcription-Job.pdf

Import library

2. Linking the name of each audio file to the speaker

3. set key access with AWS platform

4. Set S3 credential and check bucket

5. Creating a new S3 bucket to upload the audio files

6. Uploading the files to the created bucket

7. Define the file URLs on the bucket using S3 convention for file paths

8. Create Vocabulary list for transcribing

9. Function to start Amazon Transcribe job

Amazon Transcribe Now Generally Available | Amazon Web Services

At AWS re:Invent 2017 we launched Amazon Transcribe in private preview. Today we're excited to make Amazon Transcribe…

aws.amazon.com

10. Create sagemaker role

11. Iterate over the audio files URLs on S3 and call the start_transcription function defined above.

12. Download JSON file after transcribing from the S3 bucket

13. Delete Transcribe job which is taking the name from the bucket

14. Verify Amazon Transcribe jobs that are under the status COMPLETE

Result:

The outcome is JSON file of hindi audio that comprise of hindi Transcript of audio, Diarization, timestempt of each words with confidence score.

2. Amazon Comprehend:

On the last part of our analysis we are going to use Amazon Comprehend for sentiment analysis of the speeches. As mentioned before, AWS offers a pre-trained model that you can use to return the percentage of 4 different sentiments: positive, negative, mixed or neutral.

To perform the sentiment analysis we simply need to provide the text as a string and the language. One limitation imposed by Amazon Comprehend is the size of the text.

Sentiment: Sentiment allows you to understand whether what the user is saying is positive or negative. Or even neutral, sometimes that’s important as well. You want to know if there’s not sentiment, that might be a signal.

Entities: This feature goes through the unstructured text and extracts entities and actually categorizes them for you. So things like people, or things like organizations will be given a category.

Language detection: So for a company that has a multilingual application, with a multilingual customer base. You can actually determine what language the text is in. So you know if you have to translate the text itself, or take some other kind of business action on the text.

Key phrase: think of this as noun phrases. So where entities are extracted, is maybe proper nouns. The key phrase will catch everything else from the unstructured text, so you actually can go deeper into the meaning. What were they saying about the person? What were they saying about the organization for example?

Topic modeling: Topic modeling works over a large corpus of documents. And helps you do things like organize them into the topics contained within those documents. So it’s really nice for organization and information management.

For example Social Analytics:

Amazon Comprehend - Natural Language Processing (NLP) and Machine Learning (ML)

Discover insights and relationships in text Amazon Comprehend is a natural language processing (NLP) service that uses…

aws.amazon.com

aws-samples/aws-nlp-workshop

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Detail step by step followed in sentiment analysis

Importing Libraries

2. Reading JSON file from the directory

3. set key access with AWS platform

4. Set JSON input and output directory

5. Get file path from the input directory

6. Set comprehend function for sentiment value in 5000 byte chunk. it can be analyzed to 5000 bytes (which translates as a string containing 5000 characters). As we are dealing with texts transcripts that are larger than this limit, we created the start_comprehend_job function that split the input text into smaller chunks and calls the sentiment analysis using boto3 for each independent part.

7. Set transcribe function to use amazon comprehend for sentiment value in dataframe

8. Define the main function

9. Run the main function

Result:

Sentiment analysis for each selected speech. It has four numerical outcome with sentiment lebel i.e positive, negative, neutral and mixed.

Reference:

Google images

Analyzing historical speeches using Amazon Transcribe and Comprehend

A quick and easy NLP tutorial using Python and AWS services

towardsdatascience.com

Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

Speech emotion and affect recognition are crucial aspects for a coherent human-robot interaction and have recently…

deepai.org

https://docs.aws.amazon.com/pt_br/comprehend/latest/dg/guidelines-and-limits.html

Audio to text conversion using AWS Transcribe and Sentiment Analysis using Comprehend API

Use case of AWS

What type of Service it is?

1. Amazon Transcribe

14 Supported Languages for transcription

11 Supported Regions

Key Features

Technical Specification of Speech Input

Technical Specification of Custom Vocabulary

Pricing:

Amazon Transcribe Pricing - Amazon Web Services (AWS)

With Amazon Transcribe, you pay-as-you-go based on the seconds of audio transcribed per month. It's easy to get started…

Amazon Web Services Simple Monthly Calculator

The AWS Simple Monthly Calculator helps customers and prospects estimate their monthly AWS bill more efficiently. Using…

The Architecture

Create an account

AWS Free Tier

Gain free, hands-on experience with the AWS platform, products, and services Explore more than 60 products and start…

AWS Management Console

Everything you need to access and manage the AWS cloud - in one web interface. The AWS Management Console brings the…

IAM Role Best for Programmatic Access

AWS Python SDK Setup with code

Boto 3 Documentation - Boto 3 Docs 1.11.9 documentation

Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS…

Amazon Transcribe Now Generally Available | Amazon Web Services

At AWS re:Invent 2017 we launched Amazon Transcribe in private preview. Today we're excited to make Amazon Transcribe…

2. Amazon Comprehend:

Amazon Comprehend - Natural Language Processing (NLP) and Machine Learning (ML)

Discover insights and relationships in text Amazon Comprehend is a natural language processing (NLP) service that uses…

aws-samples/aws-nlp-workshop

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

Reference:

Analyzing historical speeches using Amazon Transcribe and Comprehend

A quick and easy NLP tutorial using Python and AWS services

Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

Speech emotion and affect recognition are crucial aspects for a coherent human-robot interaction and have recently…

Written by Rana singh

Responses (1)