I have talked about Armedia’s Legal Case Management Module so many times now, and every time I finish writing a blog post, I feel like I have left something out. Going through my recent blog posts, I’ve realized that I haven’t spent much time explaining the Artificial Intelligence (AI) component in our Legal Case Management Module.
Moving away from the Holywood image for AI as rebel robots with human-like appearance, AI, in fact, is an excellent Case Management assistant to any Legal Department.
Why? Because, multimedia transcription.
Let me explain.
Imagine a legal case. A courtroom, witness interviews, cross-examinations, police testimonials. Everything gets recorded. Everything gets typed in by the stenographer. Everything except people’s emotions when they say something that can be crucial for the case.
Now, imagine the legal workers going through stenography recordings to find specific keywords. Tedious, at best.
Now imagine how much better it would be if you could just take the video recordings of all the hearings and simply, extract the text, make it searchable, and make it clickable… so you can find every single mention of a word and jump to the exact spot in the exact video file where this word is mentioned.
This is why Amazon Transcribe is so widely used. For us in the US, it’s perhaps the best place to go for multimedia transcription.
Let’s see why.
Why Amazon Transcribe?
AWS has been one of the leaders in the industry for more than 12 years. They always try to keep up the pace with the evolution of language and continually improve their Transcribe platform.
Besides the speed the platform offers, AWS delivers one of the most accurate, if not the most accurate multimedia transcripts out there.
AWS does all this because Amazon is focused to provide 3 key deliverables: Capacity, Simplicity, and Accuracy.
Let’s cover each one in more details.
1. Simplicity
AWS Transcribe makes speech-to-text conversion very simple. It does not require some complicated programming. Just upload the multimedia content you want to convert into text on Amazon S3 and the platform will return a ready-to-use text.
Basically, all you need to do is to upload the desired multimedia file in your S3 Bucket, select a language, and wait for the multimedia transcription job to be completed. Which doesn’t take much thanks to all the processing power of AWS.
It’s really that simple. But for some, it may be a bit too simple. I’m not a huge fan of their UI. I’d definitely make the look and feel a bit up to date. But it gets the job done.
2. Capacity
Amazon Transcribe is designed to do multimedia transcription for various formats, with different quality.
By now, AWS has managed to enable their Transcribe platform to process multimedia content:
- From telephony to studio-clarity recordings (8-48KHz sample rate)
- Long audio and video recordings, up to 2 hours long
- In two different languages: English and Spanish
- In WAV, mp3, mp4, FLAC formats
One thing that hurts AWS is their languages coverage. There’s a lot to be done here, but at least for English-speaking needs, AWS is really, really good. They do a good coverage of Spanish as well. So just by looking at what they offer (English and Spanish), AWS seems a bit too US-oriented compared to other vendors.
3. Accuracy
As I’m approaching Amazon Transcribe as a component-user from ArkCase, this third point is the most important one for me, so I’ll spend a bit more time on it.
The key deliverable for Amazon Transcribe is ready-to-use texts that do not require further editing. But in cases where AWS is not certain, it gives you an Accuracy Score, which usually is pretty high.
From an ArkCase perspective, we get a simple interface to edit the transcript for a 100% accuracy. The cool thing is that any time you change the transcript somehow, you’re training AWS to get better. So any time you alter a text, or punctuation, or a timestamp, you’re artificially increasing AWS’ intelligence.
To achieve this, Amazon Transcribe has special features that contribute to the high accuracy and production of ready-to-use multimedia transcriptions.
Let’s take a look at some of them:
- Punctuation
Amazon Transcribe is capable of formatting the text automatically and adding appropriate punctuation to the text as it goes. This way the platform is producing an intelligible output which is ready-to-use without further editing. I really get excited thinking about the level of analysis that goes into punctuation. The vocal pauses, the context of the pause, intonation of words etc. All this is factored in for your text to be properly punctuated. Fascinating.
- Confidence Score
AWS Transcribe also provides a confidence score (showed in percents) which displays how confident the platform is with the transcription. As I mentioned before, any time you make a change to the suggested transcript, you help AWS return texts with higher Confidence Score.
- Possible Alternatives
The platform also gives you an opportunity to make some alterations in cases where you want to. This usually happens when a word is difficult to understand. People pronounce words differently, and audio quality may be too poor for proper machine understanding. Each time you accept an alternative, you make AWS smarter.
- Timestamp Generation
Thanks to deep learning technologies, Amazon Transcribe produces automatically time-stamped transcripts. This means that the platform generates timestamps for each word, AND for each scene, which has text contextualization in mind. This feature of Amazon Transcribe increases the searchability of the transcribed media file.
- Custom Vocabulary
AWS Transcribe allows users to create their own list of custom vocabulary. This feature enables the user to expand the speech recognition of the platform. This way you’re giving the platform more information on how to process your speech in the multimedia file. This is very important for legal cases in general, and cases where audio files are processed from technical interviews on a very specific topic. The more domain-specific vocabulary you’ve trained in, the better AWS is at returning highly accurate multimedia transcriptions. You can Create, Delete or Update existing vocabulary files.
- Multiple Speakers
Amazon Transcribe can easily identify 2-10 different speakers in a multimedia file. The platform can easily recognize when the speaker changes and accordingly attribute the transcribed text to different speakers.
This feature also enables the users to select the number of speakers they need to be identified in the multimedia file.
Overall, the Accuracy side of things is pretty well covered, so, well done Amazon. What I find most useful is the ability to build topic-specific vocabularies. As most of our use of ArkCase and AWS Transcribe is in the Legal department, having a trained vocabulary helps a lot in quickly and easily getting the Confidence Score in the high 90%.
Amazon Transcribe As A Part Of Armedia Legal Module
Throughout the years, you have probably come across the need of increasing the accessibility and searchability of your multimedia content. Whether it is a simple customer call, recording of a meeting, judicial hearing, etc. there is a need for large organizations to derive value from stored multimedia content. And, as data stored in audio and video formats increases every day, so does the need for a platform like Amazon Transcribe.
Our team has recognized the need for converting multimedia content into a text in legal organizations, and because of that, we’ve decided to integrate this platform as a part of our Legal Module.
Thanks to the integration with AWS Transcribe, Armedia Legal Module can now help organizations derive value from multimedia content and as such enable them to:
- Make multimedia content much more accessible and discoverable,
- Improve search capabilities of multimedia content,
- Analyze customer multimedia data,
- Automate subtitle creation…
Having these capabilities, now every legal organization can easily organize (and find) multimedia content in a quick and secure way.
Final Thoughts
We made Amazon Transcribe as part of our Legal Module so that many organizations can benefit from this AI-powered multimedia transcription service.
This integration enabled our team to add a reliable speech-to-text capability to our Armedia Legal Module. Now, legal agencies have full search functionality of multimedia files.
So, if you are in the Legal industry and have many multimedia formats that need to be securely transcribed, feel free to contact us.
And, don’t forget to share this post on social media. Maybe some of your colleagues need a reliable, secure multimedia transcription service too.
0 Comments