In this second installment in his series of articles on Adobes latest post-production powerhouse, CS4 Production Premium, veteran editor David Roth Weiss takes closer look at Adobe Speech Search, a new speech recognition and transcription feature in both Premiere Pro CS4 and Soundbooth CS4. Speech Search is one the most exciting and eagerly anticipated new features to hit the editing world in a long time. So, if youve been wondering about it, read on, as David tells us what Speech Search is all about, how it works, and if it truly makes the grade.
LET ME INTRODUCE YOU
Unless you've been in total isolation for the last few months, you probably know that Creative Suite 4 (CS4) is the latest release from Adobe. CS4 includes impressive updates of just about every application across Adobe's entire product line, which encompasses many of the most popular and widely-used applications for Web, print, and video creation. My focus in this series is on Adobe's Production Premium, one of, if not the most comprehensive suites of video applications on the market, with solutions for video and audio editing, still and motion graphics, visual effects, interactive media, DVD, Blu-ray Disc, and mobile authoring.
At the core of Production Premium are Adobe's latest versions of Premiere Pro CS4, After Effects CS4, Encore CS4, Photoshop CS4, Soundbooth CS4, and Flash CS4, each of which is loaded with a brand new modernized and streamlined interface, and scores of valuable and useful new features.
However, to many video professionals there's nothing quite as exciting as Adobe's new Speech Search capabilities. Speech Search is an awesome new feature with incredible potential: it can be accessed from within either Premiere Pro or Soundbooth. There's really nothing quite like it. It has drawn a lot of attention and it has a lot of people talking.
THE METADATA THING
Log video in your NLE application and you're inputting metadata into the system. Logging adds useful information in the form of searchable descriptive text and index points (in & outs points) into a database in your NLE, and that information can be organized and searched in many ways that make it easier to locate the bits and pieces of picture, sound, and dialogue that you've identified as the most useful among all of your video assets. Adobe has raised the bar by figuring out how to turn the information in recorded dialogue tracks into useful metadata.
CUT TO THE CHASE
Adobe Speech Search has been referred to as "transcription in a box," but that doesn't really do it, or Adobe, the justice they deserve. As you'll see, Speech Search is much more than that. With Speech Search, Adobe has figured out exciting new ways to make use of the information contained within the dialogue tracks of recorded audio and video files.
Rather than relying solely on descriptive notes that must be laboriously typed into the logs of non linear editing apps after the fact, Speech Search analyzes the voice tracks of either audio or video files using voice recognition software, generates a speech-to-text transcription, then it embeds that information directly into the original media file where it's stored as searchable metadata, precisely indexed to the video.
And, unlike descriptive logging information, which loses all value outside of the non-linear editing application in which it's entered, the embedded metadata from Speech Search is not stored in the relational database of any particular application. The Speech Search transcription is instead a permanent record, retained within the media file itself, and accessible by any XMP metadata-enabled applications. Plus, because it's very simple data, the embedded transcription adds only a few kilobytes to the media files.
HOW DOES IT WORK?
With just the click of single button, from within either Premiere or Soundbooth, the magic begins. That's all it takes to initiate the automatic transcription of dialogue from a video or audio file into indexed searchable text. It typically happens at a speed very close to real-time -- so, depending on the length your file, you'll wind up with a transcript that looks like the one below.
In this particular case, I had Premiere Pro create a transcription from a clear, well-recorded one on one interview, with a duration of just under 44-minutes. Right on target, the transcription took just about 44-minutes to process. Even better, the entire process happens in the background, so you can keep right on working in Premiere Pro as Speech Search plies its magic.
Once the transcription is complete, just click on a word of text in the transcript window (now part of the clip's metadata), and Premiere instantly cues to the corresponding point in the video. Mark that as your in-point. Then, click on another word farther into the transcribed text that represents a good place to end the clip, and mark that as your out-point. Now, perform an insert or over-record edit and your corresponding video clip is cut into the sequence.
Speech Search is especially valuable for documentary filmmakers who are constantly extracting sound bites from lengthy on-camera interviews. There's no need to scrub through the interview at slow speed while monitoring the audio as it's been done traditionally, so it's much faster and easier to locate, select, and edit sound bites to the timeline. The Speech Search advantage is exponential, because you can edit faster, and you wind up with searchable indexed video too, and that's all good. However, as you'll see, it's not all a bed of roses.
THE GOOD & THE NOT SO GOOD
Creating a 44-minute transcript automatically at a total cost of zero dollars, and without tying up your computer, is quite a trick, especially when you compare it to a professional transcript, which could easily run over $100. And, as I've pointed out, the transcription itself is only a small part of the wonder. However, Speech Search is not exactly perfect at this point in time, and that's important to know.
Unlike a costly professionally typed transcript, which typically has a very high level of accuracy, often approaching 99%, Adobe freely admits that Speech Search is only about 75% accurate on average at this time. I'm not so sure I'd agree that 75% accuracy is really on the money, it's less than that in some cases, more in others, but I can tell you that with less than ideal voice recording, that number can and will drop substantially. However, don't be too fast to let the statistics sway you, Speech Search is still very useful in spite its imperfections.
If you follow along with the test below that I created for this article, which was excerpted from a very well-recorded one on one interview, you'll see that the transcript contains significant errors, some of which are just completely whacky. However, this is just the first release version of the software, and the transcripts created in Speech Search aren't meant as legal records, they're to be used as tools to speed the editing process. Product managers at Adobe say they are working closely with the software vendor that supplies the voice recognition software, to increase the accuracy of the transcripts, and they have suggested that it will improve as the product matures.
Also, for the record, I selected an option for the transcript that's designed to identify and number the various people speaking in the recording. It's a great idea, but unfortunately, it's another aspect of Speech Search that still needs a little work. You'll notice that it identifies thirty-nine different speakers in this short interview, when in fact there were only two. So, while this feature clearly isn't working quite as it should, like most of the issues that exist in this first release of Speech Search, this one is also not a deal breaker, at least not in my opinion.
Adobe Speech Search is just one of many fresh ideas that continue to come from the creative minds of those behind the continuing evolution of Adobe CS4 Production Premium. While there is no doubt that it is in need of further development, Speech Search is a novel use of computer technology that will continue to improve, which will ultimately save enormous amounts of time, energy, and money in post-production in years to come.
I give Adobe Speech Search 4 Cows, and I'm looking forward to a very bright future for this technology.