LIBRARY: Tutorials Reviews Interviews Editorials Features Business Authors RSS Feed

Embedding a Speech Transcript in a Flash Video with Adobe CS5

COW Library : Adobe Flash Tutorials : Michael Hurwicz : Embedding a Speech Transcript in a Flash Video with Adobe CS5
CreativeCOW presents Embedding a Speech Transcript in a Flash Video with Adobe CS5 -- Adobe Flash Tutorial


www.hurwicz.com
Eastsound Washington USA
CreativeCOW.net. All rights reserved.


In this tutorial, Creative COW Leader Michael Hurwicz shows you how to create a scrolling transcript synced with a Flash video, using Adobe Creative Suite 5 (Story, OnLocation, Premiere Pro, Soundbooth and Flash Professional).



Play Video TutorialDownload Project Files


Embedding Speech Transcription into Flash Video with Adobe Creative Suite 5

by Michael Hurwicz

This article accompanies a Creative COW video tutorial showing how to embed a speech transcript in a Flash video project using Adobe Creative Suite 5. You start by creating a "reference script" created in Adobe Story, scriptwriting software which is free (for now, anyway), provided as an online service but also downloadable to run offline under Adobe AIR. Then you embed that script into the video using Adobe OnLocation. The speech analysis is performed in Premiere Pro. Finally, you use Adobe Soundbooth to ferry the speech transcription metadata into your Flash project.

This workflow evolved from an earlier workflow developed for CS4, which required fewer applications, but provided a less accurate speech transcription. (See below for details of this evolution.)

Tim Siglin

The workflow described in the video tutorial looks like this:

  • Based on the existing video, manually create a script using Adobe Story.
  • Import the script into OnLocation, and embed the script in the video
  • In Premiere Pro, perform speech analysis on the video with the embedded script. This results in the "perfect" speech analysis metadata – embedded in the video file in the XMP format, however. (This step does involve the Media Encoder, by the way.)
  • Bring the video file into Soundbooth. If you're in "Edit Audio to Video" workspace mode, you should see the speech analysis on the left. Then 1) export the embedded XMP as XML (File > Export > Speech Analysis), 2) import markers using the file you just exported (File > Import > Markers)
  • Save an FLV out of Soundbooth (File > Save As). It may look like you'll be saving an f4v file, but in the "Export Settings" dialogue, you can change to flv.

You can use the FLV file for your Flash Professional project.

It's true, this really shouldn't be called automatic transcription at all. More like automatic script-to-video synchronization or alignment. Whatever you call it, it's still a time-saving feature, because it allows you to get an excellent embedded transcript without having to use the horrible speech analysis editing interface in Premiere Pro or Soundbooth. Instead, you get to use the really quite nice editing interface in Adobe Story, which has all sorts of helpful features to make your job easier.

See the next section for details of creating the Flash Professional project. ActionScript code follows. Also consult the working example, cs5_speech_transcript.fla.

=========================================================================

Creating the Flash Professional CS5 "Scrolling Transcript" Application

The basis of the Flash application is the FLVPlayback component, a video player that comes with the Flash authoring environment. The other main visual element is a TextArea component to hold the scrolling transcript.

There are four basic phases to creating the Flash project:

  • Create a new ActionScript 3.0 project.
  • Populate the Library with the necessary components.
  • Cut and paste the ActionScript code.
  • Publish.

1.  Create a new project.
  • Select File > New and select the top type (ActionScript 3.0).
  • Select File > Save As, enter a file name, and click Save.
  • Go to Modify > Document and set the dimensions a bit larger than your video. (I chose 800x600 for a 656x480 video).

2.  Populate the Library.
  • If the Library is not already visible, select Window > Library to bring up the Library window. This step is not really necessary, but it allows you to confirm that the Library is currently empty.
  • If the Components window is not already visible, select Window > Components to bring it up.
  • From the Components window, drag an FLVPlayback component (from the Video section) and a TextArea component (both from the User Interface section) anywhere on the Stage. Hit Ctrl-Enter. This will create the SkinUnderPlaySeekMute.swf skin file on your disk.

    Then, delete them. You will see that the Library is now populated.


3.  Cut and paste the ActionScript code.
  • If the Actions panel is not already visible, select Window > Actions to bring it up.
  • Copy the ActionScript code and paste it into the Actions panel. (The ActionScript is in the next section below, or you can get it from the sample project.)
  • Change the name and location of the FLV file in the code. (The line you’re looking for starts with flvPlayer.source. Look for three asterisks in the line above it.)

4.  Publish
  • Go to File > Publish Settings. On the Formats tab, make sure that both Flash and HTML are selected. On the Flash tab, select Flash Player 9 or 10 and ActionScript 3.0. (Flash Player versions prior to 9 do not support ActionScript 3.0, so they won't work with the code provided in this tutorial.
  • Click Publish.

That’s it! Deploy your SWF (including SkinUnderPlaySeekMute.swf), HTML, and FLV files, and you should have a working application.

(Note: This application was tested successfully with Windows 7, 64-bit., on an HP Z600 Workstation with 12GB of RAM)

=========================================================================

// ActionScript code for Creative COW article:
// Embedding Speech Transcription into Flash Video with Adobe Creative Suite 5
// by Michael Hurwicz
import fl.video.*;
import fl.controls.*;

// keep the TextArea scrolled to the bottom as text appears
function autoscroll()
{
setTimeout(function():void{ta.verticalScrollPosition = ta.maxVerticalScrollPosition;},100);
}

// add new text to the bottom of the TextArea and scroll
function cuePointListener(event1:MetadataEvent):void
{
trace("!!!");
//trace("Elapsed time in seconds: " + flvPlayer.playheadTime);
// with CS4 the text was in event1.info.name
// things are more complex with CS5

for (var key:String in event1.info)
{
trace(key + ": " + event1.info[key]);
for (var key2:String in event1.info[key])
{
trace(key2 + ": " + event1.info[key][key2]);
if (key2 == "data")
{
trace("data found");
ta.text+=(" "+event1.info[key][key2]);
autoscroll();
}
}

}
}

// show that metadata is available (for debugging)
function receivedListener(event1:MetadataEvent):void
{
trace("### metadata available ###");
}

// create the video player
var flvPlayer:FLVPlayback = new FLVPlayback();
flvPlayer.width = 656;
flvPlayer.height = 480;
flvPlayer.bufferTime = 60;
// flvPlayer.fullScreenTakeOver = false;
//The next line assumes you have copied the skin file to the local directory
flvPlayer.skin = "SkinUnderPlaySeekMute.swf";
// *** the following line is the one you want to change to your own FLV file *** //
flvPlayer.source = "tim_siglin_red_carpet.flv";
addChild(flvPlayer);

// listen for cue points and metadata availability
flvPlayer.addEventListener(MetadataEvent.CUE_POINT, cuePointListener);
flvPlayer.addEventListener(MetadataEvent.METADATA_RECEIVED, receivedListener);

// create the TextArea and format the text;
var textAreaFormat:TextFormat = new TextFormat();
textAreaFormat.size = 18;
textAreaFormat.italic = false;
var ta:TextArea = new TextArea();
ta.x = 650;
ta.y = 10;
ta.width = 150;
ta.height = 300;
ta.setStyle("textFormat", textAreaFormat);
ta.text = "";
addChild(ta);

=========================================================================

Background/History

Horrible, Powerful Speech Transcription

Both Adobe Premiere Pro and Adobe Soundbooth can do what Adobe now terms "speech analysis," automatically analyzing the spoken words in a video and creating a text transcript. (See my April, 2009, article, "Creating Automatic Transcripts in Flash Video Using Adobe CS4.") There are two main weak points in that process: 1) the accuracy of the speech recognition and 2) the ease of editing the transcript.

For normal speech – even speech that sounds quite clear to you and me – the transcription (if you're not using the new workflow which I'm about to describe) is often laughably, divertingly bad. Less than 50% accuracy is quite typical. (The software tries its best, however, which is where the entertainment value comes in. For instance, in the video used for this article, Premiere Pro CS5 – with no reference script to guide it – put the following words in Tim Siglin's mouth: "Get sexy and in Ensenada with a friend that he cannot said terrible acts so there it seems he knows are thinking that yesterday I had to visit this tag and man the woman ..." Which may be true, but it's not what Tim said.)

As for the editing function (in Premiere Pro or Soundbooth), you edit one word at a time, and you have to double-click a word before you can edit it. (If you need to edit several consecutive words, you can tab from one to the next.) And to shift the timing of a word, you have to right-click it first and then select "Merge With Next Word" or "Merge With Previous Word" from a context menu. The same is true for deleting or copying a word. "Cumbersome" is too kind a word for it. It's horrible.

However, what you end up with is potentially powerful: an XML file that gives you text and timing for every spoken word in the video. This could be used, for instance, to create subtitles for a Flash video project (which is basically what I did with CS4 last year) or to create a searchable index that actually takes you to the point in the video where the search item occurs (which Adobe has done via Encore CS5, DVD / Blu-ray authoring software).

In the Flash authoring environment (Flash Professional), your imagination and your ActionScripting skills are the only limits to what you can do with the embedded transcript. You could ring a bell every time the "magic word" for the day comes up, display text ads based on the topics being discussed in the video, display links to related information, or enforce parental controls based on the language used.

A Cool, Boring, Typical Reference Script Scenario

It's this flexibility that fueled my interest in developing a methodology for getting the speech transcript into a form that could be accessed in the Flash authoring environment via ActionScript. Specifically, that means that each word in the speech transcript has to be associated with a "cue point," a marker in the Flash video (FLV) file which can be used for navigation or to trigger any event that can be programmed via ActionScript.

So, when I started investigating Adobe Creative Suite 5, though I was suitably impressed by the "big" new features for video, like increased performance, tapeless workflows, and sharing with FCP and Avid users, I was equally curious whether they had done anything to improve speech transcription accuracy and editing, or integration of transcription data into Flash. In addition, since the speech transcript is metadata embedded in the video file, I was interested to see how speech transcription might fit in with Adobe's new metadata-driven "script-to-screen" workflow.

It turns out that CS5 offers one significant new feature relating to automatic transcription: the "reference script," a script that Premiere Pro (or Soundbooth) can be guided by while doing the transcription. It is very much a "script-to-screen" feature.

One typical scenario for using the reference script is as follows:

  • Write a script using Adobe Story
  • Import the script into OnLocation (direct-to-disk recording, logging, and monitoring software) as metadata to help you organize and manage your shoot.
  • Enrich the metadata in OnLocation (for instance, by marking good shots and in/out points).
  • Import the enriched metadata from OnLocation into Premiere Pro to guide you during the video editing process.
  • Perform automatic speech transcription in Premiere Pro using the Story script as a reference.
  • Use Encore to embed the speech transcription text in a "Web DVD" (a video in the Flash SWF format).

When you play the Web DVD in the Flash DVD Player, the embedded text is searchable. So, if you're watching a cooking video, you type "salad" in the Flash DVD Player search box, and it takes you to the points in the video where that word is spoken. It's really pretty cool. But it's also boring, because it's just too easy.

Adobe has a video on this, if you're interested.

Getting the Same Old Result with More Work (Whoopee)

My interest was in a different workflow, starting with an existing video and ending in an ActionScript-based Flash Professional project. Initially, I thought that because I was starting with an existing video clip, I would have no need for OnLocation. I was very, very wrong. I also thought I would have no need for Encore, since my goal was to create a project in Flash Professional. That turned out to be correct.

As I mentioned above, for a Flash Professional project, you want to create a Flash video file (FLV) with embedded cue points that you can access via ActionScript. In the past, I have used the Adobe Media Encoder to embed cue points in FLVs, and then used Flash Professional to create an FLV player. With that approach, I can use ActionScript to determine what happens when a cue point is detected during playback.

An obstacle to this workflow with the typical "script-to-screen" workflow is that Flash Professional and the Adobe Media Encoder both want to work with cue points, while the "script-to-screen" workflow in other Adobe applications (Story, OnLocation, Premiere Pro, Encore and the Flash DVD Player) is based on data stored in the XMP (Extensible Metadata Platform) format. However, this is not a show-stopper: As I showed in the article mentioned above, Adobe Soundbooth can import embedded XMP speech transcription metadata and export an XML cue point file that the Adobe Media Encoder can use to embed cue points in an FLV. Those cue points can then be referenced via ActionScript in a player created with Flash Professional.

One thing that threw me for a minor loop was that the format of the XML cue point file created by Soundbooth has changed from CS4 to CS5: it's gotten more complex. Reflecting that, the cue points themselves are more complex, and the ActionScript that looks at them has to be a bit more sophisticated. In my ActionScript code, the cue point metadata object is called event1.info, and the code for accessing the transcription text in CS4 was simply event1.info.name. Couldn't be much simpler. With CS5, the text is buried a couple of levels down in the event1.info object, and the ActionScript code has to unwrap those layers to get at the text. (See the cuePointListener function in the ActionScript accompanying this article to see how that's done.)

By overcoming this complication, I was able to duplicate last year's results. (Whoopee.)

Bring in the Reference Script, Please!

The workflow I developed for CS4 does still work with CS5 (with a minor ActionScript code change described in the previous section), and it could still be a useful tool, I guess. Even though, with the video I was testing with for this article (the first minute or so of Tim Siglin's "Red Carpet" interview from Streaming Media East 2010), I only hit 41% accuracy using last year's approach.

However, by bringing in a reference script, things started to get a little more interesting. I had been told by a generally reliable informant (Karl Soulé, Adobe Technical Evangelist), that even a plain text file with some key phrases from the video would be used by the speech analysis software to decode those phrases. That sounded pretty cool: No need to make a complete transcript; any text file that happens to have a bunch of the same phrases used in the video could be helpful. Of course, how helpful it is depends on the particular video and the particular text file.

I created a 54 word text file which I thought addressed the areas that the speech analysis software had the most trouble with. For my efforts, I got a mere 4% improvement, from 41% accuracy to 45%. This was disappointing, since the whole video has only about 220 words in it, and the number of words correctly identified increased by a mere five words, from 87 to 92. Not great for a 54 word reference "script"..

So, I tried creating a complete transcription of the video in Adobe Story. That yielded 82% accuracy. I was somewhat pleased and somewhat puzzled. I had doubled my accuracy. But why didn't I get at least 95% accuracy from a 95% accurate script? (The 5% inaccuracy, by the way, is because I didn't transcribe "ums" or dysfluencies like repetitions of words. This made for a more readable transcription but was slightly less faithful to the actual spoken word.)

There were various reasons, I think, for this imperfect performance. In a couple of places, both people were talking at the same time. While the human brain handles this easily enough, speech transcription software doesn't. In other places, the software tried to make sense of a non-meaningful sound, like an "um". Then there were the mysterious failures, where, even with the precise words available to it, and no obvious complicating factors, the software failed to recognize certain phrases. For instance, near the end, Tim says "the Z18 or the ZX3". Even after being told, via the Story script, to expect this, the software translated it as "to see you one day for this the X ray". There were several other surprising deficits in the speech analysis, including loss of most of the punctuation. See the second page of this article for details.

Anyway, the 82%-accurate workflow in CS5 looked like this:

  • Based on the existing video, manually create a script. (I used Adobe Story, but unless you're embedding with OnLocation, as described in the next section, I don't think it matters what editor you use.)
  • Use the script as a reference for automatic speech transcription in Premiere Pro. The speech analysis metadata is automatically embedded as XMP metadata in the video file. (If you note the timestamp on the file before and after, you'll see it changes.)
  • Bring that video file into Soundbooth, and then 1) export the embedded XMP as XML (File > Export > Speech Analysis), 2) import markers into Soundbooth using the file you just exported (File > Import > Markers) and 3) re-export markers (File > Export > Markers). I told you the workflow was a little arcane. But the result is an XML file (named "Flash Cue Data.xml" by default) that you can use in the Adobe Media Encoder. This is a good thing, as demonstrated in the next step.
  • Embed cue points in an FLV using the Adobe Media Encoder and the aforementioned XML cue point file.
  • Create a project in Flash Professional that uses ActionScript to work with the embedded cue points in the FLV.

(There's a link at the end of this article to step-by-step instructions for creating the Flash application.) Again, this is basically the same workflow I described last year for CS4, with the addition of a reference script.

In a way, of course, this approach fails to deliver on the promise of automatic speech transcription: You have to do the initial transcribing manually. But, let's face it, the unaided software really isn't very good at recognizing words anyway. On the other hand, it is excellent at determining the timing of words once they've been recognized (a function sometimes referred to as "script alignment" as opposed to speech recognition). This approach uses the software for what it's good at, and the human brain for what it's good at. This also could be the approach of choice if you already have good transcripts of your videos (they don't have to be Adobe Story scripts) and you just want to embed them in the videos. The accuracy won't be perfect. You will lose a lot, in my experience. But it's not much effort, either, if you already have the transcripts.

(That being said, there are hybrid approaches in which you let the software take a first stab at decoding the words, either with no reference script or with a quick-and-dirty reference script, and then use the results of that decoding to create a new improved reference script, after which you follow the same workflow shown above. Given that the software can often hit 30% - 50% accuracy with no help at all, this approach could save you a lot of typing when creating an initial transcript. The minimal reference script, in which you give the software just a few difficult key phrases, is an intriguing approach to me. Though, as I have said, it didn't really help me much for the particular clip I was working with.)

When I first saw a "perfect" transcription (as perfect as my reference script, anyway) appear as embedded metadata in Premiere Pro, I was thrilled. I assumed that this perfection would transfer, via Soundbooth and the Adobe Media Encoder, into the cue points of my FLV.

Not true. When I tried to use the XML file from Soundbooth in the Media Encoder, it worked very imperfectly, if at all. I pored over the XML, but couldn't see anything wrong with it (or anything significantly different from earlier, pre-OnLocation XML files, which had worked). I tried several other things, which I will spare you, some of which involved contact between my head and the wall.

So, still not satisfied with 82% accuracy from a 95% accurate script, I re-examined my assumptions, scanned some forums, Googled "what am I doing wrong?" and even (gulp) read some help files. What I discovered (thank you Curt Wrigley, Community Professional on Adobe Forums) was that by bringing OnLocation into the act, I was able to make the embedded transcription exactly match the Adobe Story script.

In addition, I discovered that I got consistently good results using Soundbooth, rather than the Media Encoder, for the final process of creating an FLV with embedded cue points. Why this works better, I don't know. But I am gladdened by the fact that it does.

 

Testing environment:

HP Z600 Workstation, 12GB RAM, Dual Processor 2.93GHz
NVIDIA Quadro FX 3800 Graphics Adapter
Windows 7 64-bit

 

 

 


Comments

Re: Embedding a Speech Transcript in a Flash Video with Adobe CS5
by Michael Hurwicz
There is a sample astx at this URL you could download:

http://greendept.com/astx/

I believe this is the astx file I used in this tutorial.
Re: Embedding a Speech Transcript in a Flash Video with Adobe CS5
by Michael Hurwicz
@Ace
.astx files are standard XML formatted documents ( http://forums.adobe.com/thread/707301 ) so theoretically you could manually construct an .astx file and continue from there. Back in Sept. 2010, someone on the Adobe Story team mentioned making a spec public (see forum posts at above URL) but I'm not aware that this has happened yet.
Re: Embedding a Speech Transcript in a Flash Video with Adobe CS5
by Ace Roj
Hi Michael,

I was wondering if there's another way to get the script done without using Adobe Story. I won't have internet access unfortunately to create the script file but I have access to the whole CS5 Suite.

This tutorial was very helpful otherwise!

Thanks,

Ace
Re: Embedding a Speech Transcript in a Flash Video with Adobe CS5
by Michael Hurwicz
Hi Amit -

I copy-and-pasted the code in from a text file. Sorry if that was confusing. The code is in the FLA that you can download with the project files (cs5_speech_transcript.fla). I also included it in the article accompanying the tutorial.

Hope this helps!

Mike
Re: Embedding a Speech Transcript in a Flash Video with Adobe CS5
by amit levi
Hi Michael
Seeing your tutorial got me a little confused due to the
AS file you brought there (at 12:30 tc)
Where did it come forme and how?

Thanks
Amit Levi
Re: Embedding a Speech Transcript in a Flash Video with Adobe CS5
by Abraham Chaffin
Great work Michael! Thank you for such a thorough look into how to do perform this process.

Abraham


Related Articles / Tutorials:
Adobe Flash
Adobe Edge Animate/The Missing Manual Software/Book Review

Adobe Edge Animate/The Missing Manual Software/Book Review

Increasingly, web developers are being asked to create sites that are viewable on tablets and mobile devices, as well as desktops and laptops running Windows, OS X or Linux. One of the challenges in creating a site like that is animation. Flash, for many years the preferred cross-platform animation solution, is no longer supported on many tablets and mobile devices. Michael looks for - and finds - answers within Adobe's Edge Animate.

Review, Editorial
Michael Hurwicz
Adobe Flash
HTML5 (CreateJS): Interactivity and Debugging

HTML5 (CreateJS): Interactivity and Debugging
  Play Video
Building on two previous tutorials on CreateJS for Flash Pro CS6, Creative COW leader Michael Hurwicz introduces interactivity (specifically, responding to a mouse click), debugging using the JavaScript "alert" statement, and assigning names to objects exported from Flash. Project files included.

Tutorial, Video Tutorial
Michael Hurwicz
Adobe Flash
HTML5 (CreateJS): Animate with JavaScript

HTML5 (CreateJS): Animate with JavaScript
  Play Video
Building on a previous introductory tutorial on CreateJS for Flash Pro CS6, Creative COW leader Michael Hurwicz shows how to modify the JavaScript exported from Flash, to change attributes such as position, speed, wait time, and color. Project files are included.

Tutorial, Video Tutorial
Michael Hurwicz
Adobe Flash
HTML5 (CreateJS) for Adobe Flash Professional CS6

HTML5 (CreateJS) for Adobe Flash Professional CS6
  Play Video
Creative COW leader Michael Hurwicz provides a brief introduction to CreateJS, an extension for Adobe Flash Professional CS6 that allows you to export Flash projects as HTML and JavaScript, even on platforms that do not support the Flash Player.

Tutorial, Video Tutorial
Michael Hurwicz
Adobe Flash
Building an Interactive Whiteboard in Flash: Part Two

Building an Interactive Whiteboard in Flash: Part Two
  Play Video
In part two of this series, Justin Junda walks you through the necessary steps to connect the graphical elements from part one, with Actionscript 3.0 using the Flash drawing API in order to bring this whiteboard to life.

Tutorial, Video Tutorial
Justin Junda
Adobe Flash
Building an Interactive Whiteboard in Flash: Part One

Building an Interactive Whiteboard in Flash: Part One
  Play Video
In this tutorial, Justin Junda takes you through the step by step process on how to build and interactive whiteboard. These whiteboard are often seen in drawing applications, scratch pads, and educational sites. This is part one of a two part series. Part one takes you through setting up all the graphical elements within the stage. Then part two shows a user how to connect the graphics to Actionscript using flashes drawing API to make this whiteboard truly interactive.

Tutorial, Video Tutorial
Justin Junda
Adobe Flash
Inverse Kinematics - Springs

Inverse Kinematics - Springs
  Play Video
Springs are a new feature for Inverse Kinematics in Flash CS5. In this video tutorial, Creative COW leader Michael Hurwicz shows you a simple technique for implementing springs, as well as several other "tricks and traps" to make your work with Inverse Kinematics in Flash easier, more flexible and more powerful.

Tutorial, Video Tutorial
Michael Hurwicz
Adobe Flash
Inverse Kinematics - Control Points

Inverse Kinematics - Control Points
  Play Video
Fine-tune your Inverse Kinematics animations in Flash CS5 by working with control points. In this Adobe Flash video tutorial, Creative COW leader Michael Hurwicz shows you how to associate bones with control points, how to move, add and delete control points, and how to adjust control point handles.

Tutorial, Video Tutorial
Michael Hurwicz
Adobe Flash
Inverse Kinematics

Inverse Kinematics
  Play Video
Inverse Kinematics allows you to create structures of bones and joints and use them to animate characters in Flash. Fun and easy! Creative COW leader Michael Hurwicz shows you how in this video tutorial, step by step. Learn some basic ActionScript, too!

Tutorial, Video Tutorial
Michael Hurwicz
Adobe Flash
Creating an Interactive Snow Globe with Action Script 3.0

Creating an Interactive Snow Globe with Action Script 3.0
  Play Video
In this tutorial, Chris Smith and Scott Hiers from Domani Studios create a shake-snow globe using Flash CS4 and Action Script 3.0. In this tutorial, you can either watch the video or follow along with the text tutorial below.

Tutorial, Video Tutorial
Domani Studios
MORE
© 2016 CreativeCOW.net All Rights Reserved
[TOP]