Metadata & The Future of Filmmaking
COW Library : Art of the Edit : Dave Stump, ASC : Metadata & The Future of Filmmaking
Editor's note: Although this interview was first published in 2008, we've found that the ideas explored here are becoming more relevant by the day, as large parts of the future that David Stump ASC imagined at the time are coming to pass — while other parts of it seem more urgent than ever. This conversation with Creative COW Editor-in-Chief Tim Wilson and longtime Creative COW host and camera expert Gary Adcock is also one of the most engaging looks you'll find at the technical side of filmmaking, seen through the eyes of one of the industry's most respected VFX cinematographers and industry thought leaders.
Pictures and sound are data. Information about them is metadata — the data about the data.
Metadata can begin with information as simple as reel name, clip name, date, duration. However, with new cameras skipping video and film as we’ve known them and recording straight to digital files, the potential complexity of the metadata skyrockets.
This is why metadata collection is moving closer and closer to the beginning of image capture, to lenses, cameras, even cranes.
Dave Stump is the chair of the Camera subcommittee of the American Society of Cinematographers, and co-chair of the Metadata subcommittee. His message to Hollywood is how critical it is that camera issues and metadata issues be addressed at the same time.
Creative Cow’s Gary Adcock assists Dave on these two committees, and told us about a presentation that he and Dave gave during NAB 2008 to illustrate metadata. Dave held up a photograph, and asked if anyone in the audience could figure out who it is. After some guessing, someone in the audience suggested looking at the back of the photo to see if a name was written there.
Dave said, “Ah, you mean check the metadata.”
(On the back of the photograph is written "Earl Stump, 1918." It's Dave's grandfather.)
As his “day job” Dave has served as the visual effects director of photography and VFX supervisor for dozens of films, as diverse as “X-men” and “X2,” “Batman Forever,” “Stand by Me,” “Free Willy,” and 2008’s James Bond film, “Quantum of Solace.”
Regardless of a film’s scale or genre, Dave’s task is the same, enabling the realistic combination of camera footage with CGI. Until very recently, much of that work was done by hand, guided by informed guesswork, hoping to match camera position, lens length, focus and more— typically all of them in motion at once over the course of a shot.
In 2000, Dave was part of a team that received a Technical Achievement Award from the Academy of Motion Picture Arts and Sciences, for hand-development of advanced camera data capture systems, which he describes below. We’ve come a long way since then, in no small measure thanks to the concerted efforts of Dave and his colleagues.
As he tells it, his primary goal in that ongoing effort was simply to explain what metadata is, and why it matters.
Dave Stump: Privately, my secondary goal was to shame the proprietary sense of everyone in the manufacturing community who builds our tools. Because everyone who builds a machine, every one who builds a computer-driven device, everybody who uses metadata, builds their own metadata scheme, and no two of them talk to each other.
You know the saying, “Standards are great. That’s why we have so many of them.” If no two standards can talk to each other, there’s no uniformity to the metadata. It becomes meaningless.
You put it on a scanner, you save it as data. Some of it you send out to a visual effects house, some of it, you run through an Avid or a Final Cut system. Whoever is working on it at one visual effects house puts it in a Shake system, or some of it goes in Maya. Some of it goes into an Autodesk Flame, some of it goes into Inferno, some of it goes into Matador.
All of these systems will bring in a DPX file or Cineon file. Sure we know that this is, but the data in it that would have told you when and where it was created, and who it belongs to, and how it was named in accordance to standards set for the particular movie, and what the original colorscape of it was, or what the original camera settings were, or how quickly it was panning from left to right in degrees or frames -- all of the information put in there is discarded the moment its fitted into another new machine. Thrown in the thrash.
“What do I need that for?," [someone asks.] "I’m just here to do some compositing.”
So when you get back a file, all that information has been decimated. And there is no reason why we can't all agree on the value of the data like that. And agree to do no harm to it.
Naming is vastly more important than we think it is because that's how you find things. That's the first thing you look for in databases. "Go look for a file called 'The Buddy White Story' We started off naming the third and fourth characters in the names in the string as 'bw.'"
If the last place that you sent the file stripped the name and used their own naming convention, which is some UNIX string of number or random number or date that they used, the "bw" is gone. Now you can't find it with the computer!
That's 260 fields of metadata that ought to be included in every picture that it makes. Just for starters. Just for that camera alone. And that doesn't even include the main menu criteria that also ought to be there. There are criteria ought to be there that Sony hasn't even though to put in yet.
The problem is that so few of the people who are part of this process have sat down and agreed on how the data ought to come out. Most of them want to build the machines where the data comes out themselves, and fit them into another proprietary box which you have to buy from them. So the monetary interest in being the only solution for metadata prevents the universalization of standards.
And, excuse me, that’s what standards mean! Something that’s open source and universal. When you say “our standard,” it’s no longer a standard.
Yet in the grand scheme of things, that’s a minuscule amount of data to collect while shooting. You only have to remember to ask, “O’Connor, the next time you build a pan head, we want it with a plug for a data recorder.” Or “Panavision, do you have a GPS set that you can build into the base plate?”
GPS apparently takes very little real estate because it’s there in my iPhone sitting on my desk.
Gary: I look at it from the post side. Cooke Optics has this little box, the “/i dataLink.”
It records focus, zoom and all that from the lens, and then everything from the camera too. It records all that to this little SD card.
Now you have the actual data. Instead of having to recreate it, you can do motion matching and everything in VFX long before the footage itself actually gets there. There’s not somebody waiting for the footage, and then starting to do all this work manually for weeks and weeks on end.
Or conversely you could take VFX information of files that have already been created, and program existing queries from master shots or something that's already been approved. Everything gets more and more efficient down the line. You can streamline the cost and expenses and redos and everything else further down the food chain thereby saving money in the long run.
Instead, for a shot that used to be a Boujou problem, you create a sync frame, like the bloop on the slate. Now comes the rest of the data: here’s the center shutter open pulse, here’s the pan, tilt, focus, zoom, f-stop, dolly, boom — synchronized with every frame of the film that you shot.
The artist who would have spent six weeks tracking this out by hand, and reverse engineering camera position and focal length anecdotally or from someone’s handwritten notes, can now simply take the metadata file, plug it in and start doing the work. The real work.
This is the way that I love to frame the discussion, as an invitation to the producers and the studios who want to save money. You know, we can all stand around and haggle over 50 cents an hour for every employee on the staff and you can feel like you’ve saved some money.
Or we can automate those people’s work, get it done in a week’s less time or a month’s less time, and then save some real money.
Everyone asks, well, who’s going to pay for developing all of this new automated metadata collection? I say, we already pay for it anyway. How often do you buy computers and cameras and lenses? We renew and replenish this stuff on a daily basis. At least ask manufacturers for what you want in the updates, rather than just taking what you’re handed.
Tim: I'd like to get into some specifics. I know you started working with Viper in their early on days, right?
Dave:: Yes. And when it was introduced, there was one recorder, called the Director's Friend, which sort of disappeared immediately, and there was nothing to record it to.
How many recorders are there now that can not only do 4:4:4 but for that matter 16 bit TIFF? Just by virtue of having made demands from the community. I won't take the credit for the fact that maybe a dozen machines like this exist. But I will take credit for having spoken into that vacuum very early on.
And I can tell you what the looks are what you get when you do that.
Dave: One of the obstructions to automating the motion picture workplace is that we don’t have a tradition of metadata on set. We have a tradition of what I call “metapaper.”
For example, script supervisors for the most part take a paper copy of the script, and note vast quantities of metadata in real time just by watching the movie being filmed: script changes, which actors are in each shot, and so on. And they notate that using lines and squiggles and arrows and notes all over the typed script, with hand written notes to elaborate. They accumulate vast quantities of paper that people have to keep in notebooks.
The first assistant and the second assistant, all the cameramen, the loader — these people keep vast amounts of paper notes too. If you want to know what lens they were shooting with, or if you want to know what filters were on the camera or what settings they shot with, you have to dig out that notebook and find the page that you want, and hopefully it’s in the right place.
Now you want to find out the tilt angle for this particular CGI shot, approximately in degrees -- the best that the visual effects’ people were able to determine by standing there and looking at the heightened crane, which is 20 feet in the air and trying to guess what the tilt angle was for the given shot. Visual effects people have data wranglers to keep vast amounts of their own paper notes.
And you have to find that notebook and dig it out -- sometimes the notebooks for a production aren’t even all in the same place!
All of this metapaper exists separately and independently of the images themselves!
Once you don’t have to have those notebooks stacked in shelves, it becomes a downhill rush to automate all metadata coming to the editors. It’s a small step from there to attach the metadata to the picture files themselves, and to preserve that information as it passes from machine to machine in post.
FROM BATMAN TO ARTICULATE DEMANDS
In the earliest days of live action motion control and data capture, we had a shot that started in a macro closeup, then boomed up to 60 feet in the sky. The question everyone asked was, how in the world are we ever going to focus this thing?
We ended up attaching an encoder to the crane arm to give us a numerical value for the position of the camera at any given azimuth. We then wrote a lookup table as an “if/then” equation. If the arm has boomed up 6 feet, then the focus should be set at 6 feet. If the arm has boomed up 12 feet, then the focus should be set at 12 feet.
I’m oversimplifying, but it’s easy to put a motor on a focuser. What we ended up doing was going down to the arm and attaching an encoder, so that for any position of the arm swinging up we had a numerical value for that position. We then wrote a lookup table, or translation table, as an if-then equation.
Once you write that lookup table, and you swing the arm, and the arm data drives the focuser, there’s no mistake to be made. You have the numbers. It’s just an equation.
And while there was a motor involved in that task, ultimately we realized that that was all you ever needed, and you could figure out the rest.
But if someone is actually focusing the camera and someone's actually pushing a dolly and actually hand- filtering the camera, that doesn't mean you can't record it.
It turns out that you can record anything that you can measure. So for “Batman Forever” I built a little kit, and Panavision, to their credit, built me three encoded PanaHeads that had differential encoders on primary axles, recording pan and tilt, and converting that to degrees, and saving that data.
Then I put a little puck wheel on a dolly. As it rolls, it can measure tracking distance usually within greater precision than16ths of an inch.
For swinging the arm of a crane, the same thing: you put an azimuth encoder to the chain of a Titan crane or you put inclinometer encoders on the side of the arm. When you read how many degrees of tilt the arm is going through, you know exactly what height the crane is at.
But we discovered there was inherent noise in those pendulum encoders. For example, if you start booming up and pushing the dolly at the same time, it generates an inertial noise -- a lurch as the movement begins.
I also discovered that if you put a pendulum encoder on the right side of the arm where it moves, and another pendulum encoder on the left side of the chassis where it doesn't tilt up, and you subtract the inertial noise, you have a noise graph that tells you how quickly the inertial jolt moved in addition to the tilt up of the arm. Subtract the noise from the arm movement, and you have pure arm movement. And that becomes extraordinarily valuable.
So we were able to record all these axes of movement, unobtrusively. There was a little extra wiring on the dolly that we ran through a nice little cable harness, on down to an RS 422 line connected to a computer sitting off to the side.
That was very liberating to me, very freeing, just realizing that the community can make demands of manufacturers.
And now, Panavision have a data port out of every Technocrane they own. You can walk up and plug a data capture system into the base of a Technocrane and record every move for every frame. I wrote the connector standard for them, so I know. [Laughs]
Tim: Can you also collect metadata from non-Panavision cameras and their lenses on the crane?
Dave: Yes! If the camera and lens and head send out data, it will pass through the crane. So you can put an Arriflex camera on that crane and record all the data.
Tim: Now you’re talking!
Dave: You know, Panavision actually got involved very early on with putting encoders in their lenses. The guys at Fujinon also developed a system to output data from their lenses for George Lucas to use on the first digital Star Wars movie. Arri have taken a somewhat a proprietary approach to packaging their data. But they’re starting to see the logic of open source.
So there have been baby steps, but the Cooke /i Lenses are the first committed, open source invitation to everyone to embrace gathering metadata from lenses. If you look on the Cooke Optical website, you can download a PDF file. “Here is the standard, here are the connectors, here is how it’s wired, here’s how the data comes out. Do with it what you will. It’s open source.”
They completely have the right idea. It’s up to us in the community to demand that the rest of imaging chain deliver data recorded to the images themselves as they’re gathered on set, in ways that everybody else can use.
Tim: It sounds like you’ve done an awful lot in terms of moving film production into the future with on-set metadata so, what do you want to do now?”
Dave: The problem is that in order to capture metadata, you have to agree how to name it, what it means and where to put it. There are over 2000 fields of metadata defined in the industry dictionary. What we did in committee was apply for and receive an ASC node to the metadata dictionary so that we could define the fields of metadata that we felt were important on set, for inclusion into the metadata header
We wanted to be able to assure that our work and visual effects work could be automated. And the only way to ensure that is to include it in the dictionary.
Dave: Once we have metadata everywhere, everyone will look around in shock and awe and ask each other, “How did we ever make movies without this stuff?
On-set metadata collection will become as ubiquitous as the walkie-talkie. You know, how did we make movies before we had walkie-talkies? Well, we shouted and stood on the side of the mountain and sent semaphore to the guy on the next hill. We sent smoke signals! Fire a gun — that means “GO!”
Well, when on-set metadata becomes useful and ubiquitous, we’ll be saying the same thing about it then. Instead of waiting for all the pieces of paper from the script supervisor and everyone else on the set to arrive in an envelope at the production office each night, we can have digital metadata, collected automatically on-set, delivered even as we’re shooting.
You know, if you can turn the focus barrel of a lens into data, you are to be able to turn the meaning of a script supervisor's wavy line on a tablet computer into the proper kind of data as well. In fact, that should be a trivial task compared to encoding a lens.
The amount of information in today’s physical metadata — script notes, camera movements, camera settings — is trivial, insignificant in size compared to the actual picture or sound data we’re already collecting. But getting it attached to the picture and sound data is NOT trivial. And it won’t happen unless you ask for it.
The question is being asked. The answers are being provided. It just takes time for the herd to move in that direction. So, every chance that I get, I speak to the herd, and I speak to the possibility of what we could be doing.
The tools of metadata can and will enable authorship of images, control of look management, efficiency in visual effects and editorial, and make better movies while saving the producers and studios money!