QuickTime is often described as a "media creation" API, and that means a lot more than just the ability to edit your audio and video and export it to an arbitrary format. This month I'd like to take the term very literally and show you how to create your movies in Java, one frame at a time, without depending on a pre-existing movie.
To do that, we need to take another look at the format of a QuickTime movie. In "Parsing and Writing the QuickTime File Format," we saw how structures called "atoms" represented this format. For today, let's strip away those details and look at the big picture:
A movie contains metadata (creation and modification time, current selection, preferred volume and rate, etc.) and zero or more tracks.
A track contains metadata (creation and modification time, playback quality), exactly one media object, and an edit list describing which parts of the media are to be used.
A media object contains a data reference that indicates where the audio, video, or other data actually is (in the movie file, in another file, on the network, etc.); information about which QuickTime "media handler" can load, save, and play the data; and a structure called a "sample table" to represent where the sample for a given time can be found in the data.
Graphically, this can be seen as a movie where the references are all to external sources (files, URLs, and other movies), as shown in Figure 1, or a "flattened" one, in which the data is all contained within the same
.mov file as the movie's structure, as shown in Figure 2. Either way, the movie is the structure that represents where the samples are, how they're arranged, and what to do with them.
Figure 1. A movie with external sources
Figure 2. A movie with internal sources
By "samples," we mean what is to be seen or heard at some instant of time, in the smallest amount of time relevant to that kind of media. For example, imagine a format where we have totally uncompressed video (equivalent to, say, North American television) and uncompressed CD-quality audio. The video, by our definition, is 30 frames per second, so there are 30 video samples in one second. CD-quality audio is 44.1 KHz, meaning there are 44,100 samples in a second.
QuickTime, interestingly, realizes that a player would generally like its data to be organized with regards to time. For example, you don't want to have a file with all of the video data first and then all the audio data, since playing back would require jumping back and forth between the two, and the read/write head on your hard drive would scream in agony. It's easier to mix them, so that the video data for a certain time and the audio data for that time are in the same place. In QuickTime's worldview, this is a process of "chunking" — the media data combines video, audio, and any other data into one stream (a long run of bytes), with "chunks" of audio, video, and other samples grouped by time. It's up to the media object to manage several tables, like a time-to-sample table and a sample-to-chunk table, to allow it to find the samples at playback time.
Fortunately, you as a developer aren't responsible for all of that bookkeeping, but it's good to understand how it works.
Getting back to the point, to make a movie from scratch, we need to do the following:
You may have noticed in the diagrams above that our hypothetical movie contains not just an audio and a video track, but also a "text track." This is exactly what it sounds like: a time-based collection of text, commonly used for providing captions to QuickTime movies. More technically, it is a track where the media samples are ordinary text strings. This is a good place to start with creating our own media, since it doesn't require knowing anything about images or sounds.
Download the source code for the
MakeTextTrack sample application creates a movie with a
single text track. It starts by creating an empty movie file to write to:
Movie.createMovieFile(file, StdQTConstants.kMoviePlayer, StdQTConstants.createMovieFileDeleteCurFile | StdQTConstants.createMovieFileDontCreateResFile);
Next, it creates an empty text track and a text media object, which it will eventually insert into the track:
Track textTrack = movie.addTrack (TEXT_TRACK_WIDTH, TEXT_TRACK_HEIGHT, TEXT_TRACK_VOLUME); TextMedia textMedia = new TextMedia(textTrack, timeScale);
The last argument is a time scale for the media. Movies, tracks, and media all have their own time scale, which is the number of time units that pass in one second. For a movie, this value defaults to 600, which has the advantage of being an even multiple of many common frame-rates: 30 (NTSC video), 25 (PAL and SECAM video), and 24 (film). Dean Perry of Abstract Plane also reminds me it's an even multiple of the 60 "ticks" per second that older Macintoshes used for timekeeping. However, you're free to use and abuse the time scales as you see fit. I arbitrarily chose a value of 100 for my media, so my sample durations are measured in hundredths of a second.
Next, we tell the new
Media object that we intend to do some
We then get the media handler object, required in this case because it has a method for creating new text samples:
TextMediaHandler handler = textMedia.getTextHandler();
and we create a rectangle that will be used in every sample to describe the shape that the text is to be rendered into when played back:
QDRect textBox = new QDRect(0, 0, TEXT_TRACK_WIDTH, TEXT_TRACK_HEIGHT);
We're finally ready to start adding samples. The sample application uses a
static array of
Strings, getting a QuickTime-compatible
QTPointer to each one and passing that as the first argument to
TextMediaHandler.addTextSample() method. Here's how that call
handler.addTextSample (msgPoint, 0, 12, 0, QDColor.yellow, QDColor.black, QDConstants.teJustCenter, textBox, 0, 0, 0, 0, QDColor.white, 100 );
Obviously, this method has a lot of parameters. In order, they are:
QTPointerRef text: a pointer to the string.
int fontNumber: an integer to indicate font.
can always be used as a generic default, or use the
to get the ID for a font name
int fontSize: the font size, in points.
int textFace: a style, such as bold, italics, etc., as defined by constants in
QDColor textColor: the foreground color, expressed as a
QDColor (not a
QDColor backColor: the background color.
int textJustification: the right/left/center justification.
Possible values are in
QDRect textBox: a
QDRect rectangle describing the box in which the text is to be displayed.
int displayFlags: zero or many behavior flags, logically
OR'd together, describing behavior such as clipping or scaling the text when displayed over other video, etc. These flags are in
StdQTConstants and a list of supported flags is documented for the native TextMediaAddTextSample function.
int scrollDelay: a time to delay between scrolls if the
dfScrollOut flags are set. Not
useful in this app, with its short samples, but potentially useful for other
int hiliteStart: the index of first character of text to
highlight (select), if any.
int hiliteEnd: the index of the last character of text to
QDColor rgbHiliteColor: the color of the highlight, if
int duration: the duration of this sample, expressed in the
media's time scale.
The duration is interesting for a couple of reasons. First, it's expressed in terms of the media's time scale. In our case, the time scale is 100 and the duration is 100, so the sample is exactly one second long. Of course, we could have half-second samples by using a duration of 50, or any sample length that can be expressed as a fraction of duration over time scale. Moreover, despite the commonness of fixed frame rates in audio and video (30 fps video, 44.1 KHz sound, etc.), QuickTime requires no such thing -- each sample can be of an arbitrary duration, different from the sample before or after it.
Wrapping up the application, once the loop is done adding samples, we inform
Media that we're done editing:
and insert this media into the text track:
textTrack.insertMedia (0, // trackStart 0, // mediaTime textMedia.getDuration(), // mediaDuration 1); // mediaRate
after which we save the file to disk as texttrack.mov, in the current directory.
To compile and run the sample code, make sure you've worked through any versioning or classpath issues as covered in our re-introduction to QTJ a few months back. When you're done, the result will look something like this (assuming you have the QT plug-in):
One of the nice things to notice is that we picked up word-wrap automatically, without hand-coding line-breaks.
CLAUDE I'd like to point out that this tape has not been tampered with or edited in any way. It even has a timecode on it, and those are very hard to fake. JUDGE For the benefit of the court, would you please explain "timecode"? CLAUDE Just because I don't know what it is ... doesn't mean I'm lying.from the movie Strange Brew
Download the source code for the
Actually, Claude, you are lying, and timecodes -- which are just a system for encoding the current time in a movie -- are very easy to fake. In fact, the next example will add a timecode to any QuickTime movie.
To do this, we'll add a text track with timecode-like
to the existing tracks in a movie:
Open a movie. Note that this has to be a real QuickTime movie, not just some other format that QuickTime can open, such as AVI or MPEG-4.
Add a text track for the timecode text, set to use the bottom center of the movie's display.
Write text samples every 1/30th of a second, in a typical timecode format (hh:mm:ss:ff).
Flatten the movie out to a new disk file.
We start off by getting a movie from a file rather than creating a new one. The movie already has some sizing information from its existing video track, which will help us later.
This time, I've used a time scale of
30 for the text media,
which will correspond with the idea of having timestamps every 1/30th of a
second. That means every sample will have a duration of 1. Of course, we
could have accomplished the same thing by using a time scale of 60 and samples
of duration 2, or a time scale of 600 and 20-unit samples, and so on.
What's interesting is what we don't have to do, namely care what the time scale or the frame rate of our video and audio is. Just as you can have audio and video at more or less arbitrary frame rates, freely changeable independent of one another, we can have 30 text samples a second regardless of the video's frame rate. Granted, this can't be truly accurate if the video's frame rate isn't 30 fps or something reasonably divisible, but that's not the point. The key is that for any given time in the movie, there is one appropriate sample in each track that QuickTime will retrieve for us, whether that's one of thousands of audio samples that fly by every second, one frame of video, or one of our text samples.
On the other hand, we do have to worry about where our text will be placed over the video. When the (0,0)-based coordinates of the text frames are mapped into the movie's display space, we get a timecode at (0,0), which is not what we want, as shown in Figure 3.
Figure 3. The added timecode
To place the caption box in a specific place relative to the other tracks in
the movie, we can use a transformation matrix. In QuickTime, this is
a 3x3 mathematical construct that maps points from one space into another. In
our case, we need to map from a rectangle whose upper left corner is at
(0,0) to a rectangle that is centered along the bottom of the
movie's space. We do this by calling
setMatrix() on our text
track, with a
Matrix object that describes the spatial
transformation we want QuickTime to perform.
The formula for matrix transformations is shown in Figure 4. Don't run away. It's not that scary, at least not in practice.
Figure 4. The formula for matrix transformations
The formula means that, given a point (x,y), we get the new coordinates (x',y') by applying matrix multiplication. The transformation can be expressed more simply as a pair of formulas:
x' = ax + cy + tx y' = bx + dy + ty
This buys us the ability to specify operations that move, rotate, and scale your source, all with one object. A full discussion of the possibilities is available on Apple's developer site.
For our purposes, we only need to specify a move to a pair of coordinates we
boxTop, which are then used
to create a
QDRect object called
toBox. We can then
Matrix that represents the moving of pixels from the
textBox, with an upper left corner of (0,0), to
toBox, with upper left corner of (
boxTop). Setting this as the text track's matrix causes QuickTime
to use the matrix when drawing the text frames at playback time:
Matrix transformMatrix = new Matrix(); QDRect toBox = new QDRect (boxLeft, boxTop, TEXT_TRACK_WIDTH, TEXT_TRACK_HEIGHT); transformMatrix.map (textBox, toBox); textTrack.setMatrix (transformMatrix);
If you read the docs, you'll notice that the
tx and ty values
are the only ones used for moving pixels; i.e., for translating between
coordinate spaces. So we could replace the
map() call with:
transformMatrix.setTx (boxLeft); transformMatrix.setTy (boxTop);
Either way, this puts the text box in its proper location relative to the rest of the movie, as seen in Figure 5.
Figure 5. A better timestamp location
Matrix class provides a several methods that allow you
to define matrices that can perform scaling and rotation operations, all
without you having to do your own trigonometry. For example, adding this
rather silly call rotates our timecode counter-clockwise by 45 degrees,
centered on the top left corner:
transformMatrix.rotate (315, boxLeft, boxTop);
The result looks amusing as a screenshot, but is more impressive (or just plain goofy) when played as an accurate, running timecode for the movie, as shown in Figure 6.
Figure 6. A rotated timestamp
Overall, the code for this example is fairly similar to that of the first one. Again
we create a text
Track and accompanying
which we populate with samples. The
addTextSample has a few
differences to superimpose the text onto the video:
fontSizeis 14, for better readability.
QDConstants.bold, for better readability.
StdQTConstands.dfKeyedTextis given as the
This use of
dfKeyedText produces a chromakey effect, replacing the background color (
QDColor.black, in our case) with the pixels from the video underneath. So the black box surrounding the text becomes invisible, and we just see the text on top of the video.
As before, the resulting movie is
flatten()ed out to a file,
this time called timecoded.mov, which you can open in QuickTime
Having done this simple little timecode with a text track, it should be
noted that QuickTime offers a real "timecode track" as one of the
many track types it supports. It is much more involved than is necessary for this tutorial, but if you have professional needs, check out the
TimeCodeMedia classes in QTJ.
Now that we've done some simple text tracks, the next step is to get into the good stuff: writing out video tracks from scratch. In our next article, we'll do just that, borrowing an image-to-movie effect from our favorite Civil War documentarian.
Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.