This is the second of a two-part series on creating QuickTime movies "from scratch" in Java. By that, I mean we're creating our own media data, piece by piece, to assemble the movie. Doing things at this low level is tricky, but I hope you'll agree after this installment that it's remarkably powerful.
Part 1 began with the structure of a QuickTime movie as a collection of tracks, each of which has exactly one
Media object that in turn
references media data that can be in the movie file, in another file, or out on the
Media has tables that indicate how to find specific
"samples," individual pieces of audio, video, text, or other content
to be rendered at a specific time in the movie. Part 1 used easy-to-create
text tracks to show how to build up a
Media structure, first by
creating a simple all-text movie and then by adding textual
"time-code" samples as a new text track in an existing movie.
In this part, we'll move on to creating video tracks from scratch, building up a video media object by adding graphic samples.
The goal of this article's sample code is to take a graphic file and make a movie out of it by "moving" around the image — you may have seen this concept in iMovie, where Apple calls it the "Ken Burns Effect," after the director who used it extensively in PBS' The Civil War and other documentaries. There is also a shareware application called Photo to Movie that does much the same thing.
Download the source code for the examples.
We can make this work because of the concept of persistence of vision, which says that the human eye perceives a series of images, alternated sufficiently quickly, as motion. To do an image-to-movie effect, we show slightly different parts of the picture in each distinct image or "frame," creating the illusion of moving from one part of the picture to another.
In creating text tracks, the approach was to:
Mediaobject to it.
MediaHandlerand use that to add samples to the
The same approach generally works for video, except that the
VisualMediaHandler doesn't do anything for us. Instead, we need
to create a compression sequence, or
CSequence, to prepare
samples, encoded and compressed with a codec supported by QuickTime. We'll
then add these samples directly to the
CSequence class has a method called
compressFrame, which is what we need to generate samples. Its
public CompressedFrameInfo compressFrame(QDGraphics src, QDRect srcRect, int flags, RawEncodedImage data) throws StdQTException
That doesn't look too bad. We just need a
QDGraphics as the
source of our image, a rectangle describing what part of the image to use, some
behavior flags, and a
RawEncodedImage buffer into which to put the compressed
"So what's a
QDGraphics?", you might be wondering. The name is
presumably meant to evoke thoughts of the AWT's
the two are remarkably similar: each represents a drawing surface, either
on-screen or off-, containing methods for drawing lines, circles, arcs,
polygons, and text strings.
One clever thing that
QDGraphics does under the covers is to
offer an isolation layer to hide whether the drawing surface is on-screen or
off-screen unless you specifically ask for it, and what native structures
GWorld) are involved. One odd
side effect of this arrangement is that while there are many
getGWorld() methods throughout the QTJ API, there's no
GWorld class to return, so you get
In fact, the
GraphicsImporter offers a
getGWorld(), and if you guessed that this class offers a way to get an
image into QuickTime, you're right. So now we have some idea of how we're
going to connect the dots to make a movie from an image:
GraphicsImportercan read an image file.
getGWorld()that returns a
QDGraphicscan go to
compressFramecan be added to our
One strategy for getting the frames is to:
Get starting and ending rectangles, where a rectangle is a
QDRect representing an upper-left corner point and width by height
Calculate a series of intermediate rectangles that take us from the
startRect to the
For each of these intermediate
compressFrame to make a frame from that portion of the original
image. Add each frame as a sample.
If you have QuickTime 5 or better, you can see the result here.
This strategy works, but it is limited by the size of the original image. This is pretty much a fatal flaw. If the image is only slightly larger than the movie size (i.e., the size of the rectangles), there isn't much room to move around. If it's smaller than our movie, then it won't work at all. On the other hand, if the image is much larger than our desired movie dimensions, then we might not be able to get the parts of the picture we want — it's not very useful if we can't get someone's entire face in the movie, and instead settle for a shot that moves from their nose to their chin.
Scaling the image would be a nice improvement, but we can actually do better
than that. If we could scale each
fromRect, then we could
"zoom" in or out of the picture by using progressively larger or
smaller source regions. But how do we do this?
Part 1 demonstrated how QuickTime's
Matrix class could be used to define a spatial transformation. Mainly, we used it to move text located at
(0,0) to a point at the bottom of a movie, but look at the javadocs and you'll
see some intriguing methods, like
The key to our improved strategy is a method called
that combines a coordinate mapping with a scaling operation. This allows us
to use any source rectangle and scale it to the size of the frames we're
compressing for the movie.
To make this work, the sample code creates an offscreen
QDGraphics and tells the
GraphicsImporter to use this
draw()s. The new
QDGraphics's dimensions are
the same as those of the frames we intend to compress. That means its bounds
QDRect with upper-left corner 0,0 and constant dimensions
VIDEO_TRACK_HEIGHT (which I've
set to 360 by 240, but you're welcome to change in the code). For each
fromRect, we create a
Matrix to map from
fromRect's QDRect to our
The revised process looks like this:
QDRectrepresenting an upper-left corner point and width by height dimensions.
fromRects, use a
Matrixto scale the rectangle into the bounds of an offscreen
QDGraphics, draw it into the
QDGraphics, and then call
compressFrameto make a frame from the offscreen
QDGraphics. Add each frame as a sample.
Given that strategy, let's step through the code that makes it all work.
We'll skip over creating the movie itself, which we covered last time.
Similarly, creating and adding the
VideoMedia are a very straightforward analogue to last article's
If this is your first time compiling and running QuickTime for Java
code, see my earlier article, "A Gentle Re-Introduction to QuickTime for Java," for information on how to work out
CLASSPATH and Java versioning issues.
To get things started in this example, we need to know the source image
file, as well as the
that define the movie we are to make. The sample code expects a makecsequence.properties file to be in the current directory, with
entries that look something like this:
file=/Users/cadamson/Pictures/keagy/DSC01763.jpg start.x=545 start.y=370 start.width=1500 start.height=1125 end.x=400 end.y=390 end.width=800 end.height=600
If this file is absent, the user will be queried for an image file at runtime, and the rectangles will be chosen randomly.
QTFile for the image file, creating the
GraphicsImporter is quite straightforward:
GraphicsImporter importer = new GraphicsImporter (imgFile);
Next, we create the offscreen
QDGraphics and tell the
GraphicsImporter to use it for its drawing:
QDGraphics gw = new QDGraphics (new QDRect (0, 0, VIDEO_TRACK_WIDTH, VIDEO_TRACK_HEIGHT)); importer.setGWorld (gw, null);
Notice that I inadvertently called the variable
gw, as in
"GWorld". The use of that term in the API and Apple's docs is
One thing we have to prepare early is a block of memory big enough to hold
the largest possible frame that the chosen video compressor could create. To
do this, we call a
getMaxCompressionSize() method, allocate a
block of memory of that size (as referenced by a
"lock" the handle so it can't move while we're working with it.
Finally, we can create a
RawEncodedImage object with this buffer:
int rawImageSize = QTImage.getMaxCompressionSize (gw, gRect, gw.getPixMap().getPixelSize(), StdQTConstants.codecNormalQuality, CODEC_TYPE, CodecComponent.anyCodec); QTHandle imageHandle = new QTHandle (rawImageSize, true); imageHandle.lock(); RawEncodedImage compressedImage = RawEncodedImage.fromQTHandle(imageHandle);
CODEC_TYPE is a constant defined early in the sample code.
It is an
int that indicates which QuickTime-supported compression
scheme we've chosen to use, "codec" being the term for a
scheme by which video is encoded and decoded. Many of these are provided as
constants in the
StdQTConstants class. Among the popular choices
kCinepakCodecType. Cinepak is a widely supported codec dating
back to the early 90s. However, its image quality and compression ratios
aren't very compelling anymore.
kSorensonCodecType. Sorenson Video pretty much replaced
Cinepak for a lot of QuickTime users with its higher quality and great
kH263CodecType. H.263 is a codec originally
designed for videoconferencing but widely used in other environments. It is
also supported by Windows Media Player, the Java Media Framework, and is a simple form of MPEG-4 video.
kAnimationCodecType. A compressor meant for use with
synthetic images. Apple's demo code uses this a lot, but that's because their
sample apps create their own image data. Our photo doesn't compress well with
the Animation codec, so only use it here if you want to be shocked by how big
the resulting file is (hint: make sure you have at least 15MB free!).
There are more supported codecs than QTJ lets on, but you have to look in
the native API's
ImageCompression.h to find them. Two great
Sorenson 3. A newer version of the Sorenson codec, Sorenson 3 is available in QuickTime 5
and up. The identifier for this codec is
SVQ3, so to create the
int that QuickTime wants, we take the bottom eight bits of each
character (we pretend that we've gone back in time and Unicode doesn't exist
int value is
MPEG-4. You can use MPEG-4 video in a regular QuickTime
container, with the caveat that only QuickTime will be able to read it —
to create a real
.mp4 file, you'd need to use a
MovieExporter, as shown in the article
on the QuickTime File Format. For our current purposes, the codec type of
MPEG-4 video is
mp4v, which translates to the
The next thing we do is to create a
CSequence. This object
provides us the ability to compress frames. We have to call this with each
frame to compress, in order, and there's an interesting reason for this. If we
were using a compression scheme meant for single images, such as JPEG, we could
do the images in any order, since each frame would have all of the information it
needed to be decompressed and rendered. This is generally not true of
video compression schemes, which often use "temporal compression":
techniques to compress data by eliminating redundant information
between frames, such as an unchanging background. Because of this approach,
decoding a given frame might depend on information from one or more previous
frames, which is why we have to do our compression through an object that
understands that we're working with a series of images.
CSequence constructor looks like this:
CSequence seq = new CSequence (gw, gRect, gw.getPixMap().getPixelSize(), CODEC_TYPE, CodecComponent.bestFidelityCodec, StdQTConstants.codecNormalQuality, StdQTConstants.codecNormalQuality, KEY_FRAME_RATE, null, 0);
These arguments are, in order:
QDGraphics src: the
QDGraphicsfrom which to get image data. In our case, the offscreen
GWorldinto which we draw.
QDRect srcRect: the portion of the
srcto use. In our case, the whole thing.
int colorDepth: an
intindicating the likely depth (4-bit color, 32-bit color, etc.) at which the frames are likely to be viewed. Pass
0to let the Image Compression Manager choose for you. More info lives in the docs for the native function.
int cType: the codec type, as described above.
CodecComponent codec: often used to request a specific behavior of the given codec, such as the
int spatialQuality: a quality setting for the images, from
codecMinQuality, through low, normal, and high, up to
codecMaxQualityand, in for codecs that allow it,
int temporalQuality: the quality setting for inter-frame compression, with values as above.
int keyFrameRate: the maximum number of frames allowed between "key frames," which are the frames that have all of the information they need, and that may be needed for multiple subsequent frames to decompress.
ColorTable clut: a custom color lookup table, often set to
nullto let QuickTime use the table from the source image.
int flags: one or more behavior flags, logically
OR'd together. One interesting option is
codecFlagWasCompressed, which hints that the source image was previously compressed and gives the codec a chance to compensate for the artifacting and other image degradation that occurs when an image has been compressed with a lossy codec (like JPEG).
Once we've created the
CSequence, we get an
ImageDescription object, which we'll need later when adding
samples to the
Now we can start the loop to draw, compress, and add frames. We calculate a
fromRect, inside of the original image. This will be the
source of this frame. Next, we create a
Matrix that maps and
scales from its original location and size to the offscreen buffer's location
and size; in other words, a rectangle at (0,0) with dimensions
GraphicsImporter.draw() performs the scaled drawing of the region
into the offscreen
Matrix drawMatrix = new Matrix(); drawMatrix.rect (fromRect, gRect); importer.setMatrix (drawMatrix); importer.draw();
Next, we compress the image that was drawn into the offscreen
CompressedFrameInfo cfInfo = seq.compressFrame (gw, gRect, StdQTConstants.codecFlagUpdatePrevious, compressedImage);
The arguments to this call are:
QDGraphics src: the source image to compress.
QDRect rect: what portion of that image to use.
int flags: behavior flags. Among the most useful is
codecFlagUpdatePrevious, which is used for codecs that use temporal compression. Another interesting option not needed here is
codecFlagLiveGrab, which you'd use if you were generating images from a live source, possibly image capture, and needed to compress frames as quickly as possible. In the typical QuickTime style, the desired flags are mathematically
RawEncodedImageinto which the compressed frame will be written. This is the object we made sure was big enough with that
compressFrame call returns a
CompressedImageInfo object, which has an important method called
getSimilarity(). This value represents how similar the compressed
image is to the one compressed just before it. A value of 255 means the images
are identical. 0 means the compressed frame is to be a "key frame,"
meaning it has all the image data it needs, it does not depend on other frames,
and other frames may depend on it. Other values simply represent image
difference, where low values mean low similarity.
With the frame now compressed into the
RawEncodedImage, we can
add a sample to the
VideoMedia, with the
method inherited from the
videoMedia.addSample (imageHandle, 0, cfInfo.getDataSize(), 20, imgDesc, 1, (syncSample ? 0: StdQTConstants.mediaSampleNotSync) );
The arguments to this method are:
QTHandleRef data: a reference to the sample data; in this case, to the
int dataOffset: an offset into the
data. This is
0in our case, since we're using all of the
RawEncodedImagethat was populated by
int dataSize: the number of bytes of
data, starting at
dataOffset, to use. Again, we're using the whole
int durationPerSample: how long this sample lasts, expressed in units of the media's timescale. Since our timescale is 600, a duration of 20 equals 1/30th of a second.
SampleDescription sampleDesc: an object that tells the media what to do with the sample data being passed in. This is why we got an
int numberOfSamples: the number of samples provided by this call. For video, this is typically one frame. For other kinds of media, there are some performance considerations described in the native docs.
int sampleFlags: behavior flags. The interesting value here is whether or not this is a "key frame," also known in QuickTime as a "sync sample." We set the
mediaSampleNotSyncflag if our earlier call to
CompressedFrameInfo.getSimilarity()returned non-zero. Note that failing to set this flag correctly is a popular cause of movies that "blur" when scrubbed or played from any point other than the first frame, as explained in an Apple tech note.
Once the loop finishes, we do the same clean-up tasks as with the text-track samples in Part 1 -- declare that we're done editing and insert the media into the video track:
videoMedia.endEdits(); videoTrack.insertMedia (0, // trackStart 0, // mediaTime videoMedia.getDuration(), // mediaDuration 1); // mediaRate
Finally, we save the movie to disk, exactly as before.
Here, for those with QuickTime 5 or 6, is a videotrack.mov movie produced by the sample code. If you recompile and re-run the code with different codecs and different sizes, you'll see some fairly dramatic differences in file size and image quality. I've used 160x120 to keep the file size small, in order to avoid abusing O'Reilly's bandwidth, and the compression artifacts here are more visible than in the 320x240 version.
Also remember that while we just copied a scaled section of an image into
the offscreen buffer, you can do any kind of imaging with this buffer before
compressing it into a frame. For example, you could do the drawing commands in
QDGraphics class, or use the
QTImageDrawer to use
Java 2D Graphics methods to draw into the QuickTime world. With some
bit-munging, you might even find a way to render 3D graphics from JOGL into QuickTime ... anyone up for rendering Finding Nemo directly into a QuickTime movie?
This completes our tour of QuickTime media structures, in which we've gone from the high-level view of what makes up a movie to the low-level mucking around with individual samples. This is a little "closer to the metal" than QTJ usually requires, but if you believe in keeping simple tasks easy and complex tasks possible, this has been an example of the latter.
Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.