A number of folks have written asking for more information on the process of making screencasts. Nowadays, my weapon of choice is Camtasia Studio. (Perhaps I should make a screencast showing how I use the product. But like the demonstration of Windows Media Encoder 9 I did in the February column, a Camtasia demo would be of interest only to Windows users. And since Camtasia costs $300 more than the free Windows Media Encoder, the appeal of such a tutorial would be even more limited.)
I've also been exploring different--and even costlier--options of late. A major challenge has been to synchronize audio narration with screen activity. So I picked up a copy of Adobe Premiere Pro and, in search of more precise control over my narrated videos, I've begun to climb its learning curve.
In general, screencasting is a three-step process: capture of audio and video, editing, and production of a compressed deliverable. Camtasia combines all three functions in a single, integrated application, but in principle they're separable. I can imagine using Camtasia (or an equivalent) for capture, Premiere (or an equivalent) for editing, and Camtasia (or an equivalent) to produce a compressed .SWF file.
All this, of course, is purely academic if you don't run Windows or aren't in a position to license commercial software in order to make screencasts. So in this column I'll focus on basic strategies that transcend specific tools. If you're on Windows, you might be using free stuff: Windows Media Encoder, or WME, for capture and Windows Movie Maker for editing. I haven't found a free capture tool for the Mac, but I have used WME to record Mac screen activity by way of a remote VNC session. For native video capture on the Mac I'm told that the relatively inexpensive Snapz Pro works well. Of course, iMovie, bundled with Mac OS X, is a capable low-end video editor. On either platform (and on Linux as well), Audacity is the cheapskate's weapon of choice for recording and editing audio narration.
In these scenarios you'll wind up producing either Windows Media (.WMV) or QuickTime (.MOV) files. Neither affords the simplicity of Flash (.SWF), the most universally accessible delivery format. But that may not matter crucially. Both Windows Media and QuickTime can yield compact, progressively downloadable files. And in each case there are freely available, cross-platform options: Windows Media Player for Mac OS X and QuickTime for Windows. Here's one way to think about the tradeoffs. If people like your screencasts so much that they demand more seamless playback, that's a good problem to have. You'll know that an investment in tools is justified. Meanwhile, focus on creating compelling content. To that end, here are some of the guidelines I've developed for myself.
Although you can capture your entire screen, you almost certainly don't want to. Even with the best compression, output files can weigh in at well over one megabyte per minute. Extraneous screen real estate is just costly overhead. And if the captured screen is larger than the playback window, things get really awkward for the viewer. Use the same rules that guide your delivery of any other kind of web content. In my case, I've concluded that 1024 by 768 is the hard limit, but if I can tell the story in 800 by 600, that's even better.
Whatever you decide, be sure to factor in the player's scrollbars and buttons. Screencasts are literal pixel-for-pixel representations of screen activity. When what you thought was just a web page starts working autonomously--and talking to you as it does so--it can be quite alarming!
It may sometimes be necessary to maximize the window containing your subject application, but avoid that if you can. Usually, I find it's possible to size the window smaller. Beyond shrinking the output file and averting playback conflicts, this can be a great way to tighten the visual focus and thus sharpen the impact of the screencast. In order to maintain focus, you may need to pan around inside that smaller window. That's OK. You can leave those transitions on the cutting room floor.
Here's a principle that also applies to ordinary static screenshots: Lose all unnecessary chrome. If your subject application is running in a browser, viewers probably don't need to see the title bar, toolbar, status bar, or scrollbars. The address window is relevant if you'll refer to the URLs displayed in it, otherwise not. Similarly, the linkbar is relevant if you're demonstrating bookmarklets, otherwise not. In general, whatever doesn't help you tell your story is just baggage. Dump it and focus on the story.
When the subject involves multiple applications, and/or multiple windows popping up within a single application, you'll want to set your capture tool for a rectangular region of the screen rather than for a specific window. Then run through the sequence, sizing everything to fit inside that rectangle.
When you've got a short story to tell, it may only consist of a single scene. You can do a lot in ninety seconds of narrated video. You might need a couple of takes, but you can probably create something that's directly usable without requiring post-production. As you attempt longer and more complex screencasts, though, it gets harder to avoid editing.
If you don't have a video editor that's compatible with your capture tool, clearly you won't be doing any editing at all. That needn't be a showstopper, though. You can tell a story in scenes by creating a series of short screencasts and presenting them on a web page.
If you do have a video editor, it's tempting to capture an entire session in a single pass. But even in that case it's probably a good idea to capture a series of modular chunks. Just because you can carve scenes from a single large file doesn't mean you should. Working a scene at a time can help you think about each scene's role in the larger production. And depending on your tools and work style, it may be more convenient to combine a set of small clips than to subdivide a single large one.
Note that multiple takes can be challenging when the plot involves state-changing interactions. If you visit a link in the browser, for example, it's going to be a different color in the next take--unless you clear your browser's memory of visited links between takes. When I made the del.icio.us screencast, I kept having to remove items I'd added to del.icio.us, so that I could add them again. And of course some actions are irreversible, like creating a New York Times account as seen in the single sign-on screencast. There's no general solution to this problem, I just strive for as much continuity as time and circumstances will permit.
Capture tools record every little detail of software interaction. A lot of this micro-behavior--wandering around with the mouse pointer, noodling with menus, moving and resizing windows --is (or should be) of great interest to interaction designers. It's quite instructive to see how you're constantly engaged in learning and self-correction, even while driving an application you think you know well. But unless the point of your story is to reveal these low-level and largely unconscious activities, nobody needs or wants to see them. If you ditch the false starts and random hovering, you'll accelerate and intensify your scene.
If you're doing single unedited scenes, the only way to eliminate this wasted motion is to run through a series of practice takes, observing and refining your micro-technique. If you're editing scenes, though, you can cut out the slack, and I recommend that you do. For example, it might take six seconds to type something into a text field. Slicing out the middle four seconds might seem pointless, but it's easily done and the effects really add up.
With this kind of tightening, you can squeeze an eight-minute screencast down to five minutes. That's the difference between one that people will think is too long, and one they'll judge to be just right. Note that this isn't simply a question of whether viewers can spare the extra three minutes. If the five-minute version preserves all of the essential action, it will be far more impactful than the equivalent eight-minute version.
Composing the audio narration and synchronizing it with the video is, for me, the hardest part of the job. If you have prior experience with voice recording--I didn't--that should help. But even so you're likely to find that syncing your voice with the action onscreen is a real challenge.
For short unedited scenes, you can do multiple takes until you get it right, or as close to right as is possible. For longer productions, though, I've adopted a very different work style. Initially I don't even try to narrate the scenes, I just capture them as video from which I trim all the fat. Then I dictate the audio for each scene in short segments. I save these sound clips in files, load them into the video editor, and arrange them to coincide with the onscreen action.
What happens next is a kind of two-way negotiation between the video and audio tracks. In some cases I'll extend a frame of video to cover a crucial bit of narration. In other cases I'll rerecord a snippet of audio so that it covers some crucial action onscreen. It's tedious to trade files between Audacity and a video editor, and that's one reason I'm investigating more robust video editors with fully-integrated audio editing. But the shoestring approach is the only one I've used so far, and clearly it's viable.
Of course this approach presumes that you're the sole narrator. That's not always true. Some of the screencasts I've done are conversations recorded over remote screensharing sessions. Those situations present fewer editing options. I usually record the audio and video tracks together, with the screen capture tool's built-in audio recorder, and then edit them as a single track. It's quick and easy to excise unwanted segments. It's also possible, I'm sure, to pull the tracks apart, edit them separately, and resynchronize them. But you'd need a really good reason to try.
Also in Primetime Hypermedia:
It's exciting to make a screencast, and you'll want to share it with the world right away. But first watch it carefully, from beginning to end, more than once. Continuity problems can creep in during the editing process. There's also a real danger of exposing confidential data--either on your own computer or, if you're recording a remote session, someone else's.
When I made the JotSpot screencast, for example, I noticed only after publishing it that Joe Kraus had inadvertently revealed his cell phone number while demonstrating JotSpot's email integration feature. I immediately published a new version that omitted that detail, but it was scary moment.
If I incorporate a more powerful video editor into my routine, I'll report back on that in a future column. Meanwhile, though, I'm curious to see what else can be done with free and inexpensive tools. I'm also eager for new ones to emerge. It would be great to have a screencasting equivalent to Audacity: free and open source, cross-platform, capable of handling all the basics.
One of the advantages of coining a word is that you can track the progress of its associated meme. Last fall, in collaboration with readers of my blog, I settled on the word screencast. A couple of months ago it drew 200 Google hits, today the number is 60,000. Screencasting may never have the mainstream appeal of podcasting, a word coined not long before that now draws 8 million Google hits. But the meme is spreading and I can't wait to see where it goes next.
Jon Udell is an author, information architect, software developer, and new media innovator.
Read more Primetime Hypermedia columns.
Return to the O'Reilly Network.
Copyright © 2009 O'Reilly Media, Inc.