Visual Basic for Applications (VBA), the language used for scripting Microsoft
Word, isn't really known for its string-processing capabilities. You won't even
find regular expression support in VBA (though you can bolt it on by referencing
RegExp object, something I discuss in my book, Word Hacks). But
sometimes some string hacking is a quick and convenient way to solve a problem,
and the string functions VBA does provide are often up to the task.
I often write macros that look for paragraphs styled with one of several different
styles, such as any paragraph that's styled as Heading 1, List
Bullet, or Body Text Indent 3. A traditional approach to
this would be to build a simple array of those three styles, and then test
each paragraph's style for membership in the array, as the following code shows.
Note that an underscore (
_ ) is the line-continuation character in
VBA, indicating an optional line break.
Sub LookForSomeParas() Dim vParasToFind() As Variant Dim k As Integer Dim bIsInList As Boolean Dim para As Paragraph vParasToFind = Array("Heading 1", _ "List Bullet", _ "Body Text Indent 3") For Each para In ActiveDocument.Paragraphs bIsInList = False For k = 0 To UBound(vParasToFind) If para.Style = vParasToFind(k) Then bIsInList = True Exit For End If Next k If bIsInList Then ' Do stuff to paragraph End If Next para End Sub
A different method, and one that'll have you typing less than half the lines of code, uses a string to store the list of styles, as this snippet shows:
Dim sParasToFind as String sParasToFind = _ "/Heading 1/List Bullet/Body Text Indent 3/"
To use this kind of list, a macro would visit each paragraph, then check the
list to see whether the paragraph's style is in it, using the built-in VBA
function. If the string doesn't appear in this string list,
To be sure you don't get a false match when one style's name is actually part of another's (accidentally flagging Body Text when you're looking for Body Text Indent 3, for instance), a delimiter is included to mark the beginning and end of each entry in the string list.
Here's the complete macro:
Sub LookForSomeParasUsingAStringList() Dim sParasToFind As String Dim para As Paragraph sParasToFind = _ "/Heading 1/List Bullet/Body Text Indent 3/" For Each para In ActiveDocument.Paragraphs If InStr(sParasToFind, _ "/" & para.Style & "/") <> 0 Then ' Do stuff to para here End If Next para End Sub
While this may not be the best choice in every situation, it's a handy hack for quick-and-dirty list-membership tests, and as a bonus, it's more than twice as fast as the previous method.
I've used a slash (
/ ) as the delimiter in this example, because
it seems like a logical choice that most people can quickly recognize as a
separator. But in the case of testing names of Word styles, it's actually not
the best choice. Why? Well,
/ is a perfectly valid character for use in a style
name, which could cause incorrect results for the membership test. When using
this technique, it's best to choose a character that won't appear anywhere
in any of the entries in the list, or any of the items you'll be
checking for membership in that list. A better choice for this particular example
would be a semicolon, which Word doesn't allow as a style name. Making that
change results in the following:
Sub LookForSomeParasUsingString() Dim sParasToFind As String Dim para As Paragraph sParasToFind = _ ";Heading 1;List Bullet;Body Text Indent 3;" For Each para In ActiveDocument.Paragraphs If InStr(sParasToFind, _ ";" & para.Style & ";") <> 0 Then ' Do stuff to para here End If Next para End Sub
These string lists are useful in other situations, and I'll come back to them in the last section of this article. The next section discusses how to make up for the string-processing shortcomings of Word on the Macintosh and in Word 97.
Anyone who's tried to develop VBA macros for use with Word for Macintosh knows that it's a very different landscape than Word for Windows. It can be maddening to work around all the minor (and major) bugs and glitches in VBA on Word for the Macintosh. But at a company like O'Reilly, unlike in a more rigid corporate environment, we need to be able to accommodate authors and editors working on a variety of platforms and a variety of versions: Word 97, 2000, 2002, and 2003 for Windows; Word X for Mac OS X; and for a few users, even Word on Linux, using Crossover Office.
One of the biggest gotchas when working with Word for the Mac (and Word 97 for Windows, for that matter) is working around some conspicuously absent string functions.
With Word 2000, Microsoft included several important string-processing functions
with VBA, including two very useful ones common in other scripting languages,
Split turns a string into an array, separating the string
at a given delimiter, by default a space.
concatenating an array of strings into a single string and separating the smaller
strings with a given delimiter, also a space by default.
But now it’s 2004, and Word VBA on a Macintosh still doesn’t include those functions. And of course, anyone still plugging away on Word 97 (I know more than a few) can’t run macros that use those functions either.
In addition to being invaluable tools for general use in writing Word macros, these functions are an important part of one solution to another Mac VBA problem, discussed in the next section, along with those string lists I discussed above.
As an example of how handy that
Split function can be, let’s take a
look at style aliases in a document. A style alias is an additional name you
can give to a style. For example, given the style Heading 1,
you could add an alias, h1, so that you can just type h1 into
the Styles pull-down menu on the Formatting toolbar to apply the Heading 1 style.
In that case, the Heading 1 style would be listed as Heading 1,h1.
Style aliases are added to a style name by putting a comma at the end of the style name, followed by an alias. You can even add multiple aliases, such as Heading 1,HeadA,h1.
It’s often helpful to remove the aliases from all the styles in a document,
particularly before exporting the document to another program, such as QuarkXPress
or FrameMaker, which may not interpret them correctly. The
Split function makes
it very easy to do, as the following macro shows:
Sub RemoveAllStyleAliases Dim sty As Style For Each sty In ActiveDocument.Styles sty.NameLocal = Split(sty.NameLocal, ",")(0) Next sty End Sub
The above code uses the
Split function to get only the part of the style name
before the first comma (which is the first element in the array returned by
Split, accessed by its offset,
0), and it’s a handy utility function
to include in a distributed template. But if some of those people you’re distributing
it to are running Word on a Mac or are still on Word 97, they’ll be
greeted with compilation errors when they try to use the macro.
So what’s the solution? Create your own versions of those important
string functions. Here’s one version of a
Split function written using
VBA that will work on a Mac or with Word 97:
Function Split(ByVal strIn As String, _ Optional ByVal strDelim As String = " ", _ Optional ByVal lCount As Long = -1) _ As Variant Dim vOut() As Variant Dim strSubString As String Dim k As Integer Dim lDelimPos As Long k = 0 lDelimPos = InStr(strIn, strDelim) Do While (lDelimPos) ' Get everything to the left of the delimiter strSubString = Left(strIn, lDelimPos - 1) ' Make the return array one element larger ReDim Preserve vOut(k) ' Add the new element vOut(k) = strSubString k = k + 1 If lCount <> -1 And k = lCount Then Split = vOut Exit Function End If ' Only interested in what's right of delimiter strIn = Right(strIn, (Len(strIn) - _ (lDelimPos + Len(strDelim) - 1))) ' See if delimiter occurs again lDelimPos = InStr(strIn, strDelim) Loop ' No more delimiters in string. ' Add what's left as last element ReDim Preserve vOut(k) vOut(k) = strIn Split = vOut End Function
And here’s a
Join function, again written using code that will work
with Word on a Mac or with Word 97:
Function Join(ByVal vIn As Variant, _ Optional ByVal strDelim As String = " ") _ As String Dim strOut As String Dim k As Long Dim lUpperBound As Long lUpperBound = UBound(vIn) For k = LBound(vIn) To (lUpperBound - 1) strOut = strOut & vIn(k) & strDelim Next k ' Don't want to add delimiter after last element strOut = strOut & vIn(lUpperBound) Join = strOut End Function
Fortunately, you don’t really have to write all of your own from scratch.
Several web sites offer versions of these and other useful VB6 string functions
InStrRev) that you can use in your own macros. One such
site is http://www.freevbcode.com/ShowCode.asp?ID=17.
Note that if you want to use the string functions from that particular site
with Word on a Mac, you’ll need to remove the
Optional bCompare argument,
and all the related code, from each one that uses it.
Split function also comes in handy when working with string lists, as
These string lists act like pseudo-arrays, without the extra code needed to build up a proper array. The lists still behave a lot like arrays, and can even be accessed by offset. For example, given the list:
sBunnies = "/flopsy/mopsy/cottontail/"
you can get to
mopsy by using the
Split function, as the following snippet
MsgBox Split(Mid(sBunnies, _ 2, Len(sBunnies) - 2), "/")(1)
Mid function is needed here to trim off the leading and trailing
sBunnies. Once that's done, you can just use
Split to get at the elements of
the pseudo-array by offset. And once you've used the
Split function, your list
actually becomes a "real" array, because that's what
variant of type
Array. Just remember that the array returned by
has an offset of 0.
Like many distributed custom Word templates, the one we use at O'Reilly includes a significant amount of validation code. These procedures are used to help ensure that authors stick to a certain group of styles, to control which styles are available based on which book series the author is writing for, and to control certain template features, like context-sensitive formatting, which I discuss in my book.
All of this validation means that the template needs to "know" quite
a bit about which styles meet certain criteria. That information isn't part of the built-in Word Styles collection, of course, so it's stored in a separate,
custom-built class. The class is created dynamically when it's needed, using information
stored in a
Collection, kind of like the following simplified example:
Sub CollectionDemo() Dim colStyles As Collection Dim col As Collection Set colStyles = New Collection Set col = New Collection col.Add Key:="AllowedInTemplate", Item:=True col.Add Key:="UsesSmartQuotes", Item:=True colStyles.Add Key:="Heading 1", Item:=col MsgBox colStyles("Heading 1")("UsesSmartQuotes") ' Above MsgBox Displays "True" End Sub
The styles collection is then used to initialize the custom class, which allows other macros in the template to query that class for information on a particular style, such as whether paragraphs using that style should use "smart" (curly) or "straight" quotation marks.
When the custom class needs to contain information on the 100 or so styles used in the template, and each style has a dozen or so of these custom properties, it requires a very lengthy collection assignment statement, on the order of 1,000 lines or more. Not too much fun to type out, but Word VBA for Windows handles it just fine. But try a collection assignment that long on a Mac, and you get a cryptic error message. As best I could deduce, there was a limit to the number of lines of code I could include in a single procedure with VBA on a Mac. Breaking up the single, massive collection assignment into several smaller routines eliminated the error, but it felt like a kludge.
The fix I finally settled on—perhaps still a kludge, but at least a more interesting one—was to use a string list, like the ones described in the first section, "Using Strings for Simple Lists." All of the data about the styles is kept in a single string list, like this:
Dim sList As String sList = sList & ";;Heading 1;;True;;True;;" sList = sList & ";;Heading 2;;True;;False;;" sList = sList & ";;Body Text;;False;;True;;"
Note that when presented like this, our string list begins to strongly resemble a table, which is the goal.
With this method, the data about each style takes up just one line of code,
instead of the several needed to add it to the collection directly, like in
CollectionDemo example above. So how do you get the data from a "table" like
this into a collection? By parsing it with the
Split function, as shown in
the following macro:
Sub StringListToLoadCollectionDemo() Dim sDataTable As String Dim colStyles As Collection Dim col As Collection Dim vDataRows As Variant Dim v As Variant Dim vDataFields As Variant Set colStyles = New Collection ' Only need one line of code needed per style sDataTable = sDataTable & _ ";;Heading 1;;True;;True;;" sDataTable = sDataTable & _ ";;Heading 2;;True;;False;;" sDataTable = sDataTable & _ ";;Body Text;;False;;True;;" ' All the code below stays the same, ' regardless of the number ' of syles added to the "data table" above. ' Remove the leading and trailing ";;" ' from the big string, otherwise first ' and last row of the "data table" will ' have one extraneous pair of semicolons, ' either at beginning or end sDataTable = Mid(sDataTable, 3, _ Len(sDataTable) - 4) ' Parse the "data table" vDataRows = Split(sDataTable, ";;;;") For Each v In vDataRows vDataFields = Split(v, ";;") Set col = New Collection col.Add Key:="AllowedInTemplate", _ Item:=vDataFields(1) col.Add Key:="UsesSmartQuotes", _ Item:=vDataFields(2) colStyles.Add Key:=vDataFields(0), _ Item:=col Set col = Nothing Next v MsgBox colStyles("Heading 1")("UsesSmartQuotes") ' Above MsgBox Displays "True" End Sub
With this method, adding a new "field" of data about a particular
style requires adding only one line of code, within the collection-assignment
For Each loop. So as was the case with the string lists discussed in the first
method will save you some typing. Note also that, as with the bunny rabbit
example in the previous section, you need to trim off the leading
and trailing separator.
This certainly isn't anyone's idea of an ideal solution, but it is a quick
and dirty way to compress a lengthy collection assignment to the point that
Word on a Mac will at least process it. The big irony here is that the key
to the fix here, the
Split function, isn't included by default in Word VBA
on a Mac.
While string processing with VBA probably won't ever be as good as some other (let's face it—most other) scripting languages, the string functions VBA does offer can sometimes provide interesting ways to reduce coding time, improve code performance, and even present solutions to cross-platform development obstacles.
Andrew Savikas is the CEO at Safari Books Online. He sits on several boards, including the Book Industry Study Group, and is on advisory boards for Bookshare and the University of Michigan Press.
Previously, Andrew led the digital publishing and ebook program and strategy for O'Reilly Media as VP of Digital Initiatives, and was a Program Chair for the Tools of Change for Publishing conference.
Return to the Windows DevCenter
Copyright © 2009 O'Reilly Media, Inc.