- UniCycling in Vim
- XMLUnicoding in Emacs
A while back I tried out a Unicode-aware editor called
Mined. I found out about it from reading
Ed Trager’s A Quick Primer On Unicode and Software Internationalization Under Linux and UNIX, which I wrote a short item about earlier.
One thing I really liked about Mined was its support for automatically inserting “smart quotes” while you type. At the time I saw it, I thought: Hey, wouldn’t it be nice if my main editor of choice, Vim, let me do that? Well, now it does…
UniCycling in Vim
It’s called UniCycle because it cycles through different unicode characters as you’re typing them. It’s similar to the “Smart Quote” feature in Word except it’s easier to get back to a dumb quote if that’s what you really want: just hit the quote key again and it’ll cycle to the next character.
It works with hyphens (turning them into en and em dashes), periods (turning them into horizontal ellipses), apostrophes (turning them into left or right single quotation marks), and quotes (turning them into left or right double quotation marks).
How to install UniCycle
The script now has its own page at the Vim.org site and you can download it from there. As with other Vim scripts, you can install it just by dropping it into your Vim plugins directory; that directory is
~/.vim/plugin by default; if you don’t know if you have such a directory, you don’t really need to bother to check, because you can just do a quick install like this:
mkdir -p ~/.vim/plugin && cd ~/.vim/plugin && wget -O unicycle.vim http://www.vim.org/scripts/download_script.php?src_id=4689
Some short (optional) config
There are no additional install or config steps required, because once you’ve installed UniCycle in your
~/.vim/plugin directory, vim will load it automatically each time it starts up. That said, though, there are a couple things you might want to add to your
~/.vimrc file to make UniCycle work better.
" Turn UniCycle on by default for all XML and XSLT files autocmd FileType xml,xslt UniCycleOn " make the vim command-line 2 lines high so that we can see secret " messages emitted by UniCycle set cmdheight=2
As far as the
cmdheight=2 part, I’ll say more about that in a minute.
Make sure vim starts in a UTF-8 environment
Before you start up vim and give UniCycle a try, make sure to launch Vim in a UTF-8-ready way. Otherwise, it‘s not going to work they way you would expect.
There are a couple of ways to launch vim in a UTF- 8-ready way:
A. Gvim way
gvim instead of
vim, and start it up like this:
That will launch Gvim in a separate X-Window and you’ll be all ready to go.
B. Unicode X-terminal way
Start up a Unicode-enabled terminal such as mlterm or
xterm and then run the
vim command there.
LC_CTYPE=en_US.UTF-8 xterm -u8 -fn '-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1'
After you’ve started
gvim, open a new or existing
*.xsl file, hit
i to get into insert mode, and type a quotation-mark (
") character. If you did the Some short (optional) config step above, you should now see a curly left quotation mark, and a message in the vim “command-line” (a the bottom of the frame) saying “LEFT DOUBLE QUOTATION MARK” (that’s what the
cmdheight=2 line in your
~/.vimrc file is for; it expands the command line so that you can see these messages). Hit
" again, and you’ll see a message saying just “QUOTATION MARK”. Hit it one more time, and you’ll see “RIGHT DOUBLE QUOTATION MARK”.
If that all works for you as expected, try typing an apostrophe or dash. You’ll see that vim (uni)cycles through character choices for them just as it does for the quotation mark. Then try typing three dot/period characters in a row, and you’ll see vim replace them with a real ellipses chararacter.
If you didn’t do the Some short (optional) config step above or if you did but you’re still not seeing the behavior described above, manually type the
:set cmdheight=2 commands and then try again.
You should now see the “LEFT DOUBLE QUOTATION MARK” message. But if you see weird boxes or spaces or garbage characters where you’re expecting to see curly quotation marks, then it probably means you are not actually running vim in UTF-8-ready way.
And if you see the message, but the quotation marks that appear don’t look so curly, it probably just means your default font doesn’t have good glyphs for curly quotes. So either try switching to a different font in your X-terminal; or if you are using Gvim, change the font by typing
:set guifont=Monospace\ 13 (or whatever font and size you want to try).
If it all works out, you’ll end with an easy way to type curly quotes and em/en dashes and ellipses in docs you edit in Vim. If it doesn’t, well, you can always consider switching to Emacs.
XMLUnicoding in Emacs
Vim is my main editor of choice, but there are some things for which Emacs currently provides a better editing environment. For example, there currently is no way to do context-sensitive validated editing in Vim. But there is a way to do it in Emacs. A very good way: using James Clark’s nXML mode.
nXML is a mighty piece of work. It’s hard to imagine now how I ever did any XML editing without it. As good as it is, though, when I first started using it to edit UTF-8-encoded documents, I found myself thinking: Hey, now that I can actually work with a document format that allows real (Unicode) special characters (instead of just some ascii escape code or entity or whatever for representing those characters), wouldn’t it be great if I had an easy way to actually directly enter those special characters ― especially characters for curly quotes and em/en dashes?
At the same time I was just sitting around dreaming about it, Norm Walsh was actually doing something about it; namely, cooking up something in Emacs lisp to make it work. The result is a package he named XMLUnicode.
Around the time when Norm released XMLUnicode, he also wrote up a blog entry about it, describing the variety of ways it gives you to enter special characters.
But to describe it briefly: it lets you enter smart quotes, em/en dashes, and ellipses in a way very similar to what UniCycle does, plus more.
(In fact, I guess that it’s a little odd to describe it that way, since it was around for quite a while before UniCycle and was actually, I believe, a big part of the inspiration for UniCycle.)
How to install XMLUnicode (and nXML mode)
Before installing XMLUnicode, you’ll probably first want to install nXML. It may already be packaged for your distro, so check first. For example, on a Debian system, you can install it with this command:
sudo apt-get install nxml-mode
To install it manually, you need to put it somewhere in your Emacs load path. If you have root access on the system where you want to install it, the appropriate place is probably
/usr/local/share/emacs/site-lisp. So do something like this:
cd /usr/local/share/emacs/site-lisp/ && sudo wget http://www.thaiopensource.com/download/nxml-mode-20041004.tar.gz && sudo tar xvfz nxml-mode-20041004.tar.gz
Install XMLUnicode itself with a similar set of commands:
cd /usr/local/share/emacs/site-lisp && sudo wget http://nwalsh.com/emacs/xmlchars/xmlunicode.el && sudo wget http://nwalsh.com/emacs/xmlchars/unichars.el
Getting nXML and XMLUnicode set up and available within Emacs takes a little more work than getting UniCycle working in Vim, but not too much more.
Some (non-optional) configuration
To configure nXML, and to configure XMLUnicode for use within nXML mode, add the following to your
.emacs startup file.
;;; nxml setup ;; load autoloads for nXML mode (load "rng-auto.el") ;; auto-start nXML mode for *.xml and *.xsl files (setq auto-mode-alist (append (list (cons "\.xml\'" 'nxml-mode)) auto-mode-alist)) (setq auto-mode-alist (append (list (cons "\.xsl\'" 'nxml-mode)) auto-mode-alist)) ;;; end of nXML setup ;;; xml-unicode.el setup ;; The xml-unicode.el code relies on some Common Lisp functions, ;; so you need to make sure the the Common Lisp package is loaded ;; before loading xml-unicode.el (require 'cl) ;; location where unichars.el file is installed; needs to be ;; specified before xmlunicode is loaded (setq unicode-character-list-file "/usr/local/share/emacs/site-lisp/unichars.el") (load "xmlunicode") ;; Set up xmlunicode for use within nXML mode (defun bind-nxml-mode-keys () (set-language-environment "utf-8") (define-key nxml-mode-map """ 'unicode-smart-double-quote) (define-key nxml-mode-map "'" 'unicode-smart-single-quote) (define-key nxml-mode-map "-" 'unicode-smart-hyphen) (define-key nxml-mode-map "." 'unicode-smart-period) ;; display UniChar menu when in nXML mode (define-key nxml-mode-map [menu-bar unichar] (cons "UniChar" unicode-character-menu-map)) ;; set input method to "xml" (xmlunicode) when in nXML mode (set-input-method 'xml)) ;;; End of xmlunicode setup
After you’ve started Emacs, visit a new or existing file with a
.xml extension (
foo.xml or whatever), and type a quotation-mark (
") character. You should now see a curly left quotation mark. Hit
" again, and you’ll see a regular straight quotation mark. Hit it one more time, and you’ll see a curly right quotation mark.
If that all works for you as expected, try typing an apostrophe or dash. you’ll see that Emacs cycles through character choices for them just as it does for the quotation mark. Then try typing three dot/period characters in a row, and you’ll see Emacs turn replace them a real ellipses chararacter.
You’ll also notice that your Emacs now has a
UniChar menu that you can use to insert a variety of other special characters. And that’s not the only additional feature that XMLUnicode provides for inserting special characters ― read the docs for it to find out more.
If the quotation marks that appear don’t look so curly, it probably just means your default font doesn’t have good glyphs for curly quotes. So try switching to a different font in your Emacs.
If it all works out, you’ll end with an easy way to type curly-quotes and em/en dashes and ellipses in any UTF-8-encoded docs you want to edit in Emacs ― and also a menu and some additional commands for easily adding other special characters. If it doesn’t work out, well, you can always consider switching to Vim and using UniCycle.
Other methods for entering special characters in your favorite text editor?