Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Using the Subversion Client API, Part 1

by Garrett Rooney
04/24/2003

Subversion, as you probably already know, is a version control system written from scratch to replace CVS, the most popular open source version control system. While there are many reasons to choose Subversion, one of the most interesting is that Subversion has been designed and implemented as a collection of reusable libraries, written in C. This allows your programs to use the same functionality found in the command line Subversion client without having to call out to the command line client, to execute commands, or parsing output. This article briefly reviews the Subversion libraries, explains some of their data structures, and demonstrates the use of the Subversion client APIs in other programs.

Getting Started

Before you can jump into the code, you need to install Subversion. This article was written with release 0.20.0 of Subversion in mind. It would be best if you had that version. The installation instructions are available in the INSTALL file. If you don't like to compile your own software, you can try a binary distribution of Subversion .

If you're using an older version of Subversion, it's a good idea to upgrade at least to 0.20.0. If you're using a more current version, just watch your step. The Subversion project has not yet released version 1.0, the APIs are not yet fixed, and things may change. To get a good idea of the changes between 0.20.0 and the version you have, look at the CHANGES file in the Subversion tarball, specifically the "Developer-visible changes" sections. The general concepts discussed in this article will still apply to any version of Subversion.

Once you've installed Subversion, you'll need to become familiar with its general use. This article assumes some basic knowledge of how Subversion works. If you've never used it before, take a break and learn how before reading any farther. Some good resources for this are Rafael Garcia-Suarez's great articles on Single User Subversion and Multiuser Subversion.

What's The Point Anyway?

You may be thinking that using the Subversion libraries directly will add a bit of complexity to your life as a software developer. You'll need to to make your build process find the proper libraries and pass the correct flags to your compiler to link to them, not to mention learning a whole new API--there's a lot to do! I'd be surprised if you haven't at least thought about giving up now and simply writing a little wrapper library that calls the Subversion command line client.

Related Reading

Building Embedded Linux Systems
By Karim Yaghmour

What do you actually gain from using the libraries directly? Other than the efficiency gained by avoiding the overhead of starting processes for every action, you gain something much more important: correctness. You get access to all the information the client API can give you, rather than the limited subset of that information that the command line client can give you. The command line client is a fantastic program, but it was written to be a good generic command line tool, not to do whatever it is you've got in mind for your application. You'll likely do a better, more complete job by using the Subversion APIs directly.

The Basic Building Blocks

Like most other software systems, Subversion is built on a number of smaller bits of code. In order to use Subversion's API well, you'll need to understand its underlying libraries.

APR

To ensure maximum portability across a wide number of operating systems, Subversion is built on APR, the Apache Group's portability layer. The APR developers use doxygen markup to comment their code, so you can access their documentation online here. In addition to abstracting away various platform specific bits of functionality Subversion needs, APR provides a set of basic data structures such as hash tables and memory pools. We'll cover memory pools first, as they're probably less familiar.

Before we get into any APR data types, you'll have to learn how to initialize and shut down APR. Simply call apr_initialize before calling any APR (or Subversion) functions and use atexit or some other means to arrange for apr_terminate to be called at shutdown.

Rather than manually allocating and deallocating memory with malloc and free, Subversion uses APR's memory pools to manage memory. Create a pool using svn_pool_create, which is actually a thin wrapper around apr_pool_create, with a simpler interface and a few Subversion debugging tricks. Allocate memory from the pool with functions like apr_palloc and apr_pcalloc. You don't need to worry about freeing the memory. When you're done with everything you allocated out of that pool, just destroy the pool with svn_pool_destroy (again, a thin wrapper around apr_pool_destroy). It will free the memory for you.

This is kind of cool, since you only need to worry about freeing memory once, but it's nothing to write home about. The real benefit comes when you take advantage of chaining pools together in a hierarchy. You can create subpools inside your main pool (or inside other subpools, ad infinitum), and clear them with svn_pool_clear. This lets you avoid making the operating system allocate more memory for you, and can give a nice performance boost in some situations.

Unfortunately, you need to be careful with pools, because you can easily get into situations where a pool is growing without bound as you allocate from it within a loop. To avoid this situation, you have to use common sense. Create a subpool before the loop and clear it each time through the loop. To avoid losing access to things you allocate inside the loop, duplicate them into the parent pool--like this:

char **
function (int iterations, apr_pool_t *pool)
{
  /* allocate an iteration pool. */
  apr_pool_t *subpool = svn_pool_create (pool);

  /* allocate some memory to hold our results. */
  char ** array = apr_pcalloc (pool, iterations + 1);
  int i;

  for (i = 0; i < iterations; ++i)
    {
      char * result = some_function_that_takes_a_pool (i, subpool);

      /* duplicate the result into our main pool. */
      array[i] = apr_pstrdup (pool, result);

      svn_pool_clear (subpool);
    }

  /* clean up after ourselves. */
  svn_pool_destroy (subpool);

  /* return our results, safely allocated in our main pool. */
  return array;
}

As long as you're careful with how you use pools, you'll find that they greatly simplify the logic of your code. You can stop worrying about memory management and start concentrating on what your code actually does.

Another common APR data type used in Subversion is the apr_hash_t. This is just a standard hash table, designed to work with APR pools. It uses void pointers for its keys and values, so you can stick whatever you want in it, as long as you're careful to remember the type so you can cast the contents appropriately when you retrieve values.

Error Handling

In addition to the various data structures it inherits from APR, Subversion has a several fundamental data types. The most important of these is svn_error_t, used everywhere in the Subversion API.

Rather than returning a generic error code to indicate that a function has failed, Subversion uses its own "exception" object, svn_error_t, as the return value for all its functions that can fail. If a Subversion API function succeeds, it returns the value SVN_NO_ERROR (which is actually 0, to simplify error checking). If it fails, it returns a new svn_error_t. Each svn_error_t contains an apr_status_t--either the return value of the underlying APR function that failed or the Subversion specific error code.

All Subversion error codes are defined in svn_error_codes.h. There is also a const char * that describes what precisely went wrong, a pointer to another error (as svn_error_ts can be chained together), and the pool that the error was allocated from. When Subversion returns an error, you need to handle it, usually with svn_error_clear, in order to free the memory associated with the error and any other errors in its chain. All of the other error-handling functions are declared and documented in svn_error.h. Here's an example of a function that handles an svn_error_t.

void
handle_error (svn_error_t *error)
{
  svn_error_t *itr = error;

  while (itr)
    {
      char buffer[256] = { 0 };

      printf ("source error code: %d (%s)\n",
              error->apr_err,
              svn_strerror (itr->apr_err, buffer, sizeof (buffer)));

      printf ("error description: %s\n", error->message);

      if (itr->child)
        printf ("\n");

      itr = itr->child;
    }

  svn_error_clear (error);
}

You might notice that many function calls are wrapped in the SVN_ERR macro. This is just a quick way of saying that if the function returns SVN_NO_ERROR, we should continue on, but if it returns anything else, we return the error to our calling function, propagating the error up the call stack to be handled elsewhere. As long as your functions also return svn_error_t *s, you can use this macro.

So How Do I Use This Stuff?

All right, enough background, on to the actual Subversion client API.

Subversion is broken into several libraries. From the client developer's point of view, the most important is libsvn_client, which holds the functions used to implement the various commands you've seen in the command line client. This library is a wrapper around the underlying libraries which manage access to the working copy (your checked out copy of the contents of the repository) and to the repository (via a number of possible paths).

Some Common Characteristics

All the functions in this library share some common characteristics. First, they all assume textual data (paths, URLs, raw text like a log entry or a username, etc.) is UTF-8 encoded and uses newlines for line endings. This provides consistency between clients and avoids the need for cumbersome and unnecessary locale and line ending data tagging. To convert your data into UTF-8, use the functions in svn_utf.h. For the purposes of this article, we will assume all input is in ASCII and will avoid conversion to UTF-8.

Second, each function takes an apr_pool_t pointer to use for memory allocation.

Third, each function takes a pointer to a structure called svn_client_ctx_t. This "client context" serves as a container for several different things that are used across many libsvn_client functions. For example, all the Subversion functions that commit a change to the repository require the client to provide them with a log message. To do this, they use a callback function and baton that are stored in the client context. Similarly, many of the functions need to provide progress notification to the calling application and, eventually, to the user. The library functions call a notification callback-baton pair that are passed in via the context. The client context also caches configuration options, so the libraries don't need to read them in whenever they require them.

To use the rest of libsvn_client, you will have to fill in a bare minimum of the context. Here's an example of how to do it:

svn_error_t *
set_up_client_ctx (svn_client_ctx_t **ctx, apr_pool_t *pool)
{
  /* allocate our context, using apr_pcalloc to ensure it is zeroed out. */
  *ctx = apr_pcalloc (pool, sizeof (*ctx));

  /* read in the client's config options. */
  SVN_ERR (svn_config_get_config(&(*ctx)->config, pool));

  /* set up an authentication baton.  details of how to use libsvn_auth are 
   * beyond the scope of this article, but for more details on its use you 
   * can read the code for the subversion command line client and the 
   * comments in svn_auth.h. */
  {
    svn_auth_baton_t *auth_baton;

    apr_array_header_t *providers
      = apr_array_make (pool, 1, sizeof (svn_auth_provider_object_t *));

    svn_auth_provider_object_t *username_wc_provider 
      = apr_pcalloc (pool, sizeof(*username_wc_provider));

    svn_wc_get_username_provider 
      (&(username_wc_provider->vtable),
       &(username_wc_provider->provider_baton), pool);

    *(svn_auth_provider_object_t **)apr_array_push (providers) 
      = username_wc_provider;

    svn_auth_open (&auth_baton, providers, pool);

    (*ctx)->auth_baton = auth_baton;
  }

  return SVN_NO_ERROR;
}

The comments in svn_client.h provide more details on the contents of svn_client_ctx_t.

Compiling and Linking

All you need now to start using the Subversion libraries are a few details on how to compile and link against them. For all the examples in this article, you will have to link against libsvn_client-1, libsvn_auth-1, and libsvn_subr-1. The header files are located in $(PREFIX)/include/subversion-1, where $(PREFIX) is either the path you specified for --prefix when configuring Subversion or /usr/local by default. You should also include the output of svn-config --includes --cflags --libs in your compile and link lines.

Eventually, you should be able to forget about manually including Subversion's include and libs, allowing svn-config to take care of the details. But for now you will have to do it yourself. Here's the Makefile I used when preparing this article, which should get you far enough along to get things working.

PREFIX=/Users/rooneg/Hacking/article

CC=cc

CFLAGS=`$(PREFIX)/bin/svn-config --cflags` -Wall

INCLUDES=`$(PREFIX)/bin/svn-config --includes` -I$(PREFIX)/include/subversion-1

LIBS=`$(PREFIX)/bin/svn-config --libs` -L$(PREFIX)/lib -lsvn_subr-1 \
     -lsvn_auth-1 -lsvn_client-1 -lsvn_wc-1

.c.o:
        $(CC) $(CFLAGS) $(INCLUDES) -c $<

basic-client: basic-client.o
        $(CC) $(CFLAGS) $(LIBS) basic-client.o -o $@

clean:
        rm -rf *.o
        rm -rf basic-client

A Practical Example

Now you're ready to write an actual application that uses libsvn_client. Let's say your company makes a web application. You store everything in a Subversion repository and can simply check out some parts of your source onto the web server and things just work. Suppose also that several of your web developers are unskilled in using version control tools. To simplify life for them, you're writing an application to let them deploy a site from the tree to the server, query what versions of each file they have there, and update to new versions from the repository.

You will need to use at least three functions from libsvn_client. svn_client_checkout deploys the site for the first time to a new server. svn_client_status checks the versions deployed on a given server. svn_client_update deploys new versions of the site to an existing install on the server. We'll look at each function in turn.

As you might guess, svn_client_checkout implements the svn checkout command. It takes as arguments the URL to check out from the repository, a path that will become the root of the new working copy it creates, and a number that indicates which revision of the URL you want to check out. There's also a boolean flag that indicates if the checkout should recurse into subdirectories inside the URL. Besides these specific arguments, the normal libsvn_client function arguments apply; a client context and a pool. If you want to provide feedback to your user as the checkout takes place, you can provide a notification callback and baton inside the context to be called each time something happens. Here's an example of how your application could use it:

void
deploy_notification_callback (void *baton,
                              const char *path,
                              svn_wc_notify_action_t action,
                              svn_node_kind_t kind,
                              const char *mime_type,
                              svn_wc_notify_state_t content_state,
                              svn_wc_notify_state_t prop_state,
                              svn_revnum_t revision)
{
  printf ("deploying %s\n", path);
}

void
deploy_new_site (const char *repos_url,
                 const char *target_path,
		 svn_client_ctx_t *ctx,
                 apr_pool_t *pool)
{
  svn_opt_revision_t revision = { 0 };
  svn_error_t *err;

  /* set up our notification callback.  our callback doesn't use a baton, so 
   * we can just leave that blank. */
  ctx->notify_func = deploy_notification_callback;

  /* grab the most recent version of the website. */
  revision.kind = svn_opt_revision_head;

  err = svn_client_checkout (repos_url,
                             target_path,
                             &revision,
                             TRUE, /* yes, we want to recurse into the URL */
                             ctx,
                             pool);
  if (err)
    handle_error (err);
  else
    printf ("deployment succeeded.\n");
}

Now that your application can deploy a new website, it needs to be able to query the deployed version to find out which versions of each file are there. This needs svn_client_status, the routine that implements the core of the svn status command. svn_client_status is a bit more complicated than svn_client_checkout, as there are more variations.

If the "descend" argument is TRUE, it recurses down a path in a working copy, filling in an apr_hash_t with keys that contain each entry's path and values that are svn_wc_status_ts. Otherwise, it just reads the entries in the top level of the directory structure.

To check the status of the entry in the working copy against that of the repository, you can pass TRUE as the "update" flag. The svn_wc_status_ts in the hash will have their repos_text_status and repos_prop_status members filled in appropriately. This will also fill in the youngest argument with the number of the most current revision in the repository.

Use the "get_all" argument to switch between fetching all entries in the working copy in the hash or only the "interesting" entries (either locally modified or out of date compared to the repository). If you don't want the svn:ignore property to control which entries are seen, pass TRUE for the 'no_ignore' argument. As with svn_client_checkout, any notification callback will be called, along with the context's notification baton, for each entry placed in the hash. The following example uses svn_client_status to print the revision numbers of everything in the deployed site.

void
print_revisions (const char *deployed_site,
                 svn_client_ctx_t *ctx,
                 apr_pool_t *pool)
{
  apr_hash_t *statuses;
  svn_error_t *err;

  err = svn_client_status (&statuses,
                           NULL,
                           deployed_site,
                           TRUE,  /* descend into subdirs */
                           TRUE,  /* get all entries */
                           FALSE, /* don't hit repos for out of dateness info */
                           FALSE, /* respect svn:ignore */
                           ctx,
                           pool);
  if (err)
    {
      handle_error (err);
      return;
    }

  /* loop over the hash entries and print them out */
  {
    apr_hash_index_t *hi;

    for (hi = apr_hash_first (pool, statuses); hi; hi = apr_hash_next (hi))
      {
        const svn_wc_status_t *status;
        const char *path;
        const void *key;
        void *value;

        apr_hash_this (hi, &key, NULL, &value);

        status = value;
        path = key;

        if (status->entry)
          printf ("%s is at revision %" SVN_REVNUM_T_FMT "\n",
                  path, status->entry->revision);
        else
          printf ("%s is not under revision control\n", path);
      }
  }
}

The final feature for your application is the ability to update a deployed site to a newer version, using svn update. As you might suspect, the libsvn_client function for this is svn_client_update. Fortunately, this is much simpler than svn_client_status. Pass it the path to the deployed site, an svn_opt_revision_t identifying the version to which to update, and a flag to allow or disallow recursion into subdirectories. As usual, it takes pool and context arguments, and any notification callback in the context will be called for each updated entry. Let's see how this can be used to update our deployed website to the latest revision in the repository.

void
update_notification_callback (void *baton,
                              const char *path,
			      svn_wc_notify_action_t action,
			      svn_node_kind_t kind,
			      const char *mime_type,
			      svn_wc_notify_state_t content_state,
			      svn_wc_notify_state_t prop_state,
			      svn_revnum_t revision)
{
  if (action == svn_wc_notify_update_completed)
    printf ("Updated %s to revision %" SVN_REVNUM_T_FMT "\n", path, revision);
}

void
update_deployed_site (const char *deployed_site,
                      svn_client_ctx_t *ctx,
                      apr_pool_t *pool)
{
  svn_opt_revision_t revision = { 0 };

  revision.kind = svn_opt_revision_head;

  ctx->notify_func = update_notification_callback;

  err = svn_client_update (deployed_site, *revision, TRUE, ctx, pool);
  if (err)
    {
      handle_error (err);
      return;
    }
}

Conclusion

There you have it, a simple set of functions that take the existing functionality of libsvn_client and apply it to your specific problem. Due to the design of Subversion, you can do this much more flexibly than by wrapping up an existing command line client. My next article will look at how to extend this to provide the ability to edit the deployed files and to commit the changes back into the repository, using the rest of libsvn_client.

Garrett Rooney is a software developer at FactSet Research Systems, where he works on real-time market data.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.