AddThis Social Bookmark Button

Print

Serialization in .NET, Part 1

by Dan Frumin
01/26/2004

Overview

For many years, research scientists have promised us that memory will become unlimited and persistent. Unfortunately, this memory nirvana has not come to pass. Considering that most applications are meant to be used more than once, this places the requirement of persistence on the developer. Even if memory were to become persistent, the need to exchange data across multiple applications or computers would again place the developer in a position to implement some form of persistence mechanism. Serialization of data using built-in .NET support makes persistence easy and reusable. In this article, we will review the support available for serialization and look at a couple of scenarios for using it.

Introduction to Serialization

So, what is serialization? Semantically, serialization is the act of publishing or producing an item in the form of a series of information bits. This is a fancy way of saying that serialization is taking some data structure and pumping it out into a stream of bytes that we can then use. A concrete example will help us out; let's say that we have a data structure in memory representing an employee. The data structure for the class has the following form:

public class Employee
{
  public string FirstName;
  public string LastName;
  public DateTime StartDate;
  public int Age;
  private int EmpID;
  ...
}

For the purposes of our application, we'll further assume that the EmpID is being set randomly by the constructor and is only used for internal processing. The need for a private member will become evident in the examples that follow.

Related Reading

.NET Framework Essentials
By Thuan L. Thai, Hoang Lam

At some point, our application will likely need to persist this data structure to disk, minimally so that the user can close the application down and reopen it. Traditionally, we write some code that would go through the list of employees and add a record for each employee into a database. When we want to, we read the records from the database back into memory. In a sense, we serialized the data structure for each employee into a series of bytes in the database and then deserialized the bytes back into a data structure at a future time. This is a bit of a simplification, but I suspect you get the point.

Now that we know what serialization is, let's consider what it's good for. Most uses of serialization fall into two categories: persistence and data interchange. Persistence allows us to store the information on some non-volatile mechanism for future use. This includes multiple uses of our application, archiving, and so on. Data interchange is a bit more versatile in its uses. If our application takes the form of an N-tier solution, it will need to transfer information from client to server, likely using a network protocol such as TCP. To achieve this we would serialize the data structure into a series of bytes that we can transfer over the network. Another use of serialization for data interchange is the use of XML serialization to allow our application to share data with another application altogether. As you can see, serialization is a part of many different solutions within our application.

Before .NET, all approaches to serialization required custom code. For example, we could take a single instance of the employee data structure and manually generate the string for an XML document, which we could then save to the disk. Luckily for us, the Microsoft .NET team decided to save us the hassle of doing this work. Unfortunately, since the .NET team is composed of several sub-teams, we find that there are two distinct solutions to the problem. The first solution to the problem exists within the System.Runtime.Serialization namespace and consists of a generic solution to serialization. This is the more powerful of the two solutions, as it is more generalized, customizable, and extensible. The second solution is specific to XML serialization and is implemented in the System.Xml.Serialization namespace. Both implement serialization techniques, albeit with some minor differences.

The XmlSerializer Class

We'll start by looking at the XmlSerializer class, since it's the easiest of the two to apply and debug. When instantiating a new XmlSerializer, you need to tell it which class signature to apply. We pass that information in the constructor. After that, it's a matter of calling the Serialize method with one of a number of overrides, including a TextWriter that allows us to work with the resulting string in memory. Here is some sample code:

XmlSerializer xs = new XmlSerializer(typeof(Employee));
StringWriter sw = new StringWriter();
xs.Serialize(sw, emp); 

At this point, the StringWriter has an XML document. If we output it, we see it looks like this:

<?xml version="1.0" encoding="utf-16"?>
<Employee xmlns:xsd=http://www.w3.org/2001/XMLSchema 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <FirstName>John</FirstName> <LastName>Smith</LastName> <StartDate>2002-06-23T00:00:00.0000000-07:00</StartDate> <Age>25</Age> </Employee>

You should note a few things about this output. First, it is a correctly formatted XML document, which means we can pass it around to other applications or to our own application across a network path. Second, all output was made in the form of elements, named after their internal class or member names. Third, and lastly, only public elements were exported to the XML document. This last consideration is quite relevant, as it prevents us from fully recreating the state of this object at a future or remote instantiation of the application. If the EmpID member cannot be recreated each time, then we must persist it along with the public members. We'll address this issue by using the alternative serialization mechanism.

The utility of the XmlSerializer class should be clear. Only three lines of code were required to give us a mechanism that allows us to interchange data with other applications using the XML format. That's pretty impressive. But what if the other application uses a different XML schema than we do? Or maybe the other application doesn't care to see certain elements? We can adjust our output using a few attributes, best seen in the code sample below:

public class Employee
{
  // use an attribute 
  [XmlAttribute] 
  public string FirstName;

  // use an attribute with a custom name
  [XmlAttribute("FamilyName")] 
  public string LastName;

  // do not output this data
  [XmlIgnore]
  public DateTime StartDate;

  // use an element with a custom name
  [XmlElement("EmployeeAge")]
  public int Age;

  private int EmpID;
  ...
}

Which results in the following output:

<?xml version="1.0" encoding="utf-16"?>
<Employee xmlns:xsd=http://www.w3.org/2001/XMLSchema 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" FirstName="John" FamilyName="Smith"> <EmployeeAge>25</EmployeeAge> </Employee>

As you can see, the attributes applied to our class definition were used by the XmlSerializer to adjust the output. The XmlSerializer is indeed very useful, and offers several other capabilities, including the ability to deserialize a class from an XML document. That learning is left up to the reader, as we will now divert our attention to the more generic serialization solution available in .NET.

.NET Formatters

As mentioned above, .NET offers a different solution for serialization within the System.Runtime.Serialization namespace. This is the more generic approach; as such, it requires a bit more code, but offers significantly expanded capabilities. This implementation is based on a generic IFormatter interface that can implement formatters of different kinds. .NET ships with built-in formatters for binary streams (a series of bytes) and for SOAP messages. Should you ever need to, you can create your own custom formatter.

In order to use a formatter, we must mark the class using the Serializable attribute. That informs the formatter that it can go ahead and attempt to serialize the class. After that, we use a bit of code to get at our output. Here's a sample using the BinaryFormatter, which resides in the System.Runtime.Serialization.Formatters.Binary namespace:

[Serializable]
public class Employee
{ ... }

BinaryFormatter bf = new BinaryFormatter();
FileStream fs = new FileStream("output.bin",FileMode.Create);
bf.Serialize(fs, emp);
fs.Close();

As you might notice, the BinaryFormatter uses a stream for its output. In this case I used a FileStream, but I just as easily could have used a MemoryStream if I needed to work with the byte output immediately (for example, to send it over a TCP connection.). If there were members of the class that we didn't want serialized, we could mark them with the NonSerialized attribute, which is analogous to the XmlIgnore attribute.

At this point, it makes sense for us to look at a quick sample of deserialization. Both the XmlSerializer and the IFormatter interface support deserialization, with slightly different code. Here's a sample using the BinaryFormatter to deserialize the output from our previous sample:

FileStream input = new FileStream("output.bin", FileMode.Open);
Employee emp2 = (Employee) bf.Deserialize(input);
input.Close();

As you can see, it's pretty easy to bring the object back into your application (for example, the next time the user starts the application). One interesting difference between the formatters and the XmlSerializer is that the formatters output a complete copy of the object, including all of the private members. This is useful for persistence, or where you need to recreate a complete state at a future or remote instance of the application.

The Serialization namespace also offers the SoapFormatter, which generates SOAP-compatible messages as its output. This particular formatter is useful for remoting applications that are based on the SOAP protocol. The SoapFormatter offers the developer levels of control similar to the XmlSerializer in defining the schema for the output. A series of attributes are offered to the developer for this customization, though we won't cover them in this article.

Extended Serialization

So far, we've used serialization only on very simple classes, with limited data types. The obvious question that follows is, "How far can this take us?" The answer is surprisingly comforting, largely because the built-in support for serialization is quite robust. However, it does require a little bit of familiarization on our part to get used to what it will and won't do.

The most interesting aspect of serialization is that it implements dynamic navigation of object graphs. That is to say, it will navigate through sub-objects wherever possible to serialize the complete set. Let's assume that our Employee object has a reference to another Employee named Manager. If we try and serialize the object, we will receive the following output from the XmlSerializer:

<?xml version="1.0" encoding="utf-16"?>
<Employee xmlns:xsd=http://www.w3.org/2001/XMLSchema 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <FirstName>John</FirstName> <LastName>Smith</LastName> <StartDate>2002-06-23T00:00:00.0000000-07:00</StartDate> <Age>25</Age> <Manager> <FirstName>Betty</FirstName> <LastName>Jones</LastName> <StartDate>1999-02-12T00:00:00.0000000-08:00</StartDate> <Age>32</Age> </Manager> </Employee>

Both the XmlSerializer and the BinaryFormatter are able to handle this type of nested object graph. Each will try to serialize the next object in the graph according to its particular needs. For example, the XmlSerializer will respect the XmlIgnore and other available attributes in sub-objects in the graph. The BinaryFormatter, on the other hand, requires that every sub-class be marked with the Serializable attribute and implement serialization. Most, but not all, of the core objects in the .NET framework implement this interface.

Unfortunately, the XmlSerializer is a bit more limited in its support for serialization of core classes. There are several objects that it cannot serialize, including, but not limited to, any classes that implement the IDictionary interface (e.g., Hashtable). This can be somewhat limiting and may require custom code on your part. The BinaryFormatter is able to handle many of these classes without any difficulty.

But what happens when even the BinaryFormatter can't handle our objects? Sadly, not all of the classes within the .NET framework implement the ISerializable interface. A list of which classes implement the ISerializable interface is available in this MSDN article. In these cases, we can implement the ISerializable interface ourselves and manually serialize/deserialize the class contents. But this is something we will cover in a separate article.

Summary

By now, you've seen the basics of serialization and deserialization. You've seen how you can use serialization for persistence, data interchange, and in some cases, even debugging the internal contents of your classes. All of these are valuable applications for a powerful tool provided to us by the Microsoft .NET team. I hope this article allows you to maximize the value of these tools in your own solutions.

Dan Frumin is a long-time technology executive, with over 10 years of experience in the industry.


Return to ONDotnet.com