If you have a background in database work, you have undoubtedly run into the rules of normalization. In particular, you probably internalized the rule that you should not store derived data in the database. For example, if your database includes an Order Details table containing columns for Price and Quantity, you should not put an Item Total column in that table. The item total can always be calculated from the price and quantity when it's needed. By not storing the calculated column, you avoid having to update it whenever the data in one of the other columns changes.
But even though ADO.NET's DataSet class is sometimes described as a "relational
database in memory," it's important to remember that there are differences between
databases and DataSets. In this particular case, the
designed to manage calculated columns. In this article I will show you the syntax
for building calculated columns in your own DataSets, and discuss some of the
pros and cons of using such columns.
To demonstrate calculated DataColumns in action, I'll build a very simple C# Windows Forms application. The user interface consists entirely of a DataGrid control named dgMain. If you'd like to follow along, start by putting this code into the form's Load event handler:
// Connect to the database SqlConnection cnn = new SqlConnection(); cnn.ConnectionString = "Data Source=\"(local)\";" + "Initial Catalog=Northwind;Integrated Security=SSPI"; // Set up the DataSet to hold all the data DataSet dsMain = new DataSet(); // Load customers SqlCommand cmdCustomers = cnn.CreateCommand(); cmdCustomers.CommandType = System.Data.CommandType.Text; cmdCustomers.CommandText="SELECT * FROM Customers"; SqlDataAdapter daCustomers = new SqlDataAdapter(); daCustomers.SelectCommand = cmdCustomers; daCustomers.Fill(dsMain, "Customers"); // Load orders SqlCommand cmdOrders = cnn.CreateCommand(); cmdOrders.CommandType = System.Data.CommandType.Text; cmdOrders.CommandText="SELECT * FROM Orders"; SqlDataAdapter daOrders = new SqlDataAdapter(); daOrders.SelectCommand = cmdOrders; daOrders.Fill(dsMain, "Orders"); // Load order details SqlCommand cmdOrderDetails = cnn.CreateCommand(); cmdOrderDetails.CommandType = System.Data.CommandType.Text; cmdOrderDetails.CommandText="SELECT * FROM [Order Details]"; SqlDataAdapter daOrderDetails = new SqlDataAdapter(); daOrderDetails.SelectCommand = cmdOrderDetails; daOrderDetails.Fill(dsMain, "OrderDetails"); // Relate tables dsMain.Relations.Add(new DataRelation("relCustOrders", dsMain.Tables["Customers"].Columns["CustomerID"], dsMain.Tables["Orders"].Columns["CustomerID"])); dsMain.Relations.Add(new DataRelation("relOrdersDetails", dsMain.Tables["Orders"].Columns["OrderID"], dsMain.Tables["OrderDetails"].Columns["OrderID"])); // Calculations will go here // Bind data to the user interface dgMain.DataSource = dsMain; dgMain.DataMember = "Customers";
Assuming you've done work with ADO.NET, this code should look very
familiar. It connects to the Northwind database on the local SQL Server, and
then proceeds to use
SqlDataAdapter objects to retrieve
information from three tables. It then builds relations between the tables and
displays the result on the user interface. Assuming you've got SQL
Server on the same computer as .NET, running the project should give you a
DataGrid full of data, as shown in Figure 1.
|Figure 1. Sample data on a DataGrid|
You can drill down in the data by clicking on the + signs and the hyperlinks, which appear when you click on them. Figure 2 shows the result of drilling down twice to reach the order detail level. You can see that the data from the Northwind sample database includes Price, Quantity, and Discount columns for each line item on the order. Note that the header of the DataGrid provides information on the navigation path down to these details.
|Figure 2. Line item detail on a DataGrid|
Adding a calculated column to the DataSet requires only two lines of code. Insert these lines at the placeholder comment in the main code:
// Add a calculated column to the order details dsMain.Tables["OrderDetails"].Columns.Add("ItemTotal", typeof(Double)); dsMain.Tables["OrderDetails"].Columns["ItemTotal"].Expression = "UnitPrice * Quantity * (1 - Discount)";
The first line of code here adds a new DataColumn named ItemTotal to the OrderDetails DataTable in the DataSet. The Add method takes two arguments in this particular overload: the name of the new column and the type of data that it will contain. The second line of code then uses the Expression property of the new column to tell ADO.NET how to compute the values that this column will contain. In this particular case, the column values are calculated by using other columns in the same DataTable together with standard mathematical functions. Figure 3 shows the results: the last column in the DataGrid displays the results of the calculation for each row.
|Figure 3. Calculated column on a DataGrid|
Another handy feature of ADO.NET is the ability to calculate an aggregate column. To see this in action, add a second calculation to the source code directly after the first:
// Add a calculated column to the orders dsMain.Tables["Orders"].Columns.Add("OrderTotal", typeof(Double), "Sum(Child.ItemTotal)");
In this case, I've used a different overload of the Add method; this one
takes the column name, data type, and expression as three arguments, so I don't
need to supply the expression separately.
Child is a special
keyword for the ADO.NET expression service. It refers to the child DataTable
related to the current DataTable. So, the expression calculates
the sum of the ItemTotal column (which is itself a calculated column) of all
rows in the OrderDetails DataTable related to the current row in the Orders
Finally, I can add an overall total column to the Customers DataTable, which shows the total of all orders for the customer, in this way:
// Add a calculated column to the customers dsMain.Tables["Customers"].Columns.Add("CustomerTotal", typeof(Double), "Sum(Child(relCustOrders).OrderTotal)");
When a DataTable contains more than one relationship, you need to specify precisely the relation that should be used in determining the child table. This is not necessary in this case because the Customers DataTable has only a single related child table.
Although I have added only a single calculated DataColumn to each DataTable in this example, you can have multiple calculated DataColumns in a single DataTable. You need to be careful, though, not to create circular references (in which one column depends on another column, which in turn depends on the first column).
DataColumns support a fairly rich syntax for expressions. The table below shows the major pieces of this syntax.
|ColumnName||Refer to columns by name. If the column name contains a special character, enclose the name in square brackets.|
|50 or 50.0 or 5E1||Numeric constants can be represented as integers, floating point, or in scientific notation.|
|#9/2/1959#||Date constants should be quoted with pound signs.|
|'Polygon'||String constants should be quoted with single quotes.|
|AND, OR, NOT||Boolean operators|
|<, >, <=, >=, <>, =, IN, LIKE||Comparison operators|
|+, -, *, /, %||Arithmetic operators|
|+||String concatenation operator|
|* or %||Wildcards for string comparison|
|Child.ColumnName or Child(RelationName).ColumnName||Column in a child table|
|Parent.ColumnName||Column in a parent table|
|Sum(), Avg(), Min(), Max(), StDev(), Var()||Aggregate functions|
|CONVERT(expression, type)||Convert an expression to a .NET type|
|LEN(string)||Length of a string|
|ISNULL(expression, replacement)||Returns the expression if it isn't Null, otherwise returns the replacement|
|IIF(expression, truepart, falsepart)||Returns truepart or falsepart depending on whether the expression is true or false|
|TRIM(expression)||Removes leading and trailing blanks|
|SUBSTRING(expression, start, length)||Returns length number of characters from the specified starting point|
To reset the expression value, assign an empty string or a null value to it. If you reset an expression and there is a default value specified for the column, then the default will propagate into all of the previously calculated rows.
Now that you know about this nifty tool, when should you use it? In making the decision, you need to consider your application's user interface, potential performance penalties, and bugs.
On the user interface front, remember that the calculation is being done for all rows in the affected DataTable, whether anyone looks at them or not. Thus, if you're displaying only a few rows out of a very large set of data, it may be more useful to calculate derived values when you need them, instead of in advance. This will avoid loading up memory and processor time with calculations that are never used. On the other hand, if you're using a grid interface, it makes sense to calculate the values for all rows that will be seen in the grid, as I did in this article.
Performance is a tricky thing because you need to worry about both real performance and perceived performance. Suppose you end up displaying only 10 percent of the rows in your data, but you need to display the results of a time-consuming calculation with each row. Should you avoid the calculated DataColumn and instead perform the calculations at display time? Not necessarily. If you use a calculated DataColumn, the application's startup time may be longer, but individual rows will then display quickly because all of the calculations will be performed in advance. This is a situation where perceived performance is more important than actual performance, arguing for the use of the calculated DataColumn.
Finally, I did find one piece of unexpected behavior (that I personally would classify as a bug) when working up the example for this article. Fire up the form, drill down to the order detail level, and change the quantity value for an existing row of data. You'll find that the ItemTotal column for the row is recalculated as soon as you leave the row, but that the values for the OrderTotal and CustomerTotal (as displayed in the header rows) remain unchanged. Navigate back to the parent row (using the left-pointing arrow at the upper right of the DataGrid), or Alt-tab to another application and back (forcing a screen repaint) and the OrderTotal will be updated. But try as I might, I can't find a way to force the CustomerTotal to be updated without writing code. It appears that automatic recalculation only goes up one level the hierarchy.
DataColumn.Expression Documentation (for complete syntax of Expressions).
Mike Gunderloy is the lead developer for Larkware and author of numerous books and articles on programming topics.
Return to ONDotnet.com
Copyright © 2009 O'Reilly Media, Inc.