Windows DevCenter    
 Published on Windows DevCenter (http://www.windowsdevcenter.com/)
 See this if you're having trouble printing code examples


Liberty on Beta 2

Just-In-Time Data Loading For DataGrids

by Jesse Liberty
12/20/2005

In my previous column I said that I will be working on an application and discussing it as I go. Well, it turns out that real customers and real life haven't actually gotten the project past the "I Have A Dream" stage, so it will be a few weeks before I have much that is interesting to report (though in the meantime you may want to take a look at the web services technology from Amazon that I'll be using).

While I've been non-productive on that project, a client posed the following problem: she has a database with 2 million records and wants to display these records in a data grid, but does not want to load them all into memory from the database. She wants them loaded "just in time," maximizing performance and minimizing memory load on the web server. After all, the vast majority of users will need to see only a small fraction of the records.

It turns out that the new DataGridView has a property VirtualMode (which defaults to false), that provides the capabilities needed. Microsoft has an excellent walk-through on this topic, and this article will explore a direct solution to this problem as well as showing you how to quickly create an application to create a very large edition of some Northwind tables to test with.

In my next article, I'll show how you can bypass this nifty approach and use a more generic approach that will work with any data grid (using the Infragistics grid as an alternative, for example).

Related Reading

Visual C# 2005: A Developer's Notebook
By Jesse Liberty

Creating the Virtual Grid

Create a new Windows Application and name it JIT Data. Use your language of choice; for this column I'll write in C# as I've been writing a lot of VB lately.

Drag a DataGridView control onto your form, and name it dgvJIT. Also add a label control named lblMsg and set its text to "Ready." (While you are at it, widen the form, widen the grid, set the title of the form to "JIT Data" and change the class file name for the form to JITData.cs.)

Your next step is to change the VirtualMode property of the grid from false to true. This can be done programmatically, but it is easier to just set it in the properties window. The bad news is that you can no longer bind a data source to the grid, but the good news is that you can easily create a cache to hold a page or two of data, and a simple class to fetch data as needed.

There is a bit of a chicken and egg phenomenon here[1], because your class needs a member variable of type Cache, and the constructor needs an instance of DataRetriever so it is tempting to stop and create those classes. However, to understand how those classes work and what they are for, it is better to create them in the order they are used. To cut this knot, you'll stub out these two classes, providing just enough to move forward, and you'll come back and flesh them out as you need them.

Even to stub out the DataRetriever you want to indicate that it implements the IDataPageRetriever interface that Microsoft suggests you define, so let's take a small step back and define that interface first. To do so, right-click on the project and choose Add...New item, stare at the dialog box for a while wondering why the templates are not in alphabetical order, and then choose Interface and name your new file IDataPageRetriever.cs, filling it with this definition:

using System.Data;

namespace JITDataColumn
{
   interface IDataPageRetriever
   {
      DataTable SupplyPageOfData(
         int lowerPageBoundary, int rowsPerPage );
   }
}

Next, stub out your DataRetriever class and its constructor:

public class DataRetriever : IDataPageRetriever
{
  // member variables here
   public DataRetriever( string database, string table )
   {
        //  Properiesy called by JITData Constructor
      public DataColumnCollection Columns { get { return new DataColumnCollection(); } } 
        public int RowCount { get { return 0; } }
   }

Finally, stub out the Cache class and its constructor:

class Cache
{
    public Cache( IDataPageRetriever dataSupplier, int rowsPerPage )

You are now (finally!) ready to create the JITData constructor:

public JITData()
{
   InitializeComponent();   
   const int PageSize = 10;
   DataRetriever dr = new DataRetriever();
   memoryCache = new Cache( dr, PageSize );
   foreach ( System.Data.DataColumn column in dr.Columns )
   {
      grid.Columns.Add( column.ColumnName, column.ColumnName );
   }
   grid.RowCount = dr.RowCount;
}

As you can see, you create a DataRetriever, and then pass that instance (along with the page size) to the constructor for the Cache. You then ask the DataRetriever for its Columns property (described below and stubbed out for now) and iterate through the collection, adding columns to the grid. Finally, you ask the DataRetriever for the RowCount to assign to the grid.

DataGridView Events

The only event you need to handle for the grid is CellValueNeeded, which will be called every time the grid wants to draw a cell for any reason (typically because a row has scrolled into view).

private void dgvJIt_CellValueNeeded( 
   object sender, DataGridViewCellValueEventArgs e )
{
   e.Value = memoryCache.RetrieveElement( e.RowIndex, e.ColumnIndex );
}

You handle the event by calling the RetrieveElement method on your Cache, passing in the row and column. As far as the grid is concerned, it is up to the cache to supply the data to put in the cell; how it does so is fully encapsulated in the methods of the Cache class. The only interface to that class, from the application, is through this single method: RetrieveElement.

The Cache Class

The Cache class is pretty big and most of the code is "housekeeping," so I won't show it all here. Feel free to download the full source code, and I'll just review the highlights here.

Note: the Cache and DataRetriever classes are only minimally changed from the Microsoft walk-through available in the MSDN documentation.

The Cache class has a static member variable RowsPerPage and a nested struct DataPage. The DataPage struct represents (surprise!) a page of data, and has three private members: a reference to a DataTable and two integers, representing the lowest index value and the highest index value, respectively.

The key to understanding the cache is in the RetrieveElement method, which takes as arguments the row and column whose value you want to retrieve:

public string RetrieveElement( int rowIndex, int columnIndex )
{
   string element = null;

   if ( IfPageCached_ThenSetElement( rowIndex, columnIndex, ref element ) )
   {
      return element;
   }
   else
   {
      return RetrieveData_CacheIt_ThenReturnElement(
          rowIndex, columnIndex );
   }
}

RetrieveElement calls the private method IfPageCached_ThenSetElement which looks to see if the element is in either of the Cache's two cached pages. It returns true if the element is cached. (The element itself is returned as a reference parameter.)

private bool IfPageCached_ThenSetElement( int rowIndex,
    int columnIndex, ref string element )
{
   if ( IsRowCachedInPage( 0, rowIndex ) )
   {
      element = cachePages[0].table
          .Rows[rowIndex % RowsPerPage][columnIndex].ToString();
      return true;
   }
   else if ( IsRowCachedInPage( 1, rowIndex ) )
   {
      element = cachePages[1].table
          .Rows[rowIndex % RowsPerPage][columnIndex].ToString();
      return true;
   }

   return false;
}

If IfPageCached_ThenSetElement returns false, then the element is not in the cache, so RetrieveElement calls RetrieveData_CacheIt_ThenReturnElement, which asks the DataRetriever object (stored by the Cache when it was constructed) to get a new page of data, caches the new page, and then calls RetrieveElement to return the newly cached data.

 private string RetrieveData_CacheIt_ThenReturnElement(
     int rowIndex, int columnIndex )
 {
    // Tell the DataRetriever to give back a page of data
    DataTable table = dataSupply.SupplyPageOfData(
        DataPage.MapToLowerBoundary( rowIndex ), RowsPerPage );

    // Replace the cached page furthest from the requested cell
    // with a new page containing the newly retrieved data.
    cachePages[GetIndexToUnusedPage( rowIndex )] = new DataPage( table, rowIndex );

    return RetrieveElement( rowIndex, columnIndex );
 }

The DataRetriever

It is the job of the DataRetriever to get the data. Interaction with the underlying database is fully encapsulated in the DataRetriever, which has three interesting properties:

Property Purpose
RowCount Computes the number of rows in the entire data set
Columns Returns a DataColumnCollection of all the columns in the table retrieved by the query
CommaSeparatedListOfColumns Returns a string that represents a comma-separated list of all the column names

The only method in DataRetriever is SupplyPageOfData, whose job is to retrieve one page's worth of data from the database and to return it as a table.

public DataTable SupplyPageOfData( int lowerPageBoundary, int rowsPerPage )
{

   if ( columnToSortBy == null )
   {
      columnToSortBy = this.Columns[0].ColumnName;
   }

   // Retrieve the specified number of rows from the database, starting
   // with the row specified by the lowerPageBoundary parameter.
   command.CommandText = "Select Top " + rowsPerPage + " " +
       CommaSeparatedListOfColumnNames + " From " + tableName +
       " WHERE " + columnToSortBy + " NOT IN (SELECT TOP " +
       lowerPageBoundary + " " + columnToSortBy + " From " +
       tableName + " Order By " + columnToSortBy +
       ") Order By " + columnToSortBy;
   adapter.SelectCommand = command;
   DataTable table = new DataTable();
   table.Locale = System.Globalization.CultureInfo.InvariantCulture;
   adapter.Fill( table );
   return table;
}

Note that by taking this level of control of how your data is retrieved, you now must work at the ADO.NET level, and create DataTables and interact with Connection objects -- details that are normally encapsulated within the .NET 2.0 Data controls. ADO.NET is still the big hammer, there when you need it, but keep your thumbs out of the way!

Choosing the Database

Before testing the application, return to the JitDataDesign and add a group box with the text "Which DB?" inside of which you'll place two radio nuttons, named rbNorthwind and rbNWBig respectively. Set the checked value of the first radio button to true, as shown in Figure 1.

Which DB Buttons
Figure 1. Setting up the "Which DB?" radio buttons

You'll see how to create NWBig below; for now, you'll keep Northwind checked.

Return to the constructor for the page and add two strings, one to hold the database choice based on the user's selection of buttons, and the other to hold the table name, which (for now) you'll hardwire to Orders. (Creating a UI to change tables is left as an exercise for the reader, along with creating a UI to change the number of pages of data stored in the cache!)

const int PageSize = 10;
string theDatabase = this.rbNorthWind.Checked ? "Northwind" : "NWBig";
string theTable = "Orders";
DataRetriever dr = new DataRetriever( theDatabase, theTable );

Testing the JIT approach

To test the JIT approach, we'd like to know which row is being displayed and when the cache has to go back to the dataReader for data. To do so, you'll add one line to the CellValueNeeded handler:

lblMsg.Text = "Row " + String.Format( "{0:N0}", e.RowIndex ) + " of " + String.Format( "{0:N0}", this.dataGridView1.RowCount );

That will show you the row, but finding out when the cache asks for data is a bit trickier. The cleanest way to do so is to create an event for the Cache class:

public delegate void RetrievedDataHandler( object dataSource, RetrievingDataEventArgs e );
public event RetrievedDataHandler OnDataRetrieved;

This event uses the RetrievingDataEventArgs class that you will also define:

public class RetrievingDataEventArgs : EventArgs
{
   public readonly int rowID;
   public RetrievingDataEventArgs( int rowID )
   {
      this.rowID = rowID;
   }
}

When the Cache has to ask for data, it fires the event:

 private string RetrieveData_CacheIt_ThenReturnElement(
     int rowIndex, int columnIndex )
 {
    // fire the event indicating that we had to get data for the cache
    if ( OnDataRetrieved != null )
    {
       RetrievingDataEventArgs args = new RetrievingDataEventArgs( rowIndex );
       OnDataRetrieved( this, args );
    }

The JITData class can register to receive the event just after it creates its Cache:

memoryCache = new Cache( dr, PageSize );
memoryCache.OnDataRetrieved += new Cache.RetrievedDataHandler( memoryCache_OnDataRetrieved );

To keep track of these events, add a new label next to the radio buttons, named lblCacheMsg and set its text to "Using Cache." The event handler will fill this label with the information about which row triggered the event:

void memoryCache_OnDataRetrieved( object dataSource, Cache.RetrievingDataEventArgs e )
{
   lblMsg.Text = "Retrieved data for row " + e.rowID;
}

Each time data is retrieved from the database the event is fired, and the label is updated, as shown in Figure 2.

Catching the Event
Figure 2. Catching the event

Testing with a Very Large Database

If you have this working with Northwind, you'll want to create a new database with lots of records, to prove to yourself that this kind of caching can be very efficient. To do so, I've whipped together a quick-and-dirty utility program to create additional records in a copy of the Northwind database. Here's how you do it. First, make a copy of Northwind (the easiest way is to backup the database and then restore it to a new name, e.g., NWBig.) Then create a new Windows Project named BigNWBuilder (or just download the finished project).

Rename the form to BigNWBuilder.cs, set its text property to "Big NW Builder" and set the form size to 225,275. Add the controls as indicated in Figure 3.

Builder Form
Figure 3. Adding controls to the Builder form

Note that the text box txtStartingNum and the label lblStartingNum both should have their Enabled and Visible properties set to false in the properties window.

For this quick-and-dirty utility, the tables that the program updates (Customers and Orders) are hard-wired. You are free, of course, to add user-interface elements to enable the user to populate other tables.

When the user clicks Go the starting number is retrieved from the database by checking how many records are already in the Customers table. That value is then placed in txtStartingNum which is then made visible along with its associated label. The number of records to add (retrieved from the UI) and the starting number are then passed to the UpdateDB method. The job UpdateDB is to insert values into the Customers table, and then to add from 1 to 5 orders for each of the newly added customers.

One tricky aspect to all this is that in Northwind (and thus in BigNW), the customer ID is a five-character string, which must be unique. To create these IDs, we'll generate a five-letter string by converting a counter to base 26 (where a = 0, b = 1, etc.). This can (and will) still generate IDs that conflict with existing records, but it will not create its own duplicates as it goes.

To keep the code simple we set the companyName to Company_XXXX and the contactName to Mr. JohnXXXX, where the final four characters are the newly generated company ID. We set the address to be the company ID with Main Street appended. All the other fields are set to hard-coded harmless values.

customerID = TranslateToBaseTwentySix( idNumber );
companyName = "Company_" + customerID;
contactName = "Mr. John " + customerID;
address = idNumber + " Main Street";

All of this is then inserted into the Customers table and errors are entered into the list box lbErrors.

try
{
   int numRowsAffected = command.ExecuteNonQuery();
}
catch (Exception dataException)
{
   this.lbErrors.Items.Add("Cust: " + idNumber + ": " +
      dataException.Message );
   this.lbErrors.SelectedIndex = lbErrors.Items.Count - 1;
   Application.DoEvents();
}

Once the customer is added, a pseudo-random number between 1 and 5 is generated, and that many orders are created for the customer:

try
{
   // add between 1 and 5 orders per user
   Random ran = new Random();
   int numOrders = ran.Next(1,6);
   for (int i = 0; i < numOrders; i++)
   {
      // set the shipping cost between $1 and $100
      int shipCost = ran.Next( 1, 101 );
      command.CommandText = "Insert orders values ('" +
         customerID + "',1" + ", GetDate(), GetDate(), GetDate(),1," + shipCost +
         ", 'UPS', '" + address + "','" + city + "','" + state +
         "','" + zip + "','" + country + "')";
      int numRowsAffected = command.ExecuteNonQuery();
   }     // end for
}        // end try

Every 500 records, we update the UI, which both keeps the user informed, and provides an opportunity for the stop button to work (DoEvents yields the thread).

if ( idNumber % 500 == 0 )
{
   this.lblStatus.Text = customerID + " (" + idNumber.ToString("N0") + ")";
   Application.DoEvents();
}

Generating Five-Letter Customer IDs

Translating the record number to base 26 requires the following steps (remembering that we want the value of record 1 to be 'aaaab' (where a = zero):

1. Determine the values of each of the five places.

const int theBase =        26;
const int squared =        theBase * theBase;
const int cubed =          squared * theBase;
const int fourthPower =    cubed * theBase;

2. Initialize the five characters to 'a' (zero).

char char5 = 'a', char4 = 'a', char3 = 'a', char2 = 'a', char1 = 'a';

3. Divide the original number by the fourth power and use the result (if greater than zero) as an offset into the array of characters. Decrement the working value by the amount you've now assigned.

fourthPowerQuotient = workingValue / fourthPower;
if ( fourthPowerQuotient > 0 )
{
   char1 = letters[fourthPowerQuotient];
   workingValue -= fourthPowerQuotient * fourthPower
}

4. Repeat these steps for the cubedQuotient, the SquaredQuotient and the BaseQuotient. Finally, set the fifth character to letter at the offset of the remaining value.

char5 = letters[workingValue];

5. Return the five characters as a concatenated string.

return char1.ToString() + char2.ToString() +
   char3.ToString() + char4.ToString() + char5.ToString();

Hey! Presto! A base 26 string that represents the record number.

I use this application to put 100,000 or more records into the database and then test that the JIT access displays the data quickly and efficiently (which it does).

Next Steps

The only downside to the solution shown here is that we are bypassing the standard data binding. This is fine, but if you want your users to be able to substitute a different kind of grid (e.g., an Infragistics UltraGrid [2]), this approach won't work well. (Infragistics has its own solution to this, as you can imagine.) What is needed is a generic solution that decouples getting the data from populating the grid.

In my next column, I'll show a more generic solution that will work with any grid, at the cost of doing much more work tracking the data (and caching it if you want) yourself.


Footnotes


[1] This is a silly aphorism; of course the egg had to come first, because mutations must be genetic, not acquired, and speciation is an outcome of mutation. Return to text

[2] I will use the Infragistics UltraGrid for two good reasons. First, it is enormously popular and many of my clients use it. Second, they gave me a really nice free evaluation package that works with VS 2005.

Jesse Liberty is a senior program manager for Microsoft Silverlight where he is responsible for the creation of tutorials, videos and other content to facilitate the learning and use of Silverlight. Jesse is well known in the industry in part because of his many bestselling books, including O'Reilly Media's Programming .NET 3.5, Programming C# 3.0, Learning ASP.NET with AJAX and the soon to be published Programming Silverlight.


Read more Liberty on Beta 2 columns.

Return to WindowsDevCenter.com

Copyright © 2009 O'Reilly Media, Inc.