Print

## Manipulating Fixed-Width Integer Data Types

by Michael Barr, author of Programming Embedded Systems in C and C++
10/07/2003
In the process of manipulating memory-mapped I/O registers, embedded programmers who use C or C++ often require fixed-size integer data types that aren't provided by the language standards. Here's a new look at this old problem, complete with a final solution to the issue of naming fixed-width integer data types.

Computer programmers don't always much care how wide an integer is when held by the processor. For example, when we write,
``for (int i=0; i < N; i++) { ... }``

we generally expect our compiler to generate the most efficient code possible, whether that makes the loop counter an 8-bit, 16-bit, 32-bit, or even 64-bit quantity.

So long as it's wide enough to hold the maximum value, in this case N, we'd like the most efficient use of the processor to be made. And that's precisely what the ISO C and C++ standards tell the compiler writer to do: choose the most efficient integer size that can do the job specifically requested. Because of the variable size of integers on different processors and the corresponding flexibility of the language standards, the code above may result in a 32-bit integer with one compiler but a 16-bit integer with another one, even if the very same processor is targeted.

But there are many other programming situations in which integer size does matter. Embedded programming, in particular, often involves considerable manipulation of integer data of fixed widths. The most obvious example of this is the use of memory-mapped I/O to read and write peripheral control and status registers. In the process of using memory-mapped I/O it is quite common to write code like that in Example 1.

Example 1. Memory-mapped I/O example

``````
typedef struct
{
unsigned int  count;        // Current count register;  offset 0x00.
unsigned int  max;          // Maximum count register;  offset 0x02.
unsigned int  _reserved;    // Unused 16-bit register;  offset 0x04.
unsigned int  flags;        // Control/status register; offset 0x06.

} Counter;

Counter volatile * const  pCounter = 0x10000000;	// Chip base address.

...

pCounter->max   = 5000;	// Count from 0 to 5000, then interrupt.
pCounter->flags |= GO;	// Start the timer.

...

if (pCounter->flags &= DONE)
{

}

...``````

In this example we first declare a struct that represents the registers in a timer/counter chip. We then declare a pointer to a data structure of that type, and initialize it to point at the memory address assigned (by the hardware designer) to the counter chip. After that setup, we can use the pointer to read and write the registers within the chip.

The obvious advantage of implementing memory-mapped I/O this way is that the compiler automates the calculation of the offsets of the individual registers within the chip. The compiler also automatically adds the offsets of a register to the base address. These things are done at compile time, thus a readable line of code like:

``pCounter->max = 5000;``

will be executed just as efficiently as the more cryptic:

``*((unsigned int *)0x10000002) = 5000;``

But what if you port this code to a new compiler or target processor? Will the new compiler agree with the old one that those unsigned integers are two bytes wide and calculate the offsets appropriately? How can you tell the new compiler more precisely what you want in this case? Other integer names like `short` and `long` won't help, since the sizes of those types are also left up to the compiler writer-subject to a small set of restrictions relative to the size they've chosen for `int`.

### Standard Names

There's nothing new about this problem, of course. And solutions have been around for decades too. The approach that's generally taken to solve the problem is to define a new data type for each fixed-size integer that you will use in your programs. For example, we might define:

``typedef unsigned int  uint16;``

in a header file and then declare each of the registers in the struct as `uint16`. By using `char`, `short`, `long`, and compiler-specific knowledge, you can easily define both signed and unsigned 8-, 16-, and 32-bit integer data types. And if the compiler or target processor does later change, only the typedefs need be modified to correct all of the fixed integer size requests throughout what might be a very large set of source files.

This is a good solution and one that's widely used. The problem, however, has always been that there is little agreement on the names for these fixed size integer typedefs. To date, I've seen all of the following names used for signed 16-bit integers in production code and coding standards: `INT16`, `int16`, `INT16S`, and `INT2`, the latter scheme placing the emphasis on the number of bytes rather than bits. I'm sure there are other names in use elsewhere.

In hindsight, it sure would have been nice if the authors of the C standard had defined some standard names and made compiler providers responsible for providing the appropriate typedef for each fixed-size integer type in a library header file. Alternatively, of course, the C standard could have specified (as Java does) that each of the types `short`, `int`, and `long` has a standard width on all platforms; but that can have an impact on performance, particularly on 8-bit processors that must implement 16- and 32-bit additions in multi-instruction sequences.

Interestingly, it turns out that the authors of a 1999 update to the ISO C standard (hereafter "C99") did just that. It seems the ISO organization has finally put the weight of its standard behind a preferred set of names for signed and unsigned fixed-size integer data types. The newly defined type names are as follows:

``````8-bit:  int8_t   uint8_t
16-bit: int16_t  uint16_t
32-bit: int32_t  uint32_t
64-bit: int64_t  uint64_t``````

According to the updated standard, this required set of typedefs (along with some others) is to be defined by compiler vendors and included in the new header file stdint.h. If you are already using a C99-compliant compiler, this new feature of the language makes it as easy as declaring each required fixed-size integer variable or register definition using one of the new type names. But even if you don't have an updated compiler, the inclusion of these names in the C99 standard suggests it's a good time to update your coding standards and practices.

Love them or hate them, at least these new names are part of an accepted international standard for C programmers. In the future, it will be far easier to port C programs that require fixed-size integers to other compilers and platforms as a direct result. In addition, modules that are reused or sold with source can be more easily understood when they conform to standard naming and typing conventions like this.

Of course, if you don't have a C99-compliant compiler yet, you'll still have to write your own set of typedefs, using compiler-specific knowledge of the `char`, `short`, and `long` primitive widths. I recommend putting these typedefs in a header file of your own design, and adding the anonymous union declaration shown in Example 2 to a linked source module to check their sizes; that is, to gently "remind" whomever might someday have the task of porting your code.

Example 2. An anonymous union allows the compiler to report typedef errors automatically

``````static union
{
char    int8_t_incorrect[sizeof(  int8_t) == 1];
char   uint8_t_incorrect[sizeof( uint8_t) == 1];

char   int16_t_incorrect[sizeof( int16_t) == 2];
char  uint16_t_incorrect[sizeof(uint16_t) == 2];

char   int32_t_incorrect[sizeof( int32_t) == 4];
char  uint32_t_incorrect[sizeof(uint32_t) == 4];

};``````

### Conclusion

The new C99 standard has made a number of minor improvements to the C programming language. Among these, the most important for embedded programmers to understand are the contents of the new stdint.h header file. There's lots more there than I've had room to mention in this brief article. Further research is left as an exercise for the reader.

Michael Barr is a leading authority on the design of embedded computer systems. He has provided expert testimony in court, appeared on the PBS show "American Business Review", and been quoted in newspaper articles. Barr is also the author of more than forty technical articles, co-author of the "Embedded Systems Dictionary", and founder of Embedipedia.net.

O'Reilly & Associates published Programming Embedded Systems in C and C++ in January 1999.