2013-05-07

## Ranged integer types and bounds checking

There’s a discussion going on in the Rust mailing list about ranged integer types and whether they’re useful for safety. Some of the points raised have been about the performance costs of dynamic bounds checks and the overhead of code for static bounds checks. I thought I’d write a post with my experiences with overfow checking in Gecko and with programming languages that can do overflow checking at compile time.

## Gecko

There have been places in Gecko (the core Firefox engine) where overflow detection at compile time would be useful. An example is in the WAV audio format reading code. In this C++ code we have the following code:

```
static const int64_t BLOCK_SIZE = 4096;
int64_t readSize = std::min(BLOCK_SIZE, remaining);
int64_t frames = readSize / mFrameSize;
const size_t bufferSize = static_cast<size_t>(frames * mChannels);
nsAutoArrayPtr<AudioDataValue> sampleBuffer(new AudioDataValue[bufferSize]);
```

Here `frames * mChannels`

can overflow with the result of the memory buffer allocated following it being of the wrong size and a memory access error occurring later. We can do a run time check that it won’t overflow but this is error prone if someone changes `BLOCK_SIZE`

to be of a size that causes overflow, or the size of `AudioDataValue`

changes to do the same. It’ll take an edge case in a test to find the problem. Better is to ensure that it will never overflow at compile time. The `PR_STATIC_ASSERT`

construct can do this for us:

```
PR_STATIC_ASSERT(uint64_t(BLOCK_SIZE) < UINT_MAX / sizeof(AudioDataValue) / MAX_CHANNELS);
const size_t bufferSize = static_cast<size_t>(frames * mChannels);
nsAutoArrayPtr<AudioDataValue> sampleBuffer(new AudioDataValue[bufferSize]);
```

Now if `BLOCK_SIZE`

, `AudioDataValue`

or `MAX_CHANNELS`

changes it will complain if the result overflows at compile time. Because C++ doesn’t have dependent types and constraint solving, we don’t have the ability to statically confirm that `frames * mChannels`

avoids overflow automatically. We have to encode how those values are calculated in the `PR_STATIC_ASSERT`

but it’s better than nothing.

Similar code follows to check `readSize`

:

```
PR_STATIC_ASSERT(uint64_t(BLOCK_SIZE) < UINT_MAX / sizeof(char));
nsAutoArrayPtr<char> dataBuffer(new char[static_cast<size_t>(readSize)]);
```

We encode the rule that `BLOCK_SIZE`

must not be greater than `UINT_MAX`

knowing that `readSize`

is capped at `BLOCK_SIZE`

.

Other uses of `PR_STATIC_ASSERT`

are in that same file and various other places in the codebase. For example, Bug 623998. This was caused by an overflow in the implementation of the C++ `operator new`

so we used `PR_STATIC_ASSERT`

to ensure that the argument passed to `operator new`

couldn’t overflow based on analysis of the constraints of the arguments we pass to it.

## ATS

The ATS programming language has a dependent type system that allows expressing ranges for integers at the static level. Unfortunately the current implementation uses machine integers in the constraint solver and this prevents easily expressing constraints involving overflow.

The Applied Operating System, an operating system kernel written in ATS, uses a patched ATS that uses libgmp for constraint solving and it declares types such overflow causes a type error. For example, the following code will not typecheck:

```
fun test () = {
val a = ubyte1_of 250
val c = ubyte1_of 6
val d = a + b
}
```

The type checker recognizes that `a`

is the exact unsigned byte value of `250`

and that `c`

is the exact type value of unsigned byte `6`

. Adding these together causing a `byte`

greater than `255`

which is a type error.

Constraints on types are propogated as expected. For example:

```
fun add_me {a,b: Ubyte} (a: ubyte a, b: ubyte b): ubyte (a+b) = a + b
fun test () = {
val x = add_me (ubyte1_of 200, ubyte1_of 10)
}
```

Here the arguments to `add_me`

cannot be determined statically (assume `test`

and `add_me`

are compiled seperately so the compiler does not know the exact values at compile time) and the type checker will complain about the `a+b`

since it can’t prove that any unsigned byte added to another unsigned byte is less than the maximum value of an unsigned byte.

A solution is to add to the static system to say that the arguments to `add_me`

add up to less than 255 and will not overflow and make callers check the range before calling:

```
fun add_me {a,b: Ubyte | a+b <= 255 } (a: ubyte a, b: ubyte b): ubyte (a+b)
= a + b
fun test () = {
val x = add_me (ubyte1_of 200, ubyte1_of 10)
}
```

With judicious use of constraints you can then push out the range check from the inner loops to the callers at the outmost region to reduce overhead. Or you can add dynamic checks close to the inner loops to reduce the burden on callers. The good thing is the compiler tells you when you get it wrong.

## ATS2

ATS2 is a work in progress with improvements and additions to ATS. You’ll need a recent version of ATS to build and then follow the instructions to get a working ATS2.

One of the improvements in ATS2 is allowing using libgmp to deal with constraints so range checks can be done to detect overflow. Hongwei Xi posted to the mailing list an example of using this to detect integer overflow in bsearch at compile time. This file can be typechecked with:

`patsopt -tc -d doc/EXAMPLE/MISC/arith_overflow.dats`

The `arith_overflow`

example starts off by declaring an abstract type, `intb`

to represent bounded integers:

`abst@ype intb (i: int) // bounded integers`

The type is indexed over a sort `int`

. This is a dependent type that represents the value of the bounded integer. Cast functions are provided to allow converting between a normal integer and a bounded integer:

```
castfn intb2int {i:int} (i: intb i): int (i)
castfn int2intb {i:int | isintb(i)} (i: int i): intb (i)
```

The `int2intb`

cast function has a contraint on its type index saying that the given integer is a bounded int via the `isintb`

call. This is defined as:

```
stacst INTMIN : int and INTMAX : int
stadef isintb (i:int): bool = (INTMIN <= i && i <= INTMAX)
praxi lemma_INTMINMAX (): [INTMIN < ~0X7FFF ; INTMAX >= 0x7FFF] void
```

The way this works is two sorts constants are declared, INTMIN and INTMAX. The definition `isintb`

constrains the index `i`

to be between these two values. The values are not yet defined so all the type system knows for now is that `isintb`

returns true if the given number is between two other numbers.

The axiom (introduced with `praxi`

and is equivalent to a `prfun`

or proof function) `lemma_INTMINMAX`

states that INTMIN is < -0x&FFFF and INTMAX >= 0x7FFF. Once this proof function is called it tells the constraint system what the `intb`

is bounded by.

Basic mathematic operations are provided for a bounded integer and the standard mathematical operators are overloaded with these. For example, adding two bounded integers:

```
fun add_intb_intb {i,j:int | isintb(i+j)} (i: intb (i), j: intb (j)):<> intb (i+j)
overload + with add_intb_intb
```

This is similar to the `add_me`

function shown previously using the modified ATS compiler. It adds two bounded ints with the contraint that the result must also be a bounded int. Because the type system uses `libgmp`

for the contraints it is able to ensure that the type system itself doesn’t overflow when doing this check.

With these definitions for bounded ints in place it’s possible to write code that will not typecheck if overflow occurs. The example in `arith_overflow`

implements `bsearch`

. Within the `bsearch`

implementation it has the follow code:

`val m = (l + r) / 2`

Here `l`

and `r`

are bounded ints. Adding `l`

and `r`

can overflow and this will fail to compile. Changing the code to do effectively the same thing but avoiding adding the two numbers to prevent overflow results in compiling because the type checker can prove no overflow occurs:

`val m = l + half (r - l)`

Where `half`

is defined to return the number divided by two:

```
fun half_intb {i:nat} (i: intb (i)):<> intb (ndiv(i,2))
overload half with half_intb
```

## Summary

I think overflow checking at the type level is a very useful feature. It helps find important bugs early as people hardly ever check for overflow. When they do it’s at runtime and the code is often not exercised until an issue actually occurs so may not handle things correctly.

For Mozilla C++ code we can use `PR_STATIC_ASSERT`

(or preferably the recently introduced MOZ_STATIC_ASSERT) to help get some of the benefits of compile time overflow checking.