2013-05-07
Ranged integer types and bounds checking
There's a discussion going on in the Rust mailing list about ranged integer types and whether they're useful for safety. Some of the points raised have been about the performance costs of dynamic bounds checks and the overhead of code for static bounds checks. I thought I'd write a post with my experiences with overfow checking in Gecko and with programming languages that can do overflow checking at compile time.
Gecko
There have been places in Gecko (the core Firefox engine) where overflow detection at compile time would be useful. An example is in the WAV audio format reading code. In this C++ code we have the following code:
static const int64_t BLOCK_SIZE = 4096;
int64_t readSize = std::min(BLOCK_SIZE, remaining);
int64_t frames = readSize / mFrameSize;
const size_t bufferSize = static_cast<size_t>(frames * mChannels);
nsAutoArrayPtr<AudioDataValue> sampleBuffer(new AudioDataValue[bufferSize]);
Here frames * mChannels
can overflow with the result of the memory buffer allocated following it being of the wrong size and a memory access error occurring later. We can do a run time check that it won't overflow but this is error prone if someone changes BLOCK_SIZE
to be of a size that causes overflow, or the size of AudioDataValue
changes to do the same. It'll take an edge case in a test to find the problem. Better is to ensure that it will never overflow at compile time. The PR_STATIC_ASSERT
construct can do this for us:
PR_STATIC_ASSERT(uint64_t(BLOCK_SIZE) < UINT_MAX / sizeof(AudioDataValue) / MAX_CHANNELS);
const size_t bufferSize = static_cast<size_t>(frames * mChannels);
nsAutoArrayPtr<AudioDataValue> sampleBuffer(new AudioDataValue[bufferSize]);
Now if BLOCK_SIZE
, AudioDataValue
or MAX_CHANNELS
changes it will complain if the result overflows at compile time. Because C++ doesn't have dependent types and constraint solving, we don't have the ability to statically confirm that frames * mChannels
avoids overflow automatically. We have to encode how those values are calculated in the PR_STATIC_ASSERT
but it's better than nothing.
Similar code follows to check readSize
:
PR_STATIC_ASSERT(uint64_t(BLOCK_SIZE) < UINT_MAX / sizeof(char));
nsAutoArrayPtr<char> dataBuffer(new char[static_cast<size_t>(readSize)]);
We encode the rule that BLOCK_SIZE
must not be greater than UINT_MAX
knowing that readSize
is capped at BLOCK_SIZE
.
Other uses of PR_STATIC_ASSERT
are in that same file and various other places in the codebase. For example, Bug 623998. This was caused by an overflow in the implementation of the C++ operator new
so we used PR_STATIC_ASSERT
to ensure that the argument passed to operator new
couldn't overflow based on analysis of the constraints of the arguments we pass to it.
ATS
The ATS programming language has a dependent type system that allows expressing ranges for integers at the static level. Unfortunately the current implementation uses machine integers in the constraint solver and this prevents easily expressing constraints involving overflow.
The Applied Operating System, an operating system kernel written in ATS, uses a patched ATS that uses libgmp for constraint solving and it declares types such overflow causes a type error. For example, the following code will not typecheck:
fun test () = {
val a = ubyte1_of 250
val c = ubyte1_of 6
val d = a + b
}
The type checker recognizes that a
is the exact unsigned byte value of 250
and that c
is the exact type value of unsigned byte 6
. Adding these together causing a byte
greater than 255
which is a type error.
Constraints on types are propogated as expected. For example:
fun add_me {a,b: Ubyte} (a: ubyte a, b: ubyte b): ubyte (a+b) = a + b
fun test () = {
val x = add_me (ubyte1_of 200, ubyte1_of 10)
}
Here the arguments to add_me
cannot be determined statically (assume test
and add_me
are compiled seperately so the compiler does not know the exact values at compile time) and the type checker will complain about the a+b
since it can't prove that any unsigned byte added to another unsigned byte is less than the maximum value of an unsigned byte.
A solution is to add to the static system to say that the arguments to add_me
add up to less than 255 and will not overflow and make callers check the range before calling:
fun add_me {a,b: Ubyte | a+b <= 255 } (a: ubyte a, b: ubyte b): ubyte (a+b)
= a + b
fun test () = {
val x = add_me (ubyte1_of 200, ubyte1_of 10)
}
With judicious use of constraints you can then push out the range check from the inner loops to the callers at the outmost region to reduce overhead. Or you can add dynamic checks close to the inner loops to reduce the burden on callers. The good thing is the compiler tells you when you get it wrong.
ATS2
ATS2 is a work in progress with improvements and additions to ATS. You'll need a recent version of ATS to build and then follow the instructions to get a working ATS2.
One of the improvements in ATS2 is allowing using libgmp to deal with constraints so range checks can be done to detect overflow. Hongwei Xi posted to the mailing list an example of using this to detect integer overflow in bsearch at compile time. This file can be typechecked with:
patsopt -tc -d doc/EXAMPLE/MISC/arith_overflow.dats
The arith_overflow
example starts off by declaring an abstract type, intb
to represent bounded integers:
abst@ype intb (i: int) // bounded integers
The type is indexed over a sort int
. This is a dependent type that represents the value of the bounded integer. Cast functions are provided to allow converting between a normal integer and a bounded integer:
castfn intb2int {i:int} (i: intb i): int (i)
castfn int2intb {i:int | isintb(i)} (i: int i): intb (i)
The int2intb
cast function has a contraint on its type index saying that the given integer is a bounded int via the isintb
call. This is defined as:
stacst INTMIN : int and INTMAX : int
stadef isintb (i:int): bool = (INTMIN <= i && i <= INTMAX)
praxi lemma_INTMINMAX (): [INTMIN < ~0X7FFF ; INTMAX >= 0x7FFF] void
The way this works is two sorts constants are declared, INTMIN and INTMAX. The definition isintb
constrains the index i
to be between these two values. The values are not yet defined so all the type system knows for now is that isintb
returns true if the given number is between two other numbers.
The axiom (introduced with praxi
and is equivalent to a prfun
or proof function) lemma_INTMINMAX
states that INTMIN is < -0x&FFFF and INTMAX >= 0x7FFF. Once this proof function is called it tells the constraint system what the intb
is bounded by.
Basic mathematic operations are provided for a bounded integer and the standard mathematical operators are overloaded with these. For example, adding two bounded integers:
fun add_intb_intb {i,j:int | isintb(i+j)} (i: intb (i), j: intb (j)):<> intb (i+j)
overload + with add_intb_intb
This is similar to the add_me
function shown previously using the modified ATS compiler. It adds two bounded ints with the contraint that the result must also be a bounded int. Because the type system uses libgmp
for the contraints it is able to ensure that the type system itself doesn't overflow when doing this check.
With these definitions for bounded ints in place it's possible to write code that will not typecheck if overflow occurs. The example in arith_overflow
implements bsearch
. Within the bsearch
implementation it has the follow code:
val m = (l + r) / 2
Here l
and r
are bounded ints. Adding l
and r
can overflow and this will fail to compile. Changing the code to do effectively the same thing but avoiding adding the two numbers to prevent overflow results in compiling because the type checker can prove no overflow occurs:
val m = l + half (r - l)
Where half
is defined to return the number divided by two:
fun half_intb {i:nat} (i: intb (i)):<> intb (ndiv(i,2))
overload half with half_intb
Summary
I think overflow checking at the type level is a very useful feature. It helps find important bugs early as people hardly ever check for overflow. When they do it's at runtime and the code is often not exercised until an issue actually occurs so may not handle things correctly.
For Mozilla C++ code we can use PR_STATIC_ASSERT
(or preferably the recently introduced MOZ_STATIC_ASSERT) to help get some of the benefits of compile time overflow checking.