Variable length arrays are used to allocate a stack size for the number of elements that are known at the runtime, for example:
void merge(int* tab, int size, int first) {
int res[size]; // allocate buffer for merging on stack
/* Code that merges two sorted ranges [tab, tab + first) and [tab + first, tab + last)
* into res buffer.
* ....
*/
// copy merged data into input
memcpy(tab, res, size * sizeof(int));
}
The syntax follows the one used by an array with static size (where size is a constant), and the value placed in the square brackets
([size]
in the above example) determines the size of the array. For both static and runtime-sized arrays, the size is required to be
greater than zero. However, in the case of VLAs, this cannot be checked by the compiler, and any program that creates such an array with a size that
has a value of zero or is negative, has undefined behavior.
void example() {
int s = -1;
int vla[s]; // program compiles, and have undefined behavior
int arr[-1]; // program is ill-formed
}
This defect may also manifest when the variable used as size is not initialized. Uninitialized variables might have zero, negative, or any value at
each time the program executes.
void uninitialized() {
int s; // uninitialized, the value is not determined
int vla[s]; // program compiles, have undefined behavior if the value read from s is negative or zero
}
A non-positive size of the VLAs may also result from using a value loaded by the program from a file or other external resources used as size
without previous validation. Such values as usually referred to as being tainted.
void loadFromInput() {
int size = -1;
scanf("%d", &size); // loads size from input
char bytes[size]; // size may be negative
/* ... */
}
The above code will lead to undefined behavior when the value of the size
load from the input is not greater than zero. This may
happen due to source data corruption, accidental user mistake, or malicious action.
What is the potential impact?
Creating a variable length array with a size smaller or equal to zero leads to undefined behavior. This means the compiler is not bound by the
language standard anymore, and your program has no meaning assigned to it.
Practically this can lead to a wide range of effects and may lead to the following:
- crashes, in particular segmentation faults, when the program access memory that it is not allowed to,
- memory corruption and data losses, when the program overwrites bytes responsible for storing data or executable code,
- stack overflow, when negative size is interpreted as a large positive number.
Furthermore, in a situation when VLA size is dependent on the user input, it can lead to vulnerabilities in the program.