Creating a Variable Length Array (VLA) with a size that is not greater than zero leads to undefined behavior.
Why is this an issue?
Variable length arrays are used to allocate a stack size for the number of elements that are known at the runtime, for example:
void merge(int* tab, int size, int first) {
int res[size]; // allocate buffer for merging on stack
/* Code that merges two sorted ranges [tab, tab + first) and [tab + first, tab + last)
* into res buffer.
* ....
*/
// copy merged data into input
memcpy(tab, res, size * sizeof(int));
}
The syntax follows the one used by an array with static size (where size is a constant), and the value placed in the square brackets
([size]
in the above example) determines the size of the array. For both static and runtime-sized arrays, the size is required to be
greater than zero. However, in the case of VLAs, this cannot be checked by the compiler, and any program that creates such an array with a size that
has a value of zero or is negative, has undefined behavior.
void example() {
int s = -1;
int vla[s]; // program compiles, and have undefined behavior
int arr[-1]; // program is ill-formed
}
This defect may also manifest when the variable used as size is not initialized. Uninitialized variables might have zero, negative, or any value at
each time the program executes.
void uninitialized() {
int s; // uninitialized, the value is not determined
int vla[s]; // program compiles, have undefined behavior if the value read from s is negative or zero
}
A non-positive size of the VLAs may also result from using a value loaded by the program from a file or other external resources used as size
without previous validation. Such values as usually referred to as being tainted.
void loadFromInput() {
int size = -1;
scanf("%d", &size); // loads size from input
char bytes[size]; // size may be negative
/* ... */
}
The above code will lead to undefined behavior when the value of the size
load from the input is not greater than zero. This may
happen due to source data corruption, accidental user mistake, or malicious action.
What is the potential impact?
Creating a variable length array with a size smaller or equal to zero leads to undefined behavior. This means the compiler is not bound by the
language standard anymore, and your program has no meaning assigned to it.
Practically this can lead to a wide range of effects and may lead to the following:
- crashes, in particular segmentation faults, when the program access memory that it is not allowed to,
- memory corruption and data losses, when the program overwrites bytes responsible for storing data or executable code,
- stack overflow, when negative size is interpreted as a large positive number.
Furthermore, in a situation when VLA size is dependent on the user input, it can lead to vulnerabilities in the program.
How to fix it
Ensure that the size of the variable length array is always greater than zero. In case of value that depends on user input (is tainted),
prefer to allocate the memory on heap, or if that is not possible, include appropriate checks to ensure that value is positive and constrained before
using it as size for VLA.
Code examples
Noncompliant code example
Zero size VLA is created due of by-one error.
void zeroSize() {
for (int i = 0; i <= 5; ++i) {
int buffer[5 - i]; // Noncompliant: buffer size is zero for last iteration
/* ... */
}
}
Compliant solution
void zeroSize() {
for (int i = 0; i < 5; ++i) {
int buffer[5 - i]; // Complaint
/* ... */
}
}
Noncompliant code example
Size is not initialized in one of code paths.
int countGrapheneClusters(int codeUnits, int bytesInCodeUnit) {
int size;
if (bytesInCodeUnit > 0) {
size = codeUnits * bytesInCodeUnit;
}
char buffer[size]; // Noncomplaint: size may not be initialized
// ....
return 0;
}
Compliant solution
int countGrapheneClusters(int codeUnits, int bytesInCodeUnit) {
int size;
if (bytesInCodeUnit > 0) {
size = codeUnits * bytesInCodeUnit;
} else {
return -1;
}
char buffer[size]; // Complaint: size is always initialized
// ....
return 0;
}
Noncompliant code example
int taintedSize() {
int size = 0;
scanf("%d", &size); // size value is loaded from input, value is tainted here
char bytes[size]; // Noncomplaint: size may be negative
/* ... */
return 0;
}
Compliant solution
Allocate memory on heap, instead of using variable lenght array.
int taintedSize() {
int size = 0;
scanf("%d", &size);
if (size <= 0 || size > 1000) {
return -1;
}
char* bytes = (char*)malloc(size); // Complaint: uses heap allocation
/* ... */
return 0;
}
Going the extra mile
Variable length arrays are allocated on the stack, so in situations when a large value of the size is used, creating such an array may lead to
stack overflow and undefined behavior:
void largeVLA() {
int s = INT_MAX;
int vla[s][100]; // requires allocation of the INT_MAX * 100
}
In addition, the language does not provide a way to query available stack space, nor the possibility of reporting failure in the creation of such
an array.
When applicable, it is recommended to replace the VLA with heap-allocated memory. In contrast to VLA, heap allocation functions report in a
situation when sufficient memory cannot be provided, by returning NULL
or throwing an exception (in C++). Furthermore, the C++ standard
library provides containers like std::vector
, that manage the heap-allocated memory.
Moreover, the C11 language standard and above only optionally supports VLAs (with STDC_NO_VLA
), and the C++ standard never
supported it, however, they are commonly accepted as extensions.
Resources
Documentation
Standards