TensorFlow
The result of reduction operations (i.e. tf.math.reduce_sum
, tf.math.reduce_std
, torch.sum
,
torch.mean
, etc…), highly depends on the shape of the Tensor provided.
import tensorflow as tf
x = tf.constant([[1, 1, 1], [1, 1, 1]])
tf.math.reduce_sum(x)
In the example above the reduction of the 2 dimensional array will return the value 6
as all the elements are added together. By
default TensorFlow’s reduction operations are applied across all axis. When specifying an axis the result will be completely different.
import tensorflow as tf
x = tf.constant([[1, 1, 1], [1, 1, 1]])
tf.math.reduce_sum(x, axis=0)
Here the result will be [2,2,2]
as the reduction is applied only on the axis 0.
TensorFlow’s default behavior can be confusing, especially when the reducing array of different shapes.
Considering the following example:
import tensorflow as tf
x = tf.constant([[1], [2]])
y = tf.constant([1, 2])
tf.math.reduce_sum(x + y)
Here the result will be 12
instead of the 6
that could be expected. This is because the implicit broadcasting reshapes
the first array to [[1,1], [2,2]]
which is then added to the y
array [1,2]
resulting in [[2,3],
[3,4]]
. As the reduction happen across all dimensions the result is then 2 + 3 + 3 + 4 = 12
. It is not clear by looking at the
example if this was intentional or if the user made a mistake.
This is why a good practice is to always specify the axis on which to perform the reduction.
For example:
import tensorflow as tf
x = tf.constant([[1], [2]])
y = tf.constant([1, 2])
tf.math.reduce_sum(x + y, axis=0)
In the example above, specifying the axis clarifies the intent, as the result now is [5, 7]
. If the intent was to effectively reduce
across all dimensions the user should provide the list of axis axis=[0,1]
or clearly state the default behavior should be applied with
axis=None
.
The PyTorch equivalent
The same behavior occurs in PyTorch, but the argument is called dim
instead of axis
.