The values
attribute and the to_numpy()
method in pandas both provide a way to return a NumPy representation of the
DataFrame
. However, there are some reasons why the to_numpy()
method is recommended over the values
attribute:
- Future Compatibility: The
values
attribute is considered a legacy feature, while the to_numpy()
is
the recommended method to extract data and is considered more future-proof.
- Data type consistency: If the
DataFrame
has columns with different data types, NumPy will choose a common data
type that can hold all the data. This may lead to loss of information, unexpected type conversions, or increased memory usage. The
to_numpy()
allows you to select the common type manually, passing the dtype
argument.
- View vs Copy: The
values
attribute can return a view or a copy of the data depending on whether the data needs to
be transposed. This can lead to confusion when modifying the extracted data. On the other hand, to_numpy()
has copy
argument allowing to force it always to return a new NumPy array, ensuring that any changes you make won’t affect the original
DataFrame
.
- Missing values control: The
to_numpy()
allows to specify the default value used for missing values in the
DataFrame
, while the values
will always use numpy.nan
for missing values.