Using inplace=True when modifying a Pandas DataFrame means that the method will modify the DataFrame in place, rather than returning a
new object:
df.an_operation(inplace=True)
When inplace is False (which is the default behavior), a new object is returned instead:
df2 = df.an_operation(inplace=False)
Generally speaking, the motivation for modifying an object in place is to improve efficiency by avoiding the creation of a copy of the original
object. Unfortunately, many methods supporting the inplace keyword either cannot actually be done inplace, or make a copy as a consequence of the
operations they perform, regardless of whether inplace is True or not. For example, the following methods can never operate
in place:
- drop (dropping rows)
- dropna
- drop_duplicates
- sort_values
- sort_index
- eval
- query
Because of this, expecting efficiency gains through the use of inplace=True is not reliable.
Additionally, using inplace=True may trigger a SettingWithCopyWarning and make the overall intention of the code unclear.
In the following example, modifying df2 will not modify the original df dataframe, and a warning will be raised:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
In general, side effects such as object mutation may be the source of subtle bugs and explicit reassignment is considered safer.
When intermediate results are not needed, method chaining is a more explicit alternative to the inplace parameter. For instance, one
may write:
df.drop('City', axis=1, inplace=True)
df.sort_values('Name', inplace=True)
df.reset_index(drop=True, inplace=True)
Through method chaining, this previous example may be rewritten as:
result = df.drop('City', axis=1).sort_values('Name').reset_index(drop=True)
For these reasons, it is therefore recommended to avoid using inplace=True in favor of more explicit and less error-prone
alternatives.