PySpark offers powerful APIs to work with Pandas DataFrames in a distributed environment. While the integration between PySpark and Pandas is
seamless, there are some caveats that should be taken into account.
Spark Pandas API uses some special column names for internal purposes. These column names contain leading __
and trailing
__
. Therefore, when using PySpark with Pandas and naming or renaming columns, it is discouraged to use such reserved column names as they
are not guaranteed to yield the expected results.