In PySpark, a window defines a set of rows related to the current row, enabling calculations like running totals or rankings across these rows. It
is useful for performing complex data analysis tasks by allowing computations over partitions of data while preserving the context of each row.
Depending on the operation you’re willing to compute, you need to define a frame for the window. A frame defines the range of rows that are used in
each computation. If you don’t define a frame, a default frame is used.
The default frame that will be used depends on whether ordering is defined. When ordering is not defined, an unbounded window frame
(rowFrame, unboundedPreceding, unboundedFollowing)
is used by default. When ordering is defined, a growing window frame
(rangeFrame, unboundedPreceding, currentRow)
is used by default.
This can lead to unexpected results if the default frame is not what you intended. It is recommended to always define a frame when using a window
function to avoid confusion and ensure the expected results.