A list of columns can be provided to the subset argument of PySpark’s DataFrame.dropDuplicates method. This will cause
the method to only consider the columns in the subset argument when evaluating if a row is a duplicate. It is also possible to use all
columns of the DataFrame by passing None to the subset argument or leaving it empty (as None is
the default value). However when an empty list is provided to the subset argument, dropDuplicates does not perform any
deduplication but instead removes all row except one, which can lead to unexpected results and potentially incorrect data analysis. This rule ensures
that DataFrame.dropDuplicates is used correctly by specifying at least one column, or not specifying a column at all.
This rule will raise issues as well on DataFrame.drop_duplicates and DataFrame.dropDuplicatesWithinWatermark.