Products
In-IDE
IDE extension that lets you fix coding issues before they exist!
Discover SonarQube for IDE
SaaS
Setup is effortless and analysis is automatic for most languages
Discover SonarQube Cloud
Self-Hosted
Fast, accurate analysis; enterprise scalability
Discover SonarQube Server

Secrets
ABAP
Ansible
Apex
AzureResourceManager
C
C#
C++
CloudFormation
COBOL
CSS
Dart
Docker
Flex
Go
HTML
Java
JavaScript
JCL
Kotlin
Kubernetes
Objective C
PHP
PL/I
PL/SQL
Python
RPG
Ruby
Rust
Scala
Swift
Terraform
Text
TypeScript
T-SQL
VB.NET
VB6
XML

Python static code analysis

Unique rules to find Bugs, Vulnerabilities, Security Hotspots, and Code Smells in your PYTHON code

Tags

Impact

Clean code attribute

Use non-blocking sleep functions in asynchronous code
Bug
"withColumns" method should be preferred over "withColumn" when multiple columns are specified
Code Smell
The "subset" argument should be provided when using PySpark DataFrame "dropDuplicates" method
Code Smell
Subclasses of Scikit-Learn's "BaseEstimator" should not set attributes ending with "_" in the "__init__" method
Code Smell
The reduction axis/dimension should be specified when using reduction operations
Code Smell
The "pandas.DataFrame.to_numpy()" method should be preferred to the "pandas.DataFrame.values" attribute
Code Smell
When using pandas.merge or pandas.join, the parameters on, how and validate should be provided
Code Smell
Function returns should have type hints
Code Smell
Character classes in regular expressions should not contain only one character
Code Smell
Non-capturing groups without quantifier should not be used
Code Smell
Regular expression quantifiers and character classes should be used concisely
Code Smell
Regular expressions should not contain multiple spaces
Code Smell
Assertions should not be made at the end of blocks expecting an exception
Bug
Assert should not be called on a tuple literal
Bug
Builtins should not be shadowed by local variables
Code Smell
Implicit string and byte concatenations should not be confusing
Code Smell
New objects should not be created only to check their identity
Bug
Identity comparisons should not be used with cached types
Code Smell
"SystemExit" should be re-raised
Code Smell
Instance and class methods should have at least one positional parameter
Bug
Function parameters' default values should not be modified or assigned
Code Smell
A subclass should not be in the same "except" statement as a parent class
Code Smell
Some special methods should return "NotImplemented" instead of raising "NotImplementedError"
Code Smell
Caught Exceptions must derive from BaseException
Bug
Exceptions should not be created without being raised
Bug
All branches in a conditional structure should not have exactly the same implementation
Bug
Jump statements should not be redundant
Code Smell
"pass" should not be used needlessly
Code Smell
The first argument to class methods should follow the naming convention
Code Smell
Boolean checks should not be inverted
Code Smell
Unused assignments should be removed
Code Smell
Redundant pairs of parentheses should be removed
Code Smell
Nested blocks of code should not be left empty
Code Smell

"withColumns" method should be preferred over "withColumn" when multiple columns are specified

intentionality - efficient

maintainability

reliability

Code Smell

Quick FixIDE quick fixes available with SonarLint

pyspark
data-science

This rule identifies instances where multiple withColumn calls are used in succession to add or modify columns in a PySpark DataFrame. It suggests using withColumns instead, which is more efficient and concise.

Why is this an issue?

How can I fix it?

More Info

Using withColumn multiple times can lead to inefficient code, as each call creates a new Spark Logical Plan. withColumns allows for adding or modifying multiple columns in a single operation, improving performance.

What is the potential impact?

Creating a new column can be a costly operation, as Spark has to loop on every row to compute the new column value.

Exceptions

withColumn can be used multiple times sequentially on a Dataframe when computing consecutive columns requires the presence of the previous ones. In this case, consecutive withColumn calls are a solution.

from pyspark.sql import SparkSession
from pyspark.sql.functions import col
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([[1,2],[2,3]], ["id", "value"])
df_with_new_cols = df.withColumn("squared_value", col("value") * col("value")).withColumn("cubic_value", col("squared_value") * col("value")) # Compliant

Available In:

Catch issues on the fly,
in your IDE

Detect issues in your GitHub, Azure DevOps Services, Bitbucket Cloud, GitLab repositories

Analyze code in your
on-premise CI

In-IDE

SaaS

Self-Hosted