Products
In-IDE
IDE extension that lets you fix coding issues before they exist!
Discover SonarQube for IDE
SaaS
Setup is effortless and analysis is automatic for most languages
Discover SonarQube Cloud
Self-Hosted
Fast, accurate analysis; enterprise scalability
Discover SonarQube Server

Python static code analysis

Unique rules to find Bugs, Vulnerabilities, Security Hotspots, and Code Smells in your PYTHON code

Filtered: 13 rules found

pyspark

Impact

Clean code attribute

"master" and "appName" should be set when constructing PySpark "SparkContext"s and "SparkSession"s

consistency - conventional

maintainability

reliability

Code Smell

This rule raises an issue when a PySpark SparkContext or SparkSession is initialized without explicitly specifying both the master URL and appName parameters.

Why is this an issue?

How can I fix it?

More Info

When initializing a new SparkContext in PySpark, it is essential to specify both the master URL and the application name. The master URL determines the cluster to connect to, while the application name is used to identify your application on the cluster. Since when creating a SparkSession the SparkContext can be created implicitly, it is also important to set these parameters when creating a new SparkSession.

Failing to set these parameters can lead to unexpected behavior, such as connecting to an unintended cluster or having difficulty identifying your application in the Spark UI.

A good default for the master URL for local development is local[*], which uses all available cores on your machine. Alternatively, you can use local[n], where n is the specific number of cores you want to allocate. However, in production environments, you should specify the actual cluster URL (e.g., spark://host:port or yarn).

Exceptions

When using PySpark with AWS Glue, the master and name parameters are usually not set, since AWS Glue manages these configurations automatically. Because of this, the rule doesn’t raise if awsglue has been imported.

from pyspark.context import SparkContext
from awsglue.context import GlueContext

sc = SparkContext() # Compliant: used in the context of awsglue code
glueContext = GlueContext(sc)

Available In:

Catch issues on the fly,
in your IDE

Detect issues in your GitHub, Azure DevOps Services, Bitbucket Cloud, GitLab repositories

Analyze code in your
on-premise CI

In-IDE

SaaS

Self-Hosted