Products
In-IDE
IDE extension that lets you fix coding issues before they exist!
Discover SonarQube for IDE
SaaS
Setup is effortless and analysis is automatic for most languages
Discover SonarQube Cloud
Self-Hosted
Fast, accurate analysis; enterprise scalability
Discover SonarQube Server

Secrets
ABAP
Ansible
Apex
AzureResourceManager
C
C#
C++
CloudFormation
COBOL
CSS
Dart
Docker
Flex
GitHub Actions
Go
HTML
Java
JavaScript
JSON
JCL
Kotlin
Kubernetes
Objective C
PHP
PL/I
PL/SQL
Python
RPG
Ruby
Rust
Scala
Shell
Swift
Terraform
Text
TypeScript
T-SQL
VB.NET
VB6
XML
YAML

Python static code analysis

Unique rules to find Bugs, Vulnerabilities, Security Hotspots, and Code Smells in your PYTHON code

Tags

Impact

Clean code attribute

Functions and methods should only return expected values
Bug
The number and name of arguments passed to a function should match its parameters
Bug
Non-empty statements should change control flow or have at least one side-effect
Bug
Template string processing should use structural pattern matching
Code Smell
Template and str should not be concatenated directly
Bug
Template strings should be processed before use
Bug
Compression modules should be imported from the compression namespace
Code Smell
Return, break, or continue statements should not exit finally blocks
Bug
"NotImplemented" should not be used in boolean contexts
Bug
Tensor operations should rely on automatic broadcasting instead of manual expansion
Code Smell
Use "torch.empty()" instead of list comprehensions for empty tensor initialization
Code Smell
Tensors should not be concatenated incrementally in loops
Code Smell
Use PyTorch Lightning's built-in checkpointing instead of manual checkpoint saving
Code Smell
PyTorch module classes should not be instantiated inline in forward methods
Code Smell
Method calls should use parentheses when saving PyTorch model state
Bug
Specify "start_dim" when using "torch.flatten" to preserve batch dimension
Code Smell
AI agent code execution without sandboxing is security-sensitive
Security Hotspot
"super()" calls should not be used in TorchScript methods
Bug
Operating AI agents without predefined boundaries is security-sensitive.
Security Hotspot
Tensor copying should use recommended methods
Code Smell
Issue suppression comment should have the correct format
Code Smell
Long-term AWS access keys should not be used directly in code
Vulnerability
boto3 operations that support pagination should be performed using paginators or manual pagination handling
Code Smell
AWS waiters should be used instead of custom polling loops
Code Smell
AWS Lambda handlers should clean up temporary files in /tmp directory
Bug
"botocore.exceptions.ClientError" should be explicitly caught and handled
Code Smell
Network calls in AWS Lambda functions shouldn't be made without explicit timeout parameters
Bug
Reserved environment variable names should not be overridden in Lambda functions
Code Smell
Numerically stable PyTorch functions should be used instead of their unstable equivalents
Code Smell
AWS Lambda handlers must not be an async function
Code Smell
AWS Lambda handlers should return only JSON serializable values
Bug
AWS CloudWatch metrics namespace should not begin with `AWS/`
Code Smell
S3 operations should verify bucket ownership using ExpectedBucketOwner parameter
Vulnerability
Populating a dictionary with a constant value should be done with dict.fromkeys() method call
Code Smell
Privileged prompts should not be vulnerable to injection attacks
Vulnerability
Iteration over a dictionary key value pairs should be done with the items() method call
Code Smell
"sorted" should not be wrapped directly inside "set"
Code Smell
"async with" should be used for asynchronous resource management
Bug
Control flow statements should not be used inside TaskGroup or Nursery blocks
Bug
TaskGroup/Nursery should not be used for a single start call
Code Smell
Using ".items()" to iterate over a dictionary should be avoided if possible.
Code Smell
Passing a reversed iterable to "set()", "sorted()", or "reversed()" should be avoided
Code Smell
The "sorted" function call should not be passed to the "reversed" function as an argument
Code Smell
Redundant collection functions should be avoided
Code Smell
"defaultdict" should not be initialized with "default_factory" as a keyword argument
Code Smell
Dictionary comprehension should not use a static key
Code Smell
Generators and comprehensions should be preferred over the usage of "map" and "lambda" when creating collection
Code Smell
When iterating over an iterable object, using "list()" should be avoided
Code Smell
Async functions should use async features
Code Smell
Asyncio tasks should be saved to prevent premature garbage collection
Bug
Async functions should not contain input() calls
Bug
Comprehensions only used to copy should be replaced with the respective constructor calls
Code Smell
Async functions should not contain synchronous HTTP client calls
Bug
Literal syntax should be preferred when creating empty collections or dictionaries with keyword arguments
Code Smell
Cancellation exceptions should be re-raised after cleanup
Bug
Creation of collections with literals or comprehensions should not be wrapped in type constructors
Code Smell
Comprehensions should be used instead of constructors around generator expressions
Code Smell
Async functions should not contain synchronous file operations
Bug
List comprehensions should not be used with "any()" or "all()"
Code Smell
Checkpoints should be used instead of sleep(0)
Code Smell
Cancellation scopes should contain checkpoints
Bug
Async functions should not contain synchronous OS calls
Bug
Use non-blocking sleep functions in asynchronous code
Bug
Async functions should not contain synchronous subprocess calls
Bug
Long sleep durations should use sleep_forever() instead of arbitrary intervals
Code Smell
Events should be used instead of `sleep` in asynchronous loops
Code Smell
Asynchronous functions should not accept timeout parameters
Code Smell
"master" and "appName" should be set when constructing PySpark "SparkContext"s and "SparkSession"s
Code Smell
PySpark's "RDD.groupByKey", when used in conjunction with "RDD.mapValues" with a commutative and associative operation, should be replaced by "RDD.reduceByKey"
Code Smell
PySpark's "DataFrame" column names should be unique
Code Smell
PySpark "dropDuplicates" subset argument should not be provided with an empty list
Code Smell
Complex logic provided to PySpark "withColumn", "filter" and "when" methods should be refactored into separate expressions
Code Smell
PySpark lit(None) should be used when populating empty columns
Code Smell
PySpark DataFrame toPandas function should be avoided
Code Smell
The "how" parameter should be specified when joining two PySpark DataFrames
Code Smell
"withColumns" method should be preferred over "withColumn" when multiple columns are specified
Code Smell
PySpark DataFrames used multiple times should be cached or persisted
Code Smell
PySpark Pandas DataFrame columns should not use a reserved name
Code Smell
The "subset" argument should be provided when using PySpark DataFrame "dropDuplicates" method
Code Smell
PySpark Window functions should always specify a frame
Code Smell
Server-side requests should not be vulnerable to traversing attacks
Vulnerability
Usage of "torch.load" can lead to untrusted code execution
Security Hotspot
Einops pattern should be valid
Bug
The "num_workers" parameter should be specified for "torch.utils.data.DataLoader"
Code Smell
"model.eval()" or "model.train()" should be called after loading the state of a PyTorch model
Code Smell
"torch.tensor" should be used instead of "torch.autograd.Variable"
Code Smell
Subclasses of "torch.nn.Module" should call the initializer
Bug
Subclasses of Scikit-Learn's "BaseEstimator" should not set attributes ending with "_" in the "__init__" method
Code Smell
Important hyperparameters should be specified for machine learning libraries' estimators and optimizers
Code Smell
Nested estimator parameters modification in a Pipeline should refer to valid parameters
Code Smell
Transformers should not be accessed directly when a Scikit-Learn Pipeline uses caching
Bug
"memory" parameter should be specified for Scikit-Learn Pipeline
Code Smell
The reduction axis/dimension should be specified when using reduction operations
Code Smell
Python side effects should not be used inside a "tf.function"
Code Smell
The "validate_indices" argument should not be set for "tf.gather" function call
Code Smell
The "input_shape" parameter should not be specified for "tf.keras.Model" subclasses
Code Smell
"tf.Variable" objects should be singletons when created inside of a "tf.function"
Code Smell
"tf.function" should not depend on global or free Python variables
Code Smell
"tensorflow.function" should not be recursive
Code Smell
Using timezone-aware "datetime" objects should be preferred over using "datetime.datetime.utcnow" and "datetime.datetime.utcfromtimestamp"
Code Smell
Numpy weekmask should have a valid value
Code Smell
datetime.datetime objects should not be compared with datetime.date objects
Bug
Dates should be formatted correctly when using "pandas.to_datetime" with "dayfirst" or "yearfirst" arguments
Code Smell
"zoneinfo" should be preferred to "pytz" when using Python 3.9 and later
Code Smell
"pytz.timezone" should not be passed to the "datetime.datetime" constructor
Code Smell
offset-naive datetime.time and datetime.datetime objects should not be compared with offset-aware ones
Bug
The 12-hour format should be used with the AM/PM marker, otherwise the 24-hour format should be used
Code Smell
Constructor attributes of date and time objects should be in the range of possible values
Code Smell
HTTP response headers should not be vulnerable to response splitting attacks
Vulnerability
"f-strings" should not be nested too deeply
Code Smell
Generic functions should be defined using the type parameter syntax
Code Smell
Generic type statements should not use "TypeVars"
Code Smell
Type aliases should be declared with a "type" statement
Code Smell
Generic classes should be defined using the type parameter syntax
Code Smell
GraphQL introspection should be disabled in production
Vulnerability
GraphQL queries should not be vulnerable to Denial of Service attacks
Vulnerability
JWT secret keys should not be disclosed
Vulnerability
Flask secret keys should not be disclosed
Vulnerability
Stack traces should not be disclosed
Vulnerability
pandas.pipe method should be preferred over long chains of instructions
Code Smell
The "pandas.DataFrame.to_numpy()" method should be preferred to the "pandas.DataFrame.values" attribute
Code Smell
'dtype' parameter should be provided when using 'pandas.read_csv' or 'pandas.read_table'
Code Smell
When using pandas.merge or pandas.join, the parameters on, how and validate should be provided
Code Smell
inplace=True should not be used when modifying a Pandas DataFrame
Code Smell
Deprecated NumPy aliases of built-in types should not be used
Code Smell
np.nonzero should be preferred over np.where when only the condition parameter is set
Code Smell
The abs_tol parameter should be provided when using math.isclose to compare values to 0
Bug
Equality checks should not be made against "numpy.nan"
Bug
Passing a list to np.array should be preferred over passing a generator
Code Smell
numpy.random.Generator should be preferred to numpy.random.RandomState
Code Smell
Results that depend on random number generation should be reproducible
Code Smell
Loop boundaries should not be vulnerable to injection attacks
Vulnerability
Sequence indexes must have an __index__ method
Bug
Set members and dictionary keys should be hashable
Bug
Assignments of lambdas to variables should be replaced by function definitions
Code Smell
"isinstance()" should be preferred to direct type comparisons
Code Smell
'startswith' or 'endswith' methods should be used instead of string slicing in condition expressions
Code Smell
Memory allocations should not be vulnerable to Denial of Service attacks
Vulnerability
The "safe" flag should be set to "False" when serializing non-dictionary objects in Django JSON-encoded responses.
Bug
Fields of a Django ModelFom should be defined explicitly
Code Smell
"locals()" should not be passed to a Django "render()" function
Code Smell
Django models should define a "__str__" method
Code Smell
'null=True' should not be used on string-based fields in Django models
Code Smell
Django signal handler functions should have the '@receiver' decorator on top of all other decorators
Bug
Union type expressions should be preferred over "typing.Union" in type hints
Code Smell
Built-in generic types should be preferred over the typing module in type hints
Code Smell
Type hints of generic types should specify their type parameters
Code Smell
Any should not be used as a type hint
Code Smell
Function parameters should have type hints
Code Smell
Function returns should have type hints
Code Smell
Octal escape sequences should not be used in regular expressions
Code Smell
ExceptionGroup and BaseExceptionGroup should not be caught with except*
Bug
Accessing sequence elements should not trigger an IndexError
Bug
Unpacking should be done with the same number of elements of the iterable.
Bug
Non-existent dictionary keys should not be accessed
Bug
Allowing unrestricted outbound communications is security-sensitive
Security Hotspot
Hard-coded secrets are security-sensitive
Security Hotspot
Collections should not be modified while they are iterated
Bug
Character classes in regular expressions should not contain only one character
Code Smell
Superfluous curly brace quantifiers should be avoided
Code Smell
Non-capturing groups without quantifier should not be used
Code Smell
XML signatures should be validated securely
Vulnerability
Regular expression quantifiers and character classes should be used concisely
Code Smell
Constructing arguments of system commands from user input is security-sensitive
Security Hotspot
Creating public APIs is security-sensitive
Security Hotspot
Using unencrypted EFS file systems is security-sensitive
Security Hotspot
Regular expressions should not contain empty groups
Code Smell
Using unencrypted SQS queues is security-sensitive
Security Hotspot
Allowing public network access to cloud resources is security-sensitive
Security Hotspot
Replacement strings should reference existing regular expression groups
Bug
Using unencrypted SNS topics is security-sensitive
Security Hotspot
Regular expressions should not contain multiple spaces
Code Smell
Alternation in regular expressions should not contain empty alternatives
Bug
Administration services access should be restricted to specific IP addresses
Vulnerability
Using unencrypted SageMaker notebook instances is security-sensitive
Security Hotspot
AWS IAM policies should limit the scope of permissions given
Vulnerability
Using unencrypted OpenSearch domains is security-sensitive
Security Hotspot
Policies granting access to all resources of an account are security-sensitive
Security Hotspot
Using unencrypted RDS DB resources is security-sensitive
Security Hotspot
Policies granting all privileges are security-sensitive
Security Hotspot
Applications should not create session cookies from untrusted input
Vulnerability
Allowing public ACLs or policies on a S3 bucket is security-sensitive
Security Hotspot
Using unencrypted EBS volumes is security-sensitive
Security Hotspot
Policies authorizing public access to resources are security-sensitive
Security Hotspot
Granting access to S3 buckets to all or authenticated users is security-sensitive
Security Hotspot
AWS region should not be set with a hardcoded String
Code Smell
Disabling versioning of S3 buckets is security-sensitive
Security Hotspot
Authorizing HTTP communications with S3 buckets is security-sensitive
Security Hotspot
Lambdas should not invoke other lambdas synchronously
Code Smell
Disabling server-side encryption of S3 buckets is security-sensitive
Security Hotspot
Reusable resources should be initialized at construction time of Lambda functions
Code Smell
Single-character alternations in regular expressions should be replaced with character classes
Code Smell
Reluctant quantifiers in regular expressions should be followed by an expression that can't match the empty string
Code Smell
Regex lookahead assertions should not be contradictory
Bug
Back references in regular expressions should only refer to capturing groups that are matched before the reference
Bug
Regex boundaries should not be used in a way that can never be matched
Bug
Regex patterns following a possessive quantifier should not always fail
Bug
Variables, classes and functions should be either defined or imported
Bug
Tests should be skipped explicitly
Code Smell
Assertions should not be made at the end of blocks expecting an exception
Bug
Assertions should not fail or succeed unconditionally
Code Smell
The most specific "unittest" assertion should be used
Code Smell
Assert should not be called on a tuple literal
Bug
Test methods should be discoverable
Code Smell
Values assigned to variables should match their type annotations
Code Smell
Function return types should be consistent with their type hint
Code Smell
Character classes in regular expressions should not contain the same character twice
Code Smell
Unicode Grapheme Clusters should be avoided inside regex character classes
Bug
Type checks shouldn't be confusing
Code Smell
Names of regular expressions named groups should be used
Code Smell
Character classes should be preferred over reluctant quantifiers in regular expressions
Code Smell
Regular expressions should be syntactically valid
Bug
Regex alternatives should not be redundant
Bug
Using slow regular expressions is security-sensitive
Security Hotspot
Alternatives in regular expressions should be grouped when used with anchors
Bug
Assertions comparing incompatible types should not be made
Bug
Regular expressions should not be too complicated
Code Smell
Repeated patterns in regular expressions should not match the empty string
Bug
The "open" builtin function should be called with a valid mode
Bug
Only defined names should be listed in "__all__"
Bug
Builtins should not be shadowed by local variables
Code Smell
Implicit string and byte concatenations should not be confusing
Code Smell
Constants should not be used as conditions
Code Smell
New objects should not be created only to check their identity
Bug
Identity comparisons should not be used with cached types
Code Smell
Expressions creating sets should not have duplicate values
Code Smell
Expressions creating dictionaries should not have duplicate keys
Code Smell
Calls should not be made to non-callable values
Bug
"SystemExit" should be re-raised
Code Smell
Bare "raise" statements should only be used in "except" blocks
Code Smell
Comparison to None should not be constant
Code Smell
Property getter, setter and deleter methods should have the expected number of parameters
Bug
Special methods should have an expected number of parameters
Bug
"self" should be the first argument to instance methods
Code Smell
Instance and class methods should have at least one positional parameter
Bug
Function parameters' default values should not be modified or assigned
Code Smell
Boolean expressions of exceptions should not be used in "except" statements
Bug
A subclass should not be in the same "except" statement as a parent class
Code Smell
Some special methods should return "NotImplemented" instead of raising "NotImplementedError"
Code Smell
Custom Exception classes should inherit from "Exception" or one of its subclasses
Code Smell
Caught Exceptions must derive from BaseException
Bug
Exceptions' "__cause__" should be either an Exception or None
Bug
Special method "__exit__" should not re-raise the provided exception
Code Smell
Bare "raise" statements should not be used in "finally" blocks
Code Smell
Walrus operator should not make code confusing
Code Smell
JWT should be signed and verified
Vulnerability
Arguments given to functions should be of an expected type
Code Smell
Item operations should be done on objects supporting them
Bug
"in" and "not in" operators should be used on objects supporting them
Bug
Dictionary unpacking should only be done with "mapping" objects
Bug
Raised Exceptions must derive from BaseException
Bug
Operators should be used on compatible types
Bug
Unused scope-limited definitions should be removed
Code Smell
Function arguments should be passed only once
Bug
Cipher algorithms should be robust
Vulnerability
Encryption algorithms should be used with secure mode and padding scheme
Vulnerability
Server hostnames should be verified during SSL/TLS connections
Vulnerability
Server-side templates should not be vulnerable to injection attacks
Vulnerability
Insecure temporary file creation methods should not be used
Vulnerability
Using publicly writable directories is security-sensitive
Security Hotspot
HTML autoescape mechanism should not be globally disabled
Vulnerability
`str.replace` should be preferred to `re.sub`
Code Smell
Passwords should not be stored in plaintext or with a fast hashing algorithm
Vulnerability
Dynamic code execution should not be vulnerable to injection attacks
Vulnerability
Using clear-text protocols is security-sensitive
Security Hotspot
Sending emails is security-sensitive
Security Hotspot
Disabling auto-escaping in template engines is security-sensitive
Security Hotspot
NoSQL operations should not be vulnerable to injection attacks
Vulnerability
HTTP request redirections should not be open to forging attacks
Vulnerability
Logging should not be vulnerable to injection attacks
Vulnerability
Server-side requests should not be vulnerable to forging attacks
Vulnerability
Deserialization should not be vulnerable to injection attacks
Vulnerability
Endpoints should not be vulnerable to reflected cross-site scripting (XSS) attacks
Vulnerability
Having a permissive Cross-Origin Resource Sharing policy is security-sensitive
Security Hotspot
Expanding archive files without controlling resource consumption is security-sensitive
Security Hotspot
Server certificates should be verified during SSL/TLS connections
Vulnerability
Reading the Standard Input is security-sensitive
Security Hotspot
Signaling processes is security-sensitive
Security Hotspot
Using command line arguments is security-sensitive
Security Hotspot
Configuring loggers is security-sensitive
Security Hotspot
Using weak hashing algorithms is security-sensitive
Security Hotspot
Encrypting data is security-sensitive
Security Hotspot
Using regular expressions is security-sensitive
Security Hotspot
Using shell interpreter when executing OS commands is security-sensitive
Security Hotspot
Delivering code in production with debug features activated is security-sensitive
Security Hotspot
Disabling CSRF protections is security-sensitive
Security Hotspot
Unread "private" attributes should be removed
Code Smell
LDAP connections should be authenticated
Vulnerability
Cryptographic key generation should be based on strong parameters
Vulnerability
Weak SSL/TLS protocols should not be used
Vulnerability
Functions and methods should not have identical implementations
Code Smell
Collection content should not be replaced unconditionally
Bug
Unused private nested classes should be removed
Code Smell
Exceptions should not be created without being raised
Bug
Collection sizes and array length comparisons should make sense
Bug
All branches in a conditional structure should not have exactly the same implementation
Bug
Iterable unpacking, "for-in" loops and "yield from" should use an Iterable object
Bug
Variables, classes and functions should be defined before being used
Bug
Functions should use "return" consistently
Code Smell
Cognitive Complexity of functions should not be too high
Code Smell
Allowing both safe and unsafe HTTP methods is security-sensitive
Security Hotspot
The output of functions that don't return anything should not be used
Bug
Database queries should not be vulnerable to injection attacks
Vulnerability
Jump statements should not be redundant
Code Smell
Zero should not be a possible denominator
Bug
Functions returns should not be invariant
Code Smell
String formatting should be used correctly
Code Smell
Identity operators should not be used with dissimilar types
Bug
Conditional expressions should not be nested
Code Smell
Creating cookies without the "HttpOnly" flag is security-sensitive
Security Hotspot
Cipher Block Chaining IVs should be unpredictable
Vulnerability
"__iter__" should return an iterator
Bug
Loops without "break" should not have "else" clauses
Code Smell
Only strings should be listed in "__all__"
Bug
"pass" should not be used needlessly
Code Smell
Doubled prefix operators "not" and "~" should not be used
Code Smell
Non-existent operators like "=+" should not be used
Bug
XML parsers should not be vulnerable to XXE attacks
Vulnerability
"except" clauses should do more than raise the same issue
Code Smell
"__init__" should not return a value
Bug
"__exit__" should accept type, value, and traceback arguments
Bug
"return" and "yield" should not be used in the same function
Bug
"yield" and "return" should not be used outside functions
Bug
The first argument to class methods should follow the naming convention
Code Smell
Method overrides should not change contracts
Code Smell
Regular expressions should not be vulnerable to Denial of Service attacks
Vulnerability
Setting loose POSIX file permissions is security-sensitive
Security Hotspot
Boolean expressions should not be gratuitous
Code Smell
Conditionally executed code should be reachable
Bug
Methods and properties that don't access instance data should be static
Code Smell
The "print" statement should not be used
Code Smell
Increment and decrement operators should not be used
Bug
"<>" should not be used to test inequality
Code Smell
The "exec" statement should not be used
Code Smell
Backticks should not be used
Code Smell
String formatting should not lead to runtime errors
Bug
Python parser failure
Code Smell
Attributes should not be accessed on "None" values
Bug
Using non-standard cryptographic algorithms is security-sensitive
Security Hotspot
Using pseudorandom number generators (PRNGs) is security-sensitive
Security Hotspot
Wildcard imports should not be used
Code Smell
Return values from functions without side effects should not be ignored
Bug
Recursion should not be infinite
Bug
Unnecessary equality checks should not be made
Bug
A secure password should be used when connecting to a database
Vulnerability
Creating cookies without the "secure" flag is security-sensitive
Security Hotspot
XPath expressions should not be vulnerable to injection attacks
Vulnerability
I/O function calls should not be vulnerable to path injection attacks
Vulnerability
LDAP queries should not be vulnerable to injection attacks
Vulnerability
Formatting SQL queries is security-sensitive
Security Hotspot
OS commands should not be vulnerable to command injection attacks
Vulnerability
Hard-coded passwords are security-sensitive
Security Hotspot
Password hashing functions should use an unpredictable salt
Vulnerability
Boolean checks should not be inverted
Code Smell
Files should not be too complex
Code Smell
Two branches in a conditional structure should not have exactly the same implementation
Code Smell
Related "if/else if" statements should not have the same condition
Bug
Unused assignments should be removed
Code Smell
Methods and field names should not differ only by capitalization
Code Smell
Identical expressions should not be used on both sides of a binary operator
Bug
All code should be reachable
Bug
Loops with at most one iteration should be refactored
Bug
New-style classes should be used
Code Smell
Parentheses should not be used after certain keywords
Code Smell
Docstrings should be defined
Code Smell
"\" should only be used as an escape character outside of raw strings
Bug
"break" and "continue" should not be used outside a loop
Bug
Track "TODO" and "FIXME" comments that do not contain a reference to a person
Code Smell
A field should not duplicate the name of its containing class
Code Smell
Variables should not be self-assigned
Bug
A reason should be provided when skipping a test
Code Smell
Module names should comply with a naming convention
Code Smell
Function names should comply with a naming convention
Code Smell
Cyclomatic Complexity of functions should not be too high
Code Smell
Dynamically executing code is security-sensitive
Security Hotspot
Functions and lambdas should not reference variables defined in enclosing loops
Code Smell
Unused local variables should be removed
Code Smell
Track lack of copyright and license headers
Code Smell
Comments should not be located at the end of lines of code
Code Smell
Functions should not have too many lines of code
Code Smell
Control flow statements "if", "for", "while", "try" and "with" should not be nested too deeply
Code Smell
Using hardcoded IP addresses is security-sensitive
Security Hotspot
Cyclomatic Complexity of classes should not be too high
Code Smell
Track uses of noqa comments
Code Smell
Track uses of "NOSONAR" comments
Code Smell
Sections of code should not be commented out
Code Smell
Floating point numbers should not be tested for equality
Bug
Track comments matching a regular expression
Code Smell
Function parameters initial values should not be ignored
Bug
Statements should be on separate lines
Code Smell
String literals should not be duplicated
Code Smell
Functions and methods should not be empty
Code Smell
Unused function parameters should be removed
Code Smell
Local variable and function parameter names should comply with a naming convention
Code Smell
Field names should comply with a naming convention
Code Smell
Unused class-private methods should be removed
Code Smell
Break, continue and return statements should not occur in "finally" blocks
Bug
Functions should not contain too many return statements
Code Smell
Track uses of "TODO" tags
Code Smell
Track uses of "FIXME" tags
Code Smell
Lines should not end with trailing whitespaces
Code Smell
Files should end with a newline
Code Smell
Long suffix "L" should be upper case
Code Smell
Unnecessary imports should be removed
Code Smell
"Exception" and "BaseException" should not be raised
Code Smell
Redundant pairs of parentheses should be removed
Code Smell
Nested blocks of code should not be left empty
Code Smell
Functions, methods and lambdas should not have too many parameters
Code Smell
Mergeable "if" statements should be combined
Code Smell
All "except" blocks should be able to catch exceptions
Bug
Files should not have too many lines of code
Code Smell
Lines should not be too long
Code Smell
Class names should comply with a naming convention
Code Smell
Method names should comply with a naming convention
Code Smell

PySpark's "RDD.groupByKey", when used in conjunction with "RDD.mapValues" with a commutative and associative operation, should be replaced by "RDD.reduceByKey"

intentionality - efficient

maintainability

Code Smell

data-science
pyspark

This rule raises an issue when RDD.groupByKey is used in conjunction with RDD.mapValues and a commutative and associative function instead of RDD.reduceByKey.

Why is this an issue?

How can I fix it?

More Info

The PySpark API offers multiple ways of performing aggregation. When performing aggregations, data is usually shuffled between partitions. This shuffling is needed to compute the result correctly. It has an associated cost that can impact performance, as shuffling moves data over the network between Spark tasks.

There are however cases where some aggregation methods could be more efficient than others. For example when using RDD.groupByKey in conjunction with RDD.mapValues if the function passed to RDD.mapValues is commutative and associative, it is preferable to use RDD.reduceByKey instead. The performance gain from RDD.reduceByKey comes from the amount of data that needs to be moved between PySpark tasks. RDD.reduceByKey will effectively reduce the number of rows in a partition before sending the data over the network for further reduction. On the other hand, when using RDD.groupByKey with RDD.mapValues the reduction is only done after the data has been moved around the cluster, effectively slowing down the computation process by transferring a higher amount of data over the network.

Available In:

Catch issues on the fly,
in your IDE

Detect issues in your GitHub, Azure DevOps Services, Bitbucket Cloud, GitLab repositories

Analyze code in your
on-premise CI

In-IDE

SaaS

Self-Hosted