UNION is a convenient syntax to combine the results of two or more SQL statements because it helps you cut a complex problem into
multiple simple SQL statements. But when it comes to execution, using UNION is debatable.
First, it may be possible to fuse two simple SQL statements into a bigger one that will run faster. Second, UNION is significantly
less performant compared to UNION ALL because it removes duplicated entries and runS an internal DISTINCT to achieve
this.
UNION ALL does not remove duplicates and returns all the results from the queries. It performs faster in most cases compared to
UNION. Nevertheless, the quantity of data returned by UNION ALL can be significantly larger than with UNION. On
a slow network, the performance gain of using UNION ALL instead of UNION can be negated by the time lost in the larger data
transfer.
This rule raises an issue on each UNION. It’s up to the developer to challenge its use and see if there is a better way to rewrite
without UNION.
Noncompliant code example
-- case #1
SELECT EMAIL FROM EMPLOYEES WHERE COUNTRY = 'FR'
UNION -- Noncompliant
SELECT EMAIL FROM EMPLOYEES WHERE COUNTRY = 'CH'
-- case #2
-- if you care about not having duplicated entries, then UNION is the good choice
SELECT EMAIL FROM EMPLOYEES
UNION -- Noncompliant
SELECT EMAIL FROM CUSTOMERS
Compliant solution
-- case #1
SELECT EMAIL FROM EMPLOYEES WHERE COUNTRY = 'FR' OR COUNTRY = 'CH'
-- case #2
-- if you don't care about duplicated entries in the results of this UNION, then UNION ALL should be preferred
SELECT EMAIL FROM EMPLOYEES
UNION ALL
SELECT EMAIL FROM CUSTOMERS