What is difference between UNION vs UNION ALL in SQL?

Though both UNION and UNION ALL is used to combine results of two SELECT queries, the main difference between them is that UNION doesn't include duplicate record but UNION ALL does. Another difference between them is that UNION ALL is faster than UNION but may look slow because it returns more data which takes more time to travel via the network. The difference between UNION and UNION ALL is one of the tricky SQL questions, especially for developers, who has not used this useful keyword ever. Since UNION clause is not as common as a SELECT clause in SQL, it's usually asked in a telephonic round of programming interviews to check whether the candidate is comfortable with SQL or not. It's in the same league of questions like clustered vs non-clustered index or  primary vs unique keyUNION is very different than other SQL commands because it operates on data rather than columns.

Anyway, answer of this question is simple, though both UNION and UNION ALL are used to combine result of two separate SQL queries on same or different table,  UNION does not keep duplicate record (a row is considered duplicate if value of all columns are same), while UNION ALL does.

Since you mostly don't want duplicate rows,  UNION is preferred over UNION ALL in reporting and application development. By the way, you should keep in mind that UNION ALL performance better than UNION because it doesn't have to remove duplicate, so no extra work.

This keyword is very well supported by all major database e.g. Oracle, Microsoft SQL Server, MySQL and PostgreSQL. Another thing to keep in mind is amount data returned by UNION ALL; if your database server is quite far away and you have limited bandwidth, UNION ALL may appear slower than UNION because of number of duplicates it returned.

Cost of transferring duplicate row can exceed the query execution benefits in many cases. We will see couple of examples UNION and UNION ALL in SQL, but before that few things to keep in mind. In order to combine results of two queries, they must contain same number of columns.

For example if one query contains 3 columns and other contains 4 columns then you can not use UNION or UNION ALL. This is because a row will only be considered duplicated when all column will have same value, irrespective of name of columns itself.




UNION and UNION ALL Example in SQL Server

Let's see one simple example of UNION and UNION ALL, this will not only show you how they work but also where you can use it. This example is from my sample database and following screenshot is from SQL Server Management Studio 2014. We have two tables, Employee and Customer. In order to use UNION and UNION ALL, I have kept same persons as employee and customer, so you will see same id on emp_id and customer_id, and same name as well. If you look at result of first two select queries, you will see that first query returns two rows and second query returns three rows, where two rows have exactly same data as first query. Key things to note is that column names are different in both result set, first one has emp_id and emp_name, while second data set has customer_id and customer_name, but most important both dataset has only two columns. This is must in order to combine them using UNION and UNION ALL keywords. Third query is an example of how to use UNION clause in SQL, you can see that combined result has just three columns, all are unique. Duplicate columns from second result set was not included. This is more like how you do UNION in Set theory, where final result contains data from both set. Fourth query is how you should use UNION ALL, it contains five rows, two from first query and three from second query. It has not removed duplicate rows from second query, that's why you see Ken and Bob repeating twice. This example teaches us core concept that UNION doesn't depend upon column name but the data. You can combine result of as many queries as possible until number of columns in all of them is same and data is from same set.
Difference between UNION ALL and UNION in SQL SERVER


Regarding performance, you need to run UNION and UNION ALL with large database, containing millions of rows. There you can monitor how much time both takes and compare them. Theoretically UNION ALL should take less time to execute but more time to transfer data to client.


Difference between UNION and UNION ALL command in SQL

Now we know how union and union all works and has some background by following above examples, let's summarise the similarities and difference between them for quick revision :

1) Both UNION and UNION ALL are used to combine result of two separate SQL query, it could be on same table or different table but data should be same. E.g. if product_id is used in two table e.g. Product and Order, then two sql queries which pulls product_id from these two table can be combined using UNION or UNION ALL.

2) Key difference between UNION and UNION ALL is that former will remove duplicates but later will keep them. In another words, UNION is equal to running distinct on output of UNION ALL. For example, if product_id 10 is returned by both of SQL query then it will only appear once if you use UNION and appear twice if you use UNION ALL.

3) Due to above difference query execution time of UNION ALL is smaller than UNION, which means former runs faster than later. So if you want faster output and don't care of duplicates use UNION ALL.

4) Keep in mind that benefits gained by not removing duplicates can be easily wiped out by transferring more data over a poor bandwidth network connection. That's why in practice some time UNION ALL appear slower than UNION because it return lot of data with duplicates which require more time to travel from database server to client machine. So evaluate performance of UNION and UNION ALL case by case.

5) Another worth noting thing while using UNION and UNION ALL is that all queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists. For example if result of query 1 has three column and result of query 2 has two column then you cannot combine them using UNION command.


That's all on difference between UNION and UNION ALL command in SQL. It's one of the useful command to combine result of two SELECT queries when they contain same data. There are many practical scenarios where UNION is very useful, for example when you need to create list out of different tables containing data from same set. Main difference between UNION and UNION ALL is about duplicates, former removes it while later keeps it, other difference between them on performance and networking bandwidth usage can be easily derived by knowing this difference. Also keep in mind that it is well supported big three database e.g. MySQL, Oracle and SQL Server. Let us know if you have been asked this question in your SQL interview.

2 comments:

  1. I am going to explain a use of union operator in SQL Server Database with a real life scenario and example.

    The UNION operator is used to combine the result-set of two or more SELECT statements.

    Notice that each SELECT statement with the UNOIN must have the same number of columns. The columns must also have similar data types. Also, the columns in each SELECT statement must be in the same order.

    Before moving towards union query, let understand what is our scenario. Suppose we have three tables (Product, Customer and Order) in our database and we want to see the number of orders by month name, week days and between three hours gaps in day.
    for full implementation to refer here: http://www.mindstick.com/Articles/adeb48fa-85cc-4671-8b2e-86adfb34406d/Union%20Example%20in%20SQL%20Server

    ReplyDelete
  2. UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.UNION and UNION ALL should work on all SQL Servers.

    ReplyDelete