Difference between Correlated and Non-Correlated Subquery in SQL

The correlated subquery is one of the tricky concepts of SQL. It's similar to recursion in programming which many programmers struggle to understand, but like recursion, it also offers the unique capability to solve many SQL query based problems e.g. second highest salary problem where you need to compare one row of the table to another row. It gives you a different kind of power. The main difference between a regular, non-correlated and correlated subquery in SQL is in their working, a regular subquery just run once and return a value or a set of values which is used by outer query, but correlated subquery runs for each row returned by the outer query because the output of the whole query is based upon comparing the data returned by one row to the all other rows of the table. That's why it is also very slow and generally avoided until you don't know any other way to solve the problem.

One of the most popular examples of the correlated subquery is about finding the second highest salary of an employee in a given table. Even though there are multiple ways to solve this problem e.g you can use window functions like row_number or rank but using regular subquery to solve this problem is the easiest way.

Btw, even though   you can solve this problem by using a regular query it become tricky when Interviewer extend this problem to find the Nth highest salary then you just can't go with regular subquery because of unlimited nesting.

Correlated subquery solves this problem elegantly as shown here. It compares data returned by outer query e.g. salary and compares with other salaries to find out exactly how many salaries are higher than this salary.

Difference between Correlated and Regular Subquery in SQL

The difference between correlated and regular subquery is also a frequently asked SQL interview question. Mostly asked on telephonic interview where they cannot ask you to solve query and check the fundamentals and theoretical concepts.

In this article, I am going to compare correlated subquery with the regular one of different parameters e.g. their working, speed, performance, and dependency. I'll also give you a good example of correlated subquery e.g. the Nth highest salary problem and explain how exactly it solves the problem.

So, if interviewer ask you to find the 4th highest salary then there can only be at most 4 salary which are equal to or greater than the 4th highest salary.  This is just an example, you can use correlated subquery to solve many such problems in the world of data and SQL.

In short, here are the main difference between correlated and non-correlated subqueries in SQL

1) Working 
A non-correlated subquery is executed only once and its result can be swapped back for a query, on the other hand, a correlated subquery executed multiple times, precisely once for each row returned by the outer query.

For example, following query is an example of non-correlated subquery:

SELECT MAX(Salary) FROM Employee 
WHERE Salary NOT IN ( SELECT MAX(Salary) FROM Employee)

Here the subquery is SELECT MAX(Salary) from Employee, you can execute and substitute the result of that query e.g. if subquery return 10000 then outer query is reduced to

SELECT MAX(Salary) from Employee where Salary NOT IN (10000). 

This is not possible with a correlated subquery, which needs to be executed multiple times as shown below:

SELECT e.Name, e.Salary FROM Employee e
WHERE 2 = (
SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary)

In this example, the subquery is SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary, you cannot swap it's value for the outer query because it needs to be executed for each employee.

Let's say the first row of employee has salary 5000, in this case, e.salary will be 500 and subquery will be

SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= 5000

and subquery will find how many salaries are higher than 5000 if count return 2 then it's the second highest salary. This logic needs to be executed for each row outer query will process.

2) Dependency
A correlated subquery depends upon the outer query and cannot execute in isolation, but a regular or non-correlated subquery doesn't depend on the outer query and can execute in isolation.

From the above example, you can see that a correlated subquery i.e. SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary depends upon outer query because it needs the value of e.salary, which comes from table listed on outer query.

On the other hand, regular subquery, SELECT MAX(Salary) FROM Employee doesn't depends upon the outer query and can be executed in isolation or independently of the outer query. You can read a good SQL book to learn this more detail e.g. Head First SQL by Lynn Beighley.

Difference between Correlated and Non-Correlated Subquery in SQL

3) Speed and Performance
 A correlated subquery is much slower than the non-correlated subquery because in former, the inner query executes for each row of the outer query. This means if your table has n rows then whole processing will take the n * n = n^2 time, as compared to 2n times taken by a non-correlated subquery.

This happens because to execute non-correlated subquery you need to examine just n rows of the table and similar to execute the outer query you need to examine n rows, so in total n + n = 2n rows.

This is the reason you should be very careful using a correlated subquery with large tables e.g. tables with millions of rows because that can take a long time and could potentially block other jobs and queries from accessing the table.

In many cases, you can replace correlated subquery with inner join which would result in better performance. For example, to find all employees whose salary is greater than the average salary of the department you can write following correlated subquery:

SELECT e.id, e.name
FROM Employee e
WHERE salary > (
SELECT AVG(salary)
FROM Employee p
WHERE p.department = e.department)

Now, you can convert this correlated subquery to a JOIN based query for better performance as shown below:

SELECT e.id, e.name
(SELECT department, AVG(salary) AS department_average
FROM Employee
GROUP BY department) AS t ON e.department = t.department
WHERE e.salary > t.department_average;

If you want to learn more about how changing a query from subquery to join result in better performance and other small changes which gives big performance boost, I strongly suggest you to read a good book on SQL query and performance optimization e.g. SQL Performance Explained by Markus Winand.
Best book to learn SQL index and performance

That's all about the difference between correlated and non-correlated subquery in SQL. You have learned that correlated subquery is executed for each row returned by an outer query, which makes it very slow, but at the same time gives it the power to compare one row of the table to other rows of the table. That's why sometimes only solution possible was only by using a correlated subquery.

On the other hand regular or non-correlated subquery return a result which is then used by the outer query. It only executed one time and not for every row returned by the outer query, hence it is faster than a correlated subquery.

Other SQL Interview Questions article you may like to explore
  • How to find the second highest salary in MySQL? (solution)
  • What is the difference between UNION and UNION ALL in SQL? (answer)
  • The difference between TRUNCATE and DELETE in SQL? (answer)
  • The difference between self and equi-join in SQL? (answer)
  • What is the difference between View and Materialized View in Oracle database? (answer)
  • How to find second highest salary in Oracle using ROW_NUMBER? (answer)
  • The difference between WHERE and HAVING clause in SQL? (answer)
  • The difference between LEFT and RIGHT OUTER JOIN in SQL? (answer)
  • How to find duplicate records in a table? (query)

Thanks for reading this article so far. If you like this SQL Interview question and explanation then please share with your friends and colleagues. If you have any question, suggestion or feedback then please drop a comment and I'll try to answer it.

P.S. - If you are looking for online training/course to learn SQL from scratch, I suggest you joining Introduction to SQL by Jon Flanders. It's one of the best sourse to learn SQL fundamentals e.g. join, subquery, aggregate functions, window functions, gropuing data, advanced filtering and SQL query optimization.

No comments:

Post a Comment