select id from
customer c
join on
(select customer_id,count(*) as num_orders from orders group by customer_id) co
where c.id = co.customer_id and co.num_orders > 2
To convert this to an SQL correlated subquery, the join is removed and the subquery gets moved to the where clause. The conditions on the join get pushed down into the subquery. Since they refer to the columns from the outer query (on customer), this is what makes the subquery a correlated subquery:
select id,forename,surname from
customer c
where (select count(*) from orders o where o.customer_id = c.id) > 2
In HQL, this becomes:
select id,forename,surname from
Customer c
where (select count(*) from Order o where o.customer.id = c.id) > 2
Now suppose we want the most recent order for each customer. In the most basic variation, where you just want to get the timestamp of the most recent order for each customer id, you get both columns out of your grouping query:
select max(timestamp),customer_id from orders group by customer_id
In the more general case, where you want to get further details of either the customer or the order, you have to join this query to the customer or orders table respectively. For example, to get the order id and total as well:
select o.id,o.timestamp,o.order_total from
orders o
join
(select max(timestamp) as max_ts,customer_id from orders group by customer_id) as x
on o.customer_id = x.customer_id and o.timestamp = x.max_ts
To convert this to a correlated subquery, the subquery gets moved from the join into the where clause, and the additional condition on the join becomes part of the subquery where clause. Note that the group by can be removed, since the restriction on customer_id has become part of the correlated subquery where clause.
select o1.id,o1.timestamp,o1.order_total from
orders o1
where o1.timestamp in
(select max(timestamp) from orders o2 where o1.customer_id = o2.customer_id)
In HQL this would become:
select o1.id,o1.timestamp,o1.orderTotal from
Order o1
where o1.timestamp in
(select max(timestamp) from Order o2 where o1.customer.id = o2.customer.id)
Suppose we want to add some details of the customer as well. In sql we would join this to the customer table:
select c.id as cust_id,c.forename,c.surname,o1.id,o1.timestamp,o1.order_total
from orders o1
join customer c on o1.customer_id = c.id
where o1.timestamp in
(select max(timestamp) from orders o2 where o1.customer_id = o2.customer_id)
In HQL, we simply navigate the association from order to customer to get the customer info:
select o1.customer.id,o1.customer.forename,o1.customer.surname,
o1.id,o1.timestamp,o1.orderTotal
from Order o1 where o1.timestamp in
(select max(timestamp) from Order o2 where o1.customer.id = o2.customer.id)
So, are these queries equivalent to the corresponding join queries, and should you always rewrite them in this format? The answer to the first question is that technically, a subquery could get a different execution plan at the database. In theory, evaluating a correlated subquery could mean that the database evaluates the correlated subquery separately for every single row in the outer query. Clearly, this would be a very expensive operation. However, for correlated subqueries that are actually equivalent to a join query, a good database query optimiser will recognise this, and follow the same execution plan as it would do for the join query. i.e. it will perform a join operation to get a result set that contains all of the rows in the subquery as a single operation, rather than evaluating the query once for every row in the outer query. Certainly, if I run queries like the above on SQL Server and look at the execution plan, I can see it is equivalent to the corresponding join query. What about the second question? Even if you can rewrite these sorts of query, should you? Well, this probably comes down to the normal questions surrounding Hibernate - the Hibernate version of the query won't need updating if you change the tables or columns in the database, whereas the SQL version would. If you want a query that returns objects, the Hibernate version will make this easier for you. As a general rule, the Hibernate versions are going to be easier to use and maintain.
This article is part of a series on Hibernate. You might be in interested in other articles in the series:
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations
-
Recent Posts
Recent Comments
- sherif sadek on Understanding Hibernate session flushing
- Koushik Paul on Hibernate query limitations and correlated sub queries
- Joseph Albert on Showing unused TestNG tests
- Iván on Maven offline build fails to resolve artifacts in your local repository
- Peter on Comparing two MySQL databases
Archives
- January 2023
- July 2022
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- December 2020
- June 2020
- May 2020
- January 2019
- November 2018
- September 2017
- June 2017
- September 2016
- August 2016
- February 2016
- July 2015
- November 2014
- October 2014
- August 2014
- April 2014
- February 2014
- December 2013
- November 2013
- May 2013
- February 2013
- January 2013
- October 2012
- September 2012
- July 2012
- April 2012
- February 2012
- January 2012
- November 2011
- October 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- November 2010
- October 2010
Categories
Meta
1 Response to Hibernate query limitations and correlated sub queries