Distinct, order by and Hibernate HQL

Recently someone posted a question on stackoverflow explaining that they were having a problem with a Hibernate HQL query that involved both order by and a distinct in the select. The query was across five linked entities – a many to many association, mapped by an entity called PreferenceDateETL, which links the Preference and DateETL entities, and then two properties on the join entity – Employee and Corporation. The person asking the question explained that they wanted to write a query like the following:

select distinct pd.preference
from PreferenceDateETL pd
where pd.corporation.id=:corporationId
and pd.preference.employee.deleted=false
and pd.deleted=false
and pd.preference.deleted=false
and pd.dateETL.localDate>=:startDM
and pd.dateETL.localDate<=:endDM and pd.preference.approvalStatus!=:approvalStatus order by pd.preference.dateCreated

However, when trying to write this, he got an exception saying that the date created is not in the select list. What is going on here and how can it be fixed? Well, the order by clause is always evaluated last in a database query, so you can only order on columns you have selected. Although it appears as though the above query will have all of the columns, in fact, Hibernate ends up having two instances of the preference table in its query, and the order by clause refers to a different one to the one in the select, hence the exception. The generated SQL has the form:

select
distinct preference1_.id as id1_76_,
....things...
from
preference_date_etl preference0_
inner join
preference preference1_
on preference0_.preference_id=preference1_.id cross
join
preference preference2_ cross
join
employee employee3_ cross
join
date_etl dateetl5_
where
...things...
order by
preference2_.date_created

The problem with this query is really that using distinct is unnecessary. It is required in the above form of the query, because the query has been written to pick out instances of the many to many entity that we are interested in, then from those, navigate to the preference objects they link to. By definition, because the many to many entities are representing a many to many relationship, this can give you a list that includes duplicates. You can avoid this by restructuring the query so that the query on the many to many entities becomes a subquery. i.e. you pick out the many to many entities you are interested in, and then say "now get me all of the preference objects that are in the list of preference objects referred to by the PreferenceDateETL objects". This query doesn't have any duplicates, hence you do not need the distinct, and the order by clause will work correctly. Here is the HQL:

from Preference p where p in
(select pd.preference from PreferenceDateETL pd
where pd.corporation.id=:corporationId
and pd.deleted=false
and pd.dateETL.localDate>=:startDM
and pd.dateETL.localDate<=:endDM) and p.employee.deleted=false and p.deleted=false and p.approvalStatus != :approvalStatus order by p.dateCreated

If you want to read the original question, it is here:

http://stackoverflow.com/questions/25545569/hibernate-using-multiple-joins-on-same-table

This article is part of a series of articles on Hibernate. Others in the series include:

Hibernate query limitations and correlated sub-queries
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

This entry was posted in Hibernate and tagged . Bookmark the permalink.

1 Response to Distinct, order by and Hibernate HQL

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

517,978 Spambots Blocked by Simple Comments