Distinct, order by and Hibernate HQL

Recently someone posted a question on stackoverflow explaining that they were having a problem with a Hibernate HQL query that involved both order by and a distinct in the select. The query was across five linked entities – a many to many association, mapped by an entity called PreferenceDateETL, which links the Preference and DateETL entities, and then two properties on the join entity – Employee and Corporation. The person asking the question explained that they wanted to write a query like the following:

select distinct pd.preference 
from PreferenceDateETL pd 
    where pd.corporation.id=:corporationId 
    and pd.preference.employee.deleted=false 
    and pd.deleted=false 
    and pd.preference.deleted=false 
    and pd.dateETL.localDate>=:startDM 
    and pd.dateETL.localDate<=:endDM 
    and pd.preference.approvalStatus!=:approvalStatus 
order by pd.preference.dateCreated

However, when trying to write this, he got an exception saying that the date created is not in the select list. What is going on here and how can it be fixed? Well, the order by clause is always evaluated last in a database query, so you can only order on columns you have selected. Although it appears as though the above query will have all of the columns, in fact, Hibernate ends up having two instances of the preference table in its query, and the order by clause refers to a different one to the one in the select, hence the exception. The generated SQL has the form:

select
        distinct preference1_.id as id1_76_,
        ....things...
    from
        preference_date_etl preference0_ 
    inner join
        preference preference1_ 
            on preference0_.preference_id=preference1_.id cross 
    join
        preference preference2_ cross 
    join
        employee employee3_ cross 
    join
        date_etl dateetl5_ 
    where
        ...things... 
    order by
        preference2_.date_created

The problem with this query is really that using distinct is unnecessary. It is required in the above form of the query, because the query has been written to pick out instances of the many to many entity that we are interested in, then from those, navigate to the preference objects they link to. By definition, because the many to many entities are representing a many to many relationship, this can give you a list that includes duplicates. You can avoid this by restructuring the query so that the query on the many to many entities becomes a subquery. i.e. you pick out the many to many entities you are interested in, and then say “now get me all of the preference objects that are in the list of preference objects referred to by the PreferenceDateETL objects”. This query doesn’t have any duplicates, hence you do not need the distinct, and the order by clause will work correctly. Here is the HQL:

from Preference p where p in 
  (select pd.preference from PreferenceDateETL pd
  where pd.corporation.id=:corporationId
  and pd.deleted=false
  and pd.dateETL.localDate>=:startDM
  and pd.dateETL.localDate<=:endDM)
and p.employee.deleted=false
and p.deleted=false
and p.approvalStatus != :approvalStatus
order by p.dateCreated

If you want to read the original question, it is here:

http://stackoverflow.com/questions/25545569/hibernate-using-multiple-joins-on-same-table

This article is part of a series of articles on Hibernate. Others in the series include:

Hibernate query limitations and correlated sub-queries
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

This entry was posted in Hibernate and tagged . Bookmark the permalink.

One Response to Distinct, order by and Hibernate HQL

  1. Alexei Darias says:

    Thank you very much, this saved my day.

Leave a Reply

Your email address will not be published. Required fields are marked *

502,303 Spambots Blocked by Simple Comments

HTML tags are not allowed.