Understanding Hibernate session flushing

When you interact with Hibernate, you do so via a Hibernate session. Hibernate sessions are flushed to the database in three situations:

  • When you commit a (Hibernate) transaction.
  • Before you run a query.
  • When you call session.flush().

However, it is extremely important to understand the second of these situations – session flushing prior to running a query. This does not happen before every query! Remember, the purpose of the Hibernate session is to minimise the number of writes to the database, so it will avoid flushing to the database if it thinks that it isn’t needed. Hibernate will try to work out if the objects currently in your session are relevant to the query that is running, and it will only flush them if it thinks they are. You can easily see the effects of this behaviour. For example, suppose I have Customer and Order objects. A customer has a set of orders. To begin with, I’ll just create the orders without any reference to the customers:

Order order1 = new Order();
Order order2 = new Order();
Order order3 = new Order();
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

At this stage, the three objects will be saved to the database. Now let’s create a customer and save them:

Customer customer = new Customer();

Okay, now the customer is in the database. Let’s now create the link between the customer and their orders:


Now the Hibernate session is dirty, meaning that the objects in the session have changes that haven’t been persisted to the database yet. If we run an SQL query, Hibernate doesn’t know to flush the session, and the results you get back do not reflect the current states of the objects:

Query query = session.createSQLQuery("select id,customer_id from orders");
List results = query.list();

Here are the results of this query:

Hibernate: select id,customer_id from orders

Even running a Hibernate query in HQL or with a criteria won’t cause the session to flush if Hibernate doesn’t think the query concerns the objects that are dirty. e.g.

Query query = session.createQuery("from Customer");

This won’t cause a session flush because it is the Order objects that are dirty. Hence, a Hibernate query involving these objects will cause a session flush. e.g.

Query query = session.createQuery("from Order");

If you want to see this code running, it is available from github:


This article is part of a series on Hibernate. You might be interested in some of the others:

Hibernate query limitations and correlated sub-queries
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

Posted in Hibernate | Tagged | 1 Comment

Hibernate query limitations and correlated sub queries

It’s well known that one of the key limitations of writing queries in Hibernate, using either HQL or criteria, is that you cannot write queries that have a join in the from clause. What is less well known is that for some simple queries that involve a join in the from clause, you can rewrite them as correlated subqueries, to make them runnable by Hibernate. Let’s look at an example. Imagine we have a customer table and an orders table. The orders table has a foreign key to the customer table. Suppose you want to get a report of all customers that have three or more orders. If you were to write this in SQL using a join, you would write:

select id from 
customer c 
    join on 
(select customer_id,count(*) as num_orders from orders group by customer_id) co 
    where c.id = co.customer_id and co.num_orders > 2

To convert this to an SQL correlated subquery, the join is removed and the subquery gets moved to the where clause. The conditions on the join get pushed down into the subquery. Since they refer to the columns from the outer query (on customer), this is what makes the subquery a correlated subquery:

select id,forename,surname from
customer c 
where (select count(*) from orders o where o.customer_id = c.id) > 2

In HQL, this becomes:

select id,forename,surname from 
Customer c 
where (select count(*) from Order o where o.customer.id = c.id) > 2

Now suppose we want the most recent order for each customer. In the most basic variation, where you just want to get the timestamp of the most recent order for each customer id, you get both columns out of your grouping query:

select max(timestamp),customer_id from orders group by customer_id

In the more general case, where you want to get further details of either the customer or the order, you have to join this query to the customer or orders table respectively. For example, to get the order id and total as well:

select o.id,o.timestamp,o.order_total from 
orders o 
(select max(timestamp) as max_ts,customer_id from orders group by customer_id) as x 
on o.customer_id = x.customer_id and o.timestamp = x.max_ts

To convert this to a correlated subquery, the subquery gets moved from the join into the where clause, and the additional condition on the join becomes part of the subquery where clause. Note that the group by can be removed, since the restriction on customer_id has become part of the correlated subquery where clause.

select o1.id,o1.timestamp,o1.order_total from 
orders o1 
where o1.timestamp in 
(select max(timestamp) from orders o2 where o1.customer_id = o2.customer_id)

In HQL this would become:

select o1.id,o1.timestamp,o1.orderTotal from
Order o1
where o1.timestamp in 
(select max(timestamp) from Order o2 where o1.customer.id = o2.customer.id)

Suppose we want to add some details of the customer as well. In sql we would join this to the customer table:

select c.id as cust_id,c.forename,c.surname,o1.id,o1.timestamp,o1.order_total 
from orders o1
join customer c on o1.customer_id = c.id
where o1.timestamp in 
(select max(timestamp) from orders o2 where o1.customer_id = o2.customer_id)

In HQL, we simply navigate the association from order to customer to get the customer info:

select o1.customer.id,o1.customer.forename,o1.customer.surname,
from Order o1 where o1.timestamp in 
  (select max(timestamp) from Order o2 where o1.customer.id = o2.customer.id)

So, are these queries equivalent to the corresponding join queries, and should you always rewrite them in this format? The answer to the first question is that technically, a subquery could get a different execution plan at the database. In theory, evaluating a correlated subquery could mean that the database evaluates the correlated subquery separately for every single row in the outer query. Clearly, this would be a very expensive operation. However, for correlated subqueries that are actually equivalent to a join query, a good database query optimiser will recognise this, and follow the same execution plan as it would do for the join query. i.e. it will perform a join operation to get a result set that contains all of the rows in the subquery as a single operation, rather than evaluating the query once for every row in the outer query. Certainly, if I run queries like the above on SQL Server and look at the execution plan, I can see it is equivalent to the corresponding join query. What about the second question? Even if you can rewrite these sorts of query, should you? Well, this probably comes down to the normal questions surrounding Hibernate – the Hibernate version of the query won’t need updating if you change the tables or columns in the database, whereas the SQL version would. If you want a query that returns objects, the Hibernate version will make this easier for you. As a general rule, the Hibernate versions are going to be easier to use and maintain.

This article is part of a series on Hibernate. You might be in interested in other articles in the series:

Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

Posted in Hibernate | Tagged | Leave a comment

Bespoke join conditions with Hibernate JoinFormula

Recently someone posted a question on stackoverflow asking how to deal with a database join, where the foreign key could reside in one of two different columns. This situation is sometimes found in a legacy database schema, where someone has chosen to use two different columns, because they want them to represent different sorts of relationship, but relationships that join to the same kind of entity. In the example on stackoverflow, a customer has to link to a UserAccount object, but this could either be their own, using a foreign key in the “accountId” column, or a partner account, with the foreign key in the “partnerId” column. How do you deal with this?

The answer is to use a join formula to pick out the non-null column and use it to join:

formula = @JoinFormula(
value = "case when accountId is not null then accountId " +
 "when partnerId is not null then partnerId end", 
referencedColumnName="id")) }
public UserAccount getUserAccount() {
	return userAccount;

If you want to see the full working example, it is on github:


The Customer class shows the join formula and you can see a test which demonstrates it in the Tests class, in the joinFormula() test method.

Posted in Hibernate | Tagged | Leave a comment

Hibernate example 4 – many to many associations

I’ve put some code on Github that shows three ways of modelling many to many associations with JPA / Hibernate:

  1. Using a join table that is not mapped as either an entity or an embedded component type.
  2. Mapping the join table as an entity in its own right, so you can add properties to it.
  3. Using a collection of embedded components, to simplify the management of the persistence of the association.

The code is available from:


Or the direct download link for the zip is:


I’ll explain how each of these ways of modelling the association work.

Using a join table not mapped as an entity or component

If you simply want a many to many association, without any properties attached to the association, then this is a good choice. You use a join table, and you can navigate from either end of the association. In my example, the Category and Item classes have this association, and you can see it mapped in the Category class:

	@ManyToMany(cascade = CascadeType.ALL)
		name = "CATEGORY_ITEM",
		joinColumns = { @JoinColumn(name = "CATEGORY_ID")},
		inverseJoinColumns = {@JoinColumn(name = "ITEM_ID")}
	public Set<Item> getItems() {
		return items;

Since the join table isn’t an entity, it doesn’t have a corresponding Java class.

Mapping the join table as an entity in its own right

If you want to add properties to the association, then you have to map the join table as an entity. Typical examples might be the username of the person who created the association, or the time it was created. In this case, the join table is represented by a class which has a primary key composed of the two foreign keys to the entities it is linking. I won’t show all of the code here, just the start of the class:

public class CategoryItemRelationship {
	private Id id = new Id();
	@JoinColumn(name = "CATEGORY_ID", insertable = false, updatable = false)
	private Category2 category;
	@JoinColumn(name = "ITEM_ID", insertable = false, updatable = false)
	private Item2 item;
	private Date dateAdded;
	public static class Id implements Serializable {
		@Column(name = "CATEGORY_ID")
		private Long categoryId;
		@Column(name = "ITEM_ID")
		private Long itemId;
		public Id() {}
		public Id(Long categoryId, Long itemId) {
			this.categoryId = categoryId;
			this.itemId = itemId;

Because the id is a composite, it is represented by a static inner class and mapped using the @EmbeddedId annotation. You can see that the database columns for the category and item foreign keys are updated by this static inner Id class. Hence when the item and category instance variables in the outer class are mapped as a join, they are marked as not insertable or updatable. Similarly, when we make the association bi-directional, so that we can navigate from the Item2 and Category2 classes across the association, we have to declare that the association is mapped and controlled by the intermediate CategoryItemRelationship class. For example, in the Category2 class:

	@OneToMany(mappedBy = "category", cascade = CascadeType.ALL)
	public Set<CategoryItemRelationship> getCategoryItemRelationships() {
		return categoryItemRelationships;

Using a collection of components

This is a less common choice. When you use a component, it does not have its own lifecycle. Rather, it is owned by an entity. It also does not have an id, so you cannot retrieve it from the database using an id. (Obviously in database terms a component stored in its own table will have a primary key, but this is never represented in the component class as an id field.) Finally, because a component can only have one owning entity (since its lifecycle is tied to the owning entity), you cannot enable bidirectionality on an association mapped using components. However, using a component won’t require configuring any cascade settings, so could be an appropriate choice if you have an association that you think will generally be accessed and administered from one side. In my example, this component is the CategorizedItem class:

public class CategorizedItem {
	private String username;
	private Date dateAdded = new Date();
	private Item3 item;
	private Category3 category;
	public CategorizedItem(String username,
		Category3 category,
		Item3 item) {
		this.username = username;
		this.category = category;
		this.item = item;

You can see that because it is a component rather than an entity, it is given the @Embedded annotation. Since it does not need an identifier, it doesn’t require the static inner Id class used by the entity approach. At a database level, the primary key of the categorized item table will be a composite of all columns. To link the Category3 class to the CategorizedItem, we use the @ElementCollection and @CollectionTable annotations:

public class Category3 {
	private Long id;
	private String name;
	private Set<CategorizedItem> categorizedItems = new HashSet<CategorizedItem>();
	@CollectionTable(name = "CATEGORIZED_ITEMS", joinColumns = @JoinColumn(name = "CATEGORY_ID"))
	public Set<CategorizedItem> getCategorizedItems() {
		return categorizedItems;
	public void setCategorizedItems(Set<CategorizedItem> categorizedItems) {
		this.categorizedItems = categorizedItems;
Posted in Hibernate, Java | Tagged , | Leave a comment

Hibernate example 3 – one to one associations

Just put a Hibernate example of the three different ways to create one-to-one associations on github:


It contains a Customer entity, which has one-to-one associations to a UserProfile, MarketingPreferences and a Wistlist.

  1. Foreign key relationship. Customer has a foreign key to UserProfile.
  2. Shared primary key. The MarketingPreferences entity is set to use the same primary key as the customer, so its primary key is also a foreign key to the customer.
  3. Join table. Customer and wishlist are linked by a join table, appropriate if the association is optional.

The code is built using maven, so you run the tests with:

mvn test

I’ll quickly explain how each type of association is mapped.

The relationship between Customer and UserProfile is a foreign key, which is the default for a one to one association, so you simply need the @OneToOne annotation in the Customer class:

	private UserProfile userProfile;
	@OneToOne(cascade = CascadeType.ALL)
	public UserProfile getUserProfile() {
		return userProfile;

You can see I’ve also set a cascade on this mapping, so that if you create a new customer and user profile, the user profile will be saved automatically, but it isn’t related to how the mapping works.

Then, if you want to make the association bidirectional, in the UserProfile class, add a Customer instance variable, but specify that the association is mapped and controlled by the userProfile instance variable in the Customer class. This is the standard set up for bi-directional associations – you must not define the association on both sides, as this would actually create two associations. The code in UserProfile is:

	@OneToOne(mappedBy = "userProfile")
	public Customer getCustomer() {
		return customer;

For the MarketingPreferences, it is set to always get the same value as its primary key as its corresponding Customer instance. Hence the primary key for MarketingPreferences is also a foreign key to the Customer. In the MarketingPreferences class this is mapped as:

	private Long id;
	private Customer customer;
	public Long getId() {
		return id;
	public Customer getCustomer() {
		return customer;

In most classes, the id would be generated by the database, so the @Id annotation would be accompanied by @GeneratedValue. Here, the id must be obtained from the associated Customer class. We have an instance variable of type Customer, and the @MapsId annotation tells Hibernate that the foreign key for it should not be put in a new database column, but instead is to be used as the primary key.

To make this bidirectional, on the Customer side, you must tell Hibernate to use the primary key of MarketingPreferences when it joins the database tables:

	@OneToOne(cascade = CascadeType.ALL)
	public MarketingPreferences getMarketingPreferences() {
		return marketingPreferences;

Finally there is the association between Customer and Wishlist, which is mapped by an intermediate join table. This approach is useful for two reasons. Firstly, when the assocation is optional – although you could have a null database foreign key, this isn’t best practice. Secondly, if you want to add properties to the association, the join table gives you a place to put them. The mapping in the Customer class is:

	@OneToOne(cascade = CascadeType.ALL)
	@JoinTable( name = "CUSTOMER_WISHLIST", 
		joinColumns = @JoinColumn( name = "CUSTOMER_ID"),
		inverseJoinColumns = @JoinColumn( name = "WISHLIST_ID"))
	public Wishlist getWishlist() {
		return wishlist;

To make the association bidirectional, add a Customer instance variable to the Wishlist, and specify that the mapping is done by the Customer:

	@OneToOne(mappedBy = "wishlist")
	public Customer getCustomer() {
		return customer;

For more info about one to one associations, check out chapter 8 of the Hibernate docs:

Posted in Hibernate, Java | Tagged , | Leave a comment

Hibernate example 2 – one-to-many associations

Just put an example of how to use one-to-many associations with Hibernate on Github:


Shows two examples of a one-to-many relationship between entities:

  1. Unordered one-to-many between Customer and Address.
  2. Ordered one-to-many between Customer and PaymentCard.

The unordered association uses the normal setup where the “many” side of the association owns the relationship and is responsible for updating the database, so in the Address class, it declares the ManyToOne association:

public class Address {
    private Customer customer;
    public Customer getCustomer() {
        return customer;
    // more code here...

The Customer defers to this mapping, so in the Customer class you have:

    private Set<Address> addresses = new HashSet<Address>();
    @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL)
    public Set<Address> getAddresses() {
        return addresses;

By contrast, the ordered mapping between the Customer and the PaymentCard needs to be owned by the Customer, because only the customer knows the ordering of the cards in the list. Hence this is mapped in the Customer class:

    private List<PaymentCard> paymentCards = new ArrayList<PaymentCard>();
    @OneToMany(cascade = CascadeType.ALL)
    @JoinColumn(name = "CUSTOMER_ID", nullable = false)
    @OrderColumn(name = "CARD_INDEX")
    // Order column wasn't in JPA 1, so you'd previously have had: 
    // @org.hibernate.annotations.IndexColumn(name = "CARD_INDEX")
    public List<PaymentCard> getPaymentCards() {
		return paymentCards;

Then the PaymentCard just defers to this mapping:

private Customer customer;
@JoinColumn(name = "CUSTOMER_ID", nullable = false, insertable = false, updatable = false)
public Customer getCustomer() {
    return customer;

If you want more info about one-to-many associations, check out Chapter 8 of the Hibernate docs:


Posted in Hibernate, Java | Tagged , | Leave a comment

Hibernate example 1 – inheritance and polymorphism

Have just put an example on Github of how to handle inheritance and polymorphism with Hibernate:


It demonstrates the four ways of dealing with inheritance and polymorphism:

  1. Implicit polymorphism – no explicit mapping of the inheritance, but Hibernate can support polymorphic queries because it understands the class hierarchy.
  2. Table per concrete subclass – abstract class at the top of the hierarchy does not correspond to a database table, meaning that any properties in the superclass are duplicated into each subclass table, and polymorphic queries have to use unions.
  3. Table per class. All classes, including abstract ones, get a database table, meaning that no fields are duplicated. Better from a normalization point of view, but queries can still be slow because they have to perform joins.
  4. Single table for entire class hierarchy. Fast, because no joins or unions required for any queries, but not good from a database normalization perspective, because subclass columns have to be nullable.

The project uses maven, TestNG and HSQLDB. Hence, to run the tests, use:

mvn test

For more information on Hibernate and polymorphism, see chapter ten of the Hibernate docs:


Posted in Hibernate, Java | Tagged , | Leave a comment

Using database transactions with Apache Camel

Just put an example of using database transactions with Apache Camel on github:


It is based on the example given at the start of Chapter 12 of “Camel In Action”, but using database transactions, rather than a JMS transaction. It has two routebuilders:

  • TransactionlessJDBCRouteBuilder
  • TransactedJDBCRouteBuilder

There are four testcases:

  1. TransactionlessJDBCTest.transactionlessJDBCTest_noError – will succeed, and shows that transactionless routebuilder inserts two records if there are no exceptions.
  2. TransactionlessJDBCTest.transactionlessJDBCTest_withConnectionProblem – will fail, showing that when the second insert fails, the first one is not rolled back.
  3. TransactedJDBCTest.transactedJDBCTest_noError – will succeed, showing two records inserted.
  4. TransactedJDBCTest.transactedJDBCTest_withConnectionProblem – will succeed, showing that the first insert is rolled back if the second one fails.

The project is built with Maven, so to run it, you need Maven on your path, and then you can run:

mvn test

Of course you can run the tests individually if you want by using the -Dtest parameter with maven. e.g.

mvn test -Dtest=TransactionlessJDBCTest

Posted in Camel, Databases and SQL | Tagged , | Leave a comment

Testing Camel routes with Spring OSGi properties

If you are writing Camel routes using Spring to deploy to ServiceMix, you can’t use regular Spring properties, since ServiceMix is an OSGi container. Instead, you have to use OSGi properties. But if you try to start up your Spring context in a unit test, it will fail, since there is no OSGi environment. How can you get round this? One solution is for your tests to load the spring xml as an xml file before they instantiate the spring context. That way, they can replace the OSGi references with a regular spring property loader. Let’s see what this looks like.

In my Spring file I’ve got the OSGix namespace:


Then an OSGix property loader, and a regular property placeholder that references it:

<osgix:cm-properties id="props" persistent-id="claimsExport"/>
<context:property-placeholder properties-ref="props"/>

In my test code, I have the following helper method:

public static String replaceOSGiPropertyLoader(String springXmlLocation) 
    throws ParserConfigurationException, IOException, 
    SAXException, XPathExpressionException, TransformerException {
    log.info("Removing Spring osgix property loader");
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    File springFile = new File(springXmlLocation);
    Document doc = builder.parse(new InputSource(new FileInputStream(springFile)));
    // find the osgix properties bean
    XPath xpath = XPathFactory.newInstance().newXPath();
    Node osgiProps = (Node) xpath.evaluate("/beans/cm-properties",doc, XPathConstants.NODE);
    // remove it
    // first get the root node
    Node beanNode = doc.getFirstChild();
    // remove the child
    // now adjust the spring properties bean to load the props directly from the file
    Element propertyPlaceholder = (Element) xpath.evaluate("/beans/property-placeholder",doc, XPathConstants.NODE);
    // remove the ref to the osgi bean
    // add a reference directly to the file that has the properties in
    // now write out the resultant xml
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    StringWriter writer = new StringWriter();
    transformer.transform(new DOMSource(doc),new StreamResult(writer));
    // return the string representation of the spring xml file
    return writer.toString();

The OSGix property loader loads properties from a file with a name that matches its persistent id. In this case, the file is called claimsExport.cfg. You can see that the helper method simply updates the Spring property placeloader to load the properties directly from that file.

You can use the code as:

String springXml = replaceOSGiPropertyLoader("your-path/app-context.xml");
springContext = new GenericXmlApplicationContext(new ByteArrayResource(springXml.getBytes("utf-8")));

Thanks to Ben Oday’s blog for telling me about the spring osgix property loader.

Posted in Camel, Java, Spring | Tagged , | 2 Comments

Practical Scala – processing XML

This is the second article in my series Practical Scala. The first article covered the basics of Scala syntax and then moved on to file IO and regular expressions. This article will show you how to read and write XML. Although in a sense this article builds on the previous one, I start with a very basic code example, so you should still be able to follow even if you haven’t read the last article.

Defining XML

Let’s dive right in with a simple example. As before, if you haven’t got a Scala environment installed yet, I recommend the Eclipse plugin for Scala. This is a trivial example, that creates some XML and prints out some information about it:

object XmlExample {
  def main(args: Array[String]): Unit = {
    val someXml = <books><book title="The Woman in White"><author>Wilkie Collins</author></book><book title="Great Expectations"><author>Charles Dickens</author></book></books>
    println("The xml object is of type: " + someXml.getClass())

If you run this, you should get the output:
The xml object is of type: class scala.xml.Elem
What’s going on here? Well, if you haven’t seen the Scala xml syntax before, at first sight it might look like we are defined a string of xml, but look more closely and you’ll see that it isn’t a string – there are no quotation marks around it. Because processing XML is such a common task, support for it has been built into the Scala language. The first line creates some xml, and the second line prints out the type of the object to reveal that it is a scala.xml.Elem. This is the main concrete class that is used to represent XML elements. However, its parent class – the abstract Node class – and its parent – NodeSeq – are also important classes, since much library code will operate on instances of Node or NodeSeq.

What about loading XML from a file? It’s very simple. Cut and paste the xml fragment in the code into a file – I’ve called mine books.xml and put it in the root of my Eclipse project – then update the code to the following:

import scala.xml.XML
object XmlExample {
  def main(args: Array[String]): Unit = {
    val someXml = XML.loadFile("books.xml")
    println("The xml object is of type: " + someXml.getClass())

You should be able to rerun this and verify that the file is loaded correctly. The XML class also has additional methods for loading from URLs, input streams, readers and strings.

What about converting XML back to a string or file? Well, if you don’t care about character encoding, you can convert XML to a string just by calling the toString method. However, if you want to specify the encoding, you can call XML.saveFull.

Querying XML with XPath – sort of

Okay, we’ve seen how to define XML or load it. How do we query it? Well, Scala supports a subset of the XPath query language. You can use \\ and \ to search for nodes, similar to XPath. If you’ve used XPath, you’ll know that it actually uses // and /. Why does Scala use backslashes rather than forward slashes? The answer is because in Scala, two forward slashes start a comment! Hence backslashes have to be used instead. However they operate in the same way as the corresponding XPath notation – a single slash performs a search starting from the root node of the document, a double slash searches the entire document. Add the following lines to the example:

    val test1 = someXml \\ "author"
    println("test1: " + test1)

When you run the code you should see:

The xml object is of type: class scala.xml.Elem
test1: Wilkie CollinsCharles Dickens

As in XPath, the \\ operator has searched the entire tree for nodes of type author> and returned a list of matching nodes. This is very easy, let’s try another example. Let’s try searching for a node that has an attribute with a specified value. Add the following two lines to your code:

    val test2 = someXml \\ "book[@title='The Woman in White']"
    println("test2: " +test2)

Let’s see what we get when we run this:

The xml object is of type: class scala.xml.Elem
test1: Wilkie CollinsCharles Dickens

Weird, it doesn’t seem to have found the node, why is this? Unfortunately, the answer is that Scala only supports a very limited subset of the XPath notation – pretty much just the \\ and \ operators. In order to search for nodes with specified attributes, we’re going to have to break out of this XPath notation and use some standard Scala. However, this is a nice little introduction to how collections can be processed in functional languages. Here is one way:

    val test2 = (someXml \\ "book").filter(node => node.attribute("title")
		.exists(title => title.text == "The Woman in White"))
    println("test2: " +test2)

This looks a bit complex – what is going on here? Well, our XPath style operator returns a list of book nodes. We then call the filter method on this list. When you call the filter method on a collection, you pass in a function that takes an item of the type that is in the collection, and performs a test on it that returns a boolean. This function is applied to each item in the collection in turn, so what is returned is a collection which only contains the elements for which the condition is true. The => is the Scala syntax for defining a function. The function parameters go on the left of the => and the function body on the right. In the above example, I’ve called the parameter “node”. You can see that we haven’t had to define the type of the parameter, the Scala compiler has inferred it, but you can include the parameter types if you think it makes the code clearer. (In fact, when you only have a single parameter, you don’t even need to name it, you can just use the underscore character _ to represent it in your function body, and omit the => operator entirely. However, I didn’t want this example to be too idiomatic.)

So what function do we pass into the filter method? Well, we want to pick out the node which has the title attribute “The Woman in White”, so we call the attribute method with the parameter “title”. This returns a list of all attributes with the name “title”. Then we use another useful collection method – exists. This is similar to the filter method, in that you give it a function that performs a boolean test on each item in the collection, but unlike filter, exists simply returns true as soon as it has found a single item that passes the test. In this case, we get the text value of the attribute and check if it equals “The Woman in White” using the standard string comparison operator ==.

Querying XML with pattern matching

You can also use Scala’s pattern matching syntax to query XML:

val authorInfo = <author>Charles Dickens</author>
authorInfo match {
   case <author>{a}</author> => println(a)

You should be able to run this and confirm you get “Charles Dickens” as the output. The curly brackets allow you to put Scala code inside XML literals. In this scenario, all we want to do is bind the contents of the match to a variable, so we’re not really putting any complex logic in there – just the name of the variable we want to bind the author name to, which in this case is “a”. If this case matches, we then print it out. Whilst this example works, more complex matches tend to end up with very ugly and complex Scala syntax, so you’re probably better off using normal Scala methods such as the filter and exist methods we used above.

Converting between objects and XML

Converting objects to XML simply requires that you implement the toXML method. Typically you will implement the method by using XML literals, with curly brackets to insert the variable values that you want to be output. For example:

class Customer(custId : Int, firstName : String, lastName : String) {
	def toXML = {
	override def toString = "Cust: " + custId + " " + firstName + " " + lastName

You’ll see that in this example, I’ve used the concise way of defining the class variables and a constuctor at the same time. To convert from the serialized form to objects, you can use the operators we looked at earlier. For example:

val customerXml = <customer><custId>123</custId><firstName>Hedley</firstName><lastName>Proctor</lastName></customer>
val customer = new Customer(
   (customerXml \ "custId").text.toInt,
   (customerXml \ "firstName").text,
   (customerXml \ "lastName").text)


In this article you’ve seen how to:

  • Define XML using XML literals
  • Load and save XML to strings and files
  • Query XML with XPath like operators
  • Query XML with pattern matching
  • Convert objects to and from XML

For more information about Scala’s XML support you might like to check out:

Programming in Scala, chapter 26

Working with Scala’s XML support – a great blog post from Daniel Spiewak which goes into more detail about some of the quirks and limitations of Scala’s XML library, especially for pattern matching.

Posted in Scala | Tagged | Leave a comment