Practical Scala – processing XML

This is the second article in my series Practical Scala. The first article covered the basics of Scala syntax and then moved on to file IO and regular expressions. This article will show you how to read and write XML. Although in a sense this article builds on the previous one, I start with a very basic code example, so you should still be able to follow even if you haven’t read the last article.

Defining XML

Let’s dive right in with a simple example. As before, if you haven’t got a Scala environment installed yet, I recommend the Eclipse plugin for Scala. This is a trivial example, that creates some XML and prints out some information about it:

object XmlExample {
 
  def main(args: Array[String]): Unit = {
    val someXml = <books><book title="The Woman in White"><author>Wilkie Collins</author></book><book title="Great Expectations"><author>Charles Dickens</author></book></books>
    println("The xml object is of type: " + someXml.getClass())
  }
 
}

If you run this, you should get the output:
The xml object is of type: class scala.xml.Elem
What’s going on here? Well, if you haven’t seen the Scala xml syntax before, at first sight it might look like we are defined a string of xml, but look more closely and you’ll see that it isn’t a string – there are no quotation marks around it. Because processing XML is such a common task, support for it has been built into the Scala language. The first line creates some xml, and the second line prints out the type of the object to reveal that it is a scala.xml.Elem. This is the main concrete class that is used to represent XML elements. However, its parent class – the abstract Node class – and its parent – NodeSeq – are also important classes, since much library code will operate on instances of Node or NodeSeq.

What about loading XML from a file? It’s very simple. Cut and paste the xml fragment in the code into a file – I’ve called mine books.xml and put it in the root of my Eclipse project – then update the code to the following:

import scala.xml.XML
 
object XmlExample {
 
  def main(args: Array[String]): Unit = {
    val someXml = XML.loadFile("books.xml")
    println("The xml object is of type: " + someXml.getClass())
  }
 
}

You should be able to rerun this and verify that the file is loaded correctly. The XML class also has additional methods for loading from URLs, input streams, readers and strings.

What about converting XML back to a string or file? Well, if you don’t care about character encoding, you can convert XML to a string just by calling the toString method. However, if you want to specify the encoding, you can call XML.saveFull.

Querying XML with XPath – sort of

Okay, we’ve seen how to define XML or load it. How do we query it? Well, Scala supports a subset of the XPath query language. You can use \\ and \ to search for nodes, similar to XPath. If you’ve used XPath, you’ll know that it actually uses // and /. Why does Scala use backslashes rather than forward slashes? The answer is because in Scala, two forward slashes start a comment! Hence backslashes have to be used instead. However they operate in the same way as the corresponding XPath notation – a single slash performs a search starting from the root node of the document, a double slash searches the entire document. Add the following lines to the example:

    val test1 = someXml \\ "author"
    println("test1: " + test1)

When you run the code you should see:

The xml object is of type: class scala.xml.Elem
test1: Wilkie CollinsCharles Dickens

As in XPath, the \\ operator has searched the entire tree for nodes of type author> and returned a list of matching nodes. This is very easy, let’s try another example. Let’s try searching for a node that has an attribute with a specified value. Add the following two lines to your code:

    val test2 = someXml \\ "book[@title='The Woman in White']"
    println("test2: " +test2)

Let’s see what we get when we run this:

The xml object is of type: class scala.xml.Elem
test1: Wilkie CollinsCharles Dickens
test2:

Weird, it doesn’t seem to have found the node, why is this? Unfortunately, the answer is that Scala only supports a very limited subset of the XPath notation – pretty much just the \\ and \ operators. In order to search for nodes with specified attributes, we’re going to have to break out of this XPath notation and use some standard Scala. However, this is a nice little introduction to how collections can be processed in functional languages. Here is one way:

    val test2 = (someXml \\ "book").filter(node => node.attribute("title")
		.exists(title => title.text == "The Woman in White"))
    println("test2: " +test2)

This looks a bit complex – what is going on here? Well, our XPath style operator returns a list of book nodes. We then call the filter method on this list. When you call the filter method on a collection, you pass in a function that takes an item of the type that is in the collection, and performs a test on it that returns a boolean. This function is applied to each item in the collection in turn, so what is returned is a collection which only contains the elements for which the condition is true. The => is the Scala syntax for defining a function. The function parameters go on the left of the => and the function body on the right. In the above example, I’ve called the parameter “node”. You can see that we haven’t had to define the type of the parameter, the Scala compiler has inferred it, but you can include the parameter types if you think it makes the code clearer. (In fact, when you only have a single parameter, you don’t even need to name it, you can just use the underscore character _ to represent it in your function body, and omit the => operator entirely. However, I didn’t want this example to be too idiomatic.)

So what function do we pass into the filter method? Well, we want to pick out the node which has the title attribute “The Woman in White”, so we call the attribute method with the parameter “title”. This returns a list of all attributes with the name “title”. Then we use another useful collection method – exists. This is similar to the filter method, in that you give it a function that performs a boolean test on each item in the collection, but unlike filter, exists simply returns true as soon as it has found a single item that passes the test. In this case, we get the text value of the attribute and check if it equals “The Woman in White” using the standard string comparison operator ==.

Querying XML with pattern matching

You can also use Scala’s pattern matching syntax to query XML:

val authorInfo = <author>Charles Dickens</author>
authorInfo match {
   case <author>{a}</author> => println(a)
}

You should be able to run this and confirm you get “Charles Dickens” as the output. The curly brackets allow you to put Scala code inside XML literals. In this scenario, all we want to do is bind the contents of the match to a variable, so we’re not really putting any complex logic in there – just the name of the variable we want to bind the author name to, which in this case is “a”. If this case matches, we then print it out. Whilst this example works, more complex matches tend to end up with very ugly and complex Scala syntax, so you’re probably better off using normal Scala methods such as the filter and exist methods we used above.

Converting between objects and XML

Converting objects to XML simply requires that you implement the toXML method. Typically you will implement the method by using XML literals, with curly brackets to insert the variable values that you want to be output. For example:

class Customer(custId : Int, firstName : String, lastName : String) {
 
	def toXML = {
	  <customer>
	  <custId>{custId}</custId>
	  <firstName>{firstName}</firstName>
	  <lastName>{lastName}</lastName>
	  </customer>
	}  
 
	override def toString = "Cust: " + custId + " " + firstName + " " + lastName
}
}

You’ll see that in this example, I’ve used the concise way of defining the class variables and a constuctor at the same time. To convert from the serialized form to objects, you can use the operators we looked at earlier. For example:

val customerXml = <customer><custId>123</custId><firstName>Hedley</firstName><lastName>Proctor</lastName></customer>
val customer = new Customer(
   (customerXml \ "custId").text.toInt,
   (customerXml \ "firstName").text,
   (customerXml \ "lastName").text)
println(customer)

Summary

In this article you’ve seen how to:

  • Define XML using XML literals
  • Load and save XML to strings and files
  • Query XML with XPath like operators
  • Query XML with pattern matching
  • Convert objects to and from XML

For more information about Scala’s XML support you might like to check out:

Programming in Scala, chapter 26

Working with Scala’s XML support – a great blog post from Daniel Spiewak which goes into more detail about some of the quirks and limitations of Scala’s XML library, especially for pattern matching.

This entry was posted in Scala and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

501,468 Spambots Blocked by Simple Comments

HTML tags are not allowed.