Practical Scala – processing XML

This is the second article in my series Practical Scala. The first article covered the basics of Scala syntax and then moved on to file IO and regular expressions. This article will show you how to read and write XML. Although in a sense this article builds on the previous one, I start with a very basic code example, so you should still be able to follow even if you haven’t read the last article.

Defining XML

Let’s dive right in with a simple example. As before, if you haven’t got a Scala environment installed yet, I recommend the Eclipse plugin for Scala. This is a trivial example, that creates some XML and prints out some information about it:

object XmlExample {
 
  def main(args: Array[String]): Unit = {
    val someXml = <books><book title="The Woman in White"><author>Wilkie Collins</author></book><book title="Great Expectations"><author>Charles Dickens</author></book></books>
    println("The xml object is of type: " + someXml.getClass())
  }
 
}

If you run this, you should get the output:
The xml object is of type: class scala.xml.Elem
What’s going on here? Well, if you haven’t seen the Scala xml syntax before, at first sight it might look like we are defined a string of xml, but look more closely and you’ll see that it isn’t a string – there are no quotation marks around it. Because processing XML is such a common task, support for it has been built into the Scala language. The first line creates some xml, and the second line prints out the type of the object to reveal that it is a scala.xml.Elem. This is the main concrete class that is used to represent XML elements. However, its parent class – the abstract Node class – and its parent – NodeSeq – are also important classes, since much library code will operate on instances of Node or NodeSeq.

What about loading XML from a file? It’s very simple. Cut and paste the xml fragment in the code into a file – I’ve called mine books.xml and put it in the root of my Eclipse project – then update the code to the following:

import scala.xml.XML
 
object XmlExample {
 
  def main(args: Array[String]): Unit = {
    val someXml = XML.loadFile("books.xml")
    println("The xml object is of type: " + someXml.getClass())
  }
 
}

You should be able to rerun this and verify that the file is loaded correctly. The XML class also has additional methods for loading from URLs, input streams, readers and strings.

What about converting XML back to a string or file? Well, if you don’t care about character encoding, you can convert XML to a string just by calling the toString method. However, if you want to specify the encoding, you can call XML.saveFull.

Querying XML with XPath – sort of

Okay, we’ve seen how to define XML or load it. How do we query it? Well, Scala supports a subset of the XPath query language. You can use \\ and \ to search for nodes, similar to XPath. If you’ve used XPath, you’ll know that it actually uses // and /. Why does Scala use backslashes rather than forward slashes? The answer is because in Scala, two forward slashes start a comment! Hence backslashes have to be used instead. However they operate in the same way as the corresponding XPath notation – a single slash performs a search starting from the root node of the document, a double slash searches the entire document. Add the following lines to the example:

    val test1 = someXml \\ "author"
    println("test1: " + test1)

When you run the code you should see:

The xml object is of type: class scala.xml.Elem
test1: Wilkie CollinsCharles Dickens

As in XPath, the \\ operator has searched the entire tree for nodes of type author> and returned a list of matching nodes. This is very easy, let’s try another example. Let’s try searching for a node that has an attribute with a specified value. Add the following two lines to your code:

    val test2 = someXml \\ "book[@title='The Woman in White']"
    println("test2: " +test2)

Let’s see what we get when we run this:

The xml object is of type: class scala.xml.Elem
test1: Wilkie CollinsCharles Dickens
test2:

Weird, it doesn’t seem to have found the node, why is this? Unfortunately, the answer is that Scala only supports a very limited subset of the XPath notation – pretty much just the \\ and \ operators. In order to search for nodes with specified attributes, we’re going to have to break out of this XPath notation and use some standard Scala. However, this is a nice little introduction to how collections can be processed in functional languages. Here is one way:

    val test2 = (someXml \\ "book").filter(node => node.attribute("title")
		.exists(title => title.text == "The Woman in White"))
    println("test2: " +test2)

This looks a bit complex – what is going on here? Well, our XPath style operator returns a list of book nodes. We then call the filter method on this list. When you call the filter method on a collection, you pass in a function that takes an item of the type that is in the collection, and performs a test on it that returns a boolean. This function is applied to each item in the collection in turn, so what is returned is a collection which only contains the elements for which the condition is true. The => is the Scala syntax for defining a function. The function parameters go on the left of the => and the function body on the right. In the above example, I’ve called the parameter “node”. You can see that we haven’t had to define the type of the parameter, the Scala compiler has inferred it, but you can include the parameter types if you think it makes the code clearer. (In fact, when you only have a single parameter, you don’t even need to name it, you can just use the underscore character _ to represent it in your function body, and omit the => operator entirely. However, I didn’t want this example to be too idiomatic.)

So what function do we pass into the filter method? Well, we want to pick out the node which has the title attribute “The Woman in White”, so we call the attribute method with the parameter “title”. This returns a list of all attributes with the name “title”. Then we use another useful collection method – exists. This is similar to the filter method, in that you give it a function that performs a boolean test on each item in the collection, but unlike filter, exists simply returns true as soon as it has found a single item that passes the test. In this case, we get the text value of the attribute and check if it equals “The Woman in White” using the standard string comparison operator ==.

Querying XML with pattern matching

You can also use Scala’s pattern matching syntax to query XML:

val authorInfo = <author>Charles Dickens</author>
authorInfo match {
   case <author>{a}</author> => println(a)
}

You should be able to run this and confirm you get “Charles Dickens” as the output. The curly brackets allow you to put Scala code inside XML literals. In this scenario, all we want to do is bind the contents of the match to a variable, so we’re not really putting any complex logic in there – just the name of the variable we want to bind the author name to, which in this case is “a”. If this case matches, we then print it out. Whilst this example works, more complex matches tend to end up with very ugly and complex Scala syntax, so you’re probably better off using normal Scala methods such as the filter and exist methods we used above.

Converting between objects and XML

Converting objects to XML simply requires that you implement the toXML method. Typically you will implement the method by using XML literals, with curly brackets to insert the variable values that you want to be output. For example:

class Customer(custId : Int, firstName : String, lastName : String) {
 
	def toXML = {
	  <customer>
	  <custId>{custId}</custId>
	  <firstName>{firstName}</firstName>
	  <lastName>{lastName}</lastName>
	  </customer>
	}  
 
	override def toString = "Cust: " + custId + " " + firstName + " " + lastName
}
}

You’ll see that in this example, I’ve used the concise way of defining the class variables and a constuctor at the same time. To convert from the serialized form to objects, you can use the operators we looked at earlier. For example:

val customerXml = <customer><custId>123</custId><firstName>Hedley</firstName><lastName>Proctor</lastName></customer>
val customer = new Customer(
   (customerXml \ "custId").text.toInt,
   (customerXml \ "firstName").text,
   (customerXml \ "lastName").text)
println(customer)

Summary

In this article you’ve seen how to:

  • Define XML using XML literals
  • Load and save XML to strings and files
  • Query XML with XPath like operators
  • Query XML with pattern matching
  • Convert objects to and from XML

For more information about Scala’s XML support you might like to check out:

Programming in Scala, chapter 26

Working with Scala’s XML support – a great blog post from Daniel Spiewak which goes into more detail about some of the quirks and limitations of Scala’s XML library, especially for pattern matching.

Posted in Scala | Tagged | Leave a comment

Practical Scala – file IO and regular expressions

Scala is a great language but learning it can seem like you’re battling with too many new concepts to be able to get anything done. The purpose of this article is to show that even with a few lines of Scala, you can start to do productive tasks. After reading this article you should be able to write small automation jobs that involve reading and writing text files, and use regular expressions. However, along the way, it will introduce a number of Scala concepts. (You could call this method of teaching, the “Karate Kid” method…)

I suggest you use the Eclipse Scala plugin for this tutorial, it’s probably the easiest way to compile and run your first Scala code. Once you have it installed, go to New -> Scala object. Whoa….hang on a minute here….an object? Surely that should be a class right? Well, actually no. Scala makes extensive of singleton objects. A singleton object can be defined in the same file as the corresponding class, in which case it is called a “companion object”, or it can be defined without a corresponding class, in which case it is a “standalone” singleton. Scala does not have static methods, so singleton objects are generally where you will put code that would have been in a static method in Java. In this case, we want to write a main method to start our application, so we’ll create a standalone singleton. Type in the following:

import scala.io.Source
 
object FileReader {
 
  def main(args: Array[String]): Unit = {  
    val file = Source.fromFile("/scala_fileio/file-to-read.txt")
    file.getLines().foreach( line => println(line))
  }
 
}

We can learn a lot of Scala syntax just from this example:

  • Methods are declared with the “def” keyword.
  • Unlike Java, variable names always come before their types, separated by a colon, which can be seen in the parameter to the main method – args : Array[String].
  • Type parameterization uses square brackets rather than the angle brackets seen in Java.
  • Method return values come after the parameter list, separated by a colon. In the above example, the method return type is Unit, essentially the same as Java’s void.

You can see that I’ve created a dummy file to read and saved it as /scala_fileio/file-to-read.txt. Obviously just adjust this line to point to a dummy file on your system. Then, if you run the code from Eclipse, you should see each line of the file being printed out. In my case I get:

intro
more stuff
last line

So what’s going on with the weird syntax for reading the file? Well, it’s an example of a closure – a function for which all variables have assigned values. In this case, the function is just printing the line. It’s a closure because the string value of the line is provided by the foreach method. Scala is far more functional than Java, and as you write more Scala, you’ll find that closures and functions allow you to write code that is both more concise and more flexible than the Java equivalent. The file class is called “Source” because the original Scala implementation was written alongside the Scala compiler, and when compiling, each file is a piece of source.

Let’s extend this example to show how to write to a file. As we iterate over this file, let’s output it to a second file, with asterisks before and after the text. Update the code to:

import scala.io.Source
import java.io.File
import java.io.FileWriter
import java.io.BufferedWriter
 
object FileReader {
 
  def main(args: Array[String]): Unit = {
    val file = Source.fromFile("/scala_fileio/file-to-read.txt")
    val outputFile = new File("/scala_fileio/output.txt")
    val writer = new BufferedWriter(new FileWriter(outputFile))
    // use curly brackets {} to tell Scala that it's now a multi-line statement!
    file.getLines().foreach{ line => 
      println(line)
      writer.write("***" + line + "***")
      writer.newLine()
    }
    writer.flush()
    writer.close()
  }
 
}

You can see here that we’re just using the Java file IO classes to write the file. This is the easiest way to write the code, although Scala does have an add on library called Scala IO which gives you some more Scala-ish file writing classes. You can also see that you need to change the foreach call to use curly brackets to tell the Scala compiler that we are now passing in a multi-line statement rather than a single line.

What if we wanted to print out the line number on each line? In Java, you’d need to maintain a separate counter to keep track of the line number. In Scala, you can use the zip method. A zip method takes two lists and iterates over each one to create a new list. Each element in the output list is a pair composed of the elements at that position from the two input lists. In this scenario, we can use a variant of the zip method, called zipWithIndex. It iterates over a single list, and for each position in the list, it gives you both the element and the index. We’ll get rid of the call to foreach and just use a normal Scala for loop, that iterates over the pairs of values produced by the call to zipWithIndex:

import scala.io.Source
import java.io.File
import java.io.FileWriter
import java.io.BufferedWriter
 
object FileReader {
 
  def main(args: Array[String]): Unit = {
    val file = Source.fromFile("/scala_fileio/file-to-read.txt")
    val outputFile = new File("/scala_fileio/output.txt")
    val writer = new BufferedWriter(new FileWriter(outputFile))
    for ( (line,index) <- file.getLines().zipWithIndex){ 
      println(line)
      writer.write("Line " + (index+1) + ": " + line)
      writer.newLine()
    }
    writer.flush()
    writer.close()
  }
 
}

Since the index values start at zero, we add one to the index value to get each line number. This is done inside brackets to avoid it being done as a string concatenation.

Okay, let’s move on to some regular expressions. Let’s update the input file to have some more interesting input, similar to what you might find in a log file, with date and time at the beginning of each line:

22-08-2012 08:30:45 intro
23-09-2012 14:21:46 more stuff
24-09-2012 18:21:47 java.lang.NullPointerException, caused by java.text.ParseException, invalid date format

Let’s suppose we want to find all lines that were printed in September. I’m using a british date format, so the month is the middle section of the date. Hence the pattern we want to look for is any two digits, followed by a hyphen, followed by 09 for September. Update the FileReader code to:

import scala.io.Source
import java.io.File
import java.io.FileWriter
import java.io.BufferedWriter
import scala.util.matching.Regex
 
object FileReader {
 
  def main(args: Array[String]): Unit = {
    val file = Source.fromFile("/scala_fileio/file-to-read.txt")
    val regex = new Regex("\\d\\d-09")
    for ( line <- file.getLines()){ 
    	regex.findFirstIn(line) match {
    	  case Some(septemberDate) => println("Found a log line from September: " 
    	      + line + " The matching part of the string was: " + septemberDate)
    	  case None => println("This line doesn't match")
    	}
    }
  }
 
}

If you run this code you should get the following output:

This line doesn't match
Found a log line from September: 23-09-2012 14:21:46 more stuff The matching part of the string was: 23-09
Found a log line from September: 24-09-2012 18:21:47 last line The matching part of the string was: 24-09

What is the the code doing? Well, we’re creating a regex to match against each line of the file. But we’re also using a couple of new pieces of Scala syntax:

  1. Pattern matching (case classes)
  2. The Scala Option class, and its subclasses, Some and None

The match / case syntax is an example of a very widely used piece of Scala, called pattern matching. Don’t be confused – it is separate concept from regular expressions. You can think of it as a very advanced form of a switch statement. Whereas in Java, you can only switch on numbers, characters and strings (from Java 7), in Scala you can also match against objects – matching for their class and values of their instance variables.

In this example example, the pattern match can either find a pattern, or not find one. It uses another common piece of Scala to do this – returning either Some or None. This is a mechanism within Scala to avoid NullPointerExceptions. In Java, if a method call could return a null, if you forget to put a null check in your code, you could get a NullPointerException. In Scala, methods that could return a null actually return an object of type Option. The Option class has two subtypes, called Some and None. If Some is returned, it is a container, that contains the actual return object. If None is returned, you don’t have a return object. This mechanism avoids a null pointer, because in order to get the returned object, you must perform a pattern match. The object will only be extracted from the Some container once the return has been checked and found to be a Some object. You can see from the above code that the syntax for matching against the Some object is to say Some(variableName). If the match succeeds, the returned object is bound to that variable name, and you can use it on the right hand side of the match statement. In the above example, we bind it to a variable called septemberDate and print it out. As is standard with regular expressions, it only contains part of the line – the specific part that matched the regex.

Let’s try an example which has multiple matches on a single line. We’ll extract the names of the exceptions on the third line. A basic pattern is to look for word characters, then a dot, then word characters, then a dot, then more word characters ending with “Exception”. (Obviously this wouldn’t work for all exception names, but it is sufficient for this example.) Update the code to:

import scala.io.Source
import java.io.File
import java.io.FileWriter
import java.io.BufferedWriter
import scala.util.matching.Regex
 
object FileReader {
 
  def main(args: Array[String]): Unit = {
    val file = Source.fromFile("/scala_fileio/file-to-read.txt")
    val regex = new Regex("\\w+\\.\\w+\\.\\w*Exception")
    for ( line <- file.getLines()){ 
    	for (m <- regex.findAllIn(line)) { 
    		println("Found a log line with an exception: " 
    	      + line + " The matching part of the string was: " + m)
    	}
    }
  }
 
}

If you run this you should get the output:

Found a log line with an exception: 24-09-2012 18:21:47 java.lang.NullPointerException, caused by java.text.ParseException, invalid date format The matching part of the string was: java.lang.NullPointerException
Found a log line with an exception: 24-09-2012 18:21:47 java.lang.NullPointerException, caused by java.text.ParseException, invalid date format The matching part of the string was: java.text.ParseException

Now, this code works, but we can simplify it. The Scala for construct is far more powerful than Java’s. You can iterate over multiple variables within a single for loop, so you can update the for loop to:

    for ( line <- file.getLines(); m <- regex.findAllIn(line)) {

You should be able to rerun this and find you get the same result.

Summary

In this article you’ve learnt:

  • How to write a Scala object with a main method.
  • How Scala uses singletons, and the difference between standalone singletons and companion objects.
  • How to read a file using the Scala Source class.
  • How to iterate over a file using getLines() and the foreach method.
  • How to iterate over a file with line numbers by using the zipWithIndex method.
  • How to use existing Java classes from Scala to write to a file.
  • How to write a basic for loop in Scala
  • How to use the Regex class to create a regular expression.
  • The basics of how Scala pattern matching works.
  • How Scala avoids NullPointerExceptions with the Option, Some and None classes.

If you’d like to see some more examples of pattern matching and how the Option class works, see:
Why Java developers should be learning Scala

If you’d like to experiment with the Scala IO library, see:
http://jesseeichar.github.com/scala-io-doc/0.4.1-seq/index.html#!/overview

If you want a more detailed explanation of the concepts touched upon in this article, the first edition of “Programming in Scala” is available free online:
http://www.artima.com/pins1ed/

Posted in Scala | Tagged | Leave a comment

Lift controllers example

I’ve recently put together a basic Lift example, based on an e-Commerce theme. It contains:

  • Product listing
  • Basket
  • Checkout
  • Order confirmation

It shows the following techniques:

  • How to write forms and process the response
  • How to submit forms using Lift’s form.ajax helper – used to submit the first two parts of the checkout
  • Using a session variable – the basket
  • Logging with slf4s

You can get the code from Github:

https://github.com/hedleyproctor/amur-lift-ecommerce-example

If you don’t have a Git client installed, simply click on the “Zip” button to get the code as a zip file. Then change into the base directory and do the following:

  1. Type sbt to start the simple build tool. This will download the jar files needed by sbt itself.
  2. Once the sbt shell has started, type update to download the jar files needed by the application.
  3. Compile the code with compile.
  4. Start jetty with jetty-run.

The app will then be available on http://localhost:8080. You should be able to go to the product listing page, put a product in your basket, then go to the checkout, enter your delivery address, choose a shipping option, enter your billing details and be taken to the order confirmation page. Click the images to enlarge:


Note:If you are very new to Lift and want an even simpler example, with a more detailed tutorial on how it works, you might like to look at Building your first Lift app with sbt.

Posted in Lift, Scala | Tagged , | Leave a comment

Effective Selenium testing

Selenium is the de facto standard for testing web applications. In this article I’m going to cover a number of techniques for improving your Selenium tests. The article is suitable for you if:

  1. You already know the basics of Selenium and are happy to write tests in Java (or one of the other Selenium API languages).
  2. You want to progress from having a few simple tests to building a large test suite for a complex web application with javascript and ajax.

The techniques I’ll cover are:

  1. Organising your tests – using the page controller design pattern
  2. Understanding the Selenum 2 webdriver and implicit waits
  3. Detecting when web elements are available
  4. Knowing when ajax calls have finished
  5. Taking screenshots for failing tests

The code snippets are in Java, but the Selenium API is available for a wide range of other languages (Ruby, Python, C#, PHP and Perl).

Organising your tests – using the page controller design pattern

If you write a large number of tests, when the application under test changes, you could face a maintenance nightmare. How do you avoid this? The answer is the page controller design pattern. This means that you have a Java class for each page of the application. It knows how to control that web page. i.e. it knows how to identify the elements on the page, how to select them etc. Your test classes don’t know anything about the web page. They don’t include any html ids or other selectors, they simply invoke methods on your page controller. If that web page changes, you only need to update your code in a single place. Code that used to look like this:

driver.findElement(By.id("checkoutButtonCartPage")).click();
driver.findElement(By.id("homeDeliveryButton")).click();
driver.findElement(By.id("postCodeEntry")).sendKeys("AB12 3CD");
driver.findElement(By.id("postCodeSubmitButton")).click();

becomes:

cartPageController.goToCheckout();
checkoutPageController.selectHomeDelivery();
checkoutPageController.postCodeLookup("AB12 3CD");

Understanding the Selenium 2 webdriver and implicit waits

One of the tricky aspects of Selenium testing is knowing when pages have loaded and when elements have appeared on pages. In Selenium 2 the webdriver includes some helpful functionality in this area, but it is important to understand what it does and doesn’t do. When you call the findElement method on the webdriver, if it cannot find the element on the page, it doesn’t fail immediately. Rather, it polls the page every 500ms until it reaches its timeout period. You can set this timeout period by calling webdriver.manage().timeouts().implicitlyWait. However, misunderstanding the webdriver polling functionality can cause confusion. The web driver findElement method returns a web element as soon as it finds it in the browser DOM. However, that doesn’t necessarily mean that you can interact with that web element. All elements have a display property. If the current CSS rules are not displaying that element, you can’t interact with it. The web driver won’t time out, rather it will find the element and return it to your code, which will promptly fail with an exception that explains you can’t interact with the element. A common example of this is a web page where pressing a button triggers a form to appear. Usually this form will already be in the web page, it is just hidden by CSS. Hence, if your Selenium test presses the button to make the form appear, if it proceeds to try and interact with the form too quickly, the CSS won’t have been switched over to display the form by the time the web driver locates it, and your test will fail. Thankfully, Selenium has additional functionality that can help us, which I’ll explain in the next section.

Detecting when web elements are available

We’ve just seen that it isn’t enough for a web element to be present in the browser DOM for us to interact with it, it needs to be visible as well. How do we detect this with Selenium? Well, the web driver allows you to poll for a specific condition to become true, by using the WebDriverWait class. It includes a number of standard conditions, of which visibility is one. This makes it easy to code up a helper method that will only return an element when it is visible:

public WebElement getWhenVisible(By locator, int timeout) {
	WebElement element = null;
	WebDriverWait wait = new WebDriverWait(driver, timeout);
	element = wait.until(ExpectedConditions.visibilityOfElementLocated(locator));
	return element;
}

Is this enough? Well, not necessarily. Interactive web elements also have an “enabled” property. e.g. if you want to show a checkbox but not allow the user to be able to change it, you set enabled to false. If you want to be certain that you can click an element, Selenium has another standard condition for this which you can make use of. e.g.

public void clickWhenReady(By locator, int timeout) {
	WebDriverWait wait = new WebDriverWait(driver, timeout);
	WebElement element = wait.until(ExpectedConditions.elementToBeClickable(locator));
	element.click();
}

For the full range of expected conditions, refer to the Selenium javadoc:
ExpectedConditions

Knowing when Ajax calls have finished

If your test triggers an ajax call, you don’t want to carry on until that call has finished, but how do you know? You might be okay just to use one of the wait conditions above, but this isn’t a very clean approach, and it only works for ajax calls that result in changes to the browser DOM. i.e. it won’t work for calls that simply send data to the server without any changes in the browser html. Wouldn’t it be nice to be more certain when the call was finished? Well, if you are using jQuery to make your ajax calls, you can do so by exploiting the fact that most web driver implementations can run javascript. jQuery keeps a count of how many ajax calls are active in its jquery.active variable. Here’s an example of a helper method to wait for an ajax call to finish:

public void waitForAjax(int timeoutInSeconds)  {
  System.out.println("Checking active ajax calls by calling jquery.active");
    try {
      if (driver instanceof JavascriptExecutor) {
	JavascriptExecutor jsDriver = (JavascriptExecutor)driver;
 
        for (int i = 0; i< timeoutInSeconds; i++) 
        {
	    Object numberOfAjaxConnections = jsDriver.executeScript("return jQuery.active");
	    // return should be a number
	    if (numberOfAjaxConnections instanceof Long) {
	        Long n = (Long)numberOfAjaxConnections;
	        System.out.println("Number of active jquery ajax calls: " + n);
	        if (n.longValue() == 0L)
	       	  break;
	        }
            Thread.sleep(1000);
	    }
	}
	else {
		System.out.println("Web driver: " + driver + " cannot execute javascript");
	}
}
	catch (InterruptedException e) {
		System.out.println(e);
	}
}

Of course, this example could be rewritten to use the WebDriverWait format if you wish.

Taking screenshots for failing tests

One of the golden rules of good testing is that you should be able to diagnose why a test failure has occurred without rerunning the test. For Selenium, as well as having good assertions and debug output within the tests, it is very useful to take a screenshot when a failure occurs. If you are using JUnit for your tests, a neat way of doing this is to use a JUnit rule to take the screenshot. A good write up of how to do this is here:

http://blogs.steeplesoft.com/2012/01/grabbing-screenshots-of-failed-selenium-tests/

Summary

In this article we’ve seen:

  • How to organise your test suite by using the page controller design pattern
  • How the Selenium web driver works and how to write tests for web applications with javascript and ajax
  • How to take screenhots for failing tests

For more information about the web driver and testing design patterns, see the Selenium docs:
http://seleniumhq.org/docs/index.html
If you’re interested in using CruiseControl to automate your tests:
Automating Selenium testing with TestNG, Ant and CruiseControl
If you’d like to learn about using XPath for complex element location:
Writing XPath selectors for Selenium tests

Posted in Selenium, Testing, Uncategorized | Tagged , | 1 Comment

Why Java developers should be learning Scala

Over the past fifteen years Java has been a phenomenally popular programming language, but it is starting to show its age and programmers are increasingly looking at more modern languages. The purpose of this article is to explain why Scala is the most likely successor to Java and how it can make you more productive. Rather than simply listing the features that Scala has, I’ve included a number of comparisons between Java and Scala code, to demonstrate how the different Scala language features enable you to implement the same functionality more quickly in Scala than Java.

Since Scala compiles to Java bytecode and runs on the JVM, programs written in Scala can benefit from the huge amount of library code already written in Java. However, by using Scala you get the following benefits:

  • Mandatory boilerplate code is gone – no getters and setters, no checked exceptions.
  • More powerful constructs, that allow you to do more with less code, such as case classes, option and tuples.
  • More powerful code reuse – the elements of code that you can reuse are smaller. Rather than classes with single inheritance, you have traits and functions.

I’ll work through these points in turn, giving examples of each.

No getters and setters

In Java, a class to represent a person with a name and age might be:

public class Person{
	private String firstName = null;
	private String lastName = null;
	private int age = 0;
 
	public void setFirstName(String firstName) {
		this.firstName = firstName;
	}
 
	public String getFirstName() {
		return firstName;
	}
 
	public void setLastName(String lastName) {
		this.lastName = lastName;
	}
 
	public String getLastName() {
		return lastName;
	}
}

In Scala, most likely you would write this class as:

public class Person {
	var firstName = ""
	var lastName = ""
	var age = 0
}

The variables in this class are public. If you come from a Java background this sounds worrying – doesn’t it mean that if we ever need getters and setters with additional code in them we’ll have to change all of the code that uses this class? Not in Scala. This is because Scala has a very flexible method syntax, so we can write a method that looks the same as accessing the variables directly. An example would be:

public class Person {
	var firstName = ""
	var lastName = ""
	private var theAge = 0
 
	def age = theAge
 
	def age_= (newAge : Integer) : Unit = {
		if (newAge > 0) theAge = newAge
	}
}

In this example, I’ve renamed the variable to theAge and made it private, but written a getter and setter. The getter method is called age so you can write p.age to get the age, just like before. The setter is called age_=. The underscore has a special meaning here – it allows you to write a method name with a space in it. This means that when you write:

val p = new Person()
p.age = 33

you are actually invoking the new setter method.

No checked exceptions

When Java was invented, it seemed like a good idea to force developers to deal with certain possible error conditions, which led to the concept of checked exceptions. Scala has removed these. If you want to catch an exception, you can do so, but you’re not forced to insert try/catch statements throughout your code if you don’t want to.

Case classes

Case classes are like an enhanced version of the Java switch statement. They are small classes that are usually defined in the same class file as the real classes that you want to match. Unlike switch, they can understand different object types and extract data from them. Consider a scenario in which you are iterating over a tree structure that represents an organisation chart for a company. The nodes in the tree are either of type Group or Employee. If you find a Group node, you want to print out the name of the group and the size. If you find an employee, you want to print out their name and job title. In Java, your code would look something like:

if (node instanceof Group) {
	Group g = (Group)node;
	System.out.println("Group name: " + g.getName() + " Size: " + g.getSize());
}
else if (node instanceof Employee) {
	Employee e = (Employee)node;
	System.out.println("Name: " + e.getName() + " Job: " + e.getJob());
}

In Scala this would be:

match node {
	case g: Group => println("Group name: " + g.name + " Size: " + g.size)
	case e: Employee => println("Name: " + e.name + " Job: " + e.job)
}

In this example I’ve used a “typed pattern” match, which avoids the type casts required in Java. If this was the only thing pattern matching could do, it wouldn’t be that impressive, but it can do much more. There are several different sorts of pattern matching, the most powerful of which is probably a “constructor pattern”. By matching against the contructor for a class, you can nest additional pattern matches against the values that have been passed into that constructor. These patterns can themselves be constructor matches, allowing you to match as deeply as you want. Continuing the example above, suppose that in addition to a manager, some groups also have a project manager. You want to find all groups that have a manager who is in salary band 10 and a project manager who is in salary band 9. In Java you’ll need something like:

if (node instanceof Group) {
	Group g = (Group)node;
	Manager m = g.getManager();
	ProjectManager pm = g.getProjectManager();
	if (m.getJobBand() == 10 && pm != null && pm.getJobBand() == 9) {
		System.out.println("Group: " + g.getName());
	}
}

In Scala, with the appropriate case classes, this would be:

match node {
  case g: Group(Manager(10),ProjectManager(9)) 
    => println("Group: " + g.getName())
}


Option

In Java, it can be painful having to perform a != null check each time you get a variable that might be null. For example:

items = shoppingBasket.getItems();
if (items != null) {
	for (Item i : items) {
		// process each item
	}
}
else {
	System.out.println("No items in shopping basket.")
}

The Scala standard library provides a class called Option, which has two subclasses, Some and None. Some is a container that wraps whatever class you are using. The basic pattern is that methods that could return null in Java return an Option, which will be either Some or None. Then calling code can use pattern matching on the returned value:

i = shoppingBasket.items
match i {
	case Some(items) => items.foreach( // process each item )
	case None => println("No items in shopping basket")
}

With the Java code, you can forget to insert the != null check, which can then lead to a NullPointerException at runtime, but with the Scala code, this isn’t possible.

Tuples

How many times have you written a method in Java, only to find that you really want to return two things from the method, not one? In Java the standard way to fix this is to create a small class that just contains the return values, but then you are bloating your code by having a class when all you really need to do is specify that the method returns multiple things. Scala has exactly this concept with tuples. A tuple is common in functional languages and is simply a heterogenous list. It is written using brackets, so a tuple composed of the integer 5 and string “hello” would be written:

(5,"hello")

If you want to return multiple values from a method, you simply pass them back as a tuple like this.

Traits

In an effort to avoid the problems of multiple inheritance as it was defined in C++, Java eschewed multiple inheritance entirely. This can make reusing code from two places very difficult. In Scala you can only inherit from a single class but you can also mixin as many “traits” as you want. A “trait” can be thought of as similar to an abstract class in Java.

class MyQueue extends BasicIntQueue with Incrementing with Filtering

In C++, the above sort of statement could result in ambiguity as to which method to invoke. If Incrementing and Filtering both inherit from the same base class A, you have the “diamond problem” whereby there are two instances of class A. C++ addresses this by giving you the virtual keyword which ensures that there is only a single instance of A. In Scala, traits can extend other traits or classes, but Scala always has a defined order in which methods must be invoked, by using a linearization algorithm (similar to other languages such as Python). This means you get the code reuse benefits of inheriting from multiple places without the problems caused by non-virtual inheritance.

Functions as closures

In Java, if you want to allow callers to pass code into your class to be invoked, you have create an interface or concrete class before writing a callback method. Suppose that you have a class which contains a collection of Person objects. You want to write a method that will iterate over all of the Person objects and run some code that has been passed in, which will produce a summary of each Person as a String. In Java you would first have to declare an interface:

public interface PersonSummariser {
	public String summarise(Person p);
}

Then you can write your callback method, specifying that code to be passed in must implement this interface:

public void summarisePeople(PersonSummariser summariser)

You’ve been forced to write an interface, and the person using your class has been forced to create a class (at best they might be able to create an anonymous class so they don’t need a full class definition), just to pass in code that could be as short as a single line. In Scala, this would be handled by a closure. In mathematical terms, a closure is a function for which all of the variables are bound. i.e. given values. If the method you are writing supplies values for all of the parameters in the function that is passed in, you have a closure. The Scala method definition would be:

def summarisePeople(s : Person => String)

Here we have written a method that accepts a function s, which takes a single parameter of type Person, and returns a value of type String. No need to create any additional interfaces or classes.

Closures are used extensively to perform operations on collections in functional languages. Here are just a few of the methods Scala provides in its collection classes which allow you to pass in a function to perform various operations:

  • map – transform a collection of type A to another collection of type A
  • filter – reduce the collection by filtering out all elements that don’t meet a boolean condition
  • foldLeft – apply a function to each element of the collection in turn and sum the results e.g. square every integer in a list

Standalone functions

In the example above, if you declare the Scala summarise function inline, you are creating a closure. But functions are first class entities in Scala, so you can define them independently and reuse them wherever you want. Suppose you had multiple classes holding Person objects, such OrganisationChart, Company, Team and so on, if you wanted to define a function to print out the Person objects that you could pass into any method with the same signature, you could do so, anywhere in your code:

def summarise(p : Person) : String = p.firstName + " " + p.lastName

In fact, you don’t need to declare that the return type on the above method is String, as the Scala compiler will infer it, but I added it for clarity. No longer is a class the smallest element of reuse you have, you can define individual functions and pass them around as you wish.

Currying

Suppose you’re writing code that calculates economic statistics for countries. You have a function that takes a population size and a GDP value. What if you wanted to invoke this multiple times with a fixed population size but differing GDP values? You might expect to have to repeat the first argument whenever you use the function:

calculateStats(pop1, 2000)
calculateStats(pop1, 10000)

In fact, you can “curry” the function, which means creating a new version of the function in which all but one of the parameters have already been supplied:

val cs = calculateStats(pop1, _ : Double)

Here we have supplied a value for the first parameter, but used the underscore to show that we’re not supplying a value for the second parameter. You can then invoke this new “cs” method to calculate statistics specifically for countries of a specified size. At first this might not seem that powerful – surely we’re just saving ourselves a bit of typing? However, consider that in Scala, a method parameter doesn’t have to be a simple object, it can itself be a function. This makes it very easy to write code that is both powerful and flexible. You can write functions that perform specific tasks and combine them however you want.

Summary

If you haven’t used Scala before, hopefully this article has persuaded you that it’s worth investigating. We’ve seen that:

  • It doesn’t require all of the boilerplate code that is needed in Java, such as getters, setters and checked exceptions.
  • It has powerful constructs that allow you to do more with less code, such as case classes and tuples.
  • It gives you better code reuse with traits and functions.

It’s worthwhile explaining why I haven’t mentioned a couple of things that Scala is known for – actors and parser combinators. Actors are a powerful mechanism for multi-threaded programming that avoid some of the problems of locks. Parser combination is a way of writing language parsers by combining lots of small parsers, rather than writing (or more likely generating) a single parser from a BNF grammar. Whilst both of these topics are interesting, I’m not sure either of them is necessarily indicative of the power of the Scala language. Each of them can be implemented in Java using an appropriate library – Kilim for actors and jparsec for parser combinators. By contrast, the topics I’ve covered above show things that have to be implemented within a language itself and cannot be provided by library code.

Hang on – what about languages like Ruby, Groovy or Clojure?

All of these languages are good, powerful languages that can make you more productive. However, you can’t necessarily learn all four. Why should you choose Scala over the others?

The feature set of Ruby, Groovy and Scala is broadly the same. They have all done away with getters and setters, and checked exceptions. They all have more functional concepts than Java, such as closures and first class functions. They all offer multiple inheritance via traits (mixins). However, both Ruby and Groovy are scripting languages that are dynamically and weakly typed so whilst they are good for small tasks such as automation, they don’t lend themselves to constructing large enterprise applications as well as Scala and Clojure. Scala has a very powerful type system and compiler so many bugs can be found at compile time. Scala is a hybrid of object oriented and functional concepts, so its syntax is broadly object oriented, whereas Clojure is a lisp variant and hence uses Church’s lambda calculus notation, which is a very different syntax. Finally, Scala does have some concepts which don’t really appear in the other languages, of which the most obvious example is case classes, which offer a very powerful syntax for matching objects by type and extracting data from them.

Okay, you got me. How do I find out more about Scala?

If you want a comprehensive overview of the entire language, the first edition of the book “Programming in Scala” book is available free online:
Programming in Scala
In particular, some of the topics I’ve mentioned above are:
Traits
Case classes
The Eclipse Scala IDE is available from:
http://scala-ide.org/
Daniel Spiewak’s blog has numerous good posts on Scala, such as:
Funtional currying in Scala
The Option pattern

Posted in Scala | Tagged | 4 Comments

Generating Javascript with Scala and Lift

One of the ideas that has become popular in recent years is the concept of writing complex browser GUIs entirely in your server side language, and generating the required javascript. In Java both the Google Web Toolkit and ZK allow you to do this. You get a number of advantages from this approach:

  • Your server side language is at a higher level of abstraction than coding in javascript and hence more productive.
  • You can get some type checking done when your code is compiled.
  • You can reuse functions you’ve already written as part of your core application, and the framework will translate them to javascript as required, rather than having to rewrite them in javascript yourself.

In this article I’ll introduce the functionality that the Scala Lift framework offers for generating javascript from Scala. I’m not assuming any prior knowledge of the Lift framework so you should be able to work through the examples even if you’ve never used Lift before. However, if you’d prefer to learn the basics of Lift and sbt first, you might want to read my previous tutorial: Building your first Lift app with sbt.

Download Lift

If you’re new to Lift, download it from: http://liftweb.net/download
As the instructions say, if you change into the scala_28/lift_basic directory and run
sbt update ~jetty-run
then an empty Lift application should start.

Add a form and validation Javascript

To begin with, we’ll add a form to the main page of the app and attach some javascript to it. Open up the src/webapp/index.html and add a simple form into the body of the page:

<div class="lift:RegistrationController?form=post">
  First name: <input id="first_name"><br>
  Surname: <input id="last_name"><br>
  E-mail: <input id="email"></br>
  <input type="submit" value="Submit">
</div>

Now we need to create the snippet that will process this form. The class on the form indicates that it will be processed by a snippet called “RegistrationController” so go into src/main/scala/code/snippet and create RegistrationController.scala:

package code.snippet
import scala.xml.NodeSeq
import net.liftweb.util._
import Helpers._
import net.liftweb.http._
import net.liftweb.http.js.JsCmds._
import net.liftweb.http.js.JE._
 
class RegistrationController  {
 
	private val whence = S.referer openOr "/"
 
	def render = {
	  "type=submit" #> SHtml.submit("Register", process, 
	    "onclick" -> JsIf(JsEq(ValById("first_name"), ""), Alert("alert") & JsReturn(false)).toJsCmd)
	}
 
	private def process() = {
	  S.redirectTo(whence)
	}
 
}

Let’s break down what this snippet is doing. As with all snippets, it implements the render method to transform the html in the page template. It is using a Lift CSS selector transform (the #>) which finds the html element with the “type” attribute set to “submit”, and replaces this with a submit button generated by calling the SHtml.submit method. The first parameter to the submit method is the name of the button, the second is the function to be called when the form is sent to the server. The third parameter is a list of html attributes that the button element should have. In this case we are just creating a single attribute, the onclick one, which will invoke our javascript validation – if the first_name parameter is empty, show a pop up alert box. We call the toJsCmd method on the javascript expression to turn it into a command, because that can be implicitly converted into the html attribute that the submit method requires. You should be able to verify that this alert is shown if you try to submit the form without a first name.

Let’s extend the example slightly to show how we can edit the browser DOM. Let’s add a second button to the html template next to the first:

First name: <input id="first_name"><br>
Surname: <input id="last_name"><br>
E-mail: <input id="email"></br>
<input type="submit" value="Submit">
<input type="button" value = "Populate form"/>

In our Scala code, we’ll add some javascript to this button to populate the first name field with a default value when you click the button:

def render = {
  "type=submit" #> SHtml.submit("Register", process, 
     "onclick" -> JsIf(JsEq(ValById("first_name"), ""), Alert("alert") & JsReturn(false)).toJsCmd) &
  "type=button" #> SHtml.ajaxButton("Populate form", () => JsCmds.SetValById("first_name","John"))
}

What is this Scala doing? Well, we begin by using an & to join our chain our first CSS selector to a second one. The second CSS selector finds the html element with the “type” set to “button” and replaces it with a button created by calling the SHtml.ajaxButton method. This method takes two parameters, the first is the text to put on the button, and the second is a function to invoke when the button is pressed. We use standard Scala syntax of => to declare a function that takes no parameters and when invoked, finds the first_name element in the DOM and sets the value to “John”.

Wrap up

In this article we’ve looked at a couple of simple examples of how to generate javascript on the server, using Scala code. The basic mechanism is to create the javascript using Lift’s JsCmds and JE classes, then attach the javascript to your html using standard CSS selectors and the helper methods in the SHtml object. For more detailed examples, you can check out chapters 10 and 11 in the “Exploring Lift” book:
http://exploring.liftweb.net/master/index-10.html

Posted in Javascript, Lift, Scala | Tagged , , | Leave a comment

Using Java to download a file that needs authentication

You can easily download files using Java by making a URLConnection. However, if you need to login before accessing the file, how can do this automatically? If you automate the login, how will the code that makes the URLConnection be able to make use of the logged in session? Well, if the logged in session uses a cookie, you can simply extract the cookie and resend it in your code that makes the URLConnection.  Here is an example where I’ve used the Selenium web testing tool to do the login. The code is actually Scala, but you can easily convert it to Java:

  def getFileViaSelenium() {
    println("Logging in via Selenium")
	  val driver = new FirefoxDriver()
	  driver.get("https://www.someurl.com/login")
	  driver.findElement(By.id("username")).clear();
	  driver.findElement(By.id("username")).sendKeys("John Smith");
	  driver.findElement(By.id("password")).clear();
	  driver.findElement(By.id("password")).sendKeys("password");
	  driver.findElement(By.name("commit")).click();
 // now get the cookies	 
   val seleniumCookies = driver.manage().getCookies().asScala
	  val cookieString = new StringBuilder()
	  for (cookie <- seleniumCookies) {
	    println("Cookie value: " + cookie.getValue())
	    cookieString.append(cookie.getName())
	    cookieString.append("=")
	    cookieString.append(cookie.getValue())
	    cookieString.append("; ")
	  }
 
     println("Getting file")
 
    val url = new URL("https://www.someurl.com/somefile.csv")
    val con = url.openConnection()
  // resend the cookies with this request
    con.setRequestProperty("Cookie",cookieString.toString())
    val contentType = con.getContentType()
    println("Content type: " + contentType)
    val in = con.getInputStream()
    // save file
    val fileOut = new File("C:\\my_download.csv")
    inputToFile(in,fileOut)
    println("File saved")
  }
Posted in Java, Scala, Selenium | Tagged , , | Leave a comment

Organising Eclipse static imports

By default Eclipse “Organise imports” doesn’t deal with static imports, so if you are using JUnit and want to write something like assertEquals(), it won’t be imported. However, you can add static imports to your Java preferences to get them available via the quick assist (CTRL+1). Details on stack overflow:

http://stackoverflow.com/questions/288861/eclipse-optimize-imports-to-include-static-imports

Posted in Eclipse, Java | Tagged , | Leave a comment

Getting the last build date from CruiseControl

Imagine you have a part of your build process that is a time consuming operation that only needs to take place when a particular file has been updated. How would you get the time of the last successful build? Well, CruiseControl stores a log file for each build, so one way to do it is to list the contents of the log directory and locate the latest file. Here is the Ant syntax:

?Download build.xml
   <target name="last-build">
     <exec executable="sh" outputproperty="last_build_timestamp">
       <arg value="-c" />
       <arg value="find ../../logs/projectname -name 'log*build.*.xml' | sort | tail -n 1 | sed 's/.*\([0-9]\{14\}\).*/\1/'" />
     </exec>
     <echo message="Last CruiseControl build was: ${last_build_timestamp}"/>
   </target>

Obviously this uses a shell command so it will only run on unix / linux. You need to be careful when passing a sequence of pipe joined commands to the shell from Ant. You can’t use the name of the first command as the executable, as the remainder of the command will be fed in as a parameter, which isn’t correct. Instead, simply specify the shell as the executable and use -c to specify the full command line that you want to execute. The breakdown of the above command is:

  • Find all of the log files which start with log and have the word ‘build’ in. Unsuccessful builds will have a log file, but it won’t have the word ‘build’ in. In the example above, I’m assuming that your ant script is in [CRUISE_CONTROL_ROOT]\projects\projectname and the logs are in [CRUISE_CONTROL_ROOT]\logs\projectname, so the find command has to recurse two directories before going into the log directory.
  • Sort the list of successful build files alphabetically.
  • Get the last file name – the latest build.
  • Use the sed stream editor to extract the latest build timestamp. Unlike a normal regex search and replace command, which removes elements of a string not matched by the regex, sed passes the entire string through, only replacing sections that it has matched. Hence to remove parts of a string you need to explicitly match them, then ensure that the output part of the sed command does not print them out. In the above example, the sed command does the following:
    • .* – match the first part of the file name, but don’t capture the characters
    • \([0-9]\{14\}\) – match the 14 digit timestamp and store it in capturing group 1
    • .* – match the remainder of the file name, but don’t capture the characters
    • Then the replace part of the sed command simply prints out the captured timestamp using \1
Posted in Ant | 2 Comments

Comparing two MySQL databases

When doing development, you’ll often have multiple versions of a database on different environments and sometimes it would be useful to be able to check the differences between the data. Did you know you can use the Toad database tool to compare two databases? Simply go to Tools -> Compare -> Data:

The next screen will ask you to map the tables from the first database to those in the second. If you are comparing databases with the same schema, you can simply select “Map all”. Then Toad will analyse the differences and give you a report which tells you how many rows are the same in each table, how many have been removed and how many added:

Posted in Databases and SQL, MySQL | Tagged | 2 Comments