Using Optional in Java 8

“The introduction of null references was my billion dollar mistake” – Tony Hoare

Optional is a (typed) container object. It may contain a single object, or it may be empty. It allows you to avoid null pointer exceptions. In this article I’m going to work through a number of examples of how to use Optional. I’ll include code snippets, but all the source code is available on my github at:

https://github.com/hedleyproctor/java8-examples

Let’s get started. My examples use the domain of the insurance industry, since that’s the industry I work in. Suppose we have a service that allows you to find an insurance claim based on its id. Prior to Java 8, the method signature of this would be as follows:

public Claim find(Long id)

What’s wrong with this? Well, you don’t know if it could ever return null. Will it? Or will it return a default value? If you want to use any fields of the returned object, you are forced to insert null checks, like this:

Claim claim = claimService.find(id);
if (claim != null) {
  productType = claim.getProductType();
}

If you forget the null check, you may get the dreaded NullPointerException. The purpose of Optional is to allow your method signature to tell the caller that the method may not return an object, and make it easier to avoid null pointers. With an Optional, the method call looks like this:

Optional<Claim> optionalClaim = claimService.findById(15l);

The “functional” way to interact with an Optional is not to directly unbox it, but rather to invoke one of the functional methods. e.g.

optionalClaim.ifPresent(claim -> System.out.println("Found claim. Id: " + claim.getId()));

Now, the clever thing is that if we want to use any fields of the returned object, we no longer need to write an explicit null check. Instead, the Optional class has a method called “map”. The contract for map says that you pass it two things, an Optional, and a lambda or method reference that takes a parameter of type T, and returns something of type U. It then does the following:

  • If the Optional is empty, just returns an empty Optional.
  • If the Optional has an object inside, invokes the function you have passed it on that object, and wraps the return result in an Optional.

This means that if we want to extract the productType of the claim, as before, we can now write the following:

Optional<Claim.PRODUCT_TYPE> optionalProductType =
                 claimService.findById(15l)
                 .map(Claim::getProductType);

Much better! Let’s look at some more variations. Firstly, if you want to provide a default value, you can chain another call to orElse on the end:

Claim.PRODUCT_TYPE myProductType =
                claimService.findById(15l)
                .map(Claim::getProductType)
                .orElse(Claim.PRODUCT_TYPE.MOTOR);

You can even call a supplier function to return the default value if needed:

Claim.PRODUCT_TYPE myProductType2 =
                claimService.findById(15l)
                .map(Claim::getProductType)
                .orElseGet(claimService::getDefaultProductType);

Now, suppose you want to call map with a function, but that function already wraps its response in an Optional. Imagine we want pass our Optional Claim to the following:

public Optional<AuditLog> findAuditLog(Claim claim)

What’s the problem here? Well, remember what the contract of map is. If you give it an Optional with something inside, it passes that to the method you’ve given it, AND THEN WRAPS THE RETURNED OBJECT IN AN OPTIONAL. Yikes! The findAuditLog method returns an Optional (that may or may not have an AuditLog object) but then map would wrap this in a second Optional! We don’t want this, so what is the solution? The answer is that Optional has another method called flatMap. flatMap does not wrap the returned value in an Optional, so we can now write the following:

Optional<AuditLog> auditLogOptional = 
                 claimService.findById(15l)
                .flatMap(claimService::findAuditLog);

Optional also has a filter method. Again, it is null safe, so you can safely invoke it on an Optional that might be empty, like this:

Optional<Claim> optionalMotorClaim = 
                claimService.findById(15l)
                .filter(claim -> Claim.PRODUCT_TYPE.MOTOR.equals(claim.getProductType()));

If you really do need to get the value out of an Optional, you can do so, as follows:

if (optionalClaim.isPresent()) {
            Claim myClaim = optionalClaim.get();
            // do stuff with claim
        }

Note that you should ALWAYS call isPresent() prior to calling get(), as get() will throw an exception if you invoke it on an empty Optional. Most of the time, calling ifPresent and passing a lambda will be sufficient for processing your Optional, but extracting the value will be necessary if you need to do stuff that isn’t allowed inside a lambda, such as throwing an exception.

Finally, a side note about one limitation of Optional and Stream in Java 8. At the moment it is a bit convoluted to map a Stream> to extract the values. You have to do the following:

Stream<Claim> claimsLoadedById = claimIdSet.stream()
                                                .map(claimService::findById)
                                                .filter(Optional::isPresent)
                                                .map(Optional::get);

In Java 9, this has been simplified to:

Stream<Claim> claimsLoadedById = claimIdSet.stream()
                                      .map(claimService::findById)
                                      .flatMap(Optional::stream)

Summary

In this article I’ve introduced Optional and given a number of examples of how to use it. To make effective use of Optional you should:

  • Use it as the return type for methods that can validly not return an object
  • Chain calls to map, flatMap and filter on a returned Optional to avoid nested null pointer checks

This article is part of a series on Java 8. You might be interested in the other articles:

Java 8 Streams Tutorial
Yet another Java 8 custom collector example

I also recommend the book Java 8 In Action.

Posted in Java | Tagged | Leave a comment

Maven offline build fails to resolve artifacts in your local repository

Recently I’ve been trying to set up a new machine with a maven build that can work offline. My first instinct was to do the following:

  1. Configure maven with a ~/.m2/settings.xml file with our set of Nexus repos (we use six or seven locally hosted Nexus repos)
  2. Run an online build to cache all the artifacts in the local maven repo
  3. Delete the ~/.m2/settings.xml file with the repo definitions in
  4. Run an offline build with -o and confirm it works

Much to my surprise, this process failed with a bunch of errors like the following:

[ERROR] Plugin org.apache.maven.plugins:maven-resources-plugin:2.7 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-resources-plugin:jar:2.7: The repository system is offline but the artifact org.apache.maven.plugins:maven-resources-plugin:pom:2.7 is not available in the local repository.

I couldn’t really see what was going on here. The missing artifacts were all definitely in the local repo. I ended up downloading the Maven source and debugging into it. The problem is that when Maven downloads a file from a remote repo, it stores a file called _maven.repositories along with the artifact in the local cache, that says where it was obtained from. The file format is like this:

#NOTE: This is an internal implementation file, its format can be changed without prior notice.
#Tue Jun 23 14:39:00 BST 2015
maven-jaxb2-plugin-project-0.12.3.pom>Nexus=

When trying to resolve an artifact, if the artifact is found locally, maven then attempts to determine if it is a locally installed artifact, or something cached from a remote download. The problem I was seeing is that if it finds a _maven.repositories file with the name of a repo that is not in your settings.xml, it throws an exception! To me, either Maven should permit this artifact to be used, or if the maven developers really don’t want that to happen, the wording of the exception should make clear what is actually going on. e.g. “I found file XYZ.jar in the local repo, but the _maven.repositories file tells me it was downloaded from a repo called MyRepo which isn’t configured for the current build, therefore I’m not using it”.

For now, if you want your offline build to work, you have two options:

  1. Download your proprietary jars from your Nexus repo like I did, but don’t delete your settings.xml
  2. Install your proprietary jars manually, so there is no _maven.repositories file to confuse maven
Posted in Maven | Tagged | Leave a comment

Java 8 Streams Tutorial

In this tutorial, I’m going to start by explaining some of the basics of streams. Viz:

  • What streams are
  • Terminal and non-terminal operations
  • Their “lazy” nature
  • Their read-once nature
  • Why they were introduced i.e. how they enable easy parallel operations

Then I’m going to work through examples of four key stream operations:

  • Filter
  • Map
  • Flatmap
  • Collect

I’m going to include plenty of code snippets, but note that you can get all the source over on my github:

https://github.com/hedleyproctor/java8-examples

Introduction to streams

To obtain a stream, you call the new stream() method that has been added to the Collection interface.

Stream operations can be divided into two types:

Intermediate operations, that return a stream:

  • filter
  • skip
  • limit
  • map
  • flatMap
  • distinct
  • sorted

Terminal operations, that return some kind of result

  • anyMatch – boolean
  • noneMatch – boolean
  • allMatch – boolean
  • findAny – Optional
  • findFirst – Optional
  • forEach – void, e.g. print
  • collect
  • reduce

The idea behind streams is that you can build up a pipeline of operations by calling multiple intermediate operations, and then finally a terminal operation to obtain a result.

It’s important to note that streams have two important differences compared to collections:

Firstly, unlike a collection, which is essentially a set of data in memory, stream elements are only produced one at a time, as you iterate over the stream. This is referred to as the “lazy” nature of streams. Imagine you have a large dataset of a million elements in memory and you create a stream backed by this dataset. If every time you called an intermediate operation, the entire dataset was iterated, this would be hugely inefficient. Rather, you can think of the intermediate operations as recording that an operation needs to be performed, but deferring the actual execution of that operation until you call a terminal method. At this point, the stream is iterated, and each intermediate operation is evaluated.

Secondly, you can only read from a stream once. This differs from e.g. Scala, in which you can read a stream as many times as you like. There is a great stackoverflow answer from one of the stream API designers that explains why this design choice was taken, but it is a bit of a monster, so I’ll summarise it:

  • You can use streams for things other than collections, that genuinely are read once. e.g. read a file with BufferedReader, which has a lines() method returning Stream.
  • Allowing two types of stream, one lazy and the other not, creates its own problems. e.g.
    • In Scala you can have bugs if your code attempts to read a stream twice when in fact it has been passed a once-off stream implementation.
    • Collection classes optimise some operations by storing / caching data. e.g. calling size() on a collection returns a cached size value. Calling size() on a filtered collection would take O(n) time, as it would have to apply the filter to the collection.
    • If you pass round a lazy stream and use it multiple times, each time you operate on it, the entire set of operations need to be evaluated.

There is a link to the answer at the bottom of this article if you want to read it.

Why were streams introduced?

To me, the advantages of streams can be summed up as three points:

  1. Using functional style methods is clearer. i.e. if you use a filter method, someone can see at a glance what you are doing
  2. Because streams are lazy, you can pass streams around between methods, classes or code modules, and apply operations to the stream, without anything being evaluated until you need it to be.
  3. Streams processing can be done in parallel, by calling the parallelStream method. i.e. because you aren’t manually iterating over a collection and performing arbitrary actions, but instead calling well defined methods on the stream such as map and filter, the stream classes know how to split themselves up into separate streams to be processed on different processors, and then recombine the results.

The third reason is really the driver. With the advent of “big data”, the ability to perform data processing operations on massive data sets is hugely important, and to do this efficiently, you will want your code to be able to make use of multiple cores / processors. Streams provide a way of doing that which means you don’t have to write complex code to split your input up, send it to multiple places, wait for the results and then recombine them. The stream implementation handles this for you. However, this article is meant as an introduction to streams, so I don’t want to go into too much detail as to how this works. Rather, let’s start looking at some actual stream operations.

Stream operations

To give my code examples, I’m going to use examples from two domains:

  1. Insurance – this is the domain I work in. Here we have insurance claims, which could be of different types (e.g. motor, household), have jobs attached to the them (e.g. motor repair, solicitor, loss adjuster) and payments made.
  2. Restaurant menu – this is what Java 8 In Action use for their examples.

Filter

I think filter is a great operation to start with, it’s a very common thing to do and a nice intro to stream syntax. In my code examples, if you open the StreamExamples class and find the filter method, you can see the syntax for filtering a collection of claims to motor claims only:

Stream<Claim> motorClaims = claims.stream().filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR));

The filter method takes a lambda expression, which accepts an object of the type used in your stream, in this case a Claim object, and returns a boolean. Any element for which this check is true is included in the filtered stream and elements for which the check returns false are excluded. In this case, we simply check if the type of the claim is MOTOR. As this is an intermediate operation, the return type is Stream. As explained above, at this point, the filter hasn’t actually been evaluated. It will only be evaluated when a terminal operation is added. Before we do that, let’s look at a couple more simple examples of filter.

We could filter on payments over 1000:

Stream<Claim> paymentsOver1000 = claims.stream().filter(claim -> claim.getTotalPayments() > 1000);

Or claims with 2 or more jobs:

Stream<Claim> twoOrMore = claims.stream().filter(claim -> claim.getJobs().size() >= 2);

Map

The map operation means “map” in the mathematical sense – that of mapping one value to another. Suppose we have a stream of Claim objects, but what we need is claim ids? Just map the stream like this:

Stream<Long> claimIds = claims.stream().map(claim -> claim.getId());

As you can see, the map operation takes a lambda expression that accepts an object of the type used in your stream, in this case a Claim, and converts it to another type. In fact, in this example, you don’t even need to write the full lambda expression, you can use a method reference:

Stream<Long> claimIds2 = claims.stream().map(Claim::getId);

Now that we have seen two different intermediate operations, let’s look at how to build a pipeline by applying the operations one after another. If we want to get the ids of all motor claims, we can write the following:

Stream<Long> motorClaimIds = claims.stream()
                .filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR))
                .map(Claim::getId);

I’d recommend writing your pipelines with each operation on a separate line like this. Not only does it make the code more readable, but if there is a fatal exception during your stream processing, the line number will take you straight to the failing operation.

Note that you don’t just have to “extract” values during a map operation, you can also create new objects. For example, you might convert from domain objects to DTOs, like this:

Stream<ClaimDTO> claimDTOs = claims.stream().map(claim -> new ClaimDTO(claim.getId(), claim.getTotalPayments()));

FlatMap

Suppose you want to get a stream or collection of all of the jobs attached to a list of claims. You might start with a map operation, like this:

claims.stream().map(Claim::getJobs)

However, there is a problem here. Calling getJobs() on a claim returns a Set of Job objects. So we now have a stream composed of Sets, whereas we want a stream of Job objects. This is where flatMap comes in. It takes a stream composed of Sets or another collection type, and “collapses” it down to a stream of the objects in the collections. Hence, to get a stream of all the jobs, we write:

Stream<Job> jobs = claims.stream().map(Claim::getJobs).flatMap(Set::stream);

Again, we can pipeline a number of operations here, for example by filtering the stream before mapping the values. Taking an example from the food / menu domain, here’s how to get side orders available for dishes with over 750 calories:

Stream<SideOrder> sideOrdersOver750 = menu.stream().filter(dish -> dish.getCalories() > 750).map(Dish::getSideOrders).flatMap(Set::stream);

Collect

The three operations we have covered so far are all intermediate operations. They operate on a stream and return a stream. When you want to convert your stream back into a collection, you will want to call the collect method. There a large number of variations as to how you collect, and this choice can be a bit bewildering at first, so I want to show a good number of examples here to help you get familiar with what is available to you.

Firstly, let’s start with the simplest possible collect operations, to a set, list or map. Here is what you could do if you want a stream of motor claims collected to one of these types:

Set<Claim> motorClaimSet = claims.stream().
                                    filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR)).
                                    collect(Collectors.toSet());
 
List<Claim> motorClaimList = claims.stream().
                                    filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR)).
                                    collect(Collectors.toList());
// to a map (grouping by unique key)
Map<Long,Claim> motorClaimMap =  claims.stream().
                                        filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR)).
                                        collect(Collectors.toMap(Claim::getId, Function.<Claim>identity()));

In the map example, the key of claim id is unique. What happens if you map by a non-unique key? The answer is that your map values won’t be individual objects, but rather lists of the objects that share that non-unique key. For example:

Map<Claim.PRODUCT_TYPE,List<Claim>> claimsByType = claims.stream().collect(groupingBy(Claim::getProductType));

You can see here that we are using the groupingBy method. Grouping can be multi-level however. Not only that, but the grouping keys don’t have to be attributes of the objects, you can dynamically create the key values as part of the grouping. Consider grouping by product type, and then by claims of £1000 or less:

Map<Claim.PRODUCT_TYPE,Map<String,List<Claim>>> claimsByTypeAndPayment = 
 claims.stream()
 .collect(
   groupingBy(Claim::getProductType,
      groupingBy(claim -> {
         if (claim.getTotalPayments() > 1000) {
              return "HIGH";
         }
         else {
              return "LOW";
         }
        })
      ));

Note that the result of your grouping doesn’t have to be the objects in your stream. You may want to extract a value from them. In the menu domain, suppose I want to group side orders by type, and get a list of the calories for each of the side orders in each type. In this case you will want to operate on a stream of SideOrder objects, but use the two parameter groupingBy method to specify to extract the calorie value, rather than collecting the SideOrder objects themselves:

Map<SideOrder.Type,List<Integer>> sideOrderCalories = 
    menu.stream()
    .map(Dish::getSideOrders)
    .flatMap(Set::stream)
    .collect(groupingBy(SideOrder::getType, mapping(SideOrder::getCalories, toList())));

Sometimes you want to want to group into only two groups. Because this is a common operation, it has a special convenience method called partition:

Map<Boolean,List<Dish>> veggieAndNonVeggie = menu.stream().collect(partitioningBy(Dish::isVegetarian));

Sometimes you want to sum or average numerical values from your stream:

int totalCalories = menu.stream().collect(summingInt(Dish::getCalories));
double totalPayments = claims.stream().collect(summingDouble(Claim::getTotalPayments));
double averagePayment = claims.stream().collect(averagingDouble(Claim::getTotalPayments));

The above syntax is fine if you are only obtaining one value. However, if you want both a sum and an average say, you shouldn’t evaluate each one separately – this will iterate over the stream multiple times. Instead, you should use a summing collector:

DoubleSummaryStatistics paymentStats = 
  claims.stream().collect(summarizingDouble(Claim::getTotalPayments));
totalPayments = paymentStats.getSum();
averagePayment = paymentStats.getAverage();

My final example is something that has been missing from Java for a while. How often have you needed to concatenate a collection of strings, only to have to resort to using Apache Commons to do it! No more. Now you can use the joining() collector:

String claimIdListAsCommaSeparatedString = claims.stream().map(claim -> claim.getId().toString()).collect(joining(","));

Note that if you don’t specify a separator, the default is that none will be used.

Summary

I hope this has been a useful introduction to streams and how to use them. We’ve covered what streams are, their lazy nature, their once-off nature and why they enable easier parallel processing. Then we have looked at most common stream operations: filter, map, flatMap and collect.

For collecting, if you want to know how to write your own custom collector, see my example:

Yet another Java 8 custom collector example

If you are interested in the background to the stream API design choices, see:
Why are Java streams once off?
Why doesn’t java.util.Collection implement the new stream interface?

Finally, for more details on both Java 8 in general, and functional programming, I’d strongly recommend Java 8 In Action.

Posted in Java | Tagged , | Leave a comment

Yet another Java 8 custom collector example

Java 8 introduces a number of functional programming techniques to the language. Collections can be turned into streams which allows you to perform standard functional operations on them, such as filtering, mapping, reducing and collecting. In this post I’m going to give a quick example of writing a custom collector. Firstly, what is collecting? I would probably define it as “taking a collection / stream and forming it into a particular collection / data structure”. Java 8 has numerous helper methods and classes for standard collection operations, and you should use them when they apply. e.g. you can use a groupingBy collector to process a stream and group the elements by a property, producing a map, keyed off of that property, in which each value is a list of elements with that property. However, for more complex collecting, you will need to write your own collector. There are five methods in a collector:

  • supplier – returns a function, that takes no arguments, and returns an empty instance of the collection class you want to put your collected elements into. e.g. if you are ultimately collecting your elements into a set, the supplier function will return an empty set.
  • accumulator – returns a function that takes two arguments, the first is the collection that you are building up, the second is the element being processed. The accumulator function processes each element into the target collection.
  • finisher – returns a function that allows you to perform a final transformation on your collection, if required. In many cases, you won’t need to transform the collection any further, so you will just use an identity function here.
  • combiner – only required for parallel processing of your stream. If you envisage running this operation across multiple processors / cores, then your combiner contains the logic to combine results from each parallel operation.
  • characteristics – allows you to specify the characteristics of the collector so that that it can be invoked safely and optimally. e.g. specifying Characteristics.IDENTITY_FINISH lets Java know that because you aren’t performing a final transformation, it doesn’t even need to invoke your finisher function.

Okay, let’s do a trivial example. I work in insurance, so I’ll create an example in this domain. Suppose I have a stream of insurance claims. The claims may be of different types, such as motor, household etc. I want to produce a map with one example claim in, for each of a list of specified claim types. This needs a supplier function that gives me an empty map to start with, and an accumulator function that simply gets the claim type, and if the map doesn’t already contain an example claim of this type, adds it in. The finisher can be the identity function. This is what it looks like:

import java.util.*;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;
 
public class ClaimProductTypeCollector<T extends Claim> implements Collector<T,Map,Map> {
 
    private Set<Claim.PRODUCT_TYPE> requiredTypes = new HashSet<>();
 
    public Set<Claim.PRODUCT_TYPE> getRequiredTypes() {
        return requiredTypes;
    }
 
    @Override
    public Supplier<Map> supplier() {
        return () -> new HashMap<>();
    }
 
    @Override
    public BiConsumer<Map,T> accumulator() {
        return (map,claim) -> {
            if (map.get(claim.getProductType()) == null) {
                map.put(claim.getProductType(),claim);
            }
        };
    }
 
 
    @Override
    public Function<Map, Map> finisher() {
        return Function.identity();
    }
 
    @Override
    public BinaryOperator<Map> combiner() {
        return null;
    }
 
    @Override
    public Set<Characteristics> characteristics() {
        return Collections.singleton(Characteristics.IDENTITY_FINISH);
    }
}

If you want to type this in and get it working as an example, here is what the claim class looks like:

public class Claim {
 
    public enum PRODUCT_TYPE { MOTOR, HOUSEHOLD, TRAVEL}
 
    private PRODUCT_TYPE productType;
 
    public Claim(PRODUCT_TYPE productType) {
        this.productType = productType;
    }
 
    public PRODUCT_TYPE getProductType() {
        return productType;
    }
 
    public void setProductType(PRODUCT_TYPE productType) {
        this.productType = productType;
    }
 
}

Then you can test it with:

Set<Claim> claims = new HashSet<>();
claims.add(new Claim(Claim.PRODUCT_TYPE.MOTOR));
claims.add(new Claim(Claim.PRODUCT_TYPE.MOTOR));
claims.add(new Claim(Claim.PRODUCT_TYPE.MOTOR));
 
claims.add(new Claim(Claim.PRODUCT_TYPE.HOUSEHOLD);
claims.add(new Claim(Claim.PRODUCT_TYPE.HOUSEHOLD);
 
ClaimProductTypeCollector<Claim> claimProductTypeCollector = new ClaimProductTypeCollector();
claimProductTypeCollector.getRequiredTypes().add(Claim.PRODUCT_TYPE.MOTOR);
claimProductTypeCollector.getRequiredTypes().add(Claim.PRODUCT_TYPE.HOUSEHOLD);
Map oneClaimPerProductType = claims.stream().collect(claimProductTypeCollector);

For more info on Java 8, I strongly recommend the book “Java 8 In Action”. You can get the eBook directly from the publishers Manning:

https://www.manning.com/books/java-8-in-action

Posted in Java | Tagged | Leave a comment

Finding slow queries in SQL Server

When you want to optimise your app, one of the most important things is to find your slowest database queries, to see if indexes are missing, or the queries should be rewritten. SQL Server stores statistics for its queries in the dynamic management view dm_exec_query_stats. In the book “TSQL Querying” by Itzik Ben-Gan, Itzik suggests the following as a good query to extract and format the information. This query sums the total elapsed time for queries of the same type:

select top 10
     max(query) as sample_query,
     sum(execution_count) as cnt,
     sum(total_worker_time) as cpu,
     sum(total_physical_reads) as reads,
     sum(total_logical_reads) as logical_reads,
     sum(total_elapsed_time) as duration
from (select qs.*,
               substring(st.text,(qs.statement_start_offset/2)+1,
		((case statement_end_offset when - 1 
                 then datalength(st.text) 
                 else qs.statement_end_offset end 
                 - qs.statement_start_offset)/2) + 1)
               as query
               from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_plan_attributes(qs.plan_handle) as pa
where pa.attribute = 'dbid'
and pa.value= db_id('your-database')
) as d
group by query_hash
order by duration desc

If you look on the web, you will sometimes see variations on this query that attempt to filter by calling the db_name() function on the database id stored in the statement. However, this will only work for stored procedures. The database name will be null for a normal SQL statement. This makes sense, because running the same statement against different databases, even if they have the same tables, could result in completely different query plans. Hence it is the query plan that is linked to the database. As you can see from the above, you can safely filter by converting the dbid stored in the plan attributes.

Posted in Performance, SQL Server | Tagged , | Leave a comment

Writing a DSL with Scala Parser Combinators

Scala parser combinators provide a very easy way to construct your own domain specific languages (DSLs). Let’s look at a simple example. Suppose that you work on a large application which produces various report files on a scheduled basis. You decide that you want to give the users of your application the ability to control what directories different file types are moved to, and to send e-mail alerts when specific files are generated. You could do this by allowing a power user to type the instructions in using little language such as the following:

if ($FILENAME contains "finance") {
	MOVE "finance"
	EMAIL "finance-team@somecompany.com"
}
else if ($FILENAME endsWith "xlsx") {
	MOVE "spreadsheets"
}
else if ($FILENAME startsWith "daily") {
	MOVE "daily-reports"
	EMAIL "report-team@somecompany.com"
}
else {
	MOVE "additional"
}

In this scenario, I’m assuming that this routine is invoked once for each file that is output, and the name of the file is put into the variable $FILENAME.

So how do Scala parser combinators work and how do they compare to the traditional way of constructing a DSL? Before looking at the Scala approach, it is worth clarifying that whenever you talk about a DSL, you need to be clear whether you are talking about an internal or external DSL. An internal DSL is one that is simply written in a normal programming language, but the objects and methods provided have been created to give the appearance of a domain specific language. This tends not to be done much in Java, since Java does not have a very flexible language syntax. It works better in languages such as Ruby or Scala where you have more freedom about how you define methods, such as being able to use spaces and symbols in the names, and being able to change the order of objects and the methods you are invoking on them. An external DSL is one where the code is written in its own format, and this code is parsed to understand its meaning. Traditionally this is done by writing the rules of the language in a formal grammar, usually Backus-Naur Form (BNF). The BNF grammar is then read in by a parser generator such as ANTLR or JavaCC, which generates a parser capable of understanding the language. This approach works well, but does have problems. One problem is debugging. If something goes wrong, a developer may have to debug through auto-generated code. A second annoyance is that whenever you update your grammar, you have to run the parser generation step, rather than just compiling some code. This is where parser combinators come in, as an alternative approach.

So, how do you use Scala parser combinators to parse a small language like the one above? In short, rather than writing the BNF grammar into a grammar file, and then generating a parser, you write a Scala class in which each BNF rule becomes a method. The Scala parser combinator library provides parsers, matchers and methods that correspond to all of the elements in a grammar. Let’s look at how this would work for the example above. I’ll start off with a few code snippets, and then give the full class, so you can run it yourself. If you haven’t run any Scala code before, using Eclipse with the Scala plugin is probably the easiest way to get started. Please note that although Scala parser combinators used to be included in the standard Scala library, from 2.11 onwards they are a separate download from http://www.scala-lang.org/.

To give your class access to the parser combinator code, the easiest thing to do is extend one of the parser traits, so I’ll start with this:

import scala.util.parsing.combinator.JavaTokenParsers
 
class FileHandler extends JavaTokenParsers {
 
}

Now we need to work on the grammar rules that will define the language. The easiest statements to define first will be the MOVE and EMAIL commands. They are simply the name of the command, followed by a string. Hence we could start by defining a string, and these two rules:

def string : Parser[Any] = """".*"""".r
def email : Parser[Any] = "EMAIL"~string	
def move : Parser[Any] = "MOVE"~string

Here, I have used the .r method on a string to define a regular expression saying that in our little language, a string is a double quote, followed by any characters, followed by another double quote. The whole thing is enclosed in triple double quotes, as in Scala, this means that you are defining a string literal, and you do not want to have to escape special characters in the string. For me, this is more readable that escaping the special characters, because the triple quotes are at the start and end of the string, whereas when you escape special characters, you end up with backslashes intermingled with the regular expression you are trying to write, and it can be easy to mistype the regex. Once we have defined a string, we can define the email and move rules by saying that they are the word EMAIL or MOVE respectively, followed by a string. The tilde symbol ~ is used whenever you want combine two parsers sequentially. Note that we don’t have to worry about whitespace, the tilde will take care of that.

Now we can start to define the if else statement. The if statement uses an expression, which tests the $FILENAME variable with various operators. In a larger language, we might need to define the $FILENAME as a variable. Here, because I know this is the only “variable” in the language, I’m not going to bother doing that, I’ll just write it into the expression rule:

def operator : Parser[Any] = "contains" | "endsWith" | "startsWith"
def expr : Parser[Any] = "$FILENAME"~operator~string

Here we have used the pipe symbol to say that an operator is any one of the three strings specified, and an expression is the $FILENAME, an operator and a string.

To build up the rules for the if statement, I’m going to say that:

  • We need the concept of a block – which is a pair of curly braces, with code in between.
  • A single if clause is the word “if”, followed by an expression in brackets, followed by a block.
  • An else if clause is the word “else”, followed by an if clause.
  • An else clause is the word “else”, just followed by a block.
  • The entire if statement is an if clause, optionally followed by one or more “if-else” clauses, optionally followed by an “else”.

One way to embody this in code is:

def block : Parser[Any] = "{"~rep(email | move)~"}"
def ifClause :Parser[Any] = "if"~"("~expr~")"~block
def elseIfClause : Parser[Any] = "else"~ifClause
def elseClause : Parser[Any] = "else"~block
def ifElse : Parser[Any] = ifClause~opt(rep(elseIfClause))~opt(elseClause)

Here you can see some additional parser combinator syntax – the rep method denotes repetition, the opt method denotes an optional element. The only thing that remains now is to define the “top level” rule, which will be the starting point for parsing code written in our little language. Based on the above language sample, a sensible choice for this rule is just to say that text written using this language will be composed of a list of commands, using any of the three commands MOVE, EMAIL or an if else statement:

def commandList = rep(email | move | ifElse)

So now the entire class looks like this:

import scala.util.parsing.combinator.JavaTokenParsers
 
class FileHandler extends JavaTokenParsers {
  	def string : Parser[Any] = """".*"""".r
	def email : Parser[Any] = "EMAIL"~string
	def move : Parser[Any] = "MOVE"~string 
 
	def operator : Parser[Any] = "contains" | "endsWith" | "startsWith"
	def expr : Parser[Any] = "$FILENAME"~operator~string
	def block : Parser[Any] = "{"~rep(email | move)~"}"
	def ifClause :Parser[Any] = "if"~"("~expr~")"~block
	def elseIfClause : Parser[Any] = "else"~ifClause
	def elseClause : Parser[Any] = "else"~block
	def ifElse : Parser[Any] = ifClause~opt(rep(elseIfClause))~opt(elseClause)
 
	def commandList = rep(email | move | ifElse)
}

Now we can test it. Create a file with the original language sample in it. I’ve called mine file-handler.txt. Then we can create a parser by subclassing our parser code, then pointing it at this file as input, and invoking our top level method “commandList”:

import java.io.FileReader
import java.io.FileInputStream
 
object FileHandlerTest extends FileHandler {
 
  def main(args: Array[String]): Unit = {
    val reader = new FileReader("/path-to-your-file/file-handler.txt")
    println(parseAll(commandList, reader))
  }
 
}

The output will show the code being parsed (line breaks added for clarity):

parsed: List(( 
((((((if~()~(($FILENAME~contains)~"finance"))~))
~(({~List((MOVE~"finance"), (EMAIL~"finance-team@somecompany.com")))~}))
~Some(List((else~((((if~()~(($FILENAME~endsWith)~"xlsx"))~))
~(({~List((MOVE~"spreadsheets")))~}))), 
(else~((((if~()~(($FILENAME~startsWith)~"daily"))~))
~(({~List((MOVE~"daily-reports"), 
(EMAIL~"report-team@somecompany.com")))~}))))))
~Some((else~(({~List((MOVE~"additional")))~})))))

Note that with the parser as written above, the language does not support nesting, because a block is defined to composed of either MOVE or EMAIL statements. i.e. it cannot contain a nested “if” statement. The generic way to change your grammar rules to support nesting is to change the definition of a block to recursively point back to your top level rule, which in our case is the one called “commandList”:

def block : Parser[Any] = "{"~commandList~"}"

You might like to make this change, then update your example input to include a nested if statement, and confirm that it can be parsed correctly.

For more information on Scala parser combinators, check out chapter 31 of “Programming in Scala”, available online here:

http://www.artima.com/pins1ed/combinator-parsing.html

Posted in Scala | Tagged | 2 Comments

Optimistic locking and versioning with Hibernate

Just put a small example of Hibernate optimistic locking and versioning on github:

https://github.com/hedleyproctor/HibernateVersioningExample

Hibernate uses an optimistic locking strategy, based on versioning. This means it tries to avoid taking a lock on data and instead relies on having a version attribute for each entity (assuming you have configured one). When it performs an update, it includes the version attribute in the where clause of the update. If the database row has already been updated by another thread, then no rows will be updated. Since most databases return the number of rows affected by an update statement, Hibernate can check if this value is zero and then throw an exception.

To enable optimistic versioning, simply add a version attribute to each entity. You can choose either a timestamp or a number, but a number may be safer as using a timestamp could fail if multiple computers are updating data and their clocks are out of sync. To add a version attribute, just annotate with @Version.

To demonstrate that attempting to update data with an out of date version does indeed throw an exception, in the example, I’ve created two threads. The first one is given a reference to the data, but it waits until the second thread has updated the data before attempting its own update. You will see that this throws a stale object exception. You can also see from the SQL that the where clause includes the version number.

This article is part of a series on Hibernate. You might be interested in other articles in the series:

Hibernate query limitations and correlated sub-queries
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

Posted in Hibernate | Tagged | Leave a comment

Distinct, order by and Hibernate HQL

Recently someone posted a question on stackoverflow explaining that they were having a problem with a Hibernate HQL query that involved both order by and a distinct in the select. The query was across five linked entities – a many to many association, mapped by an entity called PreferenceDateETL, which links the Preference and DateETL entities, and then two properties on the join entity – Employee and Corporation. The person asking the question explained that they wanted to write a query like the following:

select distinct pd.preference 
from PreferenceDateETL pd 
    where pd.corporation.id=:corporationId 
    and pd.preference.employee.deleted=false 
    and pd.deleted=false 
    and pd.preference.deleted=false 
    and pd.dateETL.localDate>=:startDM 
    and pd.dateETL.localDate<=:endDM 
    and pd.preference.approvalStatus!=:approvalStatus 
order by pd.preference.dateCreated

However, when trying to write this, he got an exception saying that the date created is not in the select list. What is going on here and how can it be fixed? Well, the order by clause is always evaluated last in a database query, so you can only order on columns you have selected. Although it appears as though the above query will have all of the columns, in fact, Hibernate ends up having two instances of the preference table in its query, and the order by clause refers to a different one to the one in the select, hence the exception. The generated SQL has the form:

select
        distinct preference1_.id as id1_76_,
        ....things...
    from
        preference_date_etl preference0_ 
    inner join
        preference preference1_ 
            on preference0_.preference_id=preference1_.id cross 
    join
        preference preference2_ cross 
    join
        employee employee3_ cross 
    join
        date_etl dateetl5_ 
    where
        ...things... 
    order by
        preference2_.date_created

The problem with this query is really that using distinct is unnecessary. It is required in the above form of the query, because the query has been written to pick out instances of the many to many entity that we are interested in, then from those, navigate to the preference objects they link to. By definition, because the many to many entities are representing a many to many relationship, this can give you a list that includes duplicates. You can avoid this by restructuring the query so that the query on the many to many entities becomes a subquery. i.e. you pick out the many to many entities you are interested in, and then say “now get me all of the preference objects that are in the list of preference objects referred to by the PreferenceDateETL objects”. This query doesn’t have any duplicates, hence you do not need the distinct, and the order by clause will work correctly. Here is the HQL:

from Preference p where p in 
  (select pd.preference from PreferenceDateETL pd
  where pd.corporation.id=:corporationId
  and pd.deleted=false
  and pd.dateETL.localDate>=:startDM
  and pd.dateETL.localDate<=:endDM)
and p.employee.deleted=false
and p.deleted=false
and p.approvalStatus != :approvalStatus
order by p.dateCreated

If you want to read the original question, it is here:

http://stackoverflow.com/questions/25545569/hibernate-using-multiple-joins-on-same-table

This article is part of a series of articles on Hibernate. Others in the series include:

Hibernate query limitations and correlated sub-queries
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

Posted in Hibernate | Tagged | 1 Comment

Understanding the Hibernate session cache

When you use Hibernate, you do so via a session. It is well known that the session acts as a first level cache, but sometimes this can cause confusing behaviour. Let’s look at a simple example. Suppose I have Customer and Order objects. An order has a reference to the customer who made the order, and similarly, a customer has a set of orders.

Suppose I create a single customer with three associated orders as follows:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
 
Customer customer = new Customer();
customer.setForename("Hedley");
customer.setSurname("Proctor");
session.save(customer);
 
Order order1 = new Order();
order1.setOrderTotal(29.99);
order1.setCustomer(customer);
 
Order order2 = new Order();
order2.setOrderTotal(8.99);
order2.setCustomer(customer);
 
Order order3 = new Order();
order3.setOrderTotal(15.99);
order3.setCustomer(customer);
 
session.save(order1);
session.save(order2);
session.save(order3);
 
session.flush();

As you can see, I’ve called flush() on the session, so the data has been sent to the database. Now let’s write a query that will pick out this customer, and assert that they have three orders:

Criteria criteria = session.createCriteria(Customer.class);
criteria.add(Restrictions.eq("forename","Hedley"));
criteria.add(Restrictions.eq("surname","Proctor"));
List results = criteria.list();
Customer customer1 = (Customer) results.get(0);
assertEquals(customer1.getOrders().size(),3);

This assertion will fail! Why? Because this criteria query just returns the customer object that you already have, which is in the session. Despite the fact that in the database the orders table really does have the correct foreign keys to the customers, because Hibernate has flushed all of the objects to the database, it doesn’t think that the session is dirty, so will happily return the Customer object that is the in session. What should you do to prevent confusion like this? The answer suggested by Java Persistence with Hibernate is that when you have a bidirectional association you should always have a helper method to add items to it, that will set both sides of the association, so you will never get this discrepancy between the session objects and the contents of the database.

This article is part of a series on Hibernate. You might be interested in some of the others:

Hibernate query limitations and correlated sub-queries
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

Posted in Hibernate | Tagged | Leave a comment

Understanding Hibernate session flushing

When you interact with Hibernate, you do so via a Hibernate session. Hibernate sessions are flushed to the database in three situations:

  • When you commit a (Hibernate) transaction.
  • Before you run a query.
  • When you call session.flush().

However, it is extremely important to understand the second of these situations – session flushing prior to running a query. This does not happen before every query! Remember, the purpose of the Hibernate session is to minimise the number of writes to the database, so it will avoid flushing to the database if it thinks that it isn’t needed. Hibernate will try to work out if the objects currently in your session are relevant to the query that is running, and it will only flush them if it thinks they are. You can easily see the effects of this behaviour. For example, suppose I have Customer and Order objects. A customer has a set of orders. To begin with, I’ll just create the orders without any reference to the customers:

Order order1 = new Order();
order1.setOrderTotal(29.99);
 
Order order2 = new Order();
order2.setOrderTotal(8.99);
 
Order order3 = new Order();
order3.setOrderTotal(15.99);
 
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
 
session.save(order1);
session.save(order2);
session.save(order3);

At this stage, the three objects will be saved to the database. Now let’s create a customer and save them:

Customer customer = new Customer();
customer.setForename("Hedley");
customer.setSurname("Proctor");
session.save(customer);

Okay, now the customer is in the database. Let’s now create the link between the customer and their orders:

order1.setCustomer(customer);
order2.setCustomer(customer);
order3.setCustomer(customer);

Now the Hibernate session is dirty, meaning that the objects in the session have changes that haven’t been persisted to the database yet. If we run an SQL query, Hibernate doesn’t know to flush the session, and the results you get back do not reflect the current states of the objects:

Query query = session.createSQLQuery("select id,customer_id from orders");
List results = query.list();

Here are the results of this query:

Hibernate: select id,customer_id from orders
1,null
2,null
3,null

Even running a Hibernate query in HQL or with a criteria won’t cause the session to flush if Hibernate doesn’t think the query concerns the objects that are dirty. e.g.

Query query = session.createQuery("from Customer");

This won’t cause a session flush because it is the Order objects that are dirty. Hence, a Hibernate query involving these objects will cause a session flush. e.g.

Query query = session.createQuery("from Order");

If you want to see this code running, it is available from github:

https://github.com/hedleyproctor/hibernate-mapping-and-query-examples

This article is part of a series on Hibernate. You might be interested in some of the others:

Hibernate query limitations and correlated sub-queries
Bespoke join conditions with Hibernate JoinFormula
Inheritance and polymorphism
One-to-many associations
One-to-one associations
Many-to-many associations

Posted in Hibernate | Tagged | 1 Comment