Using live templates in IntelliJ

If you need to write repeated text in IntelliJ then you can use its live templating function to help you. Suppose I’m writing a Liquibase script that will be composed of many similar changesets:

<changeSet id="MapSalvageCategoryAToMIAFTR" author="proctorh">
    <preConditions onFail="MARK_RAN">
                'rd_salvage_category' AND REF_DATA_CODE='CATEGORYA' AND EXTERNAL_SYSTEM_ID = (SELECT ID FROM

I want to repeat these inserts, but with different values for the ref data code, and what it is being mapped to.

  • Select the code and go to Tools -> Save as live template.
  • Choose the abbreviation for the template.
  • Edit the template to insert variables where required.
<changeSet id="MapSalvageCategoryAToMIAFTR" author="proctorh">
    <preConditions onFail="MARK_RAN">
                'rd_salvage_category' AND REF_DATA_CODE='$refdata$' AND EXTERNAL_SYSTEM_ID = (SELECT ID FROM

Now you can repeat the code block by doing:

  • Ctrl / Command + J to bring up the Insert template menu. The carat will be positioned on the first variable. Type the variable value, then press return to go to the next variable.

Posted in IntelliJ | Tagged | Leave a comment

IntelliJ Hints and Tips

Most useful keyboard shortcuts (these are for Mac):

  • CMD + O Open class
  • CMD + SHIFT + OOpen file
  • CMD + F Find in file
  • CMD + R Replace in file
  • CMD + SHIFT + F Find in path
  • CMD + SHIFT + R Replace in path
  • ALT + ENTER Quick fix (for problem under cursor)
  • CTRL + ALT + O Organise imports (remove unused)
  • CMD + N Generate (getters, setters, toString etc)
  • CTRL + I Implement interfaces
  • ALT + CMD + L Format file

Full list:


For efficient editing, also useful to be aware of:

Right click -> Refactoring options, such as rename, extract constant, extract method
New -> Scratch file – allows you to create a temporary file of any kind – text, xml, sql etc.
Regular expression search and replace
Block / multi column editing. Hold down ALT and drag to select an area with the mouse. You’ll get a cursor on each line and anything you type will be repeated on all lines. Or hold down SHIFT + ALT to click and put multiple cursors anywhere.
String manipulation plugin:
Zero width character plugin:

Code Swapping

When deploying a WAR file, deploy the exploded version to make it easier to recompile and repackage changes into it.
Configure “Build artifact” in the run config.
While debugging and changing a single class, right-click and choose “Compile”. This causes the class to be reloaded.
While debugging and changing multiple classes use the Build Project button (ctrl-F9 / Cmd-F9). This causes all affected classes to be reloaded.
For non Java classes such as xml files, right click and select “Package file”. This will move the file over to the exploded target location.
Standard hot code swap supports changing code within a single method. However it will not support many other changes. e.g. Changing method signatures. Updating a spring context. e.g. adding, changing or removing beans

Debugger tips

Evaluate expression – allows you to evaluate an expression in the current context. i.e. access variables, collections etc. You can write the expression on one line or switch to the multi line option.
Catch on Exception – if you don’t know where in the code you need to stop, but you know that an exception is being thrown, this will pause execution at that point. You can specify which sort of exception you are interested in.
Conditional breakpoint. e.g. if you are in a loop which is executing 5000 times, this will allow you to stop when a specific expression is true. Make sure your expression is null safe though!
“Drop Frame”. You can right-click on any stack frame in the debugger and drop back to that point in the execution, no need to rerun.
“Disabled until the selected breakpoint is hit”, which means that you can have one breakpoint depend on another.

Posted in IntelliJ, Java | Tagged , | Leave a comment

Data migration in SQL Server

Recently I’ve had to write a data migration for SQL Server to split a large table (28 million rows) into separate tables. Some notes here on my thoughts…

Firstly, SQL Server has INSERT…SELECT syntax which allows you to copy from one table to another. It seems like any solution will be based around using this.

Secondly, my assumption is that for a large migration, we’ll need to run in batches, with a transaction for each batch, as it will take too long to run in a single transaction.

One first idea was to write something like this, and run it inside a loop, breaking out when no more rows were being copied:

  ...other fields here
select top 100000 
  ...other fields here
from source_db.dbo.source_table source
left outer join TARGET_TABLE x
    on =
where source.item_type = 'REQUIRED_TYPE'
and is null
SELECT @rows_processed = @@rowcount
However, testing this with millions of rows suggests it is taking too long to perform the left join, as the time to do the join increases with every batch, as we add rows to the target table. As the target table is a new table, and hence has no rows to begin with, and we have an integer primary key, I ended up changing the where condition on the INSERT..SELECT to the following:
     ORDER BY source.ID
This means there is no join, just identifying the max id. Because we are ordering by the id and that is the primary key, there is no sorting required. In my testing, this took around 1 min 20 seconds to copy 3 million rows, compared to around 20 minutes for the join based approach.

I also had to migrate audit table data. This is interesting for two reasons. Firstly the audit tables don’t have a single primary key, but rather a composite. Secondly, the target tables already have some data, so this is more of a merge than just a copy. For this reason I ended up still using a left join for the audit data migration. I experimented with the batch size. A size of 10,000 took 28 minutes to copy 3 million rows, whereas a batch size of 100,000 took 18 minutes. This makes sense if the join is taking a long time, as we’ll be reducing the number if times we do the join by a factor of 10. I suspect the performance of the audit migration could be improved by changing the implementation so that the join is only performed once. e.g. perform initial join to check what rows need to be copied over, and store that either in a physical table or table variable, ordered by the same composite primary key. Then use a cursor or offset fetch to get the next batch of ids that need to be copied and use that to join with the source table to get the rows to copy. However my current implementation is fast enough, so this is one to investigate in a future migration perhaps.

Finally it is interesting to note that in my first implementation, I used some count(*) queries so that I could report progress in the logs. However with millions of rows, the time taken to perform these queries becomes non-trivial if you are doing a count with every batch. In my testing it could take 5-6 seconds to count with around 3 million rows, so doing that for 100 batches would mean 10 minutes just performing a count.


Insert select:
Posted in Databases and SQL, SQL Server | Tagged , | Leave a comment

log4j in maven tests

If you want to specify log4j configuration when running tests via Maven, you can do so by updating the Maven surefire plugin configuration to point to a specific log4j configuration file. For log4j v2 the appropriate parameter is called log4j.configuration. e.g.
You can see from the snippet that, since the tests run from the top level of the current module, the path to the file needs to include the path to the test resources folder. Alternatively, the log4j.configurationFile name does support a fully qualified URL in the form file://. The other thing you will notice is that I haven’t used the default name of “log4j2.xml” as the file name. This is because in a large maven project it can be confusing to have many files with the same name, so I prefer to put the module name into the file name to make things clearer.
Posted in Java, Maven | Tagged , | Leave a comment

Generating code with JavaPoet

Why write code when you can generate it? There are lots of situations when it makes more sense to generate. In this article I’m going to work through an example of how to use JavaPoet and Apache BeanUtils to write a class that will generate domain to DTO conversion code.

In our app, due to the gradual removal of our old way of doing things, we have a lot of code that does the following:

domain object -> legacy DTO object -> new DTO object

The legacy DTO objects are no longer needed, so now we would like to delete them. Really we want the code to convert from the domain object directly to the new DTO object. When doing this sort of conversion, you always face a choice – you just code a generic converter class, which understands all of the data conversions that you need to perform, and uses runtime reflection, simply iterating over all of the properties, and converting each one. However, one major problem with this is that it is very fragile – you cannot search for usages of getters or setters in your IDE, and if someone changes or removes a property, you will end up with a runtime failure, not a build or test failure. For this reason, we want to use plain old java code to do the conversion. However, we don’t want to write it by hand, so it makes sense to use JavaPoet to generate it. JavaPoet is a very easy way to do code generation. Let me show how I used it in this scenario.

Firstly, download JavaPoet or add to your Maven dependencies: In my case, both the domain and DTO classes are java beans (i.e. they have properties, and each property has a getter and setter) so rather than just using reflection, I can use the Apache BeanUtils classes to make it easier to read these properties, so my Maven setup includes both JavaPoet and Apache BeanUtils:

Now let’s start solving the problem at hand. Firstly, we need to map between the properties in the DTO and the domain class, and also keep a record of any properties that exist in the DTO, but cannot be found in the domain class, so we can put warnings in the generated code to say that the properties need to be manually checked. To begin with, I’ll create a mini helper class to return a map of the properties, and any missing ones:
class PropertyInfo {
    Map<PropertyDescriptor, PropertyDescriptor> propertyDescriptorMap;
    List<String> missingProperties;

    public PropertyInfo(Map<PropertyDescriptor, PropertyDescriptor> propertyDescriptorMap, List<String> missingProperties) {
        this.propertyDescriptorMap = propertyDescriptorMap;
        this.missingProperties = missingProperties;
Now we can write a method using Apache BeanUtils that iterates over the properties and matches them on their names:
PropertyInfo getPropertyMapping(Class source, Class target) {
    // iterate over each property / field to generate a list of properties we can deal with, and ones we cannot
    Map<PropertyDescriptor, PropertyDescriptor> propertyDescriptorMap = new HashMap<>();
    // store properties needing to be populated in target, not found in source
    List<String> missingProperties = new ArrayList<>();

    Map<String, PropertyDescriptor> sourcePropertiesByName
            .collect(toMap(PropertyDescriptor::getName, Function.<PropertyDescriptor>identity()));
    System.out.println("Source class has: " + sourcePropertiesByName.size() + " properties");

    PropertyDescriptor[] targetProperties = PropertyUtils.getPropertyDescriptors(target);
    System.out.println("Target class has: " + targetProperties.length + " properties");

    // only do declared properties for now i.e. don't go up to superclasses.
    // navigating up to superclasses would create problems as it would go all the way up to java.lang.Object
    Set<String> declaredTargetFields = new HashSet<>();
    for (Field declaredField : target.getDeclaredFields()) {
    System.out.println("Target has: " + declaredTargetFields.size() + " fields declared in class itself");

    for (PropertyDescriptor targetPropertyDescriptor : targetProperties) {
        String targetPropertyName = targetPropertyDescriptor.getName();
        System.out.println("Processing property: " + targetPropertyName);

        if (declaredTargetFields.contains(targetPropertyName)) {
            PropertyDescriptor sourcePropertyDescriptor = sourcePropertiesByName.get(targetPropertyName);
            if (sourcePropertyDescriptor != null) {
                System.out.println("Found mapping for " + targetPropertyName);
                propertyDescriptorMap.put(sourcePropertyDescriptor, targetPropertyDescriptor);
            } else {
                System.out.println("WARNING - cannot find property " + targetPropertyName + " in source");
        } else {
            System.out.println("Skipping property: " + targetPropertyName + " as declared in superclass");
    return new PropertyInfo(propertyDescriptorMap, missingProperties);
Great, now we have enough info to generate our converter. Our conversion method will accept a domain object, and return a DTO, so the method signature will look like this:
public DTOClassName toDTO(DomainClassName domainClassParameter)
How do we do this in JavaPoet? Well, firstly, let’s work out the parameter name. For some domain class names, we just need to take the class name and convert the first letter to lowercase. For some of the domain classes I am using, the class name ends in “Impl”, which I’d like to remove. So my logic to work out the parameter name is this:
String domainClassName = domainClass.getSimpleName();
String domainClassParameterName = domainClassName.substring(0, 1).toLowerCase() + domainClassName.substring(1);
if (domainClassParameterName.endsWith("Impl")) {
     domainClassParameterName = domainClassParameterName.substring(0, domainClassParameterName.length() - 4);
Now we can use JavaPoet to generate the method signature, using the MethodSpec.Builder class:
MethodSpec.Builder toDTOMethodBuilder = MethodSpec.methodBuilder("toDTO")
    .addParameter(domainClass, domainClassParameterName)
Next, we need to create a new instance of our DTO object, like this:
DTOClass dto = new DTOClass();
In JavaPoet, you use $T to indicate a type, then supply that type, like this:
toDTOMethodBuilder.addStatement("$T dto = new $T()", dtoClass, dtoClass);
Note that we have to supply the class twice here, as we have used the $T type marker twice in our statement. Why bother using this $T marker? What is wrong with just manually inserting the class name? Well, by using $T, JavaPoet understands that we are giving it a reference to a class, and it can then take care of the import for you! No need to manually keep track of what classes you need to import in your generated code, and whether you have already added an import, JavaPoet will do all that for you!

Now we can simply iterate over the sets of matched properties, and write the conversion code. The easiest case is of course where the property type is the same in both source and target. If every property type was the same, the code would be:

for (PropertyDescriptor domainClassProperty : domainToDTOPropertyMap.keySet()) {

   String domainClassPropertyName = domainClassProperty.getName();
   System.out.println("Processing property: " + domainClassPropertyName);

   PropertyDescriptor dtoPropertyDescriptor = domainToDTOPropertyMap.get(domainClassProperty);
   Method domainClassReadMethod = domainClassProperty.getReadMethod();
   String dtoWriteMethodName = dtoPropertyDescriptor.getWriteMethod().getName();
   final String getProperty = domainClassParameterName + "." + domainClassReadMethod.getName() + "()";

   toDTOMethodBuilder.addStatement("dto." + dtoWriteMethodName + "(" + getProperty + ")");
In the more general case, you need to map between different types for the properties, so you will end up with a series of if statements checking the types:
   if (Some.class.equals(domainClassProperty.getPropertyType()) && 
           Other.class.equals(dtoPropertyDescriptor.getPropertyType()) {
        // write code to convert from Some.class to Other.class
   // if you have properties that might a subclass, or implementation of an interface, use "isAssignableFrom"
   else if (Some2.class.isAssignableFrom(domainClassProperty.getPropertyType()) &&
           Other2.class.isAssignableFrom(dtoPropertyDescriptor.getPropertyType()) {
       // write code to convert from Some2 class (or subclass) to Other2.class (or subclass)
   else {
       toDTOMethodBuilder.addStatement("dto." + dtoWriteMethodName + "(" + getProperty + ")");
Having done all the properties, we should report on any properties that couldn’t be mapped, so a developer can check these manually:
for (String property : missingProperties) {
    // in early versions of JavaPoet, use addStatement. In later versions, use addComment
    toDTOMethodBuilder.addStatement("// TODO deal with property: " + property);
Now we simply add the return statement, and call build() on the builder to generate the method spec:
toDTOMethodBuilder.addStatement("return dto");
To put the conversion method into a Java class and write it out, we do the following:
TypeSpec converterClass = TypeSpec.classBuilder(converterClassName)

JavaFile javaFile = JavaFile.builder(converterPackage, converterClass).indent("    ").build();
javaFile.writeTo(new File("/path/to/chosen/directory));
All done! Now a developer can simply run this code to generate the conversion code, whenever they need to convert from a domain object to DTO. If desired, you can write similar code to generate an accompanying unit test.

For more examples of JavaPoet syntax, check out the readme on the github project:

Posted in Java, Uncategorized | Tagged | Leave a comment

Writing a custom spliterator in Java 8

In this article I’m going to give two examples of writing a custom spliterator in Java 8. What is a spliterator and why would you need to write your own? Well, a spliterator is used by the Java streams code when you call stream() on a collection or other object. The two most important methods in the Spliterator interface are as follows:
  • boolean tryAdvance(Consumer action);
  • Spliterator trySplit();
Any custom spliterator must implement tryAdvance. It is this method which is invoked to get each element of a stream to process. The trySplit method only needs to be implemented if you are going to create a parallel stream. It is invoked to split the stream into sections which can be safely processed in parallel.

Now let’s run through a few examples. I’m going to include all of the code snippets inline, but if you want a working example, all of the source code is available on my github:

To begin with, let’s consider a case where you don’t need a parallel stream, but you do have a custom class for which you want to write a spliterator. Suppose I work on an application that processes html data, and I decide that to get test data for my application, I could just scrap random pages off the web. I could use a library like jsoup to get pages, and for each page, put any links on a list, so that if I need to get another page, I can just retrieve the next link. A simple implementation could look like this:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;

import java.util.LinkedList;
import java.util.Queue;

public class WebPageProvider {

    private Queue<String> urls = new LinkedList<String>();

    public WebPageProvider() {

    public Document getPage() {
        org.jsoup.nodes.Document doc = null;

        while (doc == null) {
            String nextPageURL = urls.remove();
            System.out.println("Next page: " + nextPageURL);
            try {
                doc = Jsoup.connect(nextPageURL).get();
            } catch (IOException e) {
                // we'll try the next one on our list

        // get links and put on our queue
        Elements links ="a[href]");
        for (Element link : links) {
            String newURL = link.attr("abs:href");
            // System.out.println(newURL);
        return new Document(doc);

Now, what I’d really like to be able to do is to use all of the useful methods in streams to be able to provide different sorts of test data. For example, suppose I just wanted images, I could map each web page to get the list of images on the page, then call flatMap to flatten the stream of List objects back to a stream of Image objects, like this: new WebPageSpliterator(new WebPageProvider()), false)

Or perhaps filter to only include documents with five or more images: WebPageSpliterator(new WebPageProvider()), false)
                                                        .filter(doc -> doc.getImages().size() >= 5)

Seems useful, so how do we implement the spliterator? Well, it’s pretty trivial:

import java.util.Spliterator;
import java.util.function.Consumer;

public class WebPageSpliterator implements Spliterator<Document> {
    private WebPageProvider webPageProvider;

    public WebPageSpliterator(WebPageProvider webPageProvider) {
        this.webPageProvider = webPageProvider;

    public boolean tryAdvance(Consumer<? super Document> action) {
        return true;

    public Spliterator<Document> trySplit() {
        return null;

    public long estimateSize() {
        return 0;

    public int characteristics() {
        return 0;

You can see that all we’ve had to do is implement the tryAdvance method. Since the backing provider can provide an infinite number of web pages (assuming pages keep linking to other pages) there is no complex logic needed inside this method. It simply calls the accept method of the Consumer code passed into it (Consumer is a Java 8 functional interface, allowing callers to pass in a lambda) and then returns true, to signify that more pages can be returned if required.

Now let’s consider a more complex example involving parallel processing. When would you need to write a custom spliterator for parallel processing? Well, one situation is when you have a stream of objects, but the stream has an internal ordering or structure, meaning that a naive split of the stream at a random point might not produce sections that can validly be processed in parallel. In my github repo, I’ve given two separate examples of this type of scenario. In one, you have a character stream, which actually represents a custom record format. i.e. you need to split the stream at the record boundaries. In the other, you have a stream of Payment objects, but really these are grouped into payment batches, and you must split the stream at a payment batch boundary. Let’s look at this example. The payment batch test data is created like this:

    private List<Payment> createSampleData() {
        List<Payment> paymentList = new ArrayList<>();
        for (int i=0; i<1000; i++) {
            paymentList.add(new Payment(10,"A"));
            paymentList.add(new Payment(20,"A"));
            paymentList.add(new Payment(30,"A"));
            // total = 60

            paymentList.add(new Payment(20,"B"));
            paymentList.add(new Payment(30,"B"));
            paymentList.add(new Payment(40,"B"));
            paymentList.add(new Payment(50,"B"));
            paymentList.add(new Payment(60,"B"));
            // total = 200

            paymentList.add(new Payment(30,"C"));
            paymentList.add(new Payment(30,"C"));
            paymentList.add(new Payment(20,"C"));
            // total = 80
        return paymentList;

We want to total each batch. You can see that if you did this in parallel, but didn’t split on the batch boundaries, you would get the wrong totals, because you would count more batches than actually exist. e.g. by splitting the second batch into two. We can verify this, and then implement a custom spliterator and check that with the custom spliterator, the totals are correct. First, let’s create a collector to count up the totals:

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;

public class PaymentBatchTotaller 
    implements Collector<Payment,PaymentBatchTotaller.Accumulator,Map<String,Double>> {

    public class Total {
        public double amount;
        public int numberOfBatches;

    public class Accumulator {
        Map<String,Total> totalsByCategory = new HashMap<>();
        String currentPaymentCategory;

    public Supplier<Accumulator> supplier() {
        return Accumulator::new;

    public BiConsumer<Accumulator,Payment> accumulator() {
        return (accumulator,payment) -> {
            // store this amount
            Total batchTotalForThisCategory = accumulator.totalsByCategory.get(payment.getCategory());
            if (batchTotalForThisCategory == null) {
                batchTotalForThisCategory = new Total();
            batchTotalForThisCategory.amount += payment.getAmount();

            // if this was start of a new batch, increment the counter
            if (!payment.getCategory().equals(accumulator.currentPaymentCategory)) {
                batchTotalForThisCategory.numberOfBatches += 1;
                accumulator.currentPaymentCategory = payment.getCategory();

    public BinaryOperator<Accumulator> combiner() {
        return (accumulator1,accumulator2) -> {
            for (String category : accumulator1.totalsByCategory.keySet()) {
                Total total2 = accumulator2.totalsByCategory.get(category);
                if (total2 == null) {
                } else {
                    Total total1 = accumulator1.totalsByCategory.get(category);
                    total2.amount += total1.amount;
                    total2.numberOfBatches += total1.numberOfBatches;
            return accumulator2;

    public Function<Accumulator, Map<String, Double>> finisher() {
        return (accumulator) -> {
            Map<String,Double> results = new HashMap<>();
            for (Map.Entry<String,Total> entry : accumulator.totalsByCategory.entrySet()) {
                String category = entry.getKey();
                Total total = entry.getValue();
                double averageForBatchInThisCategory = total.amount / total.numberOfBatches;
            return results;

    public Set<Characteristics> characteristics() {
        return Collections.EMPTY_SET;

You can see that this collector keeps totals for each payment batch category, along with the number of batches in that category, then the finisher method divides each total by the number of batches in that category to get the average batch size. (If you aren’t familiar with custom collectors, you might like to read my previous article Yet another Java 8 custom collector example.)

If we run a test with a naive split of the stream, the totals will be wrong:

List<Payment> payments = createSampleData();

// won't work in parallel!
Map<String,Double> averageTotalsPerBatchAndCategory = payments.parallelStream().collect(new PaymentBatchTotaller());

Set<Map.Entry<String,Double>> entrySet = averageTotalsPerBatchAndCategory.entrySet();
for (Map.Entry<String,Double> total : averageTotalsPerBatchAndCategory.entrySet()) {
    if (total.getKey().equals("A")) {
    } else if (total.getKey().equals("B")) {
    } else {

To begin with, our spliterator must keep hold of its backing list, and will need to keep track of its current and end positions in the list:

public class PaymentBatchSpliterator implements Spliterator<Payment> {

    private List<Payment> paymentList;
    private int current;
    private int last;  // inclusive

    public PaymentBatchSpliterator(List<Payment> payments) {
        this.paymentList = payments;
        last = paymentList.size() - 1;

The implementation of tryAdvance is fairly simple. Providing we aren’t at the end of the list yet, we need to call accept on the Consumer code passed in, then increment our current counter and return true:

public boolean tryAdvance(Consumer<? super Payment> action) {
    if (current <= last) {
        return true
    return false;

Now we come to the real logic, the implementation of trySplit. We can implement this by saying: generate a possible split position, half way along the list, then check if it is a boundary between payment batches, if not, move forward until it is. The code looks like this:

    public Spliterator<Payment> trySplit() {
        if ((last - current) < 100) {
            return null;

        // first stab at finding a split position
        int splitPosition = current + (last - current) / 2;
        // if the categories are the same, we can't split here, as we are in the middle of a batch
        String categoryBeforeSplit = paymentList.get(splitPosition-1).getCategory();
        String categoryAfterSplit = paymentList.get(splitPosition).getCategory();

        // keep moving forward until we reach a split between categories
        while (categoryBeforeSplit.equals(categoryAfterSplit)) {
            categoryBeforeSplit = categoryAfterSplit;
            categoryAfterSplit = paymentList.get(splitPosition).getCategory();

        // safe to create a new spliterator
        PaymentBatchSpliterator secondHalf = new PaymentBatchSpliterator(paymentList,splitPosition,last);
        // reset our own last value
        last = splitPosition - 1;

        return secondHalf;

Finally there is one little detail not to be missed. We must implement the estimateSize() method. Why? Well, this is called internally by the stream code to check if it needs to do any more splitting – if you don’t implement it, your stream will never be split! The implementation is trivial:

    public long estimateSize() {
        return last - current;

Finally we can test this by using the spliterator in our test code when we count the totals:

        Map<String,Double> averageTotalsPerBatchAndCategory =
       PaymentBatchSpliterator(payments),true).collect(new PaymentBatchTotaller());

This will generate the correct totals. If you want to look at the character stream example, please check the github repo. You might also be interested in some of my other blog posts on Java 8: Streams tutorial Using Optional in Java 8

Posted in Java, Uncategorized | Leave a comment

Using Optional in Java 8

“The introduction of null references was my billion dollar mistake” – Tony Hoare Optional is a (typed) container object. It may contain a single object, or it may be empty. It allows you to avoid null pointer exceptions. In this article I’m going to work through a number of examples of how to use Optional. I’ll include code snippets, but all the source code is available on my github at: Let’s get started. My examples use the domain of the insurance industry, since that’s the industry I work in. Suppose we have a service that allows you to find an insurance claim based on its id. Prior to Java 8, the method signature of this would be as follows:
public Claim find(Long id)
What’s wrong with this? Well, you don’t know if it could ever return null. Will it? Or will it return a default value? If you want to use any fields of the returned object, you are forced to insert null checks, like this:
Claim claim = claimService.find(id);
if (claim != null) {
  productType = claim.getProductType();
If you forget the null check, you may get the dreaded NullPointerException. The purpose of Optional is to allow your method signature to tell the caller that the method may not return an object, and make it easier to avoid null pointers. With an Optional, the method call looks like this:
Optional<Claim> optionalClaim = claimService.findById(15l);
The “functional” way to interact with an Optional is not to directly unbox it, but rather to invoke one of the functional methods. e.g.
optionalClaim.ifPresent(claim -> System.out.println("Found claim. Id: " + claim.getId()));
Now, the clever thing is that if we want to use any fields of the returned object, we no longer need to write an explicit null check. Instead, the Optional class has a method called “map”. The contract for map says that you pass it two things, an Optional, and a lambda or method reference that takes a parameter of type T, and returns something of type U. It then does the following:
  • If the Optional is empty, just returns an empty Optional.
  • If the Optional has an object inside, invokes the function you have passed it on that object, and wraps the return result in an Optional.
This means that if we want to extract the productType of the claim, as before, we can now write the following:
Optional<Claim.PRODUCT_TYPE> optionalProductType =
Much better! Let’s look at some more variations. Firstly, if you want to provide a default value, you can chain another call to orElse on the end:
Claim.PRODUCT_TYPE myProductType =
You can even call a supplier function to return the default value if needed:
Claim.PRODUCT_TYPE myProductType2 =
Now, suppose you want to call map with a function, but that function already wraps its response in an Optional. Imagine we want pass our Optional Claim to the following:
public Optional<AuditLog> findAuditLog(Claim claim)
What’s the problem here? Well, remember what the contract of map is. If you give it an Optional with something inside, it passes that to the method you’ve given it, AND THEN WRAPS THE RETURNED OBJECT IN AN OPTIONAL. Yikes! The findAuditLog method returns an Optional (that may or may not have an AuditLog object) but then map would wrap this in a second Optional! We don’t want this, so what is the solution? The answer is that Optional has another method called flatMap. flatMap does not wrap the returned value in an Optional, so we can now write the following:
Optional<AuditLog> auditLogOptional = 
Optional also has a filter method. Again, it is null safe, so you can safely invoke it on an Optional that might be empty, like this:
Optional<Claim> optionalMotorClaim = 
                .filter(claim -> Claim.PRODUCT_TYPE.MOTOR.equals(claim.getProductType()));
If you really do need to get the value out of an Optional, you can do so, as follows:
if (optionalClaim.isPresent()) {
            Claim myClaim = optionalClaim.get();
            // do stuff with claim
Note that you should ALWAYS call isPresent() prior to calling get(), as get() will throw an exception if you invoke it on an empty Optional. Most of the time, calling ifPresent and passing a lambda will be sufficient for processing your Optional, but extracting the value will be necessary if you need to do stuff that isn’t allowed inside a lambda, such as throwing an exception. Finally, a side note about one limitation of Optional and Stream in Java 8. At the moment it is a bit convoluted to map a Stream> to extract the values. You have to do the following:
Stream<Claim> claimsLoadedById =
In Java 9, this has been simplified to:
Stream<Claim> claimsLoadedById =


In this article I’ve introduced Optional and given a number of examples of how to use it. To make effective use of Optional you should:
  • Use it as the return type for methods that can validly not return an object
  • Chain calls to map, flatMap and filter on a returned Optional to avoid nested null pointer checks
This article is part of a series on Java 8. You might be interested in the other articles: Java 8 Streams Tutorial Yet another Java 8 custom collector example I also recommend the book Java 8 In Action.
Posted in Java | Tagged | Leave a comment

Maven offline build fails to resolve artifacts in your local repository

Recently I’ve been trying to set up a new machine with a maven build that can work offline. My first instinct was to do the following:

  1. Configure maven with a ~/.m2/settings.xml file with our set of Nexus repos (we use six or seven locally hosted Nexus repos)
  2. Run an online build to cache all the artifacts in the local maven repo
  3. Delete the ~/.m2/settings.xml file with the repo definitions in
  4. Run an offline build with -o and confirm it works

Much to my surprise, this process failed with a bunch of errors like the following:

[ERROR] Plugin org.apache.maven.plugins:maven-resources-plugin:2.7 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-resources-plugin:jar:2.7: The repository system is offline but the artifact org.apache.maven.plugins:maven-resources-plugin:pom:2.7 is not available in the local repository.

I couldn’t really see what was going on here. The missing artifacts were all definitely in the local repo. I ended up downloading the Maven source and debugging into it. The problem is that when Maven downloads a file from a remote repo, it stores a file called _maven.repositories along with the artifact in the local cache, that says where it was obtained from. The file format is like this:

#NOTE: This is an internal implementation file, its format can be changed without prior notice.
#Tue Jun 23 14:39:00 BST 2015

When trying to resolve an artifact, if the artifact is found locally, maven then attempts to determine if it is a locally installed artifact, or something cached from a remote download. The problem I was seeing is that if it finds a _maven.repositories file with the name of a repo that is not in your settings.xml, it throws an exception! To me, either Maven should permit this artifact to be used, or if the maven developers really don’t want that to happen, the wording of the exception should make clear what is actually going on. e.g. “I found file XYZ.jar in the local repo, but the _maven.repositories file tells me it was downloaded from a repo called MyRepo which isn’t configured for the current build, therefore I’m not using it”.

For now, if you want your offline build to work, you have two options:

  1. Download your proprietary jars from your Nexus repo like I did, but don’t delete your settings.xml
  2. Install your proprietary jars manually, so there is no _maven.repositories file to confuse maven
Posted in Maven | Tagged | 1 Comment

Java 8 Streams Tutorial

In this tutorial, I’m going to start by explaining some of the basics of streams. Viz:
  • What streams are
  • Terminal and non-terminal operations
  • Their “lazy” nature
  • Their read-once nature
  • Why they were introduced i.e. how they enable easy parallel operations
Then I’m going to work through examples of four key stream operations:
  • Filter
  • Map
  • Flatmap
  • Collect
I’m going to include plenty of code snippets, but note that you can get all the source over on my github:

Introduction to streams

To obtain a stream, you call the new stream() method that has been added to the Collection interface.

Stream operations can be divided into two types:

Intermediate operations, that return a stream:

  • filter
  • skip
  • limit
  • map
  • flatMap
  • distinct
  • sorted

Terminal operations, that return some kind of result

  • anyMatch – boolean
  • noneMatch – boolean
  • allMatch – boolean
  • findAny – Optional
  • findFirst – Optional
  • forEach – void, e.g. print
  • collect
  • reduce

The idea behind streams is that you can build up a pipeline of operations by calling multiple intermediate operations, and then finally a terminal operation to obtain a result.

It’s important to note that streams have two important differences compared to collections: Firstly, unlike a collection, which is essentially a set of data in memory, stream elements are only produced one at a time, as you iterate over the stream. This is referred to as the “lazy” nature of streams. Imagine you have a large dataset of a million elements in memory and you create a stream backed by this dataset. If every time you called an intermediate operation, the entire dataset was iterated, this would be hugely inefficient. Rather, you can think of the intermediate operations as recording that an operation needs to be performed, but deferring the actual execution of that operation until you call a terminal method. At this point, the stream is iterated, and each intermediate operation is evaluated. Secondly, you can only read from a stream once. This differs from e.g. Scala, in which you can read a stream as many times as you like. There is a great stackoverflow answer from one of the stream API designers that explains why this design choice was taken, but it is a bit of a monster, so I’ll summarise it:

  • You can use streams for things other than collections, that genuinely are read once. e.g. read a file with BufferedReader, which has a lines() method returning Stream.
  • Allowing two types of stream, one lazy and the other not, creates its own problems. e.g.
    • In Scala you can have bugs if your code attempts to read a stream twice when in fact it has been passed a once-off stream implementation.
    • Collection classes optimise some operations by storing / caching data. e.g. calling size() on a collection returns a cached size value. Calling size() on a filtered collection would take O(n) time, as it would have to apply the filter to the collection.
    • If you pass round a lazy stream and use it multiple times, each time you operate on it, the entire set of operations need to be evaluated.
There is a link to the answer at the bottom of this article if you want to read it.

Why were streams introduced?

To me, the advantages of streams can be summed up as three points:
  1. Using functional style methods is clearer. i.e. if you use a filter method, someone can see at a glance what you are doing
  2. Because streams are lazy, you can pass streams around between methods, classes or code modules, and apply operations to the stream, without anything being evaluated until you need it to be.
  3. Streams processing can be done in parallel, by calling the parallelStream method. i.e. because you aren’t manually iterating over a collection and performing arbitrary actions, but instead calling well defined methods on the stream such as map and filter, the stream classes know how to split themselves up into separate streams to be processed on different processors, and then recombine the results.
The third reason is really the driver. With the advent of “big data”, the ability to perform data processing operations on massive data sets is hugely important, and to do this efficiently, you will want your code to be able to make use of multiple cores / processors. Streams provide a way of doing that which means you don’t have to write complex code to split your input up, send it to multiple places, wait for the results and then recombine them. The stream implementation handles this for you. However, this article is meant as an introduction to streams, so I don’t want to go into too much detail as to how this works. Rather, let’s start looking at some actual stream operations.

Stream operations

To give my code examples, I’m going to use examples from two domains:
  1. Insurance – this is the domain I work in. Here we have insurance claims, which could be of different types (e.g. motor, household), have jobs attached to the them (e.g. motor repair, solicitor, loss adjuster) and payments made.
  2. Restaurant menu – this is what Java 8 In Action use for their examples.


I think filter is a great operation to start with, it’s a very common thing to do and a nice intro to stream syntax. In my code examples, if you open the StreamExamples class and find the filter method, you can see the syntax for filtering a collection of claims to motor claims only:
Stream<Claim> motorClaims = -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR));
The filter method takes a lambda expression, which accepts an object of the type used in your stream, in this case a Claim object, and returns a boolean. Any element for which this check is true is included in the filtered stream and elements for which the check returns false are excluded. In this case, we simply check if the type of the claim is MOTOR. As this is an intermediate operation, the return type is Stream. As explained above, at this point, the filter hasn’t actually been evaluated. It will only be evaluated when a terminal operation is added. Before we do that, let’s look at a couple more simple examples of filter. We could filter on payments over 1000:
Stream<Claim> paymentsOver1000 = -> claim.getTotalPayments() > 1000);
Or claims with 2 or more jobs:
Stream<Claim> twoOrMore = -> claim.getJobs().size() >= 2);


The map operation means “map” in the mathematical sense – that of mapping one value to another. Suppose we have a stream of Claim objects, but what we need is claim ids? Just map the stream like this:
Stream<Long> claimIds = -> claim.getId());
As you can see, the map operation takes a lambda expression that accepts an object of the type used in your stream, in this case a Claim, and converts it to another type. In fact, in this example, you don’t even need to write the full lambda expression, you can use a method reference:
Stream<Long> claimIds2 =;
Now that we have seen two different intermediate operations, let’s look at how to build a pipeline by applying the operations one after another. If we want to get the ids of all motor claims, we can write the following:
Stream<Long> motorClaimIds =
                .filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR))
I’d recommend writing your pipelines with each operation on a separate line like this. Not only does it make the code more readable, but if there is a fatal exception during your stream processing, the line number will take you straight to the failing operation.

Note that you don’t just have to “extract” values during a map operation, you can also create new objects. For example, you might convert from domain objects to DTOs, like this:

Stream<ClaimDTO> claimDTOs = -> new ClaimDTO(claim.getId(), claim.getTotalPayments()));


Suppose you want to get a stream or collection of all of the jobs attached to a list of claims. You might start with a map operation, like this:
However, there is a problem here. Calling getJobs() on a claim returns a Set of Job objects. So we now have a stream composed of Sets, whereas we want a stream of Job objects. This is where flatMap comes in. It takes a stream composed of Sets or another collection type, and “collapses” it down to a stream of the objects in the collections. Hence, to get a stream of all the jobs, we write:
Stream<Job> jobs =;
Again, we can pipeline a number of operations here, for example by filtering the stream before mapping the values. Taking an example from the food / menu domain, here’s how to get side orders available for dishes with over 750 calories:
Stream<SideOrder> sideOrdersOver750 = -> dish.getCalories() > 750).map(Dish::getSideOrders).flatMap(Set::stream);


The three operations we have covered so far are all intermediate operations. They operate on a stream and return a stream. When you want to convert your stream back into a collection, you will want to call the collect method. There a large number of variations as to how you collect, and this choice can be a bit bewildering at first, so I want to show a good number of examples here to help you get familiar with what is available to you.

Firstly, let’s start with the simplest possible collect operations, to a set, list or map. Here is what you could do if you want a stream of motor claims collected to one of these types:

Set<Claim> motorClaimSet =
                                    filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR)).

List<Claim> motorClaimList =
                                    filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR)).
// to a map (grouping by unique key)
Map<Long,Claim> motorClaimMap =
                                        filter(claim -> claim.getProductType().equals(Claim.PRODUCT_TYPE.MOTOR)).
                                        collect(Collectors.toMap(Claim::getId, Function.<Claim>identity()));
In the map example, the key of claim id is unique. What happens if you map by a non-unique key? The answer is that your map values won’t be individual objects, but rather lists of the objects that share that non-unique key. For example:
Map<Claim.PRODUCT_TYPE,List<Claim>> claimsByType =;
You can see here that we are using the groupingBy method. Grouping can be multi-level however. Not only that, but the grouping keys don’t have to be attributes of the objects, you can dynamically create the key values as part of the grouping. Consider grouping by product type, and then by claims of £1000 or less:
Map<Claim.PRODUCT_TYPE,Map<String,List<Claim>>> claimsByTypeAndPayment =
      groupingBy(claim -> {
         if (claim.getTotalPayments() > 1000) {
              return "HIGH";
         else {
              return "LOW";
Note that the result of your grouping doesn’t have to be the objects in your stream. You may want to extract a value from them. In the menu domain, suppose I want to group side orders by type, and get a list of the calories for each of the side orders in each type. In this case you will want to operate on a stream of SideOrder objects, but use the two parameter groupingBy method to specify to extract the calorie value, rather than collecting the SideOrder objects themselves:
Map<SideOrder.Type,List<Integer>> sideOrderCalories =
    .collect(groupingBy(SideOrder::getType, mapping(SideOrder::getCalories, toList())));
Sometimes you want to want to group into only two groups. Because this is a common operation, it has a special convenience method called partition:
Map<Boolean,List<Dish>> veggieAndNonVeggie =;
Sometimes you want to sum or average numerical values from your stream:
int totalCalories =;
double totalPayments =;
double averagePayment =;
The above syntax is fine if you are only obtaining one value. However, if you want both a sum and an average say, you shouldn’t evaluate each one separately – this will iterate over the stream multiple times. Instead, you should use a summing collector:
DoubleSummaryStatistics paymentStats =;
totalPayments = paymentStats.getSum();
averagePayment = paymentStats.getAverage();
My final example is something that has been missing from Java for a while. How often have you needed to concatenate a collection of strings, only to have to resort to using Apache Commons to do it! No more. Now you can use the joining() collector:
String claimIdListAsCommaSeparatedString = -> claim.getId().toString()).collect(joining(","));
Note that if you don’t specify a separator, the default is that none will be used.


I hope this has been a useful introduction to streams and how to use them. We’ve covered what streams are, their lazy nature, their once-off nature and why they enable easier parallel processing. Then we have looked at most common stream operations: filter, map, flatMap and collect. For collecting, if you want to know how to write your own custom collector, see my example:
Yet another Java 8 custom collector example If you are interested in the background to the stream API design choices, see: Why are Java streams once off? Why doesn’t java.util.Collection implement the new stream interface? Finally, for more details on both Java 8 in general, and functional programming, I’d strongly recommend Java 8 In Action.
Posted in Java | Tagged , | Leave a comment

Yet another Java 8 custom collector example

Java 8 introduces a number of functional programming techniques to the language. Collections can be turned into streams which allows you to perform standard functional operations on them, such as filtering, mapping, reducing and collecting. In this post I’m going to give a quick example of writing a custom collector. Firstly, what is collecting? I would probably define it as “taking a collection / stream and forming it into a particular collection / data structure”. Java 8 has numerous helper methods and classes for standard collection operations, and you should use them when they apply. e.g. you can use a groupingBy collector to process a stream and group the elements by a property, producing a map, keyed off of that property, in which each value is a list of elements with that property. However, for more complex collecting, you will need to write your own collector. There are five methods in a collector:
  • supplier – returns a function, that takes no arguments, and returns an empty instance of the collection class you want to put your collected elements into. e.g. if you are ultimately collecting your elements into a set, the supplier function will return an empty set.
  • accumulator – returns a function that takes two arguments, the first is the collection that you are building up, the second is the element being processed. The accumulator function processes each element into the target collection.
  • finisher – returns a function that allows you to perform a final transformation on your collection, if required. In many cases, you won’t need to transform the collection any further, so you will just use an identity function here.
  • combiner – only required for parallel processing of your stream. If you envisage running this operation across multiple processors / cores, then your combiner contains the logic to combine results from each parallel operation.
  • characteristics – allows you to specify the characteristics of the collector so that that it can be invoked safely and optimally. e.g. specifying Characteristics.IDENTITY_FINISH lets Java know that because you aren’t performing a final transformation, it doesn’t even need to invoke your finisher function.
Okay, let’s do a trivial example. I work in insurance, so I’ll create an example in this domain. Suppose I have a stream of insurance claims. The claims may be of different types, such as motor, household etc. I want to produce a map with one example claim in, for each of a list of specified claim types. This needs a supplier function that gives me an empty map to start with, and an accumulator function that simply gets the claim type, and if the map doesn’t already contain an example claim of this type, adds it in. The finisher can be the identity function. This is what it looks like:
import java.util.*;
import java.util.function.BiConsumer;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.function.Supplier;

public class ClaimProductTypeCollector<T extends Claim> implements Collector<T,Map,Map> {

    private Set<Claim.PRODUCT_TYPE> requiredTypes = new HashSet<>();

    public Set<Claim.PRODUCT_TYPE> getRequiredTypes() {
        return requiredTypes;

    public Supplier<Map> supplier() {
        return () -> new HashMap<>();

    public BiConsumer<Map,T> accumulator() {
        return (map,claim) -> {
            if (map.get(claim.getProductType()) == null) {

    public Function<Map, Map> finisher() {
        return Function.identity();

    public BinaryOperator<Map> combiner() {
        return null;

    public Set<Characteristics> characteristics() {
        return Collections.singleton(Characteristics.IDENTITY_FINISH);
If you want to type this in and get it working as an example, here is what the claim class looks like:
public class Claim {


    private PRODUCT_TYPE productType;

    public Claim(PRODUCT_TYPE productType) {
        this.productType = productType;

    public PRODUCT_TYPE getProductType() {
        return productType;

    public void setProductType(PRODUCT_TYPE productType) {
        this.productType = productType;

Then you can test it with:
Set<Claim> claims = new HashSet<>();
claims.add(new Claim(Claim.PRODUCT_TYPE.MOTOR));
claims.add(new Claim(Claim.PRODUCT_TYPE.MOTOR));
claims.add(new Claim(Claim.PRODUCT_TYPE.MOTOR));

claims.add(new Claim(Claim.PRODUCT_TYPE.HOUSEHOLD);
claims.add(new Claim(Claim.PRODUCT_TYPE.HOUSEHOLD);

ClaimProductTypeCollector<Claim> claimProductTypeCollector = new ClaimProductTypeCollector();
Map oneClaimPerProductType =;
For more info on Java 8, I strongly recommend the book “Java 8 In Action”. You can get the eBook directly from the publishers Manning:
Posted in Java | Tagged | Leave a comment