Gradle Release Plugin

Gradle has a release plugin that mimics the Maven release plugin behaviour. i.e. you specify a snapshot version in your build and the plugin can update the version to a released version, commit and push that.

The plugin is not bundled / part of Gradle core, but available here:

https://github.com/researchgate/gradle-release

Configuration

To use it, set a snapshot version in your gradle.properties file, like:

version=1.1-SNAPSHOT

You build.gradle file config should be as follows:


plugins {
    id 'java'
    id 'net.researchgate.release' version '2.8.1'
    id 'maven-publish'
}

group 'com.something'

repositories {
    mavenLocal()
    mavenCentral()
    maven {
        url = uri('http://your-maven-repo:8080/repository/your-repo-url/')
    }
}

java {
    withSourcesJar()
}

publishing {
    publications {
        maven(MavenPublication) {
            from components.java
        }
    }

    repositories {
        // define our repo for publishing.
        // Note that you do not need to include maven local here to enable publishing of snapshots
        // to the local repo
        maven {
            name = "myRepo"
            def releasesRepoUrl = "http://myRepoURL:8080/repository/releases"
            def snapshotsRepoUrl = "http://myRepoURL:8080/repository/snapshots"
            url = version.endsWith('SNAPSHOT') ? snapshotsRepoUrl : releasesRepoUrl
            allowInsecureProtocol = true
            // credentials are not stored in the project, put them in gradle.properties on your build server
            credentials(PasswordCredentials)
        }
    }
}

// publish every build to local maven, to enable local testing
publishToMavenLocal.dependsOn(check)
build.dependsOn(publishToMavenLocal)
// publish release builds to remote repo
afterReleaseBuild.dependsOn publish

Diasabling git repo caching on Bamboo

If you use the Bamboo CI server, you can get this problem. By default, Bamboo will create a cached version of a git repository on a build agent. This seems to cause a release to fail as the cache appears to be created as a non-bare git repo. (Why!?) You will get an error like this:
build   12-Jul-2023 21:25:01    > Task :CPA-GJAP-JOB1:preTagCommit FAILED
build   12-Jul-2023 21:25:01    Running [git, push, --porcelain, origin, master] produced an error: [remote: error: refusing to update checked out branch: refs/heads/master       
build   12-Jul-2023 21:25:01    remote: error: By default, updating the current branch in a non-bare repository       
build   12-Jul-2023 21:25:01    remote: is denied, because it will make the index and work tree inconsistent       
build   12-Jul-2023 21:25:01    remote: with what you pushed, and will require 'git reset --hard' to match       
build   12-Jul-2023 21:25:01    remote: the work tree to HEAD.       
build   12-Jul-2023 21:25:01    remote:
build   12-Jul-2023 21:25:01    remote: You can set the 'receive.denyCurrentBranch' configuration variable       
build   12-Jul-2023 21:25:01    remote: to 'ignore' or 'warn' in the remote repository to allow pushing into       
build   12-Jul-2023 21:25:01    remote: its current branch; however, this is not recommended unless you       
build   12-Jul-2023 21:25:01    remote: arranged to update its work tree to match what you pushed in some       
build   12-Jul-2023 21:25:01    remote: other way.       
build   12-Jul-2023 21:25:01    remote:
build   12-Jul-2023 21:25:01    remote: To squelch this message and still keep the default behaviour, set       
build   12-Jul-2023 21:25:01    remote: 'receive.denyCurrentBranch' configuration variable to 'refuse'.       
build   12-Jul-2023 21:25:01    error: failed to push some refs to 'file:///opt/atlassian/bamboo_home/local-working-dir/_git-repositories-cache/75a29889fc2f629c383438079d8c939799ad3383']

You need to go into the Bamboo admin settings for that repository (not the build plan) and deselect “Enable repository caching on agents”.

Avoiding an infinite build loop with Bamboo

A second gotcha when using Bamboo! Bamboo is hard coded to understand and ignore Maven release plugin commits. This is not the case with the Gradle release plugin! Hence without additional configuration you will get an infinite loop. You can fix this by configuring the repository settings to ignore Gradle release plugin commits.

https://stackoverflow.com/questions/48692732/gradle-release-plugin-release-task-on-bamboo-cause-infinite-loop

You need to update the Bamboo admin settings for the repository, not to trigger builds when the commit message matches:

.*?Gradle Release Plugin.*?

initScmAdapter FAILED :Current Git branch is master not main

From v3 of the plugin onwards, the default git branch has changed from master to main. If you are still using master, you can configure this:
release {
    git {
        requireBranch.set('master')
    }
}
For other Gradle posts, see:
Dependencies and Configurations in Gradle
Using test fixtures in Gradle and Maven
Code coverage with Gradle and Jacoco
Posted in Gradle, Java | Tagged , | Leave a comment

Dependencies and Configurations in Gradle

What is a Gradle configuration?

In Maven dependencies can be assigned to a given scope:
  • compile – available on all classpaths and propagated to dependent projects
  • provided – will be provided by your container
  • runtime
  • test
Gradle has a much broader concept called a configuration.

https://docs.gradle.org/current/userguide/declaring_dependencies.html

  • Unlike in Maven, in Gradle it is easy to declare your own configurations.
  • A configuration is a name which you can assign one or more files to.
  • Configurations have producers, which assign files to them, and consumers, which use them.

Different configurations are defined in different plugins so if you try to use a configuration and get an error, confirm you have the plugin available in your module. e.g. https://docs.gradle.org/current/userguide/java_library_plugin.html

Common configurations:

Configuration Description
api Will be on compile classpath for this module, and for all subsequent modules
implementation Will be on compile classpath for this module, and runtime classpath for whole app, but NOT on compile classpath for later modules. This prevents developers from accidentally coding against a library that is used internally in one module.
compileOnly
runtimeOnly
testImplementation
testApi
testFixturesImplementation Used when defining a test fixture. See Using test fixtures in Gradle and Maven
testFixturesApi
testFixtures When consuming a test from another module. See Using test fixtures in Gradle and Maven
annotationProcessor For annotation processing in the compile phase.

Dependency resolution strategy and automatic dependency upgrading

Gradle performs optimistic dependency upgrading. This can cause confusion because you may have a version of a dependency explicitly specified, but if a transitive dependency is a higher version, Gradle will take the higher version. This is not what Maven would do – it would always take the explicitly specified version.

However in Gradle you can override the default dependency resolution strategy. There are multiple ways to do this:

Global fail if any conflict

configurations.all {
    resolutionStrategy {
        failOnVersionConflict()
    }
}

Custom resolution

e.g. fix version
configurations.all {
    resolutionStrategy.eachDependency { DependencyResolveDetails details ->
        if (details.requested.group == 'io.projectreactor' && details.requested.name == 'reactor-core' && details.requested.version == '3.3.0.M1') {
            details.useVersion '3.2.10.RELEASE'
            details.because 'the original version which comes with spring-integration:5.2.0.M2 is no longer available'
        }
See: https://docs.gradle.org/current/userguide/resolution_strategy_tuning.html
Posted in Gradle, Java | Tagged , | Leave a comment

Type conversion and NoTypeConversionAvailableException in Apache Camel

In Camel, type conversion of a message body can be done using the tag:

<convertBodyTo type="com.something.SomeClass"/>

Camel docs on type converter:

https://camel.apache.org/manual/type-converter.html

Type converters are loaded from the classpath, so the imports of your OSGI module will affect what converters are available. Under the covers the way the type converters are loaded is by the DefaultTypeConverter class. You can put a breakpoint in its doInit method to see them being loaded. Or if you want to turn on logging for it, the fully qualified class name is:

org.apache.camel.impl.converter.DefaultTypeConverter

Camel has the class

org.apache.camel.support.SimpleTypeConverter

This is not actually a converter itself, rather you instantiate an instance of this with a class (or lambda) that implements the ConversionMethod interface.

To see logging of the type converters being loaded, add the following to your log file:

log4j.logger.org.apache.camel.impl.converter.BaseTypeConverterRegistry=DEBUG
log4j.logger.org.apache.camel.impl.converter.CoreTypeConverterRegistry=TRACE

When I debugging this, the type converter loader classes I saw were:

org.apache.camel.component.mail.MailConvertersLoader – loads 1 fallback
org.apache.camel.component.cxf.converter.CxfConverterLoader – loads 1 fallback
org.apache.camel.component.cxf.converter.CxfPayloadConverterLoader – loads 1 fallback
org.apache.camel.converter.stream.StreamCacheBulkConverterLoader
org.apache.camel.attachment.AttachmentConverterLoader
org.apache.camel.spring.converter.ResourceConverterLoader
org.apache.camel.converter.jaxb.FallbackTypeConverterLoader – loads 1 fallback
org.apache.camel.converter.jaxp.CamelXmlJaxpBulkConverterLoader
org.apache.camel.component.activemq.converter.ActiveMQConverterLoader
org.apache.camel.component.activemq.converter.ActiveMQMessageConverterLoader
org.apache.camel.component.jackson.converter.JacksonTypeConvertersLoader – loads 1 fallback
org.apache.camel.converter.CamelBaseBulkConverterLoader
org.apache.camel.component.file.GenericFileConverterLoader – loads 1 fallback
org.apache.camel.component.http.HttpEntityConverterLoader
org.apache.camel.http.common.HttpConverterLoader
org.apache.camel.component.jetty.JettyConverterLoader – loads 1 fallback

Converter classes

To understand and debug type conversion problems, it is important to understand the different converters and what conversions they support.
JAXB converters are used for converting between objects and XML:

org.apache.camel.converter.jaxb.FallbackTypeConverter – this converts between objects and XML. For unmarshalling, it accepts any Object type and unmarshals it to a JAXB Java class. If you look at the internal code, you will see the object must be convertible to either InputStream, Reader or Source.

CXF converters for web services:

org.apache.camel.component.cxf.converter.CxfPayloadConverter – outgoing

org.apache.camel.component.cxf.converter.CxfConverter – incoming. Note that this only operates on input of type org.apache.cxf.message.MessageContentsList, which is what is returned from a CXF call. Once this initial conversion has taken place and the message body has been changed to a different type, any further conversions done on the message will not go through this converter.

Common type conversion problems

Problem Solution
Converter class is not available at all Library is missing from POM
Converter works in integration tests, fails when module is run in Karaf Import missing from POM – any library jars provided by Karaf will be imported, not packaged into the module.
JAXB conversion failing Multiple possible reasons:

  1. Input type is not convertible. May need to convert to String before doing real conversion.
  2. Java class has not been generated at all. Check pom setup for the CXF code gen or maven JAXB plugin.
  3. Java class is not annotated as a root class. This can be fixed with a custom JAXB binding.
Posted in Camel, Java | Tagged , | Leave a comment

Using test fixtures in Gradle and Maven

In Maven if you want to reuse text fixtures from one module in another module, you use the jar plugin and build a test jar. You have to specify what classes and other resources to include. Gradle has a dedicated plugin to handle this. Apply the plugin:
plugins {
  id ’java-test-fixtures'
}
Create a directory src/testFixtures/java for your classes. You can add these test fixtures to the classpath of another module as follows:
testImplementation(testFixtures(project(':some-module:some-sub-module')))
For more info, see the Gradle docs on java testing: https://docs.gradle.org/current/userguide/java_testing.html

If you still need to support the Maven build at the same time, you must make two changes, to make both resources and java classes available to your test jar build. Resources can be specified in the top level of your build config:

<build>
    <testResources>
        <testResource>
            <directory>src/test/resources</directory>
        </testResource>
        <testResource>
            <directory>src/testFixtures/resources</directory>
        </testResource>
    </testResources>
To support a new test source location, you must use the Maven build helper plugin:
<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>build-helper-maven-plugin</artifactId>
    <executions>
        <execution>
            <id>add-gradle-testfixtures</id>
            <phase>generate-test-sources</phase>
            <goals>
                <goal>add-test-source</goal>
            </goals>
            <configuration>
                <sources>
                    <source>src/testFixtures/java</source>
                </sources>
            </configuration>
        </execution>
    </executions>
</plugin>
Posted in Gradle, Java | Tagged , | Leave a comment

Memory leak with Spring integration test context caching

I’ve recently had to debug another out of memory with our Spring integration tests. The tests are configured to create a heapdump on out of memory so I loaded it up with the Eclipse Memory Analyser and this is what I saw:

This shows that there are six instances of the Hibernate session factory, taking 200Mb each. For our tests, there should be two. Spring caches the test context, and we have two different contexts, so there should be once instance of each. The above output suggests new contexts were being created for some tests, but why? I looked through the log output and then opened the tests that caused a new Spring context to be created, and discovered they were configured like this:

@ContextConfiguration(locations = {"classpath:main-test-context.xml", "classpath:another-test-context.xml"})
@Rollback
@Test(groups = "integration")
public class SomeIntegrationTest {

This test uses the main test context, but adds some more beans from a second context file. The problem is that because it is using a single @ContextConfiguration annotation, this is treated as a new context, so the entire thing is created from scratch. To achieve reuse of the existing context, you need to define a context hierarchy, like this:

@ContextHierarchy({
    @ContextConfiguration("classpath:main-test-context.xml"), 
    @ContextConfiguration"classpath:another-test-context.xml")
})
@Rollback
@Test(groups = "integration")
public class SomeIntegrationTest {

Note that this is different however – you are no longer getting a context which is merge of the two sets of beans, but rather a parent and child context. The consquences being:

  1. If you have a Spring property bean in the parent context, beans in the child context will not get injected with those properties. You will need to add another property bean in the child context and point it at the relevant property files.
  2. If you override beans in the child context and they are intended to be injected into beans defined in the parent context, this will no longer work. I fixed this by updating the relevant tests to manually set the overridden beans into the relevant classes in an @BeforeClass method, then reset them back in an @AfterClass method.

We also had one test defined using a wildcard asterisk in the context name, like this:

@ContextConfiguration(locations = {"classpath*:main-test-context.xml"})
@Rollback
@Test(groups = "integration")
public class SomeIntegrationTest {

This was also treated as a new context!

After making these changes, I was able to verify that only two session factories were created, so the contexts were being retrieved from the cache correctly.

Posted in Java, Spring, Testing | Tagged , , | Leave a comment

Enforcing Spring role based security with a custom PMD rule

Spring makes it easy to have role based security with its @PreAuthorize annotation. But how do you make sure that developers remember to add a security annotation? I like using PMD to do static analysis checks on our code and one of the great things about static analysis tools like PMD is that as well as using the built in rules, you can write your own rules tailored to your software. In this case, I’d like a rule that does the following:
  • Checks if a Java class is known to be an http endpoint.
  • If so, checks if it is on a whitelist of non-secure endpoints.
  • If not, creates a PMD rule violation if there is no @PreAuthorize on the appropriate method.
In our case, we use GWT (Google Web Toolkit) for most of our UI. The requests actually go to a single endpoint, but we use the command pattern to send the command to an appropriate handler. Thus it is the execute method on the handler that needs to be secured. I came up with the following code:
import net.sourceforge.pmd.RuleContext;
import net.sourceforge.pmd.lang.java.ast.ASTClassOrInterfaceDeclaration;
import net.sourceforge.pmd.lang.java.ast.ASTMethodDeclaration;
import net.sourceforge.pmd.lang.java.ast.ASTPackageDeclaration;
import net.sourceforge.pmd.lang.java.rule.AbstractJavaRule;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashSet;
import java.util.Set;

/** Looks for http endpoints which do not have an @PreAuthorize annotation and are not listed in
 * a whitelist file of unsecured endpoints.
 */
public class UnsecuredHttpEndpointRule extends AbstractJavaRule {

    private static final String UNSECURED_HTTP_ENDPOINT_MESSAGE =
            "HTTP endpoint class missing @PreAuthorize annotation with role and not in unsecured endpoints whitelist";

    private static Set<String> UNSECURED_ENDPOINT_WHITELIST = new HashSet<>();
    private String packageName;

    static {
        try (BufferedReader fileReader = new BufferedReader(
                new InputStreamReader(UnsecuredHttpEndpointRule.class.getClassLoader()
                        .getResourceAsStream("unsecured-http-endpoints-whitelist.txt")));) {
            String line = fileReader.readLine();
            while (line != null) {
                if (!(line.startsWith("//") || line.startsWith("#"))) {
                    UNSECURED_ENDPOINT_WHITELIST.add(line);
                }
                line = fileReader.readLine();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    @Override
    public Object visit(ASTPackageDeclaration node, Object data) {
        packageName = node.getName();
        return super.visit(node, data);
    }

    @Override
    public Object visit(ASTClassOrInterfaceDeclaration node, Object data) {
        String className = node.getQualifiedName().getClassSimpleName();
        String fullyQualifiedClassName = packageName != null ? packageName + "." + className : className;
        if (className.endsWith("Handler")
                && !UNSECURED_ENDPOINT_WHITELIST.contains(fullyQualifiedClassName)) {
            return super.visit(node, data);
        }
        // if the class is not an http endpoint, or is on the whitelist, there is no need
        // to carry on down the abstract syntax tree, so just return null
        return null;
    }

    @Override
    public Object visit(ASTMethodDeclaration node, Object data) {
        if (node.getName().equals("execute") && !node.isAnnotationPresent("PreAuthorize")) {
            addViolationWithMessage(data, node, UNSECURED_HTTP_ENDPOINT_MESSAGE);
        }
        return super.visit(node, data);
    }

    @Override
    public void end(RuleContext ctx) {
        packageName = null;
    }


}

The code loads a whitelist of unsecured classes. You can see I’ve done this in a static block as there may be multiple instances of a PMD rule instantiated as PMD could be using multiple threads (see the PMD docs on Java rules at Writing Java Rules).
Posted in Java, PMD, Spring, Static Analysis | Tagged , , , | Leave a comment

Hibernate Best Practices – using sequences for bulk inserts

For many database tables, an auto increment column is used for the primary key. This is fine for inserting single values, but if you want to bulk insert hundreds or thousands of values and you need to refer to the new id columns for inserting other data, this is inefficient as Hibernate has to perform each one at a time to get the generated id. You can avoid this by using a database sequence to generate the ids. Then Hibernate can get a set of, say, 500 ids, and use them in newly created objects and other objects that link to them, and safely bulk insert them. Sample code:
@Entity
@Table(name = "my_entity")
public class MyEntity {
    @Id
    @GeneratedValue(generator = "my_entity_generator")
    @GenericGenerator(
            name = "my_entity_generator",
            strategy = "enhanced-sequence",
            parameters = {
                    @org.hibernate.annotations.Parameter(name="sequence_name", value = "my_entity_seq"),
                    @org.hibernate.annotations.Parameter(name="increment_size", value = "500"),
                    @org.hibernate.annotations.Parameter(name="optimizer", value = "pooled-lo")
            })
    private Long id;
}
Posted in Hibernate, Java, Uncategorized | Tagged , | Leave a comment

Freeing space in SQL Server

When running my local development environment, I frequently need to free up space in SQL Server. How is this done?

Firstly, ff there are databases you don’t need, delete them with:

ALTER DATABASE your_db SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE your_db

Secondly, if you know there are particular tables you don’t need the data for, you can truncate them. The truncate command is much faster than delete, as delete operates row by row, and also writes to the transaction log.

After deleting data, you must shrink the database files to reclaim the space. Start by checking the names of the database and the transaction log:

select file_name(1),file_name(2)

You need to shrink the transaction log first, as the act of shrinking a database actually writes to the transaction log, so could fail if the log is full!

DBCC SHRINKFILE ('your_database_log')

Then the database itself. Check size with:

SELECT DB_NAME() AS DbName,
    name AS FileName,
    type_desc,
    size/128.0 AS CurrentSizeMB, 
    size/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS INT)/128.0 AS FreeSpaceMB
FROM sys.database_files
WHERE type IN (0,1)

Then shrink:

DBCC SHRINKFILE ('your_database')
Posted in SQL Server | Tagged , | Leave a comment

Code coverage with Gradle and Jacoco

Recently I’ve been trying to configure code coverage with Gradle and Jacoco. I started on the official docs page here:

https://docs.gradle.org/current/userguide/jacoco_plugin.html

However, this page wasn’t that helpful. One key requirement I have is that I want to be able to enable and disable Jacoco. For many Gradle plugins you can specify “apply false” when you add them to your Gradle build file, like this:

plugins {
    id 'java-platform'
    id 'maven-publish'
    id 'org.jetbrains.gradle.plugin.idea-ext' version '1.0'
    id 'org.owasp.dependencycheck' version '5.3.2.1' apply false
    id 'com.github.spotbugs' version '4.5.0' apply false
    id 'com.intershop.gradle.jaxb' version '4.4.2' apply false
}

This doesn’t work for the Jacoco plugin! The Jacoco plugin is a “built in” one, which means that as soon as it is on the classpath, it will instrument your class files, even if you don’t specify to run the Jacoco coverage report. Thankfully I found a good blog post here:

https://poetengineer.postach.io/post/how-to-conditionally-enable-disable-jacoco-in-gradle-build

I largely followed those steps to add a separate jacoco.gradle file, with my contents being:

apply plugin: 'jacoco'

jacoco {
    toolVersion = '0.8.6'
}

test {
    jacoco {
        // we have to exclude some classes from byte code instrumentation
        // the standard instrumentation adds a boolean array to a class called $jacocoData
        excludes += ['**/SomeClass', 
                '**/AnotherClass'
        ]
    }
}

Then in the main build.gradle file I enabled this on required sub-projects with:

ext.jacocoEnabled = System.properties.getProperty("jacocoEnabled") ?: "false"

subprojects {
    if (some logic here) {
        // only apply JaCoCo code coverage if the -PjacocoEnabled flag has been set.
        if (jacocoEnabled.toBoolean()) {
            apply from: rootProject.file("jacoco.gradle")
        }

After this I was able to run Jacoco and get a coverage report with:

./gradlew build jacocoTestReport -DjacocoEnabled

You may need to exclude some files from instrumentation. e.g. tests / code that use reflection to inspect objects may not run correctly if the code is instrumented. I found it very useful to be able to rerun a single test in a single sub-module after updating the excludes list in the jacoco.gradle file. You can do that with a command like this:

./gradlew :module-name:sub-module-name:clean :module-name:sub-module-name:test --tests="com.orgname.YourTest" -PjacocoEnabled=true
Posted in Gradle | Tagged | Leave a comment

Java XML processing with JAXB and special characters

When processing XML, it is important to understand that there are two kinds of special characters:
  • Characters that are permitted, but must either be escaped or inside a CDATA block.
  • Control characters that are not permitted at all.

Characters that fall into the first category are:

  • & – can be replaced by &amp;
  • < - can be replaced by &lt;
  • > – can be replaced by &gt;
  • ” – can be replaced by &quot;
  • ‘ – can be replaced by &apos;

If you are creating XML with Java and JAXB, you do not need to do anything to escape these characters, it will be done automatically.

What about control characters that are not allowed at all? The specification is here: https://www.w3.org/TR/REC-xml/#NT-Char

The range of valid characters is defined as:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] 
This means that all of the first 32 ASCII characters are prohibited, except:
  • 9 = horizontal tab
  • xA = 10 = line feed
  • xD = 13 = carriage return

Bizarrely, JAXB will allow you to generate XML even when your input contains control characters. Any control character will hit the default escape mechanism and be converted to its hex representation. e.g. control character 2 becomes:

&#2;

This means you can generate XML with JAXB that you cannot read in with JAXB, as these characters will cause a hard failure!!! How can you deal with this? If you Google this problem, you will find a number of posts about writing your own escape handler. For example: https://stackoverflow.com/questions/4435934/handling-xml-escape-characters-e-g-quotes-using-jaxb-marshaller

I investigated this and found two important points:

Firstly, I found two implementations of JAXB on my classpath, both in the com.sun package. One was the “internal” one, the other was the Reference Implementation (RI). This is important because when trying to override the escape handler, you need specify the correct package in the property name, and obviously extend the correct class. If you aren’t picking up the RI by default, you can force this by setting a startup property:

javax.xml.bind.context.factory=com.sun.xml.bind.v2.ContextFactory

Secondly, custom escape handlers won’t be invoked at all if you call the method

marshal( Object jaxbElement, javax.xml.transform.Result result )
While the most common use case for creating XML involves calling marshal, we are also applying an XSLT, so we call the above method. This uses a completely different writer which does not use the standard escape handler! Hence I had to look for another place to intercept the character stream. We were creating a transformer like this:
SAXTransformerFactory stf = new com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl();
TransformerHandler transformerHandler = stf.newTransformerHandler();

We cannot alter the real transformer, but we can wrap it in another transformer, that can strip the control characters, so what I wrote is below. In this implementation, I have only blocked the low value ASCII control characters, as those are the ones actually appearing in our input data. It would easy to extend the method to block the high range characters if you want to.

import org.xml.sax.*;

import javax.xml.transform.Result;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.TransformerHandler;
import java.util.HashSet;
import java.util.Set;

public class ControlStrippingTransformerHandler implements TransformerHandler {

    private TransformerHandler transformerHandler;

    public ControlStrippingTransformerHandler(TransformerHandler transformerHandler) {
        this.transformerHandler = transformerHandler;
    }

    // this is the only method we need to override!!
    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        // remove dodgy control characters
        // for performance, we only want to create a new array if there are characters to strip
        // we only want to initialise the set of characters if there are any
        Set<Integer> charactersToStrip = null;
        for (int i = start; i < start + length; i++) {
            if (isControlCharacter(ch[i])) {
                if (charactersToStrip == null) {
                    charactersToStrip = new HashSet<>();
                }
                charactersToStrip.add(i);
            }
        }
        if (charactersToStrip != null && charactersToStrip.size() > 0) {
            // this array only needs to be the specific section of the input string
            // at most this array will be 255 characters in length, since the
            // input SAX class has a 256 character array
            char[] newArray = new char[length - charactersToStrip.size()];
            int newArrayIndex = 0;
            for (int i = start; i < start + length; i++) {
                if (!charactersToStrip.contains(i)) {
                    newArray[newArrayIndex] = ch[i];
                    newArrayIndex++;
                }
            }
            ch = newArray;
            start = 0;
            length = length - charactersToStrip.size();
        }

        transformerHandler.characters(ch, start, length);
    }

    private boolean isControlCharacter(char c) {
        if (c < 9 || c == 11 || c == 12 || (c > 13 && c < 32)) {
            return true;
        }
        return false;
    }

    // other methods just call the wrapped transformer methods
}
Posted in Java, XML | Tagged , | Leave a comment