A first look at Spring Batch

October 10, 2008 · Posted in Java 

Spring-Batch is a rather new project within the Spring portfolio. It addresses a large field within computing, although not main stream in Java. A lot of corporate computing is managed by batch processing, many business transactions based on file input picked up from FTP drop zones etc.

Back in 2003, I built a batch-oriented system, that could deal with FLV (Cobol) files, assemble transaction data from a database, generate reports in various formats and push them away over FTP, HTTPS or mail. One offspring of that project is my library for reading and writing FLV files.

It is therefore a dejavu to reconnect to the ideas and principles behind Spring-Batch. As common for Spring projects, it solves more than one design problem and provides a smorgasboard of solutions. The only drawback is the lack of introductory reading material, which makes the introduction steeper than it needs to be. So, let’s fill that gap.

A little bit of theory

Spring-Batch can be subdivided into two areas which you can use separately; the first is item handling and the second batch executions.

Item Handling

Let’s start discuss item handling. This means reading and interpreting file or database contents and write it to file or database. In this area SpringBatch really shines. During interpretation you typically want to create business objects, operate and transform them. SpringBatch comes with support for both flat files, structured file and database access.

A flat file is either a CSV (Character Separated Values) of FLV (Fixed Length Values) file. A structured file is for example XML. Typically you assemble a tokenizer with a mapper that produces business objects and the other way as well. A tokenizer understands the file format and shields the rest of the application, making it easy to swap file formats. An activity more common than expected, because transaction data suppliers often delivers in various obscure file formats.

Item handling is the easy to understand part of SpringBatch. And, as I said above, you can use it as is without touching the other part; batch executions.

Batch Execution

Batch executions in general and within SpringBatch in particular, you design a solution around the concept job, which is a named sequence of steps. A step is a chunk of work, typically processing an input file.

One important execution condition for batch processing is operation performance monitoring and management, which in practice means the ability to track individual step instances and in case of need, restart a step (or job) and continue from from where it left. In order to fulfill this requirement, traditional logging is not sufficient and therefore is batch processing surrounded by lots of tracking logic and the execution progress is tracked and persisted to a database.

With this said as background information, it is easier to understand and approach SpringBatch. You can also easier take a decision do you need both components or is it sufficient with the item handling part. For every non-trivial batch application, you will end up with a loop over the input data anyway, so way not give the batch execution part a chance as well? “Nuf talking, show me the code”

The Job Model

You organize a SpringApplication in one or more jobs. If you have more than one job, they typically reuse/share lots of functionalities when ti comes to step and item processing logic.  The required infrastructure is a job repository, a job launcher (runner) and a transaction manager. The latter is needed for committing chunks of work.

This is the intended way of working - the model. On the other hand, at least during investigation and early development (and maybe later as well), you have no need for transaction management and persisted execution tracking. The good news is you can fake it, which is exactly what we will do below.

Hello SpringBatch

The (first) SpringBatch application will be a minimal ‘hello world’ application, just to demonstrate what is required to get something up and running. I will use Maven, because it hides all the tedious tasks of managing all 3rd party libraries SpringBatch depends on.

Writer

The initial version contains only one single very small Java class. It prints out its input using Log4j. Let’s start with this one, so we can move on to the interesting parts.

package com.ribomation.tutorial;

import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.FlushFailedException;
import org.springframework.batch.item.ClearFailedException;
import org.apache.log4j.Logger;

public class LogWriter implements ItemWriter {
    private Logger  log = Logger.getLogger(this.getClass());
    public void write(Object item) throws Exception {
        log.info( item );
    }
    public void flush() throws FlushFailedException { }
    public void clear() throws ClearFailedException { }
}

Reader

This class serves as the end-point of a chain of components, that reads items from a list. Here is the list definition as given in a spring-beans configuration file.

    <bean id="reader" class="org.springframework.batch.item.support.ListItemReader">
        <constructor-arg>
            <list>
                <value>Hello</value>
                <value>Spring</value>
                <value>Batch</value>
            </list>
        </constructor-arg>
    </bean>

    <bean id="writer" class="com.ribomation.tutorial.LogWriter"/>

You can see the input side is a Reader and the output side is a Writer. For every non-toy application these abstractions are tied to files and/or databases. But leave that out for the moment. So what happens in between?

Step

A step is a chunk of work, for example reading the items from the list one at a time and sending them to the writer. As I said above, SpringBatch supports a heavy-weight execution model intended to track step instance executions ans support restarts. For this reason, the configuration of a trivial step is more complex than expected. You will need a transaction manager and job repository, in addition to the two more obvious reader and writer. Here is our spring snippet

    <bean id="helloStep" class="org.springframework.batch.core.step.item.SimpleStepFactoryBean">
        <property name="transactionManager" ref="tm"/>
        <property name="jobRepository" ref="jobRepository"/>
        <property name="itemReader" ref="reader"/>
        <property name="itemWriter" ref="writer"/>
    </bean>

You can see it uses a factory bean to create the actual step behind the scenes. There are several intricate ways to create a step, but this will do for the moment.

Job

A job is a sequence of steps, which means each step is run to completion before the next is started. (Support for concurrent execution of steps are around the corner). The easiest way to create a job is to (re-)use a SimpleJob. In our case, it has just one single step.

    <bean id="helloJob" class="org.springframework.batch.core.job.SimpleJob">
        <property name="jobRepository" ref="jobRepository"/>
        <property name="steps">
            <list>
                <ref bean="helloStep"/>
            </list>
        </property>
    </bean>

The listings above captures all our application logic. What remains is batch and build execution infrastructure.

Repo, Launcher and TM

In this toy application we do not need persistent execution tracking support and will use fake components. The job repository will store its job and steps in a hash map and the transaction manager used will be just empty (view its source code).

    <bean id="tm" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>

    <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
        <property name="transactionManager" ref="tm"/>
    </bean>

    <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository"/>
    </bean>

That’s it. This is all spring wiring needed. What remains is the Maven POM.

Maven POM

I will not digress into Maven here, just leave it as is. The POM, lists the required dependencies and adds a few nice to have plugins. For example, the dependency-plugin that assembles all 3rd party JAR files into a sub-directory and the jar-plugin that sets the class-path to this lib directory and points out the main-entry point, so we can run the artifact from the command line. The main class is CommandLineJobRunner, which is a small boot-strapper, that loads a spring beans configuration and kicks the job launcher.

Without more addo, here it is.

<?xml version="1.0" encoding="iso-8859-1"?>
<project
        xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <name>HelloSpringBatch</name>
    <groupId>com.ribomation.tutorial</groupId>
    <artifactId>${project.name}</artifactId>
    <packaging>jar</packaging>
    <version>1.0</version>

    <properties>
        <javaVersion>1.5</javaVersion>
        <springBatchVersion>1.1.1.RELEASE</springBatchVersion>
        <springDaoVersion>2.0.8</springDaoVersion>
        <springVersion>2.5.5</springVersion>
        <log4jVersion>1.2.14</log4jVersion>
        <appClass>org.springframework.batch.core.launch.support.CommandLineJobRunner</appClass>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>${springVersion}</version>
        </dependency>

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-beans</artifactId>
            <version>${springVersion}</version>
        </dependency>

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-dao</artifactId>
            <version>${springDaoVersion}</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-core</artifactId>
            <version>${springBatchVersion}</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-infrastructure</artifactId>
            <version>${springBatchVersion}</version>
        </dependency>

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>${log4jVersion}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-idea-plugin</artifactId>
                <configuration>
                    <jdkLevel>${javaVersion}</jdkLevel>
                    <downloadSources>true</downloadSources>
                    <downloadJavadocs>true</downloadJavadocs>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>${javaVersion}</source>
                    <target>${javaVersion}</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <index>true</index>
                        <manifest>
                            <mainClass>${appClass}</mainClass>
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>lib/</classpathPrefix>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <id>copy-dependencies</id>
                        <phase>package</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>${project.build.directory}/lib</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

Log4j

We will also need a minimal log4j configuration file (log4j-properties), which resides together with the hello-spring-batch.xml configuration file, in the src/main/resources directory of our maven project.

log4j.rootLogger=info, stdout

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdoutTarget=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{HH:mm:ss} %5p [%c{1}] %m%n

log4j.logger.org.springframework=warn
log4j.logger.org.springframework.batch=info
log4j.logger.com.ribomation.tutorial=debug

Compilation and Execution

Build the maven application using

mvn package

Run it using the command below

java -jar target\HelloSpringBatch-1.0.jar hello-spring-batch.xml helloJob

As you can see, we run the executable JAR file, with two required command line parameters. The first points to the spring beans file in the class path, and the second parameter is the name of the job to run. The output of the execution looks like this

13:17:26  INFO [SimpleJobLauncher] No TaskExecutor has been set, defaulting to synchronous executor.
13:17:26  INFO [SimpleStepFactoryBean] Setting commit interval to default value (1)
13:17:26  INFO [SimpleJobLauncher] Job: [SimpleJob: [name=helloJob]] launched with the following parameters: [{}{}{}{}]
13:17:26  INFO [LogWriter] Hello
13:17:26  INFO [LogWriter] Spring
13:17:26  INFO [LogWriter] Batch
13:17:26  INFO [SimpleJobLauncher] Job: [SimpleJob: [name=helloJob]] completed successfully with the following parameters: [{}{}{}{}]

Let’s take one step back and review our (toy) application. SpringBatch has taken care of all plumbing code to iterate over an input source and invoke various components leaving us to concentrate on the core business - in our case just to print it out to the console.

A minor variation

Before I close this posting, let’s add one small variation. In the initial version of our hello application, we just let the data (items) pass on to the output writer. This is clearly not realistic - if we for the moment ignore the list input and stuff. Typically, the items need to be processed and/or transformed in some way. One possibility is to attach an item transformer to the writer.

Item Transformer

The class below takes care of transforming the input item (a string) into a another item (upper case string).

package com.ribomation.tutorial;
import org.springframework.batch.item.transform.ItemTransformer;

public class UpperCaseTransformer implements ItemTransformer {
    public Object transform(Object item) throws Exception {
        return item.toString().toUpperCase();
    }
}

The next step is to attach the transformer to the Writer, using an ItemTransformerItemWriter, which is a delegating writer combined with a transformer invoker.

    <bean id="transformingWriter" class="org.springframework.batch.item.transform.ItemTransformerItemWriter">
        <property name="itemTransformer">
            <bean class="com.ribomation.tutorial.UpperCaseTransformer"/>
        </property>
        <property name="delegate" ref="writer"/>
    </bean>

Complicated? No, not really. It first invokes the transformer object, and then the writer sending it the transformed item. The only remaining task is to update the step definition, to use the transformingWriter. If you allow me, I leave that as an exercise for you. The only difference in the output is the item strings are now in upper case.

16:16:36  INFO [LogWriter] HELLO
16:16:36  INFO [LogWriter] SPRING
16:16:36  INFO [LogWriter] BATCH

Source Code

Comments

4 Responses to “A first look at Spring Batch”

  1. anup on November 6th, 2008 12:07

    Thanks for the simple example which explores the power of spring batch in a very concise way.
    regards
    anup

  2. Jose Henrique on February 12th, 2009 22:13

    I’ve read your 2 posts you put here about spring batch and they helped me a lot.
    Congrats!
    Unfortunately the most samples we find on the Internet are not so clear. I’m just trying to put a batch I coded to work, but I even don’t know how to do that.
    Basically I need to read a binary file, process it by replacing some bytes and write the result out putting it in other new file (in binary format yet).
    My doubt is: which class “reader” should I use in order to read a binary file? Or even in text format, which class I should declare in my job.xml?!
    Thanks in advance..

  3. jens on February 12th, 2009 23:33

    Actually, I’m not convinced SpringBatch is the right tool for the job. The reason is that all support classes for reading / processing / writing are dealing with text files in various formats.

    Is the binary data organized in - for example - (fixed sized) records? Are the bytes you need to replace, interpretable in the context of some high-level structure? If that is the case, then it might be an idea to read binary data, convert it into higher-level objects (some form of records), update the objects and then write them out to another binary format or the same format.

    On the other hand, if nothing of this makes sense it’s better to write a dedicated program.

    You might of course still use Spring (Core), assembling the various modules together BinaryFileReader / BinaryTransformer / BinaryFileWriter.

  4. Jose Henrique on February 13th, 2009 18:20

    Alright! Thanks for your help!
    Your answer was pretty important for us. Based on it, we decided to develop a dedicated program as you mentioned, even because we are in a tight schedule, I’m affraid there’s no time to implement Spring Batch classes handling this type of file.
    Kind regards…

Leave a Reply