Spring batch integration

By Artur Yolchyan

Artur is a Senior Software Engineer at AUTO1 Group.

Intro

In this article, we will take a look at spring-batch and how it could be used. We will walk through various configurations and we will create an application which reads from a CSV file & writes into a database with outstanding performance.

I used the following: Java 11, Spring 5+, Spring Boot 2 and Maven-based project.

Create a Project

First, we need to create a Spring Boot 2 project. We recommend that you do so by visiting spring initializer, which is a useful tool to generate spring projects with required dependencies and configurations.

Dependencies

We need the dependencies below to run and test the project:

<dependency> 
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter</artifactId>
</dependency>
   
<dependency>
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-test</artifactId>
 <scope>test</scope>
 <exclusions>
    <exclusion>
     <groupId>org.junit.vintage</groupId>
      <artifactId>junit-vintage-engine</artifactId>
    </exclusion>
 </exclusions>
</dependency>

<dependency>
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-batch</artifactId>
</dependency>

<dependency>
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>    

<dependency>
 <groupId>org.projectlombok</groupId>
 <artifactId>lombok</artifactId>
 <version>1.18.10</version>
 <scope>provided</scope>
</dependency>

<dependency>
 <groupId>com.h2database</groupId>
 <artifactId>h2</artifactId>
</dependency>
    
<dependency>
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-test</artifactId>
 <scope>test</scope>
 <exclusions>
     <exclusion>
     <groupId>org.junit.vintage</groupId>
     <artifactId>junit-vintage-engine</artifactId>
     </exclusion>
 </exclusions>
</dependency>

<dependency>
 <groupId>org.springframework.batch</groupId>
 <artifactId>spring-batch-test</artifactId>
 <scope>test</scope>
</dependency>

<dependency>
 <groupId>org.hamcrest</groupId>
 <artifactId>hamcrest-all</artifactId>
 <version>1.3</version>
 <scope>test</scope>
</dependency>

spring-boot-starter-batch dependency includes all the configurations to run the spring batch application. Lombok is just a helper dependency to write the code faster and cleaner. H2 is used as an in-memory database. spring-boot-starter-test and spring-batch-test are included for test purposes.

Book Class

Let’s create a model class book, which will represent a book. This will just serve as a model class and will help us during implementation of spring batch.

@Data

public class Book {
private String title;
private String description;
private String author;
}

Configuration

Let’s create a class called SpringBatchConfiguration. We will add all required configurations here. First, let’s annotate this class with @Configuration to be able to inject beans, and with @EnableBatchProcessing to enable spring batch processing. Additionally, we can add @RequiredArgsConstructor from Lombok which would help us to generate constructor with parameters. These parameters are the ones which are marked as final class properties. These properties will be injected as spring beans. Our class will be like this:

@Configuration
@EnableBatchProcessing
@RequiredArgsConstructor
public class SpringBatchConfiguration

Now, in SpringBatchConfiguration class let’s add properties which need to be injected into the constructor:

private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;

jobBuilderFactory and stepBuilderFactory are declared in spring batch jar as spring beans so that we can inject them in any class we want.

File Reader

Now, we need to declare and initialize spring beans to configure the batch process. The first bean which we will need will be responsible for reading from a file line by line. Spring Batch provides a default class for it. The class name is FlatFileItemReader. Similarly, spring has different default reader classes for reading from a relational database, mongodb and etc. However, if you need, you can create your own reader and implement it in a way you want.

Let’s now see what FlatFileItemReader injection will look like.

@Bean
@StepScope
public FlatFileItemReader<Book> bookReader(@Value("#{jobParameters['filePath']}") final String filePath
) {
return new FlatFileItemReaderBuilder<Book>()
    .name("personItemReader")
    .resource(new ClassPathResource(filePath))
    .delimited()
    .names(new String[]{"title", "description", "author"})
    .fieldSetMapper(new BeanWrapperFieldSetMapper<Book>() {{
    setTargetType(Book.class);
    }})
    .build();
}

We are configuring that the bean reader should read data from the given file path, which should be a CSV file having rows with title, description and author respectively.

The StepScope means to initialize this bean after each step, so file path could be dynamically set when the spring batch job is launched. We will come to that later, how to pass the file path later in this article.

Item Writer

Now, let’s create a writer class, which will take the data and write into the relational database.

@Bean
public JdbcBatchItemWriter<Book> writer(final DataSource dataSource) {
    return new JdbcBatchItemWriterBuilder<Book>()
        .itemPreparedStatementSetter((book, preparedStatement) -> {
            preparedStatement.setString(1, book.getTitle());
            preparedStatement.setString(2, book.getDescription());
            preparedStatement.setString(3, book.getAuthor());
        })
        .sql("INSERT INTO books (title, description, author) VALUES (title, description, author_surname)")
        .dataSource(dataSource)
        .build();
}

In the writer bean, we are setting item processors and adding values inside preparedStatement accordingly, which book property should be inserted for each db table column.

Step Configuration

Now, let’s configure a step which will be executed in the batch process. In our case, the configuration will look like this:

@Bean
public Step step1(final ItemWriter<Book> writer, final ItemReader<Book> reader) {
    return stepBuilderFactory.get("step1")
        .<Book, Book> chunk(100)
        .reader(reader)
        .processor((ItemProcessor<Book, Book>) book -> book)
        .writer(writer)
        .build();
}

In our scenario, we have only one step, but it's also possible to configure multiple steps. Here, we are creating a spring batch step with the name step1 and setting reader and writer accordingly. These are the readers and writers which we created as spring beans earlier.

We are setting chunk as 100, which means the items chunk proceeded is 100. We can make this configurable too. The processor is for converting the current object to the one which the writer should proceed with. In our case, it is the same object.

Job Configuration

Last but not least, let’s configure the job which should be executed.

@Bean
public Job importUserJob(final Step step1) {
    return jobBuilderFactory.get("bookReaderJob")
        .incrementer(new RunIdIncrementer())
        .flow(step1)
        .end()
        .build();
}

Here, we create a job with the name bookReaderJob. We add an incrementer RunIdIncrementer, which is an id generator for the tables which are specifically designed to store the details about job executions in the database. After each time the spring batch job is executed, it will save details about the execution. To check the schema structure for storing this data, take a look at this sql. It is for MySql, but other SQL scripts are available too.

Additionally, we add flow by which the job should be executed. Currently, we have only one step in our flow, so we add it.

Please also add this config: spring.batch.job.enabled=false in your properties config file so that spring batch job won’t be executed automatically with no parameters when the application is started.

Execution

To execute the job, we need to declare JobLauncher and launch the job. To do so, we need to create the below class:

@Component
@RequiredArgsConstructor
public class SpringBatchExecutor {  
    private final JobLauncher jobLauncher;
    private final Job job;  

    @SneakyThrows
    public void execute(final String filePath) {
        JobParameters parameters = new JobParametersBuilder()
            .addString("filePath", filePath)
            .toJobParameters();  
        jobLauncher.run(job, parameters);
    }  
}

Now, we can call the execute method from whenever we want, with the file path from which the data should be read.

Additional Classes

There are a few additional classes which you will need to execute your code.

Create a class BookEntity so that spring data JPA will automatically create the book table for you. Then you can create repository.

@Data
@Entity
@Table(name = "books")
public class BookEntity {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String title;
    private String description;
    private String authorFullName;
}

Then create the interface BookRepository and extend it from JpaRepository. At the moment, we will need this only for testing.

@Repository
public interface BookRepository extends JpaRepository<BookEntity, Long> {
}

Testing

No one likes a code which is not tested. So, let’s write a few test cases for our class.

Here is a code sample for test cases:

@Slf4j

@SpringBootTest
class SpringBatchSampleApplicationIntegrationTest {

    @Autowired
    private SpringBatchExecutor springBatchExecutor;
    
    @Autowired
    private BookRepository bookRepository;

    @Test
    public void testExecution() {
        long initialCount = bookRepository.count();
        assertThat(initialCount, equalTo(0L));
        springBatchExecutor.execute("sample-data.csv");
        long count = bookRepository.count();
        assertThat(count, equalTo(7L));
    } 

    @Test
    public void testLargeData() {
        long startTime = System.currentTimeMillis();
        long initialCount = bookRepository.count();
        assertThat(initialCount, equalTo(0L));
        springBatchExecutor.execute("large-data.csv");
        long count = bookRepository.count();
        assertThat(count, equalTo(60000L));
        long endTime = System.currentTimeMillis();
        log.info("executed in miles: {}", endTime - startTime);
    }
}

The second test executes in ≈ 3500 ms in a machine with a 2.2 Ghz Intel Core i7 and 16GB ram. It proceeds 60K lines of CSV file, and saves it into a relational database.

You can check out the working sample code in github,