You are on page 1of 16

Spring Batch processing

Spring Batch is one of the core module of the spring framework and using this spring batch you can
create robust batch processing system

Batch Processing is a technique which processes data in a large group instead of a single element of
data. Where you can process a high volume of data with minimal human interaction..

For Ex:

Conversion of data from .csv file to database or vice-a versa

Hi everyone welcome to java techy in this tutorial we'll understand what is spring batch and its
architecture also we are going to develop an application using spring batch who will process a huge
volume of data in a fraction of time okay all right before we jump into implementation let's have a
quick look into spring batch and its architectural flow basically spring batch is one of the core module
of spring framework and

Using this spring batch you can create robust batch processing system now you might ask what is
batch processing right batcj processing is a technique which process data in a large group instead of a
single element of data where you can process a high volume of data with minimal human interaction
now let's understand what is the exact use case or when do we need this batch processing system so
insert whenever you want to transfer huge number of data

From source to destination then that time you must need to use this spring batch concept for
example let's say you want to design one billing analysis system so you have the billing information
with you as a csv format and you want to dump that csv to database okay so here source is your csv
file and destination will be your database now let's assume batch processing technique is not there
then you need to insert each and every row of that csv

File to database by writing insert statement which is really a painful job isn't it so for this kind of
scenario it's good to use bias processing so that your job will be faster and you can save your time
similarly you can think of another use case that is report generation let's say every day you want to
export either csb or excel report by fetching data from database so here database will be your source
and file or csv file will be your destination right

So this can also quickly be done if you are using any batch processing technique there can be n
number of use case but for this example we'll demonstrate the first scenario where we are going to
upload a large csv file to database within a second using task executed framework okay that's fine
now let's try to understand spring batch core component and their flow of execution

so the first key component in spring based architecture is job launcher so this Job Launcher is an
interface this used to launch a spring-based jobs you can also say this as an entry point to initiate any
job in spring batch it has a method called ron who will trigger the job object or job component you can
say so once job launcher call the ron method immediately it will create another component that is job
a job can be defined as the work to be execute d using spring base this work might involve a simple or
complex task once job launcher

Launches a job immediately it will call another component that is called job repository okay so this
job repository helps to maintain state of job whether it is succeed or failed suppose a spring-based job
was running and an error occurs how does spring bears know that an error has occurred and the job
needs to be rerun right so we need to save state of the jobs and further execution should take this
into consideration state management is an
Important aspect when processing large volume of data this is achieved using this spring batch job
repository okay now next this job component will talk to another component of spring base that is
state step is nothing combination of other three component like item reader item processor and item
writer okay where item reader will read the data from source so in our scenario source will be our csv
file okay similarly item processor will process the data if you

Want to do any operation in between reading and writing then you can use this item processor
similarly item writer will help you to write the data to the destination in our case destination will be
database right because you want to read the csv and you want to dump that csb to the database i
believe this is all clear for you a job can also have a multiple steps and as usual steps can have
multiple item reader processor and writer this is what

All architecture of spring bias or you can get context about all the core component of spring baths
okay now let's quickly create a new spring boot project to demonstrate this scenario so without any
further delay let's get started [Music] so create a new project click on file click on new project then
click next then give the group ideas

Com.javateki then i will specify the artifact ideas batch processing demo project name i will give the
same then i will change the jdk version to jdk8 then i will give the package name as com.javateki dot
spring dot batch okay now click next let me add all the required dependency we are using the latest
version of

Spring boot 2.6.7 so i will add the lombok dependency then i will add the web dependency then i will
also add the spring batch related dependency spring batch then since i just want to input my csv data
to database i need this mysql and also i want jpa dependency rate jpa okay i believe that's fine

Now click on next click on finish it will take few seconds it may download all the lattice dependency
so here application imported successfully it downloaded all the latest version of jar now as per our
requirement we just need to i'll show you the excel file or the csv file which you want to dump into
our database using this spring batch so if i'll open this this is where the csv file with the field id first
name

Last name email gender contact number country and date of birth okay i have 1000 row over here
this csv file i just want to save this csv file to the dv with fraction of second using the task executor
and spring base framework so for this csv file i need to create a entity class because i need to save this
to my database right so i will go to the src i will go to the main then i will create couple of package
here create new package

I will give the package name called entity then i will give the package name called config then i will
create another package called repository then create a package called controller okay fine so to map
this csb information to the object i need to create an object which will store in my dv right so for that i
need to create a entity class so

I'll go to this entity package and i'll create a class called customer fine inside this customer i need to
add few field but before that let me annotate here this is my table so i'll specify entity and i will
specify the other table annotation i'll give the name of the table you can give something like

Customer info or something like that okay then i need to add few field id first name last name gender
contact number country and dob whatever the input or the column is available in my csv file all the
information i want to save to the dv so i define all the header as a variable or the column in my table
okay let me input this this is my primary key

So i'll just add it okay now next i need to create a dio class or repository class right so i will just create
a java class it will be interface so i will name it customer okay then i will extend it from jpe repo then
give the entity here customer then primary key data above your primary

Key which is the integer right now i created my entity and i created my repository now i need to make
the connection from my application to the database right so for that in my application.properties i
need to add all the data source related properties so add this data source driver class url username
password then so sql hibernate ddl auto and the server port under dialect and here spring batch
initialize schema i

Specify always here and i just want to disable job run at startup okay on application startup i just i
don't want to run the bad job i want to run the badge of whenever i will trigger from my controller
that is why i just make it false that's fine now now next i just need to add that customer.csv file inside
this resource folder you can keep it outside but let me copy it to the resource folder so
I'll go to my desktop then i'll just copy that csv file let me copy this and i will paste it inside this
resource folder fine so you can see here id first name and all the thousand row is there in my csv file
you can create more a number of row in that csv file to test your api but with thousand row i can
show you how it will be done within a second it don't take much more than second okay so we

Have the entity we have the repo and we added the csv file which you want to map to this particular
entity and want to store in the db right now the next step we need to create the spring batch
component or spring batch config so if you can remember as per the architecture we just need to
create item reader who will read the data from my csv file then also we need to write item processor
we'll process the data in between reading and writing and we need to

Create item writer component who will write the data to database okay and all three component
once we create we need to give these three component to the step then once we created the step
that step we need to give to the job object so these are the component we need to configure in our
spring batch code so i will just create a class here new java class i will name it spring batch config then
i will just annotate here at the

Rate configuration and also you need to enable the batch processing so there is annotation you can
use enable batch processing in the spring framework this will tell to the springboard for this particular
application user want to enable the batch processing okay now what i'll do i will inject two factory
class for job and step so if you can remember the flow diagram

We have the step and job right so to create the step there is a step builder factory to create the job
there is a job builder factory so i just need to inject that that two interfaces here so i'll just add private
job builder factory then private step builder factory and here in the item writer i want to save the
data to my database right so i will just inject the

Repository here customer repo that's fine you can annotate either at auto ad but since i have these
three so i will just use the constructor here i will just use all argument constructor if you have more
than one constructor then you might define at the right auto here since i don't want other constructor
apart from these three attribute i can just define all argument constructor then spring batch this

Particular bin will inject these three bin okay then next as per the flow diagram we just need to
create reader processor and writer object right so first i will create the reader object so let me zoom
this what i will do here i will just create a reader bin which will be flat file item reader or something
like that

Yeah flat file item reader here i'll provide the generic as a customer fine then you can give the name
as a reader or something like that then i'll just define this address b since you want to read from the
csv file so there is a class given by spring bias that is flat file item reader okay so you can simply use
this class flat file item reader to read the information from your source okay so i'll just

Create object of it flat file item reader item reader equal to new flat file item reader you can pass the
generic here let me pass it here customer that's fine right i can specify this side also now in this item
reader i need to tell where is my file located so i will just give item reader dot set resource i can give
this a new

File system resource then i will give the path of my file so my file present inside src src then resource
src main resource right src main resources then the file name file name is customers.csb now also in
this item reader i need to set the name you can give any name here item reader dot set name i'll just
give something

Like csv reader any name you can give here then you need to tell to the item reader while reading my
csv file just ignore the or just keep the first line because that is the header that information i don't
want to store to save to my database right i just want the second line onwards so we can tell to skip
the first line item reader dot set lines to skip is one fine now next also you need to define a line

Mapper will understand what is the line mapper let me create a method first line mapper then i will
just create a method fine then next just create the object of this line mapper so there is a class called
default line mapper line mapper yeah i'll just create the object of it you can also define the type
generic i will

Give customer line mapper equal to new default line mapper fine now in this line mapper see the csv
file i'll show you here is comma separated value right we need to tell in this line mapper this is what
the delimiter we are using as a comma you just extract it and map to the object which is my customer
object that is what we just need to define inside this line mapper how to
Read the csv file and how to map the data from the csv file to the customer object that is what the
job of this line mapper method so i will just define the delimiter tokenizer just create object of it then
here just set the delimiter which we are using the comma right set

Delimiter is the comma then you can set the strict as well delimiter dot set strict equal to false fine
now we just need to tell what are the header you just do the comma separated and map to the object
so for that i just need to provide delimit delimited line tokenizer dot set

Names okay so these are the names i have which is the header id first name last name email gender
contact number contrary and date of birth so this is the simple guys i'll just change it to the line
tokenizer that will be more meaningful just paste it

Okay so this line token is a tokenizer will read the csv file with the comma separated value and these
are the header we set here now the next step we need to map this particular information to the
object right so there is a class in spring then something like bin wrapper field mapper or something
like this build wrapper bin wrapper field set mapper

Okay so i will give the generic this will map the csv file to the customer object i'll just give field set
mapper equal to new fine then just specify the target set target is nothing customer class customer
class that's why so we have the line tokenizer we have

The fieldset mapper line tokenizer will extract the value from the csv file fieldsetmapper will map
that value to the target class which is customer now both the object you need to provide to the line
mapper line mapper dot set line tokenizer then line mapper dot set field set mapper which is field set
mapper object then finally i will just

Return this line mapper that's it fine so we created reader object now then as for the flow diagram
next object is item processor so if you want item processor you can create it so let me create a item
processor class where there is a compilation error let me check okay let me check okay

I just need to return this return item reader now next component i need to create the item processor
so i will just create new java class customer processor or something like that then i just need to
implements this class from item processor if you observe the item processor came from the bias dot
item package okay and there will be two

Argument which will be generic so i just need to provide the read the input object as a customer and
write it as a customer inbound and outbound so i just need to implement the method see here the
argument is the type of customer and the return type is customer as of now i am not going to write
any logic in the processor i will just return the same object okay so later while explaining i

Will tell you the purpose of this item processor or this customer processor how you can filter out the
information while processing the data or while reading and writing i will tell you in a moment now i
just need to configure this customer processor object in my batch config class so i will go here and i
will just write public customer processor processor

Return new customer processor then i will just define here at the red bin fine now we added reader
and processor next we need to create the component of item writer so there is a also class called
repository item writer if you are using spring data gpa so i will just use that

Class given by spring framework repo jitori item writer something like that okay let me check the
class name repository item writer yeah and here also you need to define the generic which is
customer and you can name it writer then just define at the red bin fine so here you can create the
object of

This repository item writer let me create the object of it writer equal to new fine so you can remove
this then in this writer object you can set your repository which is set repository so you can if you
remember we inject the repository here right i can set this customer repository so here i am simply
saying in this item writer whatever the

Value you get from the reader just use my repository which is the customer repository i will better
name it properly so that it will be more meaningful this is what my set repository just call the method
of this customer repository what is the set method name or something like that yeah so the method
name will be safe okay so
We are just telling in this writer just use my customer repository dot save method to write the
information or the csv data to the database then simply i will just return it writer there is couple of
classes guys you can use this repository item writer jdvc batch writer there is n number of class given
by spring batch so if you go to this interface item writer this also implements from

Item writer ok so there is multiple classes you can just go to the spring based documentation you can
know it that's fine we created reader processor and writer in reader we tell to the spring batch read
the file from the source in processor we don't have any logic as of now and in writer in writer we just
specify write the csv information to the db which is the destination so we have completed these three
steps item reader

Processor and writer now the next step we need to create the object of this step and we need to give
these three component to this step so let me create the object of step you can create here public then
you can give the name step one let me add the input statement import class it should be came from or
g string framework dot bash dot

I'll just annotate your order at bin since we have step builder factory with us which we injected we
can create the step object using the step builder factory okay so i can simply write here return step
builder factory dot get you can give any name as your step name i will just give birth or csv step fine

And you can define the generic customer customer and you want to process the data in a chunk right
so you can define the chunk size as a 10 process 10 record at a time that is how we can define the
chunk now next you can provide who is your reader and writer and processor so reader i have created
the bean of it then the processor i also created a bin of it

Then the next dot writer we also created bean of this writer object right now the next you can simply
build it this is clear right as for the flow whatever you understand we just provided reader processor
and writer object to this step and that is what i am doing here i created the step object and i am giving
reader processor and writer to it then finally i am building that step

Object now next step you need to give that step object to this job object right so i will just create
another object of or another bin of job so just create it public job you can give any name i'll just give
import or i'll just give it job
I'll just import this this should come from batch.core you can give a meaningful name round job or
something like that here since we have the the way we have step builder factory we injected similarly
we have the job builder factory i can use this job builder factory to create the object of job so i'll just
use written job builder factory dot get of job name i will give the job

Name as import customers info something like that fine then here i just need to give the step step
one that is what the bna created here right you can give n number of step as i defined in this flow
diagram a job can have multiple step you can have if i type here let me show you a flow w

You can give the step let us say step one similarly you can create another step object dot flow dot you
can see there is another called next step if you have other step object you can give that step object
here since i have only one step so i don't have the step chaining here i can remove this i have

The single flow so i just want to execute that then next you can simply end and build that flow okay
so i believe the flow diagram whatever you understand is clear for you create three component give it
to step then create the step object give it to the job now this job object we need to give to the job
launcher in our controller so that we can trigger the job ourself by hitting the endpoint so we have the
configuration ready here

Now let me go to the controller class i will create a class called let us say customer controller or you
can name it job controller job controller controller fine then you can define a root url not this request
mapping job or jobs something like that

Now here i'll write a method who will trigger the job so for that i just need a object of job launcher
job launcher also you need the objective job right job fine let me input this you can again okay better
let me do that word if you don't want to auto it if you have a single constructor you can add these two
attribute as a argument

You don't need to specify the author but for now let me add it or to it then i will just write an end
point just right public board you can give something like start or start i can name some i can give some
meaningful name import csv to db or something like that
Job you can annotate it here i'll just give it post mapping fine now here i just need a object of job
parameter because to trigger a job using job launcher dot ron i need to pass this job object as well as
the job parameter so i'll just use job parameter okay job parameters equal to there is a class called
new job parameter builders or something like

That yeah fine then here dot r long give the key i will give the timestamp as a key start at parameter
as a timestamp system dot current time millisecond fine then i'll just give two job parameters

That's fine now i just need to use the job launcher the second argument job launcher dot run method
to trigger the job now here i just need to give two argument job and job parameters so there is a it
will throw the exception you need to handle it using try catch so more action surround with tricast
you can see there is multiple exception i can keep in a

Single cache block right pull ups cash block add in a single cash statement that is fine now we have
the launcher which will trigger the job so i just need to define endpoint here as well jobs import
customers this is where my url so fine so we have the config let me cross verify string batch config we

Created reader processor and writer step and job object then we have the processor we have the
entity repo job controller that's fine now let me run our application so let me go to the main class i
will just run our application okay it seems there is some error in job controller require a bin of type
this job go to the job controller

And we already inject the bin of object of this job now we'll verify in spring batch config we created
object of step but this job we didn't create the bin of it right so we missed 200 at that bin now i
believe we are good let me start this application it will take few seconds spring boot yeah so if you
observe it started on port 919 right

Now if you go and check in your database let me go to this java techie this is where the schema which
i use and if you see there is couple of more table added by spring batch badge of execution badge of
execution context badge of execution params sequence instance then about batch step by step
execution context and its sequence and also you have the customer info table right
So let me show you select start from this what badge of execution params okay but i just want to
show you the badge of execution table so let me remove this i'll just run this you can see there is no
job running it right we didn't stop our badge of yet so there is no job instant side there is no create
time exit code exists messages nothing is

There so similarly i'll check my step execution all are empty right then badge of instance everything is
empty then i will check from my customer info table okay what is the table name let me go to my
entity customers info right i will go here and i will just change

Here there is no entry now we are going to run our bad job through using our endpoint okay this is
what the endpoint jobs and import customers so i'll just open my postman then i'll just copy this url
this is the post request i can directly trigger trigger it from my postman right let me clear the console
go to the postman 9191

Jobs this is what the endpoint i'll just send the request okay there is some error i believe we missed
to add the setter getter in our entity yeah that's what the problem just add data at the rate all
argument constructor

Or the rate no argument constructor fine that is why it is not able to get the getter and setter method
invalid setter method you can see the error right let me restart my application application started on
port 9192 now go to the postman let me clear the console before that go to the postman trigger the
request you can see here there is a insert statement query

Still going on because it will process 1000 row from the csv file and finally this is where the step name
csv step executed in 6 second 920 millisecond and this is what the job name import customers
completed following this is what they start at system.current time millisecond and the status is
completed it took around six second triple nine four millisecond to complete or to process 1000 row
from csb to

Database right now if you go and check in your db i will just select star from customer info you can
see here it added the record sequentially because if you observe the id is mapped 1 2 3 4 5 6 10 up to
there is no gap or there is no suffered record right it is on sequence so if that is the case if i will
process thousand or one lakh record then it might take five to ten
Minute or more than that we don't we never know right then there is no sense to use these spring
baths now you might ask me what is the advantages if i use the spring bear same i can do manually
one after another now that is automated using spring bias but still it is taking time so by default spring
batch is synchronous it is not asynchronous okay so you need to tell to the spring batch execute the
row from the source to

Destination concurrently so for that you need to define your custom task executor or you can set the
concurrency label to the task executor now how we can do that go to spring based config class and
here you can define a class public task executor you can create object of it task executed then you can
create the object of simple async task executor task executor equal to new simple async

Task executor and here you can set the concurrency limit set concurrency limit is 10 since i have only
thousand record i want 10 thread will execute parallelly or concurrently okay then finally return this
task executed now we need to tell to the steps while doing the reading processing and writing just use
this task executor okay so i will just add here

Just use the task executor which i am giving you with the concurrency limit 10 fine now what i'll do i'll
just delete the information from this table otherwise it won't allow right because same csv file i'm
going to upload again the id will be duplicate we might get the constraints violation exception so
better let me clear the information from the table delete from customers info okay

Yes but before that i just want to show you if you now check the badge of execution you can find the
job execution id1 and this were the version exist message last update now if you check the step
execution you can see here one two 1 0 3 this is the step name fine this is what the

First time we got the error right that is what the exist message let me check it yeah parsing error at
this because you didn't added the greater answer so that tell me the status is failed and the last one is
succeed fine then if you check the badge of instance you can see this so this table was added

By spring boot to check the state of your job whether your job is succeed or not how many record is
processed how many records are failed everything you can get all the info from this table okay real
time while implementing this spring batch you need to play with this table you need to check that job
id and its status how many row are succeed and how many are fails everything you need to uh work
on the real time now let's

Move to the code we'll start this application because we deleted the row from this table now we
have the fresh csv file which is thousand row and we'll process with the 10 thread that is what we
added the task executed now we just want to see how the spring batch will execute concurrently so
that i will get the better performance right now let me read on this stop and read on so here we got
this error

Let me check spring batch config task executer yeah because you didn't define this as a bin right just
add it now let me cross verify once task executor set congruence limit and this i want to return this
object right not the method so you can remove this fine now let me start this application again so
application started again on port

9191 now let me let me verify the record in my table first i believe we deleted it but let's cross verify
once again there is no record ok now go to the postman before that let me clear the console go to the
postman and send the request now we'll see the total time you can see here now the job is done in
three second three dot 13 second right now if i will

Go and check in my db select star from this table you can see here now the customer id begin eight
seven eight six there is no order now now ten thread concurrently executing each row from the csv
file so we never know which row will take the or which thread will take the which row to process that
is how you are getting the record not in a sequence here right 788

You don't know if you go down 366 then it will go down 192 this is how it will concurrently execute
okay now what i'll do i'll just remove again everything then i will rerun it delete from this okay now i'll
just trigger again send it it takes two second to complete ok so if you will go here and if you will check
now it started nine four nine it again

Started from the end fine because we do not know like thread will never give you the exact expected
output it will depends on the thread scheduler whose thread will get a chance to execute that that is
how you are getting order not in a sequence that is fine if it depends based on your requirement what
is the concurrency limit you want to set 10 20 25 or whatever that depends on your business that is
fine we understand how we can
Process multiple record or the large set of record in a fraction of second that is what the main motto
of this tutorial but now in this example we don't have control on the thread we don't know which
thread will take the which record but if you want to take the controller if you want to take the
ownership on top of thread that you will tell to the thread 1 execute row 1 to 10 or thread 2 execute
from 100 to 200 if

You want to set your own limit on thread you can do that using spring bass partitioning okay that i
will cover in my next tutorial that is what spring boot or spring batch partitioning you have the control
over the thread you will tell to the spring bass give fast undertow to the thread one next hundred to
the thread two like that you can configure it okay but before close this session i will just give you a
small example of the processor because we

Write this uh spring batch processor but we are not using it okay so let me open my table select star
from customer info okay if you observe i have some country bangladesh china iran okay so i'll just
filter out with this united states let me check the count of it the number of record count star from this
table 25 okay let us say you have a requirement

You want to filter out or you want to process only the customer who came from the or whose country
is united states how i can add this filter statement while reading and processing it or reading and
writing it in between if you want to do any processing or any validation or any filtration you can use
this processor okay now i just want to filter out only the customer who whose country is united

State so for that what i can do i will go to my customer processor here i will just write the statement
if customer dot get country dot equals i'll just copy the proper name united states then only return
that customer object or else return the null

Now you don't process all the record it will get the record and it will check whose country is united
state if the if that record satisfied this condition if that customer belongs from the country united
states then only it will return that customer object so that it will go to the writer and writer will save it
otherwise it will just return the null so we'll verify that let me restart it
Meanwhile let me delete the db fine so let's switch it to complete application restricted now go to
the postman before that let me clear this console go to the postman and just send the request it took
15 34 millisecond because the record is very less because processor filter out the record based on the
country now if you go and check in your dv you

Will find total 25 record you can see here right i just will show you the information rather than count
star from customer info you can find all the record still its executing concurrently because the order is
not in sequence and you can find only the record whose state or whose country is usa total record
count is 25 because

That is what the filter we added in the uh customer processor okay this is just one sample filter i
added but it depends based on your use case you can filter out the condition from the object or from
the information you are getting from the source so this is how you can process large volume of data
within a fraction of second using spring batch you can add your own task executor to make it faster
rather than executing in a single thread or in a synchronous way

You can make it asynchronous using the task executor so that you can get the better performance
right that is what we find here and do let me know in comment section if you really want to know
more about spring batch partitioning with example that's all about this particular video guys thanks
for watching this video meet you soon with a new concept

You might also like