Professional Documents
Culture Documents
Tadaya Tsuyukubo
twitter: @ttddyy
http://bit.ly/ttddyy_spring-batch-intro
Agenda
background architecture demo more concept summary
Batch Process?
bulk process long running process mostly sequential onetime, daily, monthly, yearly, ...
"The lack of a standard, reusable batch architecture has resulted in the proliferation of many one-off, in-house solutions developed within client enterprise IT functions." - spring batch documentation
Spring Batch
Accenture + SpringSource Accenture : industry knowledge & experience SpringSource : Tech, Spring programming model Batch Process Infrastructure o transaction management, skip, repeat, job execution, etc. o POJO based o not scheduler
- no need to have special environment - flexibility of storage and algorithm - reuse of existing java library
Architecture decision, case by case mix both: pre-process job to hadoop. ex: transform log and push to hdfs. (flume?) Guideline: data & computation small to medium : spring batch very large : hadoop or grid frameworks
Basic Architecture
ItemReader:
retrieve input data from datasource(file, database, queue, etc.)
ItemProcessor:
transform input data to output data
API (pseudo)
interface ItemReader<T> { T read(); } interface ItemWriter<T> { void write(List<? extends T> items); }
ItemReaders, ItemWriters
DataSource Flat File XML Database Message etc. ItemReader:
FlatFileItemReader JdbcCursorItemReader JdbcPagingItemReader HibernateCursorItemReader IbatisPagingItemReader JmsItemReader etc.
ItemWriter:
FlatFileItemWriter HibernateItemWriter JdbcBatchItemWriter JpaItemWriter StaxEventItemWriter etc.
Demo
[Source] http://github.com/ttddyy/demo - spring-batch-intro [Samples] SimpleApp o reader, processor, writer FlatfileApp o read from csv, passing parameter, late binding, step scope
Step
Step: chunk, tasklet
Job
- group of steps - represent entire batch process
Job (cont.)
Job execution
JobLauncher, JobRepository
JobLauncher simple API to run job TaskExecutor o Synchronous o Asynchronous JobRepository store job status/result database, in-memory
more topics...
parameter passing (JobParmeters, ExecutionContext) "step" scope chunk processing commit interval, repeat policy, skip policy listeners ItemReadListener, ItemProcessListener, ItemWriteListener StepExecutionListener, ChunkListener, SkipListener scaling multi-thread, parallel, remote, partitioning web admin
Summary
Spring Batch provides infrastructure focus on business logic pojo programming with DI Light weight easily embedded to existing application reuse existing libraries Simple, easy, and powerful
Reference
Spring Batch
Project : http://static.springsource.org/spring-batch/ Documentation : http://static.springsource.org/spring-batch/reference/html/index.html