You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Easy Batch and Spring Batch fundamentally try to solve the same problem: how to efficiently process large amount of data in batch mode. But they are conceptually different at several levels:
Job structure:
A job in Spring Batch is a collection of steps. A step can be a single task or chunk-oriented. In Easy Batch, there is no concept of step. A job in Easy Batch is similar to a Spring Batch job with a single chunk-oriented step (and using an in-memory job repository).
Job definition:
Spring Batch provides a DSL to define the execution flow of steps within a job. In Easy Batch, there is no such DSL. Creating a workflow of jobs is left to an external workflow engine like Easy Flows.
Job execution:
A Spring Batch job can have multiple job instances (identified by (identifying) job parameters). Each job instance may in turn have multiple executions. In Easy Batch, there is no such job instance or job execution concepts. Jobs are Callable objects that can be executed with a JobExecutor or ExecutorService.
In Spring Batch, the execution state of jobs is persisted by default in a map based job repository (which can be in-memory database as well). This is not the case for Easy Batch: By default, jobs are executed without persisting their state in a persistent store (but it is possible to do it using listeners).
This quick comparison should give you an overview of conceptual differences. There is nothing wrong with both frameworks, they just have different design choices and defaults:
Spring Batch is designed for large scale jobs. Restarting such jobs from scratch is not efficient. Hence, persisting the job state by default to restart it where it left off in case of failure makes perfect sense.
Easy Batch is targeted at small and simple ETL jobs. These jobs are in most cases idempotent (or at least can be designed to be idempotent). Such jobs can be restarted from scratch if they fail without any problem. The design choice of not persisting the job state by default makes sense in these cases.
Based on these conceptual differences, comparing Easy Batch and Spring Batch would be unfair. The only comparison that makes sense is for the use case where an Easy Batch job is compared to a Spring Batch job with a single chunk-oriented step and using an in-memory job repository. And this is what I'm going to use in this post.
The goal of this post is to compare both frameworks in terms of features with a practical example. Since I am the author of Easy Batch, you may think the comparison will be biased. It will not be the case! I am going to be objective, I am always a constructive person. I am a big fan of Spring Framework and all related projects. My goal is not to say that Easy Batch is better than Spring Batch or vice versa, the goal is to say in which situation it is better to use one framework over the other. If you want the short answer to which framework is better, here it is: Spring Batch is better! (See my opinion in the conclusion of this post).
The use case will be reading some tweets from a flat file and printing them out in uppercase to the standard output. The data source is the following tweets.csv file:
As you can see, Spring Batch still require to configure some technical stuff you might not really need, which is not the case for Easy Batch. And this is exactly what many people are complaining about. Here are some examples:
"Spring Batch application grows pretty quick and involves configuring a lot of stuff that, at the outset, it just doesn't seem like you should need to configure. A "job repository" to track the status and history of job executions, which itself requires a data source - just to get started? Wow, that's a bit heavy handed"
"I got a little overwhelmed by the complexity and amount of configuration needed for even a simple example"
"What should we think of the Spring Batch solution? Complex. Obviously, it looks more complicated than the simple approaches. This is typical of a framework: the learning curve is steeper."
"il faut configurer le composant qui permet de lancer un batch, le « jobLauncher ». Simple, mais on voit que l’on a besoin d’un « jobRepository » qui permet de suivre et de reprendre l’avancement des tâches.
On voit que l’on a besoin d’un transaction manager. Cette propriété est obligatoire, ce qui est à mon sens dommage pour les cas simples comme le nôtre où nous n’utilisons pas les transactions."
Most of these posts are quite recent, there are a couple of them that seem to be outdated, but this is still true for the last version of Spring Batch (v3.0.3 as of writing this post).
EDIT 02/10/2017: Most people are complaining about the complexity of configuration of Spring Batch jobs (I am not one of them, read more about this later in this post). This complexity is not relevant anymore today thanks to the amazing Spring Boot project and the @EnableBatchProcessing annotation.
These reactions from the community can be summarized in 3 points:
Steep learning curve
Complex configuration
Mandatory components that you have to configure but probably don't need
Personally, steep learning curve is not a problem if it worth it (and it does for Spring Batch!). Complex configuration is also a point that I can accept. But my concern is being forced to configure components I might not need:
If my application does not require transactions, why do I need to configure a transaction manager?
If my application does not need retry on failure or job history, why do I need to configure a Job Repository (even in memory)?
If my application does not write anything, why do I need to specify a writer?
If my application does not need chunk processing, why do I need to specify a commit-interval?
There is certainly a good reason for each of these components and I have tried to answer these questions according to my understanding of the framework's internals:
If my application does not require transactions, why do I need to configure a transaction manager?
If my application does not need retry on failure or job history, why do I need to configure a Job Repository (even in memory)?
Spring Batch persists the state of the job in a database to be able to restart it where it left off in case of failure. To persist the state of the job, a transaction manager and a job repository are required. But those could have been made optional by default in case there is no requirement to retry the job on failure.
If my application does not write anything, why do I need to specify a writer?
What I am referring to is for example a batch job that counts the number of invalid records in a flat file. In this case, we don't write data anywhere (unless you define assigning a value to a variable as some kind of writing). Of course, one can use a NoOpItemWriter, but again, this is configuring a component we don't need.
If my application does not need chunk processing, why do I need to specify a commit-interval?
Coming from the unix world, I am used to tools that are record wise (sed, awk and friends). In some situations, we don't really need chunk processing. If I take the same example of a batch job that counts the number of invalid records in a flat file, providing a commit-interval makes no sense. But it makes a perfect sense in other situations! If the job is to persist data to a database, it is wise to use chunk processing and specifiy a reasonable commit-interval for performance reasons (Do you imagine committing a transaction for each record?). So like others, the chunk processing model could have been made optional by default (probably by providing a step implementation that is record wise in addition to the chunk-oriented one and give the choice to the user).
To summarize, there is nothing wrong with these components but I differ with the choice of the defaults. Spring Batch is well suited for use cases where you really need advanced features like retry on failure, remote chunking , flows, etc. When such advanced features are not needed, usually in-house solutions are created from scratch (I have seen a lot of them). And this is where Easy Batch comes to play, as a middle solution between Spring Batch and the "Do It Yourself" way:
Easy Batch is probably easier to learn, configure and use, but this does not make it suitable for all use cases (which was not the goal in the first place). Here is a side by side comparison of features between both frameworks:
Feature
Spring Batch
Easy Batch
Learning curve
Steep
Small
POJO based development
Yes
Yes
Parallel processing
Yes
Yes
Asynchronous processing
Yes
Yes
Real time monitoring
Yes
Yes
Job configuration
Java, Xml, Annotations
Java
Transaction management
Declarative, Programmatic
Declarative, Programmatic
Chunk processing
Yes
Yes
Chunk scanning
Yes
Yes
Fault tolerance features
Yes
Yes
Job meta-data persistence
Yes
No
Remote job administration
Yes
No
Remote partitioning
Yes
No
Remote chunking
Yes
No
Implements the JSR 352
Yes
No
There is no doubt, Spring Batch is ahead of Easy Batch in terms of features. But this comes with a cost: a complex configuration and a steep learning curve. As always, it is a trade-off: you can't have the cake and it eat it too 😄 The goal of Easy Batch is to keep the framework small and easy but at the same time extensible and flexible with smart defaults to cover the majority of use cases.
Conclusion:
I hope this post gives you some insights on both frameworks to help you choose which one to use in which situation. But in the end, the choice should be pragmatic: choose the right tool for the right job! If your application requires advanced features like retry on failure, remoting or flows, then go for Spring Batch (or another implementation of JSR 352). If you don't need all this advanced stuff, then Easy Batch can be very handy to simplify your batch application development (a real world example can be found here).
Let me conclude with my honest opinion about both frameworks: Spring Batch is better! Easy Batch is easier. Spring Batch is made by smart people working full time on the framework. Easy Batch, on the other hand, is made by an open source hacker working on the framework during nights and weekends (with the help of some great contributors). It is not the same league. Easy Batch will always be the little brother of Spring Batch 😄
The text was updated successfully, but these errors were encountered:
Easy Batch and Spring Batch fundamentally try to solve the same problem: how to efficiently process large amount of data in batch mode. But they are conceptually different at several levels:
Job structure:
A job in Spring Batch is a collection of steps. A step can be a single task or chunk-oriented. In Easy Batch, there is no concept of step. A job in Easy Batch is similar to a Spring Batch job with a single chunk-oriented step (and using an in-memory job repository).
Job definition:
Spring Batch provides a DSL to define the execution flow of steps within a job. In Easy Batch, there is no such DSL. Creating a workflow of jobs is left to an external workflow engine like Easy Flows.
Job execution:
A Spring Batch job can have multiple job instances (identified by (identifying) job parameters). Each job instance may in turn have multiple executions. In Easy Batch, there is no such job instance or job execution concepts. Jobs are
Callable
objects that can be executed with aJobExecutor
orExecutorService
.In Spring Batch, the execution state of jobs is persisted by default in a map based job repository (which can be in-memory database as well). This is not the case for Easy Batch: By default, jobs are executed without persisting their state in a persistent store (but it is possible to do it using listeners).
This quick comparison should give you an overview of conceptual differences. There is nothing wrong with both frameworks, they just have different design choices and defaults:
Spring Batch is designed for large scale jobs. Restarting such jobs from scratch is not efficient. Hence, persisting the job state by default to restart it where it left off in case of failure makes perfect sense.
Easy Batch is targeted at small and simple ETL jobs. These jobs are in most cases idempotent (or at least can be designed to be idempotent). Such jobs can be restarted from scratch if they fail without any problem. The design choice of not persisting the job state by default makes sense in these cases.
Based on these conceptual differences, comparing Easy Batch and Spring Batch would be unfair. The only comparison that makes sense is for the use case where an Easy Batch job is compared to a Spring Batch job with a single chunk-oriented step and using an in-memory job repository. And this is what I'm going to use in this post.
The goal of this post is to compare both frameworks in terms of features with a practical example. Since I am the author of Easy Batch, you may think the comparison will be biased. It will not be the case! I am going to be objective, I am always a constructive person. I am a big fan of Spring Framework and all related projects. My goal is not to say that Easy Batch is better than Spring Batch or vice versa, the goal is to say in which situation it is better to use one framework over the other. If you want the short answer to which framework is better, here it is: Spring Batch is better! (See my opinion in the conclusion of this post).
The use case will be reading some tweets from a flat file and printing them out in uppercase to the standard output. The data source is the following
tweets.csv
file:Records will be mapped to the following domain object:
Easy Batch implementation:
First, let's create a processor to transform tweets to uppercase:
Then, configure a job and run the application:
Spring Batch implementation:
First, let's create a processor to transform tweets to uppercase:
Then, create a writer:
And finally, configure the application:
Here is the class to launch the application with Spring Batch:
Comparison:
As you can see, Spring Batch still require to configure some technical stuff you might not really need, which is not the case for Easy Batch. And this is exactly what many people are complaining about. Here are some examples:
Most of these posts are quite recent, there are a couple of them that seem to be outdated, but this is still true for the last version of Spring Batch (v3.0.3 as of writing this post).
EDIT 02/10/2017: Most people are complaining about the complexity of configuration of Spring Batch jobs (I am not one of them, read more about this later in this post). This complexity is not relevant anymore today thanks to the amazing Spring Boot project and the
@EnableBatchProcessing
annotation.These reactions from the community can be summarized in 3 points:
Personally, steep learning curve is not a problem if it worth it (and it does for Spring Batch!). Complex configuration is also a point that I can accept. But my concern is being forced to configure components I might not need:
There is certainly a good reason for each of these components and I have tried to answer these questions according to my understanding of the framework's internals:
Spring Batch persists the state of the job in a database to be able to restart it where it left off in case of failure. To persist the state of the job, a transaction manager and a job repository are required. But those could have been made optional by default in case there is no requirement to retry the job on failure.
What I am referring to is for example a batch job that counts the number of invalid records in a flat file. In this case, we don't write data anywhere (unless you define assigning a value to a variable as some kind of writing). Of course, one can use a
NoOpItemWriter
, but again, this is configuring a component we don't need.Coming from the unix world, I am used to tools that are record wise (sed, awk and friends). In some situations, we don't really need chunk processing. If I take the same example of a batch job that counts the number of invalid records in a flat file, providing a commit-interval makes no sense. But it makes a perfect sense in other situations! If the job is to persist data to a database, it is wise to use chunk processing and specifiy a reasonable commit-interval for performance reasons (Do you imagine committing a transaction for each record?). So like others, the chunk processing model could have been made optional by default (probably by providing a step implementation that is record wise in addition to the chunk-oriented one and give the choice to the user).
To summarize, there is nothing wrong with these components but I differ with the choice of the defaults. Spring Batch is well suited for use cases where you really need advanced features like retry on failure, remote chunking , flows, etc. When such advanced features are not needed, usually in-house solutions are created from scratch (I have seen a lot of them). And this is where Easy Batch comes to play, as a middle solution between Spring Batch and the "Do It Yourself" way:
Easy Batch is probably easier to learn, configure and use, but this does not make it suitable for all use cases (which was not the goal in the first place). Here is a side by side comparison of features between both frameworks:
There is no doubt, Spring Batch is ahead of Easy Batch in terms of features. But this comes with a cost: a complex configuration and a steep learning curve. As always, it is a trade-off: you can't have the cake and it eat it too 😄 The goal of Easy Batch is to keep the framework small and easy but at the same time extensible and flexible with smart defaults to cover the majority of use cases.
Conclusion:
I hope this post gives you some insights on both frameworks to help you choose which one to use in which situation. But in the end, the choice should be pragmatic: choose the right tool for the right job! If your application requires advanced features like retry on failure, remoting or flows, then go for Spring Batch (or another implementation of JSR 352). If you don't need all this advanced stuff, then Easy Batch can be very handy to simplify your batch application development (a real world example can be found here).
Let me conclude with my honest opinion about both frameworks: Spring Batch is better! Easy Batch is easier. Spring Batch is made by smart people working full time on the framework. Easy Batch, on the other hand, is made by an open source hacker working on the framework during nights and weekends (with the help of some great contributors). It is not the same league. Easy Batch will always be the little brother of Spring Batch 😄
The text was updated successfully, but these errors were encountered: