Processing large datasets with a for-loop causes OOM errors, long transactions, and no restart capability. Spring Batch provides chunk-oriented processing, restartability, and parallel steps.
JOptimize Team
When a business asks you to "send an email to all 2 million users" or "recalculate the scores for every order in the last year", the naive implementation is a service method that loops over everything. This approach: loads everything into memory (OOM), runs in one transaction (no recovery on failure), has no progress tracking, and blocks the application for hours.
Spring Batch is the right tool for this problem.
Job → Step(s) → Chunk (Read → Process → Write) Job: A complete batch process (e.g., "monthly billing") Step: One phase of a job (e.g., "read orders", "generate invoices", "send emails") Chunk: Read N items, process them, write them — repeat until exhausted
Chunk-oriented processing is the key: read 100 rows, process them, write them, commit — then start the next chunk. If the job fails at chunk 5000, it restarts from chunk 5000, not from the beginning.
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency>
# application.properties spring.batch.job.enabled=false # Don't run all jobs on startup — run explicitly spring.batch.jdbc.initialize-schema=always # Create batch metadata tables
@Configuration @EnableBatchProcessing @RequiredArgsConstructor public class OrderProcessingJobConfig { private final JobRepository jobRepository; private final PlatformTransactionManager transactionManager; private final DataSource dataSource; @Bean public Job orderProcessingJob(Step processOrdersStep) { return new JobBuilder("orderProcessingJob", jobRepository) .start(processOrdersStep) .build(); } @Bean public Step processOrdersStep( JdbcCursorItemReader<Order> orderReader, OrderItemProcessor processor, JdbcBatchItemWriter<ProcessedOrder> orderWriter) { return new StepBuilder("processOrdersStep", jobRepository) .<Order, ProcessedOrder>chunk(500, transactionManager) // 500 items per chunk .reader(orderReader) .processor(processor) .writer(orderWriter) .faultTolerant() .skip(InvalidOrderException.class) // Skip bad records .skipLimit(100) // But no more than 100 .build(); } @Bean public JdbcCursorItemReader<Order> orderReader() { return new JdbcCursorItemReaderBuilder<Order>() .name("orderReader") .dataSource(dataSource) .sql("SELECT * FROM orders WHERE status = 'PENDING' ORDER BY id") .rowMapper(new BeanPropertyRowMapper<>(Order.class)) .fetchSize(500) // Fetch 500 rows from DB at a time (stream, not load all) .build(); } @Bean public JdbcBatchItemWriter<ProcessedOrder> orderWriter() { return new JdbcBatchItemWriterBuilder<ProcessedOrder>() .dataSource(dataSource) .sql("UPDATE orders SET status = :status, processed_at = :processedAt WHERE id = :id") .beanMapped() .build(); } }
@Component public class OrderItemProcessor implements ItemProcessor<Order, ProcessedOrder> { private final PricingService pricingService; @Override public ProcessedOrder process(Order order) throws Exception { if (order.getTotal().compareTo(BigDecimal.ZERO) <= 0) { throw new InvalidOrderException("Zero-value order: " + order.getId()); // Skipped (not retried) due to .skip(InvalidOrderException.class) } BigDecimal taxAmount = pricingService.calculateTax(order); return new ProcessedOrder( order.getId(), order.getTotal().add(taxAmount), OrderStatus.PROCESSED, LocalDateTime.now() ); } }
Return null from process() to filter out an item (it won't be written).
@Bean public Job parallelProcessingJob( Step readOrdersStep, Step sendEmailsStep, Step generateReportsStep) { // Steps 1 and 2 run in parallel, then step 3 runs after both complete Flow parallelFlow = new FlowBuilder<SimpleFlow>("parallelFlow") .split(new SimpleAsyncTaskExecutor()) .add( new FlowBuilder<SimpleFlow>("ordersFlow").start(readOrdersStep).build(), new FlowBuilder<SimpleFlow>("emailsFlow").start(sendEmailsStep).build() ) .build(); return new JobBuilder("parallelProcessingJob", jobRepository) .start(parallelFlow) .next(generateReportsStep) // Runs after both parallel steps complete .build() .build(); }
@Bean public Step partitionedOrderStep( PartitionHandler partitionHandler, Partitioner orderPartitioner) { return new StepBuilder("partitionedOrderStep", jobRepository) .partitioner("workerStep", orderPartitioner) .partitionHandler(partitionHandler) .build(); } @Bean public Partitioner orderPartitioner() { return gridSize -> { // Split order IDs into gridSize ranges Map<String, ExecutionContext> partitions = new HashMap<>(); long minId = orderRepo.findMinId(); long maxId = orderRepo.findMaxId(); long rangeSize = (maxId - minId) / gridSize + 1; for (int i = 0; i < gridSize; i++) { ExecutionContext context = new ExecutionContext(); context.putLong("minId", minId + (i * rangeSize)); context.putLong("maxId", minId + ((i + 1) * rangeSize) - 1); partitions.put("partition" + i, context); } return partitions; }; } @Bean public PartitionHandler partitionHandler(Step workerStep) { TaskExecutorPartitionHandler handler = new TaskExecutorPartitionHandler(); handler.setTaskExecutor(Executors.newFixedThreadPool(8)); handler.setStep(workerStep); handler.setGridSize(8); // 8 parallel partitions return handler; }
With 8 partitions on a 10M-row table, each partition processes ~1.25M rows. Total time ≈ time for one partition.
@Component @RequiredArgsConstructor public class BatchJobScheduler { private final JobLauncher jobLauncher; private final Job orderProcessingJob; @Scheduled(cron = "0 0 2 * * *") // Every day at 2 AM public void runDailyProcessing() throws Exception { JobParameters params = new JobParametersBuilder() .addLocalDateTime("runAt", LocalDateTime.now()) // Unique param per run .toJobParameters(); jobLauncher.run(orderProcessingJob, params); } }
fetchSize on the reader — without it, JdbcCursorItemReader loads the entire result set into memory before processing; always set fetchSize to chunk size.skip() for known transient or validation errorsspring.batch.job.enabled=true runs all jobs every time the app starts; disable it and trigger explicitlySpring Batch processes large datasets correctly via chunk-oriented processing: read N items, process them, write them, commit — then move to the next chunk. This bounds memory usage, enables automatic restart from failure, supports skip/retry policies for bad records, and can be parallelized via partitioning. For any batch operation larger than a few thousand records, Spring Batch is the correct tool.
JOptimize flags findAll() calls in scheduled tasks, unbounded loops processing database records, and single-transaction bulk operations that should be Spring Batch jobs.
Process millions of records without OOM errors — free scan.
Master Spring Boot, security, and Java performance with hands-on courses.
JOptimize finds N+1 queries, EAGER collections, and 70+ other issues in your Java codebase — in under 30 seconds.