Spring Batch: Process Millions of Records Without Running Out of Memory (2026)

When a business asks you to "send an email to all 2 million users" or "recalculate the scores for every order in the last year", the naive implementation is a service method that loops over everything. This approach: loads everything into memory (OOM), runs in one transaction (no recovery on failure), has no progress tracking, and blocks the application for hours.

Spring Batch is the right tool for this problem.

Core Concepts

Job → Step(s) → Chunk (Read → Process → Write)

Job:    A complete batch process (e.g., "monthly billing")
Step:   One phase of a job (e.g., "read orders", "generate invoices", "send emails")
Chunk:  Read N items, process them, write them — repeat until exhausted

Chunk-oriented processing is the key: read 100 rows, process them, write them, commit — then start the next chunk. If the job fails at chunk 5000, it restarts from chunk 5000, not from the beginning.

Setup

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>

# application.properties
spring.batch.job.enabled=false   # Don't run all jobs on startup — run explicitly
spring.batch.jdbc.initialize-schema=always  # Create batch metadata tables

A Complete Job: Process Orders

@Configuration
@EnableBatchProcessing
@RequiredArgsConstructor
public class OrderProcessingJobConfig {

    private final JobRepository jobRepository;
    private final PlatformTransactionManager transactionManager;
    private final DataSource dataSource;

    @Bean
    public Job orderProcessingJob(Step processOrdersStep) {
        return new JobBuilder("orderProcessingJob", jobRepository)
            .start(processOrdersStep)
            .build();
    }

    @Bean
    public Step processOrdersStep(
            JdbcCursorItemReader<Order> orderReader,
            OrderItemProcessor processor,
            JdbcBatchItemWriter<ProcessedOrder> orderWriter) {

        return new StepBuilder("processOrdersStep", jobRepository)
            .<Order, ProcessedOrder>chunk(500, transactionManager)  // 500 items per chunk
            .reader(orderReader)
            .processor(processor)
            .writer(orderWriter)
            .faultTolerant()
                .skip(InvalidOrderException.class)   // Skip bad records
                .skipLimit(100)                       // But no more than 100
            .build();
    }

    @Bean
    public JdbcCursorItemReader<Order> orderReader() {
        return new JdbcCursorItemReaderBuilder<Order>()
            .name("orderReader")
            .dataSource(dataSource)
            .sql("SELECT * FROM orders WHERE status = 'PENDING' ORDER BY id")
            .rowMapper(new BeanPropertyRowMapper<>(Order.class))
            .fetchSize(500)   // Fetch 500 rows from DB at a time (stream, not load all)
            .build();
    }

    @Bean
    public JdbcBatchItemWriter<ProcessedOrder> orderWriter() {
        return new JdbcBatchItemWriterBuilder<ProcessedOrder>()
            .dataSource(dataSource)
            .sql("UPDATE orders SET status = :status, processed_at = :processedAt WHERE id = :id")
            .beanMapped()
            .build();
    }
}

Item Processor

@Component
public class OrderItemProcessor implements ItemProcessor<Order, ProcessedOrder> {

    private final PricingService pricingService;

    @Override
    public ProcessedOrder process(Order order) throws Exception {
        if (order.getTotal().compareTo(BigDecimal.ZERO) <= 0) {
            throw new InvalidOrderException("Zero-value order: " + order.getId());
            // Skipped (not retried) due to .skip(InvalidOrderException.class)
        }

        BigDecimal taxAmount = pricingService.calculateTax(order);
        return new ProcessedOrder(
            order.getId(),
            order.getTotal().add(taxAmount),
            OrderStatus.PROCESSED,
            LocalDateTime.now()
        );
    }
}

Return null from process() to filter out an item (it won't be written).

Parallel Steps

@Bean
public Job parallelProcessingJob(
        Step readOrdersStep,
        Step sendEmailsStep,
        Step generateReportsStep) {

    // Steps 1 and 2 run in parallel, then step 3 runs after both complete
    Flow parallelFlow = new FlowBuilder<SimpleFlow>("parallelFlow")
        .split(new SimpleAsyncTaskExecutor())
        .add(
            new FlowBuilder<SimpleFlow>("ordersFlow").start(readOrdersStep).build(),
            new FlowBuilder<SimpleFlow>("emailsFlow").start(sendEmailsStep).build()
        )
        .build();

    return new JobBuilder("parallelProcessingJob", jobRepository)
        .start(parallelFlow)
        .next(generateReportsStep)   // Runs after both parallel steps complete
        .build()
        .build();
}

Partitioned Processing: Scale Across Threads

@Bean
public Step partitionedOrderStep(
        PartitionHandler partitionHandler,
        Partitioner orderPartitioner) {

    return new StepBuilder("partitionedOrderStep", jobRepository)
        .partitioner("workerStep", orderPartitioner)
        .partitionHandler(partitionHandler)
        .build();
}

@Bean
public Partitioner orderPartitioner() {
    return gridSize -> {
        // Split order IDs into gridSize ranges
        Map<String, ExecutionContext> partitions = new HashMap<>();
        long minId = orderRepo.findMinId();
        long maxId = orderRepo.findMaxId();
        long rangeSize = (maxId - minId) / gridSize + 1;

        for (int i = 0; i < gridSize; i++) {
            ExecutionContext context = new ExecutionContext();
            context.putLong("minId", minId + (i * rangeSize));
            context.putLong("maxId", minId + ((i + 1) * rangeSize) - 1);
            partitions.put("partition" + i, context);
        }
        return partitions;
    };
}

@Bean
public PartitionHandler partitionHandler(Step workerStep) {
    TaskExecutorPartitionHandler handler = new TaskExecutorPartitionHandler();
    handler.setTaskExecutor(Executors.newFixedThreadPool(8));
    handler.setStep(workerStep);
    handler.setGridSize(8);   // 8 parallel partitions
    return handler;
}

With 8 partitions on a 10M-row table, each partition processes ~1.25M rows. Total time ≈ time for one partition.

Scheduling and Triggering Jobs

@Component
@RequiredArgsConstructor
public class BatchJobScheduler {

    private final JobLauncher jobLauncher;
    private final Job orderProcessingJob;

    @Scheduled(cron = "0 0 2 * * *")  // Every day at 2 AM
    public void runDailyProcessing() throws Exception {
        JobParameters params = new JobParametersBuilder()
            .addLocalDateTime("runAt", LocalDateTime.now())  // Unique param per run
            .toJobParameters();

        jobLauncher.run(orderProcessingJob, params);
    }
}

Common Mistakes to Avoid

Not setting fetchSize on the reader — without it, JdbcCursorItemReader loads the entire result set into memory before processing; always set fetchSize to chunk size
Too-large chunk size — a chunk of 10,000 rows holds a transaction open for longer; if the write fails, all 10,000 are rolled back; 100-500 is usually optimal
No skip/retry policy — one bad record stops the entire job; configure .skip() for known transient or validation errors
Running jobs on application startup — spring.batch.job.enabled=true runs all jobs every time the app starts; disable it and trigger explicitly

Summary

Spring Batch processes large datasets correctly via chunk-oriented processing: read N items, process them, write them, commit — then move to the next chunk. This bounds memory usage, enables automatic restart from failure, supports skip/retry policies for bad records, and can be parallelized via partitioning. For any batch operation larger than a few thousand records, Spring Batch is the correct tool.

Detect Batch Processing Anti-Patterns

JOptimize flags findAll() calls in scheduled tasks, unbounded loops processing database records, and single-transaction bulk operations that should be Spring Batch jobs.

IntelliJ Plugin — batch processing anti-pattern detection: Install JOptimize for IntelliJ
Web Dashboard — full scheduled task analysis: Analyze your project free →

Process millions of records without OOM errors — free scan.