API Rate Limiting in Spring Boot: Protect Your Services from Abuse (2026)

Every public API eventually gets abused. A scraper hammers your product listing endpoint at 500 requests per second. A bug in a client application sends the same request in an infinite loop. A free-tier user discovers they can run a script that costs your infrastructure $500 a day. Without rate limiting, these scenarios take down your service or run up your cloud bill.

But rate limiting done wrong creates its own problems: legitimate users get blocked, SLAs are violated, and your customer support queue fills with "why is your API returning 429?" tickets. Good rate limiting is granular, transparent, and tuned to real usage patterns.

Choosing the Right Algorithm

There are three common rate limiting algorithms, each with different behavior:

Token Bucket (recommended for most APIs): Users accumulate tokens at a steady rate up to a maximum bucket size. Each request consumes a token. If the bucket is empty, the request is rejected. This allows short bursts (consuming saved tokens) while maintaining an average rate limit. It's the most user-friendly algorithm.

Fixed Window Counter: Count requests in a fixed time window (e.g., 100 requests per minute). Simple to implement and explain, but has an edge case: a user can make 100 requests at second 59, and 100 more at second 61 — getting 200 requests in 2 seconds while technically staying within the limit.

Sliding Window: Averages the request rate over a rolling window, eliminating the fixed window burst problem. More accurate but slightly more expensive to compute.

For most Spring Boot APIs, the token bucket algorithm (implemented by Bucket4j) provides the best balance of user experience and protection.

Setup: Bucket4j with Redis

Bucket4j is the de facto rate limiting library for Java. Combined with Redis for distributed state, it works correctly across multiple application instances:

<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-redis</artifactId>
    <version>8.10.1</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

Implementing Rate Limiting as a Filter

The cleanest way to add rate limiting is as a servlet filter that intercepts all requests before they reach controllers:

@Component
@RequiredArgsConstructor
@Order(Ordered.HIGHEST_PRECEDENCE)  // Run before other filters
public class RateLimitFilter extends OncePerRequestFilter {

    private final RateLimitService rateLimitService;

    @Override
    protected void doFilterInternal(HttpServletRequest request,
                                    HttpServletResponse response,
                                    FilterChain chain) throws IOException, ServletException {

        String clientKey = resolveClientKey(request);
        RateLimitResult result = rateLimitService.tryConsume(clientKey);

        // Always include rate limit headers — clients need this information
        response.addHeader("X-RateLimit-Limit", String.valueOf(result.limit()));
        response.addHeader("X-RateLimit-Remaining", String.valueOf(result.remaining()));
        response.addHeader("X-RateLimit-Reset", String.valueOf(result.resetTimeEpochSeconds()));

        if (result.allowed()) {
            chain.doFilter(request, response);
        } else {
            response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
            response.setContentType(MediaType.APPLICATION_JSON_VALUE);
            response.addHeader("Retry-After", String.valueOf(result.retryAfterSeconds()));
            response.getWriter().write("""
                {
                  "error": "RATE_LIMIT_EXCEEDED",
                  "message": "Too many requests. Please wait %d seconds before retrying.",
                  "retryAfter": %d
                }""".formatted(result.retryAfterSeconds(), result.retryAfterSeconds()));
        }
    }

    private String resolveClientKey(HttpServletRequest request) {
        // Priority: authenticated user ID > API key > IP address
        String userId = (String) request.getAttribute("userId");
        if (userId != null) return "user:" + userId;

        String apiKey = request.getHeader("X-API-Key");
        if (apiKey != null) return "apikey:" + apiKey;

        return "ip:" + getClientIp(request);
    }
}

The Rate Limit Service

@Service
@RequiredArgsConstructor
public class RateLimitService {

    private final RedissonClient redissonClient;
    private final RateLimitConfig config;

    public RateLimitResult tryConsume(String clientKey) {
        // Determine the plan for this client
        RateLimitPlan plan = config.getPlanForClient(clientKey);

        // Create or get existing bucket for this client
        ProxyManager<String> buckets = Bucket4j.extension(JCacheProxyManager.class)
            .builder().build(redissonClient.getMap("rate-limit-buckets"));

        Bucket bucket = buckets.builder()
            .addLimit(Bandwidth.builder()
                .capacity(plan.burstCapacity())
                .refillGreedy(plan.requestsPerMinute(), Duration.ofMinutes(1))
                .build())
            .build(clientKey, plan::toBucketConfiguration);

        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);

        return new RateLimitResult(
            probe.isConsumed(),
            plan.burstCapacity(),
            probe.getRemainingTokens(),
            probe.getNanosToWaitForRefill() / 1_000_000_000L  // Convert to seconds
        );
    }
}

Tiered Rate Limits by Plan

Real-world APIs have different limits for different users. A free user shouldn't have the same limits as an enterprise customer:

public enum RateLimitPlan {
    FREE(60, 20),           // 60 req/min, burst of 20
    STANDARD(300, 60),      // 300 req/min, burst of 60
    PROFESSIONAL(1000, 200), // 1000 req/min, burst of 200
    ENTERPRISE(10000, 2000); // 10000 req/min, burst of 2000

    private final int requestsPerMinute;
    private final int burstCapacity;

    RateLimitPlan(int requestsPerMinute, int burstCapacity) {
        this.requestsPerMinute = requestsPerMinute;
        this.burstCapacity = burstCapacity;
    }
}

@Service
@RequiredArgsConstructor
public class RateLimitConfig {

    private final UserSubscriptionRepository subscriptionRepo;
    private final Cache<String, RateLimitPlan> planCache;  // Cache plan lookups

    public RateLimitPlan getPlanForClient(String clientKey) {
        if (clientKey.startsWith("ip:")) {
            return RateLimitPlan.FREE;  // Unauthenticated = free tier
        }

        // Cache plan for 5 minutes — avoid DB lookup per request
        return planCache.get(clientKey, key -> {
            String userId = key.replace("user:", "");
            return subscriptionRepo.findPlanByUserId(userId)
                .map(RateLimitPlan::valueOf)
                .orElse(RateLimitPlan.FREE);
        });
    }
}

Endpoint-Specific Limits

Some endpoints need stricter limits than others. Authentication endpoints are a classic example — brute force attacks target them specifically:

@Component
@RequiredArgsConstructor
public class EndpointRateLimitFilter extends OncePerRequestFilter {

    private final Map<String, RateLimitPlan> endpointLimits = Map.of(
        "/api/v1/auth/login", new SpecificPlan(5, 10),     // 5 req/min per IP — brute force protection
        "/api/v1/auth/register", new SpecificPlan(3, 5),   // 3 registrations per minute
        "/api/v1/search", new SpecificPlan(30, 60)         // Search is expensive
    );

    @Override
    protected void doFilterInternal(HttpServletRequest req,
                                    HttpServletResponse res,
                                    FilterChain chain) throws IOException, ServletException {
        String path = req.getRequestURI();
        SpecificPlan endpointPlan = endpointLimits.get(path);

        if (endpointPlan != null) {
            // Apply endpoint-specific limit on top of the global limit
            String key = "endpoint:" + path + ":" + getClientIp(req);
            if (!tryConsume(key, endpointPlan)) {
                res.setStatus(429);
                return;
            }
        }
        chain.doFilter(req, res);
    }
}

Spring Cloud Gateway: Rate Limiting at the Edge

If you use Spring Cloud Gateway, rate limiting belongs at the gateway level — before requests even reach your microservices:

# application.yml — Spring Cloud Gateway
spring:
  cloud:
    gateway:
      routes:
        - id: order-service
          uri: lb://order-service
          predicates:
            - Path=/api/v1/orders/**
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 100    # tokens/second
                redis-rate-limiter.burstCapacity: 200    # max burst
                redis-rate-limiter.requestedTokens: 1
                key-resolver: "#{@userKeyResolver}"      # Spring bean

@Bean
public KeyResolver userKeyResolver() {
    return exchange -> exchange.getPrincipal()
        .map(Principal::getName)
        .defaultIfEmpty("anonymous");
}

Monitoring Rate Limit Violations

Rate limit violations are signal — they tell you which clients are hitting limits, which endpoints are under pressure, and whether your limits are calibrated correctly:

@Component
public class RateLimitMetrics {

    private final MeterRegistry registry;

    public void recordViolation(String clientKey, String endpoint) {
        Counter.builder("api.rate_limit.violations")
            .tag("client_type", clientKey.split(":")[0])  // user, apikey, ip
            .tag("endpoint", endpoint)
            .register(registry)
            .increment();
    }
}

A sudden spike in rate limit violations from legitimate users means your limits are too tight. Sustained violations from a single IP mean someone is trying to abuse the API. Both cases are worth monitoring and alerting on.

Common Mistakes to Avoid

Rate limiting only by IP address — proxies and NAT can share an IP across many legitimate users; always prefer user ID or API key when available
Not including rate limit headers — clients that don't know they're being rate limited will retry immediately, making the problem worse; always include X-RateLimit-Remaining and Retry-After
Storing rate limit state in application memory — works for a single instance but breaks with horizontal scaling; always use Redis
Same limits for all endpoints — authentication endpoints need much stricter limits than regular API calls; tune per-endpoint limits separately

Summary

Effective API rate limiting in Spring Boot uses Bucket4j with Redis for distributed state, the token bucket algorithm for user-friendly burst allowances, tiered limits by subscription plan, and proper response headers so clients can self-regulate. Monitor violations with Micrometer to keep limits calibrated to real usage.

Secure and Performant APIs with JOptimize

Rate limiting protects you from external abuse. JOptimize protects you from internal performance issues — N+1 queries, missing indexes, and over-fetching that slow down every request regardless of rate limits.

IntelliJ Plugin — performance analysis for Spring Boot APIs: Install JOptimize
Web Dashboard — full API performance audit: Analyze your project free →

Protect your API at every layer.