An unprotected API is one burst of traffic away from going down. Rate limiting protects your service from abuse, runaway clients, and accidental DDoS. Here's how to implement it properly.
JOptimize Team
Every public API eventually gets abused. A scraper hammers your product listing endpoint at 500 requests per second. A bug in a client application sends the same request in an infinite loop. A free-tier user discovers they can run a script that costs your infrastructure $500 a day. Without rate limiting, these scenarios take down your service or run up your cloud bill.
But rate limiting done wrong creates its own problems: legitimate users get blocked, SLAs are violated, and your customer support queue fills with "why is your API returning 429?" tickets. Good rate limiting is granular, transparent, and tuned to real usage patterns.
There are three common rate limiting algorithms, each with different behavior:
Token Bucket (recommended for most APIs): Users accumulate tokens at a steady rate up to a maximum bucket size. Each request consumes a token. If the bucket is empty, the request is rejected. This allows short bursts (consuming saved tokens) while maintaining an average rate limit. It's the most user-friendly algorithm.
Fixed Window Counter: Count requests in a fixed time window (e.g., 100 requests per minute). Simple to implement and explain, but has an edge case: a user can make 100 requests at second 59, and 100 more at second 61 — getting 200 requests in 2 seconds while technically staying within the limit.
Sliding Window: Averages the request rate over a rolling window, eliminating the fixed window burst problem. More accurate but slightly more expensive to compute.
For most Spring Boot APIs, the token bucket algorithm (implemented by Bucket4j) provides the best balance of user experience and protection.
Bucket4j is the de facto rate limiting library for Java. Combined with Redis for distributed state, it works correctly across multiple application instances:
<dependency> <groupId>com.bucket4j</groupId> <artifactId>bucket4j-redis</artifactId> <version>8.10.1</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-redis</artifactId> </dependency>
The cleanest way to add rate limiting is as a servlet filter that intercepts all requests before they reach controllers:
@Component @RequiredArgsConstructor @Order(Ordered.HIGHEST_PRECEDENCE) // Run before other filters public class RateLimitFilter extends OncePerRequestFilter { private final RateLimitService rateLimitService; @Override protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws IOException, ServletException { String clientKey = resolveClientKey(request); RateLimitResult result = rateLimitService.tryConsume(clientKey); // Always include rate limit headers — clients need this information response.addHeader("X-RateLimit-Limit", String.valueOf(result.limit())); response.addHeader("X-RateLimit-Remaining", String.valueOf(result.remaining())); response.addHeader("X-RateLimit-Reset", String.valueOf(result.resetTimeEpochSeconds())); if (result.allowed()) { chain.doFilter(request, response); } else { response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value()); response.setContentType(MediaType.APPLICATION_JSON_VALUE); response.addHeader("Retry-After", String.valueOf(result.retryAfterSeconds())); response.getWriter().write(""" { "error": "RATE_LIMIT_EXCEEDED", "message": "Too many requests. Please wait %d seconds before retrying.", "retryAfter": %d }""".formatted(result.retryAfterSeconds(), result.retryAfterSeconds())); } } private String resolveClientKey(HttpServletRequest request) { // Priority: authenticated user ID > API key > IP address String userId = (String) request.getAttribute("userId"); if (userId != null) return "user:" + userId; String apiKey = request.getHeader("X-API-Key"); if (apiKey != null) return "apikey:" + apiKey; return "ip:" + getClientIp(request); } }
@Service @RequiredArgsConstructor public class RateLimitService { private final RedissonClient redissonClient; private final RateLimitConfig config; public RateLimitResult tryConsume(String clientKey) { // Determine the plan for this client RateLimitPlan plan = config.getPlanForClient(clientKey); // Create or get existing bucket for this client ProxyManager<String> buckets = Bucket4j.extension(JCacheProxyManager.class) .builder().build(redissonClient.getMap("rate-limit-buckets")); Bucket bucket = buckets.builder() .addLimit(Bandwidth.builder() .capacity(plan.burstCapacity()) .refillGreedy(plan.requestsPerMinute(), Duration.ofMinutes(1)) .build()) .build(clientKey, plan::toBucketConfiguration); ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1); return new RateLimitResult( probe.isConsumed(), plan.burstCapacity(), probe.getRemainingTokens(), probe.getNanosToWaitForRefill() / 1_000_000_000L // Convert to seconds ); } }
Real-world APIs have different limits for different users. A free user shouldn't have the same limits as an enterprise customer:
public enum RateLimitPlan { FREE(60, 20), // 60 req/min, burst of 20 STANDARD(300, 60), // 300 req/min, burst of 60 PROFESSIONAL(1000, 200), // 1000 req/min, burst of 200 ENTERPRISE(10000, 2000); // 10000 req/min, burst of 2000 private final int requestsPerMinute; private final int burstCapacity; RateLimitPlan(int requestsPerMinute, int burstCapacity) { this.requestsPerMinute = requestsPerMinute; this.burstCapacity = burstCapacity; } } @Service @RequiredArgsConstructor public class RateLimitConfig { private final UserSubscriptionRepository subscriptionRepo; private final Cache<String, RateLimitPlan> planCache; // Cache plan lookups public RateLimitPlan getPlanForClient(String clientKey) { if (clientKey.startsWith("ip:")) { return RateLimitPlan.FREE; // Unauthenticated = free tier } // Cache plan for 5 minutes — avoid DB lookup per request return planCache.get(clientKey, key -> { String userId = key.replace("user:", ""); return subscriptionRepo.findPlanByUserId(userId) .map(RateLimitPlan::valueOf) .orElse(RateLimitPlan.FREE); }); } }
Some endpoints need stricter limits than others. Authentication endpoints are a classic example — brute force attacks target them specifically:
@Component @RequiredArgsConstructor public class EndpointRateLimitFilter extends OncePerRequestFilter { private final Map<String, RateLimitPlan> endpointLimits = Map.of( "/api/v1/auth/login", new SpecificPlan(5, 10), // 5 req/min per IP — brute force protection "/api/v1/auth/register", new SpecificPlan(3, 5), // 3 registrations per minute "/api/v1/search", new SpecificPlan(30, 60) // Search is expensive ); @Override protected void doFilterInternal(HttpServletRequest req, HttpServletResponse res, FilterChain chain) throws IOException, ServletException { String path = req.getRequestURI(); SpecificPlan endpointPlan = endpointLimits.get(path); if (endpointPlan != null) { // Apply endpoint-specific limit on top of the global limit String key = "endpoint:" + path + ":" + getClientIp(req); if (!tryConsume(key, endpointPlan)) { res.setStatus(429); return; } } chain.doFilter(req, res); } }
If you use Spring Cloud Gateway, rate limiting belongs at the gateway level — before requests even reach your microservices:
# application.yml — Spring Cloud Gateway spring: cloud: gateway: routes: - id: order-service uri: lb://order-service predicates: - Path=/api/v1/orders/** filters: - name: RequestRateLimiter args: redis-rate-limiter.replenishRate: 100 # tokens/second redis-rate-limiter.burstCapacity: 200 # max burst redis-rate-limiter.requestedTokens: 1 key-resolver: "#{@userKeyResolver}" # Spring bean
@Bean public KeyResolver userKeyResolver() { return exchange -> exchange.getPrincipal() .map(Principal::getName) .defaultIfEmpty("anonymous"); }
Rate limit violations are signal — they tell you which clients are hitting limits, which endpoints are under pressure, and whether your limits are calibrated correctly:
@Component public class RateLimitMetrics { private final MeterRegistry registry; public void recordViolation(String clientKey, String endpoint) { Counter.builder("api.rate_limit.violations") .tag("client_type", clientKey.split(":")[0]) // user, apikey, ip .tag("endpoint", endpoint) .register(registry) .increment(); } }
A sudden spike in rate limit violations from legitimate users means your limits are too tight. Sustained violations from a single IP mean someone is trying to abuse the API. Both cases are worth monitoring and alerting on.
X-RateLimit-Remaining and Retry-AfterEffective API rate limiting in Spring Boot uses Bucket4j with Redis for distributed state, the token bucket algorithm for user-friendly burst allowances, tiered limits by subscription plan, and proper response headers so clients can self-regulate. Monitor violations with Micrometer to keep limits calibrated to real usage.
Rate limiting protects you from external abuse. JOptimize protects you from internal performance issues — N+1 queries, missing indexes, and over-fetching that slow down every request regardless of rate limits.
Protect your API at every layer.
Master Spring Boot, security, and Java performance with hands-on courses.
JOptimize finds N+1 queries, EAGER collections, and 70+ other issues in your Java codebase — in under 30 seconds.