Rate Limiter Low Level Design — Token Bucket, Sliding Window LLD Guide

Rate Limiter Low Level Design is a critical interview topic for platform engineering, API gateway, and backend roles. It tests your ability to model multiple algorithms — Token Bucket, Sliding Window Log, Fixed Window Counter — behind a clean interface using the Strategy pattern, and your understanding of distributed state and thread safety. This guide covers the complete rate limiter LLD solution with Java code.

Why Interviewers Ask Rate Limiter LLD

Can you abstract multiple algorithms behind a single interface?
Do you understand the trade-offs between Token Bucket vs Sliding Window?
Can you make it thread-safe without a global lock?
Can you extend it for distributed rate limiting across multiple servers?
Do you think about per-user vs global rate limits?

Functional Requirements

Allow or deny incoming requests based on a rate limit rule
Support per-user, per-IP, and global rate limits
Support multiple algorithms: Token Bucket, Fixed Window, Sliding Window Log
Rate limit rules configurable at runtime (limit, window duration)
Return remaining quota and retry-after time in the response

Non-Functional Requirements

Decisions in under 5ms — must not add significant latency to requests
Thread-safe — multiple request threads checking limits concurrently
For distributed: consistent rate limiting across multiple application instances

Core Entities

RateLimiter — main entry point; delegates to a RateLimitStrategy
RateLimitStrategy — interface; TokenBucket, FixedWindow, SlidingWindowLog implement it
RateLimitRule — limit (max requests), windowDuration, identifier (userId/IP)
RateLimitResult — allowed: boolean, remaining: int, retryAfterMs: long
BucketStore — stores per-user state (token count, window timestamps)

Text-Based Class Diagram

RateLimiter
+-- strategy: RateLimitStrategy
+-- isAllowed(userId, rule): RateLimitResult

RateLimitStrategy (interface)
+-- isAllowed(userId, rule): RateLimitResult

TokenBucketStrategy    implements RateLimitStrategy
FixedWindowStrategy    implements RateLimitStrategy
SlidingWindowStrategy  implements RateLimitStrategy

RateLimitRule
+-- maxRequests: int
+-- windowDurationMs: long
+-- identifier: String

RateLimitResult
+-- allowed: boolean
+-- remaining: int
+-- retryAfterMs: long

Token Bucket Algorithm — Java

public class TokenBucketStrategy implements RateLimitStrategy {
    private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();

    @Override
    public synchronized RateLimitResult isAllowed(String userId, RateLimitRule rule) {
        Bucket bucket = buckets.computeIfAbsent(userId,
            k -> new Bucket(rule.getMaxRequests(), rule.getWindowDurationMs()));
        bucket.refill();
        if (bucket.getTokens() >= 1) {
            bucket.consume();
            return new RateLimitResult(true, (int) bucket.getTokens(), 0);
        }
        return new RateLimitResult(false, 0, bucket.getRetryAfterMs());
    }

    private static class Bucket {
        private double tokens;
        private final double maxTokens;
        private final double refillRatePerMs;
        private long lastRefillTime;

        Bucket(int maxTokens, long windowMs) {
            this.maxTokens = maxTokens;
            this.tokens = maxTokens;
            this.refillRatePerMs = (double) maxTokens / windowMs;
            this.lastRefillTime = System.currentTimeMillis();
        }

        void refill() {
            long now = System.currentTimeMillis();
            double tokensToAdd = (now - lastRefillTime) * refillRatePerMs;
            tokens = Math.min(maxTokens, tokens + tokensToAdd);
            lastRefillTime = now;
        }

        void consume() { tokens -= 1; }
        double getTokens() { return tokens; }
        long getRetryAfterMs() { return (long) ((1 - tokens) / refillRatePerMs); }
    }
}

Fixed Window Counter

public class FixedWindowStrategy implements RateLimitStrategy {
    private final Map<String, WindowCounter> counters = new ConcurrentHashMap<>();

    @Override
    public synchronized RateLimitResult isAllowed(String userId, RateLimitRule rule) {
        WindowCounter counter = counters.computeIfAbsent(userId,
            k -> new WindowCounter(rule.getWindowDurationMs()));
        long now = System.currentTimeMillis();
        counter.resetIfExpired(now, rule.getWindowDurationMs());

        if (counter.getCount() < rule.getMaxRequests()) {
            counter.increment();
            return new RateLimitResult(true, rule.getMaxRequests() - counter.getCount(), 0);
        }
        long retryAfter = counter.getWindowStart() + rule.getWindowDurationMs() - now;
        return new RateLimitResult(false, 0, retryAfter);
    }

    private static class WindowCounter {
        private int count = 0;
        private long windowStart;

        WindowCounter(long windowMs) { this.windowStart = System.currentTimeMillis(); }
        void resetIfExpired(long now, long windowMs) {
            if (now - windowStart >= windowMs) { count = 0; windowStart = now; }
        }
        void increment() { count++; }
        int getCount() { return count; }
        long getWindowStart() { return windowStart; }
    }
}

Sliding Window Log Algorithm

public class SlidingWindowLogStrategy implements RateLimitStrategy {
    private final Map<String, Deque<Long>> requestLogs = new ConcurrentHashMap<>();

    @Override
    public synchronized RateLimitResult isAllowed(String userId, RateLimitRule rule) {
        long now = System.currentTimeMillis();
        long windowStart = now - rule.getWindowDurationMs();

        Deque<Long> log = requestLogs.computeIfAbsent(userId, k -> new ArrayDeque<>());

        // Remove timestamps outside the window
        while (!log.isEmpty() && log.peekFirst() <= windowStart) log.pollFirst();

        if (log.size() < rule.getMaxRequests()) {
            log.addLast(now);
            return new RateLimitResult(true, rule.getMaxRequests() - log.size(), 0);
        }
        long retryAfter = log.peekFirst() + rule.getWindowDurationMs() - now;
        return new RateLimitResult(false, 0, retryAfter);
    }
}

Algorithm Comparison

Token Bucket: Allows bursts up to bucket size. Smooth refill. Best for APIs that need burst tolerance. Most widely used in practice (AWS API Gateway, Stripe).
Fixed Window: Simple. But allows 2x the limit at window boundaries (last request of window N + first requests of window N+1). Use for non-critical limits.
Sliding Window Log: Most accurate. No boundary problem. But stores a timestamp per request — high memory for bursty traffic. Use when precision matters.
Sliding Window Counter: Hybrid — Fixed Window with a weighted count from the previous window. Approximates sliding window with O(1) memory.

Common Follow-Up Questions

"How do you rate limit across multiple servers?" — Store state in Redis with atomic operations (INCR for fixed window, sorted sets for sliding window log). Each server checks Redis instead of local memory.
"How do you handle Redis failure?" — Fail open (allow requests) or fail closed (deny requests) based on the use case. Most APIs fail open with a fallback local limiter.
"How do you support per-endpoint rate limits?" — The key becomes userId:endpoint instead of just userId. RateLimitRule carries the endpoint as part of the identifier.

FAQ — Rate Limiter Low Level Design

What is the best algorithm for rate limiting?

Token Bucket is the most practical — it allows bursts, has a smooth refill, and is O(1) time and O(1) space per user. Sliding Window Log is the most accurate but O(n) space per user where n is the request count per window.

How does Token Bucket work?

A bucket holds tokens up to a maximum capacity. Tokens refill at a constant rate. Each request consumes one token. If the bucket is empty, the request is denied. Burst requests are handled by the accumulated tokens.

How do you make a rate limiter distributed?

Store bucket state in Redis using atomic operations. For Token Bucket, use a Lua script to atomically check tokens and consume one. For Fixed Window, use Redis INCR with an expiry. For Sliding Window, use a Redis sorted set with timestamps as scores.