并发编程

覆盖 synchronized、锁机制、AQS、线程池、并发工具类、ThreadLocal、CompletableFuture 等高频考点。

每道题包含中英双语答案、代码示例、常见误区和风控关联。

相关页面： JVM | Redis | 实时风控引擎

Q1. synchronized 和 ReentrantLock 的本质区别是什么？线上服务在高并发下应该怎么选？

EN: Compare synchronized and ReentrantLock. When would you choose one over the other?

难度： ★★★ | 出现频率： 极高（阿里、美团、字节）

Key Terms: synchronized (同步锁), ReentrantLock (可重入锁), AQS (抽象队列同步器), Lock Escalation (锁升级), Condition (条件变量)

答案要点：

实现层面：synchronized 是 JVM 层面的关键字，通过 monitorenter/monitorexit 字节码指令实现，锁信息存储在对象头的 Mark Word 中；ReentrantLock 是 java.util.concurrent.locks 包下的 API 层实现，底层依赖 AQS（AbstractQueuedSynchronizer）
可中断性：synchronized 不可响应中断，线程阻塞在 synchronized 上时只能等锁释放；ReentrantLock 支持 lockInterruptibly()，可响应 InterruptedException
公平性：synchronized 只有非公平模式；ReentrantLock 可通过构造参数 new ReentrantLock(true) 选择公平锁，但公平锁吞吐量通常低 10%-30%
条件变量：synchronized 只有 wait/notify，且只能关联一个等待队列；ReentrantLock 支持多个 Condition，可实现精准唤醒（如生产者-消费者模型中分别唤醒生产者和消费者）
可重入性：两者都可重入，但机制不同——synchronized 由 JVM 自动计数，ReentrantLock 由 AQS 的 state 字段计数
锁释放：synchronized 在字节码层面自动释放（即使异常也会通过异常表释放）；ReentrantLock 必须在 finally 中手动 unlock()，忘记释放会导致死锁
性能：JDK 6 之后 synchronized 经过偏向锁 → 轻量级锁 → 重量级锁的升级优化，低竞争场景下性能与 ReentrantLock 差距很小；高竞争场景下 ReentrantLock 的可控性更有优势

代码示例：


// synchronized：自动释放，不需要手动管理
public class SyncCounter {
    private int count = 0;
    public synchronized void increment() {
        count++; // 异常时 JVM 自动释放 monitor
    }
}

// ReentrantLock：必须在 finally 中释放
public class LockCounter {
    private final ReentrantLock lock = new ReentrantLock();
    private int count = 0;
    public void increment() {
        lock.lock();
        try {
            count++;
        } finally {
            lock.unlock(); // 忘记写这行 = 定时炸弹
        }
    }
}

// ReentrantLock 的独有能力：可中断获取锁 + 超时获取
public boolean trySendWithTimeout() throws InterruptedException {
    if (lock.tryLock(100, TimeUnit.MILLISECONDS)) {
        try {
            return doSend();
        } finally {
            lock.unlock();
        }
    }
    return false; // 超时未获取到锁，走降级逻辑
}

常见误区：

❌ 认为 synchronized 一定比 ReentrantLock 慢 → ✅ JDK 6+ 的锁升级机制已经大幅缩小差距，低竞争场景下 synchronized 甚至更快（因为无 JVM 层面的额外对象分配）
❌ 忘记在 finally 中 unlock，或 try 块范围不够大 → ✅ lock() 必须在 try 之前调用，确保异常发生在 lock 之后也能在 finally 中释放
❌ 误以为公平锁一定"更好" → ✅ 公平锁通过 FIFO 队列保证顺序，但线程上下文切换开销显著增大，吞吐量通常低于非公平锁
❌ Assuming synchronized is always slower than ReentrantLock → ✅ JDK 6+ lock escalation has narrowed the gap significantly; in low-contention scenarios synchronized can even be faster (no extra JVM-level object allocation)
❌ Forgetting to unlock in finally, or making the try block too narrow → ✅ lock() must be called before try to ensure the lock is released in finally even if an exception occurs after acquisition
❌ Assuming fair locks are always "better" → ✅ Fair locks guarantee FIFO ordering but incur significantly more context switches; throughput is typically lower than non-fair locks

延伸追问：

synchronized 锁升级过程中，偏向锁在什么场景下会退化为轻量级锁？偏向锁在 JDK 15 中为什么默认禁用了？
如果用 ReentrantLock 实现一个读写分离的缓存，应该用 ReentrantReadWriteLock 还是 StampedLock？两者在读多写少场景的性能差异有多大？
ReentrantLock 的 tryLock(0, TimeUnit.SECONDS) 和 tryLock() 有什么区别？
Under what scenarios does a biased lock degrade to a lightweight lock? Why was biased locking disabled by default in JDK 15?
If implementing a read-write cache with ReentrantLock, should you use ReentrantReadWriteLock or StampedLock? How large is the performance gap in read-heavy scenarios?

风控关联：

风控引擎中同一笔交易的规则串行执行需要加锁保护共享状态（如规则计分器），高并发场景下锁选择直接影响吞吐量
风控网关限流场景常用 tryLock + 超时机制做快速失败
Risk engines need locks to protect shared state during sequential rule execution for a single transaction (e.g., rule scorers); lock choice directly impacts throughput under high concurrency
Risk gateway rate-limiting often uses tryLock with timeouts for fast-fail semantics

English Answer：

synchronized is a JVM-level keyword implemented through monitorenter and monitorexit bytecode instructions. The lock metadata is stored in the object's Mark Word. ReentrantLock is an API-level lock from java.util.concurrent.locks, and its core implementation is based on AQS, or AbstractQueuedSynchronizer.
In terms of interruptibility, a thread blocked on synchronized cannot respond to interruption while waiting for the monitor; it must wait until the lock is released. ReentrantLock provides lockInterruptibly(), so the waiting thread can respond to InterruptedException. It also provides tryLock() and timed tryLock, which are useful for fast-fail or degradation logic.
In terms of fairness, synchronized is non-fair only. ReentrantLock can be constructed as a fair lock with new ReentrantLock(true), but fair locks usually reduce throughput because they increase scheduling and context-switch overhead. In high-throughput services, non-fair locks are often preferred unless ordering is a strict requirement.
In terms of condition variables, synchronized only has wait/notify/notifyAll on a single monitor wait set. ReentrantLock can create multiple Condition instances, so it can wake specific groups of waiting threads, such as producers and consumers separately.
Both locks are reentrant. synchronized tracks reentry at the JVM monitor level, while ReentrantLock uses the AQS state field as the hold count. The release behavior is different: synchronized is automatically released by the JVM, even when an exception occurs, while ReentrantLock must be released manually in a finally block; otherwise it can cause deadlock.
Since JDK 6, synchronized has been heavily optimized through biased locking, lightweight locking, and heavyweight locking, so it is not necessarily slower in low-contention scenarios. My selection rule is: use synchronized for simple, local critical sections; use ReentrantLock when I need interruptible locking, timeout-based acquisition, multiple conditions, fair ordering, or more explicit control under high contention.

Q2. volatile 能保证线程安全吗？它在什么场景下够用，什么场景下不够用？

EN: What does the volatile keyword do? Is it enough for thread safety?

难度： ★★★ | 出现频率： 极高（阿里、美团、字节、腾讯）

Key Terms: volatile (易变关键字), Visibility (可见性), Memory Barrier (内存屏障), Happens-Before (先行发生), DCL (双重检查锁定)

答案要点：

volatile 的两大语义：(1) 可见性——写入 volatile 变量时强制刷新到主内存，读取时从主内存加载，绕过 CPU 缓存；(2) 有序性——编译器和处理器不会对 volatile 变量的读写进行重排序，通过插入内存屏障（Memory Barrier）实现
不保证原子性：volatile int count; count++ 不是线程安全的，因为 count++ 实际是 read → modify → write 三步操作，volatile 无法保证这三步的原子性
适用场景（够用）：状态标志位（如 volatile boolean running = true）、单次写入的配置值（如 DCL 单例中的双重检查锁定）、读多写少的场景
不适用场景（不够用）：复合操作（i++）、需要原子性的计数器、先检查后执行（check-then-act）
底层实现：x86 架构下 volatile 写会插入 lock addl $0x0, (%rsp) 指令（相当于 StoreLoad 屏障），锁定缓存行并刷回主内存

代码示例：


// volatile 够用：状态标志位
public class TaskRunner {
    private volatile boolean running = true;
    public void stop() { running = false; } // 其他线程立即可见
    public void run() {
        while (running) { // 不加 volatile 可能死循环（JIT 优化为寄存器读取）
            doWork();
        }
    }
}

// volatile 不够用：复合操作
public class Counter {
    private volatile int count = 0;
    public void increment() {
        count++; // 非原子操作！多线程下会丢计数
    }
    // 正确做法：AtomicInteger 或 synchronized
}

// volatile 经典应用：DCL 单例
public class Singleton {
    private static volatile Singleton instance; // 必须 volatile
    public static Singleton getInstance() {
        if (instance == null) {               // 第一次检查（无锁）
            synchronized (Singleton.class) {
                if (instance == null) {       // 第二次检查（有锁）
                    instance = new Singleton(); // volatile 防止构造函数指令重排
                }
            }
        }
        return instance;
    }
}

常见误区：

❌ 认为 volatile 修饰的变量做 ++ 操作是线程安全的 → ✅ 这是面试中最常见的错误认知，++ 是 read-modify-write 三步操作
❌ 认为 volatile 能替代 synchronized → ✅ volatile 只保证单次读/写的可见性和有序性，不保证复合操作的原子性
❌ 混淆 happens-before 关系 → ✅ volatile 写 happens-before 后续的 volatile 读，但不是所有操作都有 happens-before 关系
❌ Assuming volatile int makes ++ thread-safe → ✅ This is the most common misconception — ++ is a read-modify-write sequence requiring atomicity, not just visibility
❌ Assuming volatile can replace synchronized → ✅ volatile only guarantees visibility and ordering for single reads/writes, not atomicity for compound operations
❌ Misunderstanding happens-before relationships → ✅ A volatile write happens-before a subsequent volatile read, but not all operations have a happens-before relationship

延伸追问：

DCL 单例中如果不加 volatile，具体会发生什么？对象初始化的指令重排是怎样的？
volatile 和 AtomicIntegerFieldUpdater 有什么关系？后者在什么场景下优于 AtomicInteger？
在 ARM 架构（如 Apple Silicon）上，volatile 的内存屏障实现和 x86 有什么不同？对性能的影响呢？
What exactly happens if you omit volatile in DCL singleton? How does instruction reordering of object initialization work?
How does the volatile memory barrier implementation differ on ARM (e.g., Apple Silicon) vs. x86? What are the performance implications?

风控关联：

风控引擎中规则热加载开关（volatile boolean ruleEnabled）、降级标志位等场景广泛使用 volatile
风控计数器（如滑动窗口限流计数）必须用 AtomicInteger 或 LongAdder，不能用 volatile int
Risk engines widely use volatile for rule hot-reload switches (volatile boolean ruleEnabled) and circuit-breaker flags
Risk counters (e.g., sliding window rate limiting) must use AtomicInteger or LongAdder — never volatile int

English Answer：

volatile provides two main semantics. The first is visibility: after one thread writes to a volatile variable, other threads reading that same variable must see the latest value. Conceptually, the write is flushed to main memory and reads reload from main memory rather than relying on stale CPU cache or register values. The second is ordering: volatile reads and writes introduce memory-barrier effects, so the compiler and CPU cannot freely reorder operations across those volatile accesses.
volatile does not provide atomicity for compound operations. For example, volatile int count; count++ is still not thread-safe because count++ is a read-modify-write sequence. Multiple threads can read the same old value, compute the same new value, and overwrite each other. For counters, use AtomicInteger, AtomicLong, LongAdder, or a lock depending on contention and consistency needs.
volatile is enough when the operation is a simple state publication or a single read/write pattern. Typical examples are a stop flag such as volatile boolean running, a degradation switch, or the instance field in a double-checked-locking singleton. In DCL, volatile prevents the object reference assignment from being reordered before the constructor finishes.
volatile is not enough for compound operations, check-then-act logic, multiple related variables, or invariants that must be updated atomically. In those cases, you need synchronized blocks, explicit locks, atomic classes, or a higher-level concurrency design.
At the low level, volatile is implemented with memory barriers. On x86, volatile writes commonly rely on locked instructions or StoreLoad-barrier-like effects to ensure visibility and ordering. On weaker memory models such as ARM, the JVM must emit the appropriate barriers as well. The exact instruction differs by architecture, but the Java-level guarantee is the same.

Q3. AQS 的核心设计原理是什么？如何用 AQS 实现一个不可重入的互斥锁？

EN: Explain how AQS (AbstractQueuedSynchronizer) works. How would you implement a non-reentrant mutex with it?

难度： ★★★★ | 出现频率： 高（阿里、字节、美团）

Key Terms: AQS (抽象队列同步器), CLH Queue (CLH 队列), CAS (比较并交换), state (同步状态), Template Method (模板方法)

答案要点：

核心数据结构：AQS 维护一个 volatile int state（同步状态）和一个 CLH（Craig, Landin, and Hagersten）变体的 FIFO 双向等待队列（头尾节点为 head/tail）
state 的含义因实现而异：ReentrantLock 中 state=0 表示未锁定，state>0 表示锁定次数（重入次数）；Semaphore 中 state 表示剩余许可数；CountDownLatch 中 state 表示剩余计数
两种模式：独占模式（Exclusive，如 ReentrantLock）和共享模式（Shared，如 Semaphore、CountDownLatch）
核心流程（以独占锁为例）：

- tryAcquire() 尝试 CAS 修改 state 从 0 → 1，成功则获取锁 - 失败则将当前线程包装为 Node 节点入队，并调用 LockSupport.park() 挂起 - 持有锁的线程释放时调用 tryRelease() 将 state 减为 0，然后 unparkSuccessor() 唤醒后继节点

模板方法模式：AQS 定义了 tryAcquire/tryRelease/tryAcquireShared/tryReleaseShared 等钩子方法，子类只需实现需要的钩子

代码示例：


// 用 AQS 实现不可重入互斥锁
public class SimpleMutex implements Lock {
    private final Sync sync = new Sync();

    // 静态内部类继承 AQS
    private static class Sync extends AbstractQueuedSynchronizer {
        // 尝试获取锁：CAS 将 state 从 0 改为 1
        @Override
        protected boolean tryAcquire(int arg) {
            if (compareAndSetState(0, 1)) {
                setExclusiveOwnerThread(Thread.currentThread());
                return true;
            }
            return false; // 不可重入：已持有锁的线程再来也返回 false
        }

        // 尝试释放锁：将 state 改回 0
        @Override
        protected boolean tryRelease(int arg) {
            if (getState() == 0) {
                throw new IllegalMonitorStateException("未持有锁");
            }
            setExclusiveOwnerThread(null);
            setState(0); // volatile 写，保证可见性
            return true;
        }

        @Override
        protected boolean isHeldExclusively() {
            return getExclusiveOwnerThread() == Thread.currentThread();
        }
    }

    @Override public void lock() { sync.acquire(1); }
    @Override public void unlock() { sync.release(1); }
    @Override public boolean tryLock() { return sync.tryAcquire(1); }
    @Override public void lockInterruptibly() throws InterruptedException { sync.acquireInterruptibly(1); }
    @Override public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
        return sync.tryAcquireNanos(1, unit.toNanos(time));
    }
    @Override public Condition newCondition() { throw new UnsupportedOperationException(); }
}

常见误区：

❌ 认为 AQS 的队列是普通的 FIFO 队列 → ✅ 实际是 CLH 变体，每个节点的等待状态存储在前驱节点中，而非自身
❌ 以为 setState(0) 和 compareAndSetState(0, 1) 可以随便混用 → ✅ 前者是普通 volatile 写（在已持有锁时使用），后者是 CAS 操作（在竞争获取时使用）
❌ 忽略 setExclusiveOwnerThread 的作用 → ✅ 它不仅用于判断重入，还是可重入锁判断"是否当前线程持有"的依据
❌ Assuming AQS uses a plain FIFO queue → ✅ It uses a CLH variant where each node's wait status is stored in its predecessor, not itself
❌ Mixing setState(0) and compareAndSetState(0, 1) freely → ✅ The former is a plain volatile write (used when holding the lock), the latter is a CAS (used during competitive acquisition)
❌ Overlooking setExclusiveOwnerThread → ✅ It's used not just for reentry detection but also for determining whether the current thread holds the lock

延伸追问：

AQS 的取消节点（CANCELLED 状态）是如何被清理的？为什么取消的节点不会被立即移除？
ReentrantLock 的公平锁 tryAcquire 实现中为什么先检查 hasQueuedPredecessors()？如果去掉这个检查会怎样？
AQS 在高并发下的瓶颈在哪里？如何优化？
How are CANCELLED nodes cleaned up in AQS? Why aren't they removed immediately?
Where is the bottleneck in AQS under high concurrency? How can it be optimized?

风控关联：

通用基础。理解 AQS 是理解 java.util.concurrent 包下所有同步器（ReentrantLock、Semaphore、CountDownLatch、ReentrantReadWriteLock）的前提
Foundational knowledge — understanding AQS is a prerequisite for understanding all synchronizers in java.util.concurrent (ReentrantLock, Semaphore, CountDownLatch, ReentrantReadWriteLock)

English Answer：

AQS is the base framework behind many classes in java.util.concurrent, such as ReentrantLock, Semaphore, CountDownLatch, and ReentrantReadWriteLock. Its core state is a volatile int state plus a CLH-variant FIFO wait queue represented by head and tail nodes.
The meaning of state is defined by each synchronizer. In ReentrantLock, state == 0 means unlocked, and a positive value means locked, usually with the value representing the reentry count. In Semaphore, state represents remaining permits. In CountDownLatch, state is the remaining count.
AQS supports exclusive mode and shared mode. Exclusive mode is used by locks such as ReentrantLock, where only one thread can own the synchronizer. Shared mode is used by synchronizers such as Semaphore and CountDownLatch, where multiple threads may pass depending on the state.
For an exclusive lock, acquisition usually calls tryAcquire(). If a CAS changes state from 0 to 1, the thread acquires the lock. If it fails, AQS wraps the current thread in a queue node, appends it to the wait queue, and parks the thread with LockSupport.park(). When the owner releases the lock, tryRelease() sets the state back to 0 and AQS unparks the successor node so it can retry acquisition.
AQS uses the template method pattern. The framework implements queuing, parking, unparking, cancellation, and interrupt handling, while subclasses implement hooks such as tryAcquire, tryRelease, tryAcquireShared, and tryReleaseShared.
To implement a non-reentrant mutex, I would create a static Sync class extending AQS. tryAcquire uses compareAndSetState(0, 1) and sets the exclusive owner thread on success. If the current owner tries to acquire again, it still returns false because the lock is intentionally non-reentrant. tryRelease checks that the state is not 0, clears the owner thread, and sets state back to 0. The public Lock methods delegate to AQS methods such as acquire, release, and tryAcquireNanos.

Q4. 线程池 ThreadPoolExecutor 的 7 个核心参数分别是什么？提交一个任务后线程池的执行流程是怎样的？

EN: Explain ThreadPoolExecutor parameters. What's the task submission flow?

难度： ★★★ | 出现频率： 极高（阿里、美团、字节、腾讯、京东）

Key Terms: ThreadPoolExecutor (线程池执行器), corePoolSize (核心线程数), workQueue (工作队列), RejectedExecutionHandler (拒绝策略), Backpressure (背压)

答案要点：

7 个核心参数：

- corePoolSize：核心线程数，即使空闲也不会回收（除非设置 allowCoreThreadTimeOut） - maximumPoolSize：最大线程数，核心 + 非核心线程的上限 - keepAliveTime：非核心线程的空闲存活时间 - unit：keepAliveTime 的时间单位 - workQueue：任务等待队列（如 ArrayBlockingQueue、LinkedBlockingQueue、SynchronousQueue） - threadFactory：线程工厂，用于自定义线程名、是否守护线程等 - handler：拒绝策略（RejectedExecutionHandler）

任务提交执行流程：

- 提交任务时，当前线程数 < corePoolSize → 直接创建核心线程执行 - 当前线程数 >= corePoolSize → 任务放入 workQueue - workQueue 已满且当前线程数 < maximumPoolSize → 创建非核心线程执行 - workQueue 已满且当前线程数 >= maximumPoolSize → 执行拒绝策略

4 种内置拒绝策略：

- AbortPolicy（默认）：抛出 RejectedExecutionException - CallerRunsPolicy：由提交任务的线程自己执行（起到削峰作用） - DiscardPolicy：静默丢弃，不抛异常 - DiscardOldestPolicy：丢弃队列头部最旧的任务，重新提交当前任务

代码示例：


// 生产级线程池配置示例
ThreadPoolExecutor executor = new ThreadPoolExecutor(
    8,                                  // corePoolSize：CPU 密集型按 CPU 核数
    32,                                 // maximumPoolSize：IO 密集型可适当放大
    60, TimeUnit.SECONDS,               // 非核心线程空闲 60s 回收
    new LinkedBlockingQueue<>(1000),     // 有界队列，容量 1000
    new ThreadFactory() {                // 自定义线程名，便于排查问题
        private final AtomicInteger counter = new AtomicInteger(0);
        @Override
        public Thread newThread(Runnable r) {
            Thread t = new Thread(r, "risk-engine-worker-" + counter.incrementAndGet());
            t.setDaemon(false);
            return t;
        }
    },
    new ThreadPoolExecutor.CallerRunsPolicy() // 拒绝时由调用线程执行，起到背压效果
);

// 动态调参（美团实践）：运行时调整 core/max/queue 容量
executor.setCorePoolSize(16);
executor.setMaximumPoolSize(64);
// 注意：setCorePoolSize 大于当前核心线程数时会立即创建新线程

常见误区：

❌ 用 Executors.newFixedThreadPool() / newCachedThreadPool() 创建线程池 → ✅ 前者使用无界队列可能导致 OOM；后者最大线程数为 Integer.MAX_VALUE 同样可能 OOM。阿里 Java 开发规范明确禁止
❌ 认为 CallerRunsPolicy 会阻塞线程池的工作线程 → ✅ 实际阻塞的是提交任务的线程（通常是业务线程），这是一种背压（backpressure）机制
❌ 认为核心线程永远不会销毁 → ✅ allowCoreThreadTimeOut(true) 可以让核心线程也超时回收
❌ Using Executors.newFixedThreadPool() / newCachedThreadPool() to create thread pools → ✅ The former uses an unbounded queue (OOM risk); the latter allows Integer.MAX_VALUE threads (also OOM risk). Explicitly prohibited by Alibaba Java coding standards
❌ Thinking CallerRunsPolicy blocks pool worker threads → ✅ It actually blocks the submitting thread (usually a business thread) — this is a backpressure mechanism
❌ Assuming core threads are never destroyed → ✅ allowCoreThreadTimeOut(true) lets core threads also expire after idle timeout

延伸追问：

如何设计一个线程池的动态调参方案？
线程池中的线程是如何创建和复用的？核心线程的"保活"机制是什么？
如果线程池中一个任务抛出未捕获异常，线程会怎样？如何处理？
How would you design a dynamic thread pool tuning scheme?
What happens when a task throws an uncaught exception in a thread pool? How should you handle it?

风控关联：

风控引擎的规则执行通常由线程池驱动。需要根据 QPS 峰值合理设置线程池参数——core 太小导致任务排队延迟增大（影响风控响应时效），queue 太大导致任务堆积（可能触发超时）
建议使用有界队列 + CallerRunsPolicy 做背压保护
Risk engine rule execution is typically thread-pool-driven. Pool parameters must be tuned for peak QPS — too few core threads increase queuing latency (impacting risk response time), too large a queue causes task accumulation (risking timeouts)
Use bounded queues + CallerRunsPolicy for backpressure protection

English Answer：

ThreadPoolExecutor has seven core parameters. corePoolSize is the number of core threads, which normally stay alive even when idle unless allowCoreThreadTimeOut is enabled. maximumPoolSize is the upper bound for total threads. keepAliveTime and unit define how long non-core idle threads can live. workQueue stores tasks waiting to run. threadFactory controls how worker threads are created, named, and configured. handler is the rejection policy used when the pool is saturated.
The task submission flow is deterministic. When a task is submitted and the current worker count is below corePoolSize, the pool creates a core thread to run it. If core threads are already enough, the task is offered to the work queue. If the queue is full and the worker count is still below maximumPoolSize, the pool creates a non-core thread. If the queue is full and the pool has reached maximumPoolSize, the rejection policy is triggered.
The four built-in rejection policies are AbortPolicy, CallerRunsPolicy, DiscardPolicy, and DiscardOldestPolicy. AbortPolicy throws RejectedExecutionException and is the default. CallerRunsPolicy lets the submitting thread run the task, which creates natural backpressure. DiscardPolicy silently drops the task. DiscardOldestPolicy drops the oldest queued task and retries submitting the current one.
In production, I avoid Executors.newFixedThreadPool() and newCachedThreadPool() because the former uses an unbounded queue and the latter can create an extremely large number of threads. I prefer explicitly defining a bounded queue, meaningful thread names, metrics, and a rejection strategy. For risk-control workloads, a bounded queue plus CallerRunsPolicy is often safer because it slows the caller before the system collapses.
Pool sizing depends on workload type. CPU-bound tasks are usually close to CPU core count, while IO-bound tasks can use more threads because many threads wait on IO. The final values should be validated by latency, queue length, rejection count, CPU utilization, and timeout metrics rather than chosen by formula alone.

Q5. CountDownLatch、CyclicBarrier、Semaphore 三者的区别和适用场景分别是什么？

EN: Compare CountDownLatch, CyclicBarrier, and Semaphore.

难度： ★★★ | 出现频率： 高（阿里、美团、字节）

Key Terms: CountDownLatch (倒计数器), CyclicBarrier (循环栅栏), Semaphore (信号量), Synchronization Barrier (同步屏障), Permit (许可)

答案要点：

CountDownLatch：一次性倒计数器，某个线程等待其他线程完成。计数器只能递减不能重置，用完即废。典型场景：主线程等待多个子任务全部完成后汇总结果
CyclicBarrier：循环栅栏，一组线程互相等待到齐后一起继续执行。计数器可以自动重置（reset()），支持回调函数（CyclicBarrier(int parties, Runnable barrierAction)）。典型场景：多线程分片计算、迭代算法中每轮同步
Semaphore：信号量，控制同时访问共享资源的线程数量。acquire 获取许可（计数减 1），release 释放许可（计数加 1）。典型场景：限流、数据库连接池、资源池

代码示例：


// CountDownLatch：主线程等待 N 个子任务完成
public class BatchRiskCheck {
    public List<Result> batchCheck(List<Order> orders) throws InterruptedException {
        CountDownLatch latch = new CountDownLatch(orders.size());
        List<Result> results = new CopyOnWriteArrayList<>();

        for (Order order : orders) {
            executor.submit(() -> {
                try {
                    results.add(riskCheck(order));
                } finally {
                    latch.countDown(); // 必须在 finally 中调用
                }
            });
        }
        latch.await(5, TimeUnit.SECONDS); // 带超时的等待
        return results;
    }
}

// CyclicBarrier：多线程分片处理，每轮同步
public class ParallelAggregator {
    public void aggregate() {
        CyclicBarrier barrier = new CyclicBarrier(3, () -> {
            mergeResults(); // 所有线程到达后执行合并
        });
        for (int i = 0; i < 3; i++) {
            executor.submit(() -> {
                for (int round = 0; round < 10; round++) {
                    computePartial(); // 各线程计算分片
                    barrier.await(); // 等待其他线程完成本轮
                }
            });
        }
    }
}

// Semaphore：限制并发访问数
public class DbConnectionPool {
    private final Semaphore permits = new Semaphore(20); // 最多 20 个并发连接

    public Connection acquire() throws InterruptedException {
        permits.acquire();
        return createConnection();
    }

    public void release(Connection conn) {
        closeConnection(conn);
        permits.release();
    }
}

常见误区：

❌ 在 CountDownLatch 中忘记在 finally 中调用 countDown() → ✅ 如果子任务抛异常，latch 永远不会归零，主线程会永远阻塞
❌ 混用 CountDownLatch 和 CyclicBarrier → ✅ 前者是"一个线程等多个线程"，后者是"多个线程互等"
❌ Semaphore 的 release 没有对应 acquire → ✅ 可能导致许可数超过初始值（Semaphore 不强制 acquire/release 配对）
❌ Forgetting to call countDown() in finally with CountDownLatch → ✅ If a subtask throws, the latch never reaches zero and the main thread blocks forever
❌ Confusing CountDownLatch with CyclicBarrier → ✅ The former is "one thread waits for many"; the latter is "many threads wait for each other"
❌ Calling Semaphore release() without a matching acquire() → ✅ This can cause permits to exceed the initial value (Semaphore does not enforce acquire/release pairing)

延伸追问：

CountDownLatch 的 await(timeout) 超时后，子任务还在跑怎么办？如何优雅取消？
CyclicBarrier 的 reset() 在有线程等待时调用会怎样？
如何用 Semaphore 实现一个简单的限流器？和 Guava RateLimiter 的区别是什么？
What happens to still-running subtasks after CountDownLatch await(timeout) expires? How to cancel them gracefully?
How would you implement a simple rate limiter with Semaphore? How does it differ from Guava RateLimiter?

风控关联：

风控批量检查场景（如批量审核交易）用 CountDownLatch 并行执行后汇总
风控网关限流可用 Semaphore 做粗粒度的并发控制
规则引擎多阶段执行（数据采集 → 规则计算 → 决策输出）可用 CyclicBarrier 同步
Batch risk checks (e.g., bulk transaction review) use CountDownLatch to fan out and aggregate results in parallel
Risk gateway rate limiting can use Semaphore for coarse-grained concurrency control
Multi-phase rule engine execution (data collection → rule computation → decision output) can be synchronized with CyclicBarrier

English Answer：

CountDownLatch is a one-shot countdown synchronizer. One or more threads call await() and wait until other threads call countDown() enough times to reduce the count to zero. The count can only decrease and cannot be reset. A typical use case is a main thread waiting for multiple subtasks, service initialization steps, or batch risk checks to finish before aggregating results.
CyclicBarrier is a reusable barrier for a group of threads. Each thread calls await() at the barrier point; when the number of waiting threads reaches the configured parties count, all threads continue together. It can reset automatically after each round and supports a barrier action callback. It is suitable for iterative algorithms, parallel phased computation, or multi-stage processing where all workers must synchronize after each round.
Semaphore controls concurrent access to a limited resource. acquire() consumes a permit, and release() returns a permit. It is useful for rate limiting, connection pools, resource pools, and coarse-grained concurrency control. A semaphore with one permit can behave like a mutex, but its main value is allowing N concurrent holders.
The common distinction is: CountDownLatch is "one thread waits for many tasks"; CyclicBarrier is "many threads wait for each other"; Semaphore is "limit the number of concurrent users of a resource." Correct cleanup matters: countDown() should be in finally, and Semaphore.release() should only be called after a successful acquire() to avoid increasing permits beyond the intended capacity.

Q6. ThreadLocal 为什么会导致内存泄漏？如何正确使用和清理？

EN: How does ThreadLocal work? What causes memory leaks?

难度： ★★★★ | 出现频率： 高（阿里、字节、美团、腾讯）

Key Terms: ThreadLocal (线程本地变量), ThreadLocalMap (线程本地映射), WeakReference (弱引用), Memory Leak (内存泄漏), remove() (清理方法)

答案要点：

数据结构：每个 Thread 对象持有一个 ThreadLocalMap（Thread 的成员变量），key 是 ThreadLocal 的弱引用（WeakReference<ThreadLocal>），value 是强引用
泄漏链路：ThreadLocal 对象被回收（弱引用 key 被GC）→ Entry 的 key 变为 null → 但 value 仍然被 Entry 强引用 → Entry 被 ThreadLocalMap 强引用 → ThreadLocalMap 被 Thread 强引用 → Thread 不结束则 value 无法回收
线程池场景更严重：线程池中的线程是复用的，不会销毁，ThreadLocalMap 中的 value 如果不手动清理，会一直累积
正确使用：

- 每次 set/remove 成对调用，在 finally 中调用 remove() - 使用 try-finally 模式确保清理 - 在 Filter/Interceptor 的 afterCompletion 中清理（Web 场景）

JDK 的自愈机制：ThreadLocalMap 在 get/set/remove 时会扫描并清理 key 为 null 的 Entry（expungeStaleEntry），但不能依赖它，因为只在操作到相邻槽位时才会触发

代码示例：


// 错误用法：忘记 remove
public class BadExample {
    private static final ThreadLocal<UserContext> ctx = new ThreadLocal<>();
    public void handle(Request req) {
        ctx.set(new UserContext(req.getUserId()));
        doSomething(); // 如果抛异常或分支逻辑忘记清理
        ctx.remove();  // 这行可能不会执行
    }
}

// 正确用法：try-finally 模式
public class UserContextHolder {
    private static final ThreadLocal<UserContext> CTX = new ThreadLocal<>();

    public static void set(UserContext ctx) { CTX.set(ctx); }
    public static UserContext get() { return CTX.get(); }

    public static void clear() { CTX.remove(); }
}

// 在 Filter 中使用（Web 场景最佳实践）
public class UserContextFilter implements Filter {
    @Override
    public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain)
            throws IOException, ServletException {
        try {
            UserContextHolder.set(buildContext((HttpServletRequest) req));
            chain.doFilter(req, resp);
        } finally {
            UserContextHolder.clear(); // 无论是否异常都清理
        }
    }
}

常见误区：

❌ 认为 ThreadLocal 是"线程安全的万能工具" → ✅ ThreadLocal 的目的是线程隔离，不是线程同步
❌ 认为 JDK 会自动清理所有泄漏 → ✅ expungeStaleEntry 只清理部分 stale entry，不是全表扫描
❌ 在线程池中使用 ThreadLocal.withInitial() 而不 remove → ✅ 初始值会在每次 get 时重新创建，但之前 set 的值仍然泄漏
❌ Assuming ThreadLocal is a universal thread-safety tool → ✅ Its purpose is thread confinement, not thread synchronization
❌ Assuming the JDK auto-cleans all leaks → ✅ expungeStaleEntry only cleans some stale entries, not a full table scan
❌ Using ThreadLocal.withInitial() in thread pools without calling remove() → ✅ The initial value is recreated on get, but previously set values still leak

延伸追问：

ThreadLocalMap 的 hash 冲突是怎么解决的？
Netty 的 FastThreadLocal 和 JDK ThreadLocal 有什么区别？为什么更快？
在 Dubbo / Spring Cloud 的链路追踪中，traceId 是怎么跨线程传递的？
How does ThreadLocalMap resolve hash collisions?
How is traceId propagated across threads in Dubbo / Spring Cloud distributed tracing?

风控关联：

风控引擎中通常用 ThreadLocal 存储请求上下文（交易信息、用户画像、风控会话 ID），在整个规则链路中透传
在线程池环境下必须确保每次请求结束后调用 remove()，否则会串上下文（A 用户的风控请求看到 B 用户的数据），这是严重的安全隐患
Risk engines typically use ThreadLocal to store request context (transaction details, user profiles, risk session IDs) and propagate them across the rule chain
In thread pool environments, remove() must be called after each request — otherwise context cross-contamination occurs (User A's risk request sees User B's data), which is a critical security vulnerability

English Answer：

Each Thread object contains a ThreadLocalMap. The key in that map is a weak reference to the ThreadLocal object, while the value is a strong reference to the stored object. This design gives each thread its own isolated copy of a variable, so it is useful for thread-confined request context, user context, trace IDs, and similar data.
The memory-leak path is subtle. If the ThreadLocal object itself becomes unreachable, its weak-reference key can be garbage-collected and become null. However, the value is still strongly referenced by the map entry. The entry is referenced by ThreadLocalMap, the map is referenced by the Thread, and in a thread pool the thread may live for the entire lifetime of the application. As a result, the value cannot be reclaimed.
This is more dangerous in thread pools because worker threads are reused across requests. If a request sets a ThreadLocal value and does not remove it, the value may accumulate and may even be visible to a later task running on the same thread, causing context contamination.
The correct pattern is to pair every set() with remove(), preferably in a finally block. In web applications, cleanup should happen in a Filter, Interceptor, or afterCompletion hook. A holder utility should expose a clear clear() method and make cleanup part of the request lifecycle.
The JDK has a partial self-cleaning mechanism: ThreadLocalMap may clean stale entries when get, set, or remove touches nearby slots. But it is not a full-table background cleanup mechanism, so production code must not rely on it.

Q7. CAS 是什么？它有什么问题？Java 中是如何解决 ABA 问题的？

EN: What is CAS? What is the ABA problem and how does Java solve it?

难度： ★★★★ | 出现频率： 高（阿里、字节、美团）

Key Terms: CAS (比较并交换), ABA Problem (ABA 问题), AtomicStampedReference (带戳原子引用), LongAdder (长整型累加器), Unsafe (底层操作类)

答案要点：

CAS（Compare And Swap）：原子操作，包含三个操作数——内存值 V、预期值 A、新值 B。只有当 V == A 时，才将 V 更新为 B，否则什么都不做。整个过程是原子的（CPU 指令级别，如 x86 的 cmpxchg）
CAS 的三大问题：

- ABA 问题：值从 A → B → A，CAS 检查时认为没有变化，实际已被修改过。在栈/链表等指针操作场景可能导致数据丢失 - 自旋开销：CAS 失败后通常通过循环重试（自旋），在高竞争场景下会大量消耗 CPU - 只能保证单个变量的原子性：无法同时 CAS 多个变量（需要用锁或 AtomicReference 封装对象）

ABA 问题的解决方案：

- AtomicStampedReference：带版本号的 CAS，每次更新同时递增 stamp（版本号），比较时同时比较值和版本号 - AtomicMarkableReference：带布尔标记的 CAS，适用于只关心"是否被修改过"的场景

Java 中的 CAS 实现：通过 sun.misc.Unsafe 类的 compareAndSwapInt/Long/Object 方法，底层调用 JNI，最终映射到 CPU 的原子指令

代码示例：


// ABA 问题复现
AtomicInteger value = new AtomicInteger(100);
// 线程1
int old = value.get();                    // 100
// 线程2
value.compareAndSet(100, 200);            // A → B
value.compareAndSet(200, 100);            // B → A
// 线程1 继续
value.compareAndSet(old, 300);            // 成功！但值已经变化过了

// 解决：AtomicStampedReference
AtomicStampedReference<String> ref = new AtomicStampedReference<>("A", 1);
int[] stampHolder = new int[1];
String current = ref.get(stampHolder);    // current="A", stamp=1
// 其他线程修改
ref.compareAndSet("A", "B", 1, 2);        // 版本号 1→2
ref.compareAndSet("B", "A", 2, 3);        // 版本号 2→3
// 当前线程 CAS
boolean success = ref.compareAndSet(current, "C", stampHolder[0], stampHolder[0] + 1);
// 失败！因为 stamp 已经是 3，而当前线程持有的还是 1

// LongAdder：解决高并发下 AtomicLong 的自旋瓶颈
LongAdder counter = new LongAdder();
counter.increment();  // 内部用 Cell 数组分散竞争
long total = counter.sum(); // 汇总所有 Cell 的值

常见误区：

❌ 认为 ABA 问题在所有场景下都有影响 → ✅ 对于单纯的计数器（只关心当前值，不关心变化历史），ABA 不是问题
❌ 混淆 AtomicStampedReference 和 AtomicMarkableReference 的适用场景 → ✅ 前者用版本号跟踪变化次数，后者只用布尔标记是否被修改过
❌ 认为 LongAdder 解决了所有 AtomicLong 的问题 → ✅ LongAdder 的 sum() 不是强一致的（遍历 Cell 时可能有并发更新），且不支持 compareAndSet
❌ Assuming ABA matters in all scenarios → ✅ For simple counters (caring only about the current value, not history), ABA is not a problem
❌ Confusing AtomicStampedReference with AtomicMarkableReference → ✅ The former tracks change count via a version stamp; the latter only uses a boolean mark for "modified or not"
❌ Assuming LongAdder solves all AtomicLong problems → ✅ LongAdder's sum() is not strongly consistent (Cells may update during traversal), and it doesn't support compareAndSet

延伸追问：

LongAdder 和 LongAccumulator 的区别是什么？LongAdder 内部 Cell 数组的扩容策略是怎样的？
Unsafe 类在 JDK 9+ 中被限制访问后，Java 如何实现 CAS？
在无锁队列（如 ConcurrentLinkedQueue）中，CAS 是如何保证链表操作的正确性的？
What's the difference between LongAdder and LongAccumulator? What is LongAdder's Cell array expansion strategy?
After Unsafe was restricted in JDK 9+, how does Java implement CAS?

风控关联：

风控限流计数器在高 QPS 场景下优先使用 LongAdder 而非 AtomicLong，避免 CAS 自旋导致的 CPU 飙升
风控幂等校验中用 AtomicStampedReference 实现带版本号的乐观锁
Risk rate-limiting counters should use LongAdder over AtomicLong under high QPS to avoid CPU spikes from CAS spinning
Risk idempotency checks use AtomicStampedReference for versioned optimistic locking

English Answer：

CAS means Compare-And-Swap. It compares a memory value V with an expected value A; only if they are equal does it update the memory value to a new value B. The whole operation is atomic at the CPU-instruction level, such as cmpxchg on x86. Java atomic classes and many lock-free data structures rely on CAS.
CAS has three common problems. The first is the ABA problem: a value changes from A to B and then back to A, so a later CAS sees A and succeeds even though the state was modified in between. This matters in pointer-based structures such as stacks and linked lists, where the history of changes affects correctness. The second problem is spin overhead: failed CAS operations often retry in a loop, and under high contention this can burn CPU. The third problem is that CAS naturally protects one variable at a time; if multiple fields must change atomically, you need a lock or an immutable object wrapped in AtomicReference.
Java solves ABA with versioned or marked references. AtomicStampedReference stores a reference plus an integer stamp, and every successful update also changes the stamp. The CAS compares both the value and the stamp, so A -> B -> A is still detected because the stamp changed. AtomicMarkableReference stores a boolean mark and is suitable when we only need to know whether the reference has been logically changed or deleted.
Java historically exposed CAS through Unsafe methods such as compareAndSwapInt, compareAndSwapLong, and compareAndSwapObject, which eventually map to native atomic CPU instructions. On newer JDKs, higher-level APIs such as VarHandle provide safer access to similar memory and atomic operations.
For high-QPS counters, AtomicLong may suffer from heavy CAS contention, so LongAdder spreads updates across multiple cells and sums them later. The trade-off is that sum() is not a strongly consistent instantaneous value and LongAdder does not support compare-and-set semantics.

Q8. CompletableFuture 的核心用法和常见陷阱有哪些？如何实现多任务的编排？

EN: How does CompletableFuture enable async composition? What are common pitfalls?

难度： ★★★★ | 出现频率： 高（阿里、字节、美团）

Key Terms: CompletableFuture (异步编排), thenApply (同步转换), thenCompose (异步扁平化), allOf (全部完成), ForkJoinPool (分治线程池), exceptionally (异常处理)

答案要点：

核心能力：链式异步编排（thenApply/thenAccept/thenCompose）、组合多个异步任务（allOf/anyOf）、异常处理（exceptionally/handle/whenComplete）、显式指定线程池
关键方法分类：

- 转换：thenApply（同步转换）、thenCompose（异步扁平化，类似 flatMap） - 消费：thenAccept（消费结果，无返回值）、thenRun（不关心结果，直接执行） - 组合：thenCombine（合并两个结果）、allOf（等待全部完成）、anyOf（任一完成） - 异常：exceptionally（捕获异常返回默认值）、handle（同时处理正常和异常）

线程池陷阱：如果不显式传入线程池，默认使用 ForkJoinPool.commonPool()，该线程池是 JVM 全局共享的，线程数量等于 CPU 核数 - 1。在 IO 密集场景下，commonPool 线程会被耗尽，导致任务排队甚至死锁
异常丢失陷阱：thenApply / thenAccept 中的异常不会自动传播到调用方，必须通过 exceptionally 或 handle 捕获，否则异常会被"吞掉"

代码示例：


// 风控场景：并行调用多个数据源，汇总后做决策
public CompletableFuture<RiskDecision> evaluateRisk(Order order) {
    // 显式指定线程池，避免 commonPool 被打满
    ExecutorService riskPool = Executors.newFixedThreadPool(10);

    CompletableFuture<UserProfile> userFuture = CompletableFuture
        .supplyAsync(() -> userService.getProfile(order.getUserId()), riskPool)
        .exceptionally(ex -> UserProfile.defaultProfile()); // 降级

    CompletableFuture<List<HistoryOrder>> historyFuture = CompletableFuture
        .supplyAsync(() -> orderService.getHistory(order.getUserId()), riskPool)
        .exceptionally(ex -> Collections.emptyList());

    CompletableFuture<DeviceFingerprint> deviceFuture = CompletableFuture
        .supplyAsync(() -> deviceService.getFingerprint(order.getDeviceId()), riskPool)
        .exceptionally(ex -> DeviceFingerprint.unknown());

    // 三个数据源并行获取，全部完成后汇总
    return CompletableFuture.allOf(userFuture, historyFuture, deviceFuture)
        .thenApplyAsync(v -> {
            RiskContext ctx = new RiskContext(
                userFuture.join(),      // join 不会阻塞，因为 allOf 已保证完成
                historyFuture.join(),
                deviceFuture.join()
            );
            return ruleEngine.evaluate(ctx);
        }, riskPool)
        .handle((decision, ex) -> {
            if (ex != null) {
                log.error("风控评估异常", ex);
                return RiskDecision.decline("系统异常，默认拒绝");
            }
            return decision;
        });
}

常见误区：

❌ 不传线程池导致 commonPool 被耗尽 → ✅ supplyAsync(() -> callRemoteApi()) 这种 IO 操作如果全走 commonPool，CPU 核数 - 1 个线程瞬间被打满
❌ 在 thenApply 中做阻塞操作 → ✅ thenApply 是同步的，如果传入自定义线程池，会占用该线程池的线程
❌ exceptionally 只能处理当前阶段的异常 → ✅ 如果后续 thenApply 也抛异常，前面的 exceptionally 捕获不到
❌ Not passing a custom executor, exhausting the commonPool → ✅ IO-heavy supplyAsync() calls all go through commonPool (CPU cores - 1 threads), which fills up instantly
❌ Doing blocking operations inside thenApply → ✅ thenApply is synchronous; if a custom executor is provided, it ties up that pool's threads
❌ Thinking exceptionally catches downstream exceptions → ✅ If a later thenApply throws, an earlier exceptionally won't catch it

延伸追问：

CompletableFuture 的 thenApply 和 thenApplyAsync 有什么区别？什么场景下必须用 async 版本？
如果有三个任务 A、B、C，要求 A 先执行完，然后 B 和 C 并行执行，最后汇总 B 和 C 的结果，怎么写？
JDK 21 的虚拟线程（Virtual Thread）出来后，CompletableFuture 还有没有必要？
What's the difference between thenApply and thenApplyAsync? When must you use the async version?
With Virtual Threads in JDK 21, is CompletableFuture still necessary?

风控关联：

风控引擎的数据采集阶段需要并行调用多个外部服务（用户画像、黑名单、设备指纹、历史交易），CompletableFuture.allOf 并行编排可大幅降低总耗时
但必须指定独立线程池，避免 commonPool 与其他业务线程互相影响
Risk engine data collection phase fans out to multiple external services (user profiles, blacklists, device fingerprints, transaction history); CompletableFuture.allOf parallel orchestration dramatically reduces total latency
A dedicated thread pool is mandatory — never let these tasks share the commonPool with other business logic

English Answer：

CompletableFuture is Java's main API for asynchronous composition. It supports chaining transformations with thenApply, consuming results with thenAccept, flattening asynchronous dependencies with thenCompose, combining independent results with thenCombine, waiting for multiple futures with allOf, racing tasks with anyOf, and handling errors with exceptionally, handle, or whenComplete.
The method choice matters. thenApply transforms a completed result synchronously in the continuation path. thenCompose is used when the next step itself returns another CompletableFuture, avoiding nested futures. thenCombine combines two independent futures. allOf is useful for fan-out/fan-in workflows, where multiple calls run in parallel and the final step uses join() only after all futures have completed.
The most common production pitfall is not specifying an executor. If no executor is passed, async methods use ForkJoinPool.commonPool(), which is a global shared pool with limited parallelism. IO-heavy tasks such as remote calls can exhaust it quickly and affect unrelated business logic. For risk-control feature fetching, user profile lookup, blacklist lookup, and device fingerprint lookup should use a dedicated thread pool.
Another pitfall is exception handling. Exceptions thrown in a stage are captured inside the future and will not automatically appear in the caller's thread. The chain must use exceptionally or handle to provide fallback values, log errors, and prevent one failed external dependency from breaking the whole decision path.
CompletableFuture is useful when the task graph is explicit and results must be composed. Even with virtual threads in JDK 21, it still has value for declarative dependency composition, timeouts, racing, aggregation, and non-blocking APIs. Virtual threads mainly make blocking-style concurrency cheaper; they do not replace all composition patterns.

Q9. StampedLock 是什么？它和 ReentrantReadWriteLock 相比有什么优势？

EN: What is StampedLock? When is it better than ReentrantReadWriteLock?

难度： ★★★★ | 出现频率： 中高（阿里、字节）

Key Terms: StampedLock (戳锁), Optimistic Read (乐观读), Read-Write Lock (读写锁), Stamp Validation (戳校验), CLH

答案要点：

核心特点：StampedLock 是 JDK 8 引入的一种乐观读锁机制。它提供三种模式：写锁（writeLock）、悲观读锁（readLock）、乐观读（tryOptimisticRead）
乐观读：不加锁，仅获取一个 stamp（版本号），读取数据后校验 stamp 是否变化（validate(stamp)），如果没有变化说明数据一致，直接使用；如果变化了则升级为悲观读锁重新读取
对比 ReentrantReadWriteLock：

- RWWL 的读锁会阻塞写锁（读多写少场景下写线程可能饿死） - StampedLock 的乐观读完全不加锁，不阻塞写操作，在读多写少的场景下吞吐量大幅提升 - StampedLock 不可重入，不支持 Condition

性能数据：在读远多于写的场景（如 100:1），StampedLock 的乐观读性能可以达到 RWWL 的 5-10 倍（因为乐观读零开销）
注意事项：StampedLock 不支持重入，不能在锁内调用可能再次获取同一锁的方法；锁内不要使用 Thread.sleep() 或 LockSupport.park()

代码示例：


// 用 StampedLock 实现高性能缓存
public class RiskRuleCache {
    private final StampedLock lock = new StampedLock();
    private volatile Map<String, Rule> rules = new HashMap<>();

    // 乐观读：零加锁开销
    public Rule getRule(String ruleId) {
        long stamp = lock.tryOptimisticRead(); // 获取乐观读 stamp
        Map<String, Rule> snapshot = rules;    // 读取引用
        if (!lock.validate(stamp)) {           // 校验 stamp
            // 乐观读失败，升级为悲观读锁
            stamp = lock.readLock();
            try {
                snapshot = rules;
            } finally {
                lock.unlockRead(stamp);
            }
        }
        return snapshot.get(ruleId);
    }

    // 写锁：更新规则
    public void refreshRules(Map<String, Rule> newRules) {
        long stamp = lock.writeLock();
        try {
            rules = new HashMap<>(newRules); // volatile 写保证可见性
        } finally {
            lock.unlockWrite(stamp);
        }
    }
}

常见误区：

❌ 认为乐观读就是"不加任何锁直接读" → ✅ 乐观读仍然需要 validate(stamp) 校验，如果不校验就使用数据可能导致读到不一致的状态
❌ 在乐观读失败后直接用写锁重试 → ✅ 应该先用悲观读锁，只有需要修改时才升级为写锁
❌ 在 StampedLock 的锁内进行阻塞操作 → ✅ StampedLock 的实现不适应阻塞操作，可能导致其他线程无法获取锁
❌ Assuming optimistic read means "read without any lock at all" → ✅ You must still validate(stamp); using data without validation can lead to inconsistent reads
❌ Upgrading directly to a write lock after optimistic read failure → ✅ Fall back to a pessimistic read lock first; only upgrade to write when modification is needed
❌ Performing blocking operations inside StampedLock → ✅ StampedLock is not designed for blocking; it may prevent other threads from acquiring the lock

延伸追问：

StampedLock 的 tryConvertToWriteLock(stamp) 是什么作用？和先释放读锁再获取写锁相比有什么优势？
为什么 StampedLock 不可重入？设计上是怎么考量的？
在什么场景下 StampedLock 的性能反而不如 ReentrantReadWriteLock？
What does tryConvertToWriteLock(stamp) do? What's its advantage over releasing the read lock and acquiring a write lock separately?
Why is StampedLock non-reentrant? What were the design considerations?

风控关联：

风控规则缓存是典型的"读多写少"场景（规则加载后很少变更，但每次请求都要读取）
StampedLock 的乐观读机制可以在不阻塞规则热更新的前提下，提供极高的读吞吐量
Risk rule caching is a classic "read-heavy, write-rare" scenario (rules rarely change after loading, but every request reads them)
StampedLock's optimistic read mechanism delivers extremely high read throughput without blocking rule hot-reloads

English Answer：

StampedLock is a read-write synchronization tool introduced in JDK 8. It provides three modes: write lock, pessimistic read lock, and optimistic read. The optimistic read mode returns a stamp rather than taking a traditional read lock.
In optimistic read mode, the reader obtains a stamp through tryOptimisticRead(), reads the data without blocking writers, and then calls validate(stamp). If validation succeeds, no write occurred during the read and the data can be used. If validation fails, the reader must fall back to a pessimistic read lock and read again.
Compared with ReentrantReadWriteLock, the biggest advantage is that optimistic reads do not block writes and have very low overhead. In read-heavy and write-rare workloads, such as rule cache reads, configuration snapshots, or mostly immutable coordinate data, this can provide much higher throughput. A normal read lock in ReentrantReadWriteLock still participates in lock coordination and can delay writers when many readers are active.
The trade-offs are important. StampedLock is not reentrant, does not support Condition, and is easier to misuse. You must validate optimistic reads before using the result. If validation fails, you usually acquire a pessimistic read lock, not a write lock, unless modification is required. You should also avoid long blocking operations inside the lock.
I would choose StampedLock for simple read-mostly data structures where optimistic validation is easy and critical sections are short. I would choose ReentrantReadWriteLock when reentrancy, conditions, simpler semantics, or more conventional lock behavior is needed.

Q10. ForkJoinPool 和普通 ThreadPoolExecutor 有什么区别？什么场景下应该用 ForkJoinPool？

EN: When would you use ForkJoinPool over a regular ThreadPoolExecutor?

难度： ★★★★ | 出现频率： 高（阿里、字节、美团）

Key Terms: ForkJoinPool (分治线程池), Work Stealing (工作窃取), RecursiveTask (递归任务), Divide-and-Conquer (分治), CommonPool (公共池)

答案要点：

核心区别：

- ThreadPoolExecutor 采用"共享队列"模型：所有工作线程从同一个队列（或有限个队列）取任务 - ForkJoinPool 采用"工作窃取（Work Stealing）"模型：每个工作线程有自己的双端队列（Deque），线程从自己队列头部取任务执行；空闲线程从其他线程的队列尾部"窃取"任务

工作窃取的优势：

- 减少竞争：每个线程优先操作自己的队列，只有窃取时才竞争 - 负载均衡：繁忙线程的任务会被空闲线程窃取，自动平衡负载 - 适合分治：大任务 fork 为小任务放入自己的队列，其他线程可以窃取子任务并行执行

适用场景：分治任务（归并排序、并行求和、递归计算）、CPU 密集型计算任务
不适用场景：IO 密集型任务（IO 阻塞会占用工作线程，导致窃取效率下降）、任务之间有复杂依赖的场景
与 CompletableFuture 的关系：CompletableFuture 默认使用 ForkJoinPool.commonPool()

代码示例：


// 用 ForkJoinPool 做并行风险评分计算
public class ParallelRiskScorer extends RecursiveTask<Double> {
    private static final int THRESHOLD = 100;
    private final List<RiskFactor> factors;
    private final int start, end;

    public ParallelRiskScorer(List<RiskFactor> factors, int start, int end) {
        this.factors = factors;
        this.start = start;
        this.end = end;
    }

    @Override
    protected Double compute() {
        if (end - start <= THRESHOLD) {
            // 小任务直接计算
            return factors.subList(start, end).stream()
                .mapToDouble(RiskFactor::score)
                .sum();
        }
        // 大任务分治
        int mid = (start + end) >>> 1;
        ParallelRiskScorer left = new ParallelRiskScorer(factors, start, mid);
        ParallelRiskScorer right = new ParallelRiskScorer(factors, mid, end);
        left.fork();              // 异步执行左半部分
        double rightResult = right.compute(); // 当前线程执行右半部分
        double leftResult = left.join();      // 等待左半部分结果
        return leftResult + rightResult;
    }

    public static double score(List<RiskFactor> factors) {
        ForkJoinPool pool = new ForkJoinPool(Runtime.getRuntime().availableProcessors());
        try {
            return pool.invoke(new ParallelRiskScorer(factors, 0, factors.size()));
        } finally {
            pool.shutdown();
        }
    }
}

常见误区：

❌ 在 ForkJoinPool 中执行 IO 阻塞任务 → ✅ 工作窃取模型依赖线程持续计算，IO 阻塞会导致线程利用率极低
❌ 在 RecursiveTask 的 compute() 中使用 fork() 后 fork() 再 join() → ✅ 应该让一个子任务在当前线程执行（compute()），只 fork 另一个
❌ 使用 ForkJoinPool.commonPool() 执行所有并行任务 → ✅ commonPool 是全局共享的，不同业务互相影响
❌ Running IO-blocking tasks in ForkJoinPool → ✅ Work stealing depends on threads staying busy; IO blocking leads to extremely low thread utilization
❌ Forking both subtasks in RecursiveTask.compute() then joining both → ✅ One subtask should execute on the current thread via compute(), and only the other should be forked
❌ Using ForkJoinPool.commonPool() for all parallel tasks → ✅ The commonPool is globally shared; different business domains interfere with each other

延伸追问：

ForkJoinPool.commonPool() 的并行度默认是多少？如何修改？
ForkJoinPool 的 Work Stealing 在窃取时如何保证线程安全？
JDK 21 虚拟线程和 ForkJoinPool 的关系是什么？
What is the default parallelism of ForkJoinPool.commonPool()? How do you change it?
How does Work Stealing in ForkJoinPool ensure thread safety when stealing tasks?

风控关联：

风控引擎中批量风险评估（如对一批交易做并行评分）可以用 ForkJoinPool 做分治计算
但注意风控调用外部服务的部分（IO 密集）不应该放在 ForkJoinPool 中，应该用独立的线程池
Batch risk assessment (e.g., parallel scoring of a batch of transactions) in risk engines can leverage ForkJoinPool for divide-and-conquer computation
However, IO-intensive external service calls must not run in ForkJoinPool — use a separate thread pool instead

English Answer：

ThreadPoolExecutor and ForkJoinPool use different scheduling models. ThreadPoolExecutor usually uses a shared work queue: workers take tasks from the queue and execute them. ForkJoinPool uses work stealing: each worker has its own deque, executes tasks from its own deque, and idle workers steal tasks from other workers' deques.
Work stealing reduces contention because most operations are local to a worker's own deque. It also balances load automatically: if one worker creates many subtasks and another worker becomes idle, the idle worker can steal some work. This model is especially suitable for divide-and-conquer algorithms where a large task recursively splits into smaller tasks.
Good use cases for ForkJoinPool include recursive CPU-bound computation, parallel sorting, parallel aggregation, and carefully controlled parallel streams. A RecursiveTask or RecursiveAction typically forks one subtask and computes the other in the current thread, then joins the forked subtask, which avoids unnecessary scheduling overhead.
It is not a good fit for IO-heavy blocking tasks. If workers block on network or database calls, the work-stealing model cannot keep CPU utilization high and the pool may stall. CompletableFuture and parallel streams use ForkJoinPool.commonPool() by default, so using it for all business tasks can cause cross-service interference.
For normal web request handling, external service calls, and business task queues, I would use a dedicated ThreadPoolExecutor with a bounded queue and clear rejection policy. I would use a dedicated ForkJoinPool only for CPU-bound recursive computation or batch scoring where the workload naturally splits into independent subtasks.

Q11. Java 内存模型（JMM）的 happens-before 规则是怎样的？如何判断一段并发代码是否线程安全？

EN: What are the happens-before rules in JMM? How do you determine if concurrent code is thread-safe?

难度： ★★★★ | 出现频率： 高（阿里、字节、美团）

Key Terms: JMM (Java 内存模型), Happens-Before (先行发生), Memory Barrier (内存屏障), Data Race (数据竞争), volatile (易变关键字)

答案要点：

happens-before 八大规则（核心记忆版）：

- 程序顺序规则：同一线程中，前面的操作 happens-before 后面的操作 - 监视器锁规则：unlock 操作 happens-before 后续对同一个锁的 lock 操作 - volatile 规则：volatile 写 happens-before 后续对同一个变量的 volatile 读 - 线程启动规则：Thread.start() happens-before 该线程的所有操作 - 线程终止规则：线程的所有操作 happens-before Thread.join() 返回 - 线程中断规则：interrupt() 调用 happens-before 被中断线程检测到中断 - 对象终结规则：构造函数执行完毕 happens-before finalize() - 传递性：A happens-before B，B happens-before C → A happens-before C

判断线程安全的思路：

- 找到共享变量（被多个线程读写的变量） - 检查是否有 happens-before 关系覆盖所有读写路径 - 如果没有，存在数据竞争（data race），代码不是线程安全的

经典分析案例：双重检查锁定（DCL）中，如果 instance 不加 volatile，构造函数的初始化赋值可能被重排到引用赋值之后，其他线程可能看到未初始化完成的对象

代码示例：


// 用 happens-before 分析：以下代码是否安全？
public class SafePublication {
    private int x;
    private volatile boolean initialized = false;

    public void init() {
        x = 42;                       // (1) 程序顺序：x=42 hb volatile 写
        initialized = true;            // (2) volatile 写
    }

    public int getX() {
        if (initialized) {             // (3) volatile 读
            return x;                  // (4) 程序顺序：volatile 读 hb return x
        }
        return 0;
    }
    // 分析：(1) hb (2) [程序顺序]；(2) hb (3) [volatile 规则]；(3) hb (4) [程序顺序]
    // 传递性：(1) hb (4) → 安全！
}

常见误区：

❌ 认为 happens-before 就是"时间上的先后" → ✅ happens-before 是一种偏序关系，描述的是可见性保证，不是执行顺序
❌ 认为所有共享变量只要有一个加 volatile 就够了 → ✅ 要看 happens-before 链路是否覆盖了所有共享变量的读写
❌ 认为 synchronized 保证了原子性就够了 → ✅ synchronized 同时保证了原子性、可见性和有序性，三者缺一不可
❌ Equating happens-before with "temporal ordering" → ✅ Happens-before is a partial order describing visibility guarantees, not execution sequence
❌ Assuming one volatile field covers all shared variables → ✅ You must verify that the happens-before chain covers all read/write paths for every shared variable
❌ Thinking synchronized only guarantees atomicity → ✅ synchronized guarantees atomicity, visibility, and ordering — all three are indispensable

延伸追问：

final 字段的初始化安全性是怎样的？为什么 final 字段不需要 volatile？
LazySet（Unsafe.putOrderedXxx）和 volatile 写的区别是什么？什么场景下可以用 LazySet 代替 volatile 写？
Java Memory Model 和 C++ Memory Model 有什么异同？
What is the initialization safety guarantee for final fields? Why don't final fields need volatile?
What are the similarities and differences between the Java Memory Model and the C++ Memory Model?

风控关联：

通用基础。理解 JMM 是正确编写无锁并发代码的前提，也是理解 volatile、synchronized、final 语义的基础
Foundational knowledge — understanding JMM is a prerequisite for correctly writing lock-free concurrent code, and for understanding the semantics of volatile, synchronized, and final

English Answer：

In the Java Memory Model, happens-before is a visibility and ordering guarantee, not simply chronological order. If action A happens-before action B, then B must be able to see the effects of A, and the JVM must preserve the ordering required by the memory model.
The core happens-before rules include the program order rule, where earlier operations in the same thread happen-before later operations; the monitor lock rule, where an unlock happens-before a later lock on the same monitor; the volatile rule, where a volatile write happens-before a later volatile read of the same variable; the thread start rule, where Thread.start() happens-before all actions in the started thread; and the thread termination rule, where all actions in a thread happen-before another thread successfully returns from join().
Other important rules include the interrupt rule, where interrupt() happens-before the interrupted thread detects the interrupt; the finalizer rule, where constructor completion happens-before finalize(); and transitivity, where if A happens-before B and B happens-before C, then A happens-before C.
To judge whether concurrent code is thread-safe, I first identify all shared variables that are read or written by multiple threads. Then I check whether every read/write path is protected by a happens-before relationship, such as the same lock, a volatile publication, thread start/join, immutable publication, or another safe publication mechanism. If a shared variable can be read and written without such a relationship, there is a data race.
Double-checked locking is the classic example. Without volatile, assignment of the object reference may be reordered before the constructor fully initializes the object, so another thread can observe a non-null but partially initialized instance. Marking the field volatile creates the necessary ordering and visibility guarantees.

Q12. 如何设计一个高性能的并发限流器？从 Semaphore 到令牌桶，你会怎么实现？

EN: How would you design a high-performance concurrent rate limiter? From Semaphore to token bucket?

难度： ★★★★★ | 出现频率： 高（阿里 P7、美团 L9、字节 3-1）

Key Terms: Token Bucket (令牌桶), Leaky Bucket (漏桶), LongAdder (长整型累加器), Rate Limiter (限流器), Redis Lua (Redis Lua 脚本)

答案要点：

限流算法对比：

- 固定窗口计数器：简单但有临界突刺问题（窗口边界两倍流量） - 滑动窗口计数器：将窗口细分为多个小窗口，缓解突刺但仍有精度限制 - 漏桶（Leaky Bucket）：恒定速率处理请求，适合平滑流量，但不允许突发 - 令牌桶（Token Bucket）：恒定速率生成令牌，允许突发（桶内有积累的令牌），是工业界最常用的方案

高性能实现要点：

- 无锁化：用 LongAdder / AtomicLong 代替 synchronized 计数 - 惰性计算：每次请求时才计算当前可用的令牌数（而不是用定时线程去填充令牌） - 预热机制：Guava RateLimiter 支持 warmup 模式，冷启动时逐步提升速率 - 分布式扩展：单机限流用内存计数，集群限流用 Redis + Lua 脚本原子操作

Guava RateLimiter 的核心思想：存储"下一次可以获取令牌的时间"（storedPermits + nextFreeTicketMicros），每次 acquire() 时计算需要等待的时间，如果需要等待则 Thread.sleep() 或返回 false

代码示例：


// 简化版令牌桶限流器（单机、无锁）
public class SimpleTokenBucket {
    private final long capacity;         // 桶容量
    private final long refillTokens;     // 每次补充的令牌数
    private final long refillIntervalMs; // 补充间隔（毫秒）
    private final AtomicLong availableTokens;
    private final AtomicLong lastRefillTime;

    public SimpleTokenBucket(long capacity, long refillTokens, long refillIntervalMs) {
        this.capacity = capacity;
        this.refillTokens = refillTokens;
        this.refillIntervalMs = refillIntervalMs;
        this.availableTokens = new AtomicLong(capacity);
        this.lastRefillTime = new AtomicLong(System.currentTimeMillis());
    }

    public boolean tryAcquire() {
        refill(); // 惰性补充令牌
        while (true) {
            long current = availableTokens.get();
            if (current <= 0) return false;
            if (availableTokens.compareAndSet(current, current - 1)) {
                return true;
            }
        }
    }

    private void refill() {
        long now = System.currentTimeMillis();
        long lastTime = lastRefillTime.get();
        if (now - lastTime < refillIntervalMs) return;
        if (lastRefillTime.compareAndSet(lastTime, now)) {
            long newTokens = Math.min(capacity, availableTokens.get() + refillTokens);
            availableTokens.set(newTokens);
        }
    }
}

// 生产级方案：直接用 Guava RateLimiter
RateLimiter limiter = RateLimiter.create(1000); // 每秒 1000 个令牌
if (limiter.tryAcquire()) {
    processRequest();
} else {
    rejectRequest(); // 快速失败
}

常见误区：

❌ 用 Thread.sleep() 实现限流 → ✅ 这是"等待"而非"限流"，会占用线程资源
❌ 分布式限流只依赖 Redis INCR 而不用 Lua 脚本 → ✅ INCR + TTL 不是原子操作，可能导致超限
❌ 认为令牌桶和漏桶可以互相替代 → ✅ 令牌桶允许突发，漏桶不允许；选择取决于业务场景
❌ Using Thread.sleep() for rate limiting → ✅ That's "waiting", not "limiting" — it ties up thread resources
❌ Relying on Redis INCR alone for distributed rate limiting → ✅ INCR + TTL is not atomic; use Lua scripts to prevent over-limiting
❌ Assuming token bucket and leaky bucket are interchangeable → ✅ Token bucket allows bursts, leaky bucket does not; the choice depends on business requirements

延伸追问：

如何实现一个集群级别的分布式限流器？
令牌桶在突发流量下的行为是什么？如何控制最大突发量？
Sentinel（阿里开源）的限流实现和 Guava RateLimiter 有什么区别？
How would you implement a cluster-level distributed rate limiter?
How does Sentinel (Alibaba open source) differ from Guava RateLimiter in its rate limiting implementation?

风控关联：

风控网关必须实现限流保护，防止突发流量压垮规则引擎。生产环境通常采用令牌桶算法（允许合理突发），结合 Redis 做集群限流
风控场景的特殊性在于：限流拒绝的请求需要返回"降级决策"（如默认通过或默认拒绝），而非简单返回 429
Risk gateways must implement rate limiting to prevent burst traffic from overwhelming the rule engine. Production typically uses token bucket (allowing reasonable bursts) combined with Redis for cluster-level limiting
The risk-specific nuance: rate-limited requests should return a fallback decision (default pass or default reject) rather than a simple HTTP 429

English Answer：

The common rate-limiting algorithms have different trade-offs. A fixed-window counter is simple but has a boundary burst problem: traffic at the end of one window and the beginning of the next can effectively double the allowed rate. A sliding-window counter divides the window into smaller buckets and reduces this burst, but it still has precision and storage trade-offs.
A leaky bucket processes requests at a constant rate, so it smooths traffic well but does not allow bursts. A token bucket generates tokens at a constant rate and allows unused tokens to accumulate up to a capacity, so it supports controlled bursts. This is why token bucket is one of the most common production choices.
For high performance on a single machine, I would avoid a coarse synchronized implementation. Counters can use AtomicLong or LongAdder depending on whether exact immediate reads are required. Token refill should be lazy: instead of running a timer thread to add tokens, each request calculates how many tokens should have been generated since the last refill and updates the state with CAS.
Guava RateLimiter is based on the idea of stored permits and the next time a permit can be issued, such as storedPermits and nextFreeTicketMicros. acquire() may wait, while tryAcquire() can fail fast. In request-path services, I usually prefer fast-fail or bounded waiting instead of sleeping business threads for a long time.
For cluster-level limiting, Redis plus Lua is a common solution because checking quota, incrementing counters, and setting TTL must be atomic. Plain INCR plus separate EXPIRE has race conditions. In a risk gateway, a rate-limited request should return a defined fallback decision, such as default pass, default reject, or manual review, rather than only returning HTTP 429.

关联

JVM — 锁升级依赖对象头 Mark Word，理解 synchronized 必须理解 JVM 内存布局
Redis — 分布式限流和分布式锁是并发控制在分布式场景的延伸
实时风控引擎 — 并发编程是风控引擎性能优化的基石