JVM

面向 3-5 年经验 Java 后端开发,覆盖内存模型、GC 算法与选型、类加载、调优实战等高频考点。
每道题包含中英双语答案、代码示例、常见误区和风控关联。
相关页面:并发编程 · MySQL · 实时风控引擎

Q1. 描述 JVM 内存模型,各区域存什么?哪些线程共享?哪些线程私有?

EN: Describe the JVM memory model. What is stored in each area? Which areas are shared and which are thread-private?

难度: ★★★★ | 出现频率: 极高(阿里、美团、字节、蚂蚁)

Key Terms: Heap (堆), Stack (栈), Method Area (方法区), Metaspace (元空间), Program Counter (程序计数器), GC roots (GC 根), generational (分代)

答案要点:

  1. 堆(Heap):所有线程共享,存储对象实例和数组。GC 主要管理区域,分为新生代(Eden + S0/S1)和老年代。JDK 8 后默认 G1 不再严格分代
  2. 虚拟机栈(VM Stack):线程私有,每个方法调用创建一个 Stack Frame(局部变量表、操作数栈、动态链接、返回地址)。StackOverflowError / OOM
  3. 本地方法栈(Native Method Stack):线程私有,为 Native 方法服务
  4. 程序计数器(PC Register):线程私有,记录当前执行的字节码行号,唯一不会 OOM 的区域
  5. 方法区 / 元空间(Metaspace):JDK 8 前叫 PermGen(永久代),JDK 8 后改为 Metaspace(使用本地内存)。存储类元信息、常量池、静态变量。Metaspace 默认无上限,可能吃光物理内存

常见误区:

  • ❌ 字符串常量池在 Metaspace 里 → ✅ JDK 7 起字符串常量池已移到堆中,类元信息才在 Metaspace
  • ❌ Metaspace 不会 OOM → ✅ Metaspace 使用本地内存,不设 -XX:MaxMetaspaceSize 可能导致物理内存耗尽
  • ❌ 程序计数器也会 OOM → ✅ 程序计数器是 JVM 规范中唯一不会发生 OOM 的区域
  • ❌ The string constant pool lives in Metaspace → ✅ Since JDK 7, the string constant pool has been moved to the heap; only class metadata resides in Metaspace
  • ❌ Metaspace cannot run out of memory → ✅ Metaspace uses native memory — without -XX:MaxMetaspaceSize, it can exhaust physical memory
  • ❌ The program counter can also cause OOM → ✅ The PC register is the only JVM area that, by specification, never triggers OutOfMemoryError

延伸追问:

  • 一个对象从创建到回收经历了什么?
  • 为什么要从永久代迁移到元空间?
  • 如何排查 Metaspace OOM?
  • What is the lifecycle of an object from creation to garbage collection?
  • Why was PermGen replaced by Metaspace in JDK 8?
  • How do you troubleshoot a Metaspace OOM in production?

风控关联:

  • 风控服务常加载大量规则类(Drools 编译生成的 class),Metaspace 调优很关键,需要设置合理的 -XX:MaxMetaspaceSize 防止规则热更新时吃光物理内存
  • Risk control services often load a large number of rule classes (e.g., Drools-compiled bytecode). Metaspace tuning is critical — setting a reasonable -XX:MaxMetaspaceSize prevents hot rule reloading from exhausting native memory
  • 关联 实时风控引擎

English Answer:

  1. The JVM runtime data areas are usually explained as the heap, VM stack, native method stack, program counter, and method area/metaspace. The heap is shared by all threads and stores object instances and arrays. It is the main area managed by garbage collection. In classic generational collectors it is divided into the young generation, including Eden and Survivor spaces, and the old generation. With G1, the heap is organized into regions, so the physical layout is no longer a strict contiguous young/old split, even though the generational concept still exists logically.
  2. The VM stack is private to each thread. Every method invocation creates a stack frame, which contains the local variable table, operand stack, dynamic linking information, and return address. If the call depth is too large, the thread may throw StackOverflowError; if stack memory cannot be allocated, it may throw OutOfMemoryError.
  3. The native method stack is also thread-private. It serves native methods invoked through JNI or other native interfaces, while the program counter is thread-private and records the bytecode instruction currently being executed by the thread. The program counter is the only JVM runtime area that the JVM specification says will not throw OutOfMemoryError.
  4. The method area was implemented as PermGen before JDK 8 and as Metaspace since JDK 8. Metaspace uses native memory and stores class metadata, runtime constant-pool-related metadata, and static variable metadata. Because Metaspace is native memory and may be effectively unbounded by default, production services should set a reasonable -XX:MaxMetaspaceSize, especially when frameworks dynamically generate classes.

Q2. CMS、G1、ZGC 有什么区别?生产环境怎么选?

EN: Compare CMS, G1, and ZGC garbage collectors. How do you choose among them in production?

难度: ★★★★★ | 出现频率: 极高(阿里、美团、字节、蚂蚁)

Key Terms: STW (Stop-The-World), concurrent mark-sweep (并发标记清除), region-based (基于 Region), SATB (快照标记), colored pointers (染色指针), read barrier (读屏障)

答案要点:

  1. CMS(Concurrent Mark-Sweep):JDK 9 标记为 deprecated,JDK 14 移除。并发标记 + 并发清除,低延迟但有以下问题:浮动垃圾(并发阶段产生的新垃圾需下次 GC 回收)、内存碎片(mark-sweep 不压缩)、CPU 敏感(并发阶段占用 CPU)
  2. G1(Garbage-First):JDK 9 起默认。Region 化布局(Eden/Survivor/Old/Humongous),优先回收垃圾最多的 Region。兼顾延迟和吞吐,可预测停顿时间(-XX:MaxGCPauseMillis)。使用 SATB(Snapshot-At-The-Beginning)做并发标记
  3. ZGC:JDK 11 引入(实验),JDK 15 生产可用。基于 colored pointers 和 load barrier 实现并发整理,STW < 1ms(通常 < 0.1ms)。适合超大堆(TB 级)和超低延迟场景
维度 CMS G1 ZGC
STW 时间 几十~几百 ms 几十~200ms < 1ms
堆大小 < 32GB 4GB~64GB TB 级
内存碎片 有(不压缩) 无(Region 复制) 无(并发整理)
适用场景 遗留系统 通用 超低延迟

常见误区:

  • ❌ G1 不会 Full GC → ✅ G1 在极端情况下仍会触发 Full GC(单线程 Serial Old),需要监控和调优避免
  • ❌ ZGC 适合所有场景 → ✅ ZGC 的吞吐量略低于 G1,小堆场景下 G1 更合适
  • ❌ CMS 还在生产中使用 → ✅ CMS 在 JDK 14 已被正式移除,新项目不应再使用
  • ❌ G1 never triggers Full GC → ✅ G1 can still fall back to a single-threaded Serial Old Full GC under extreme conditions — monitoring and tuning are required to avoid it
  • ❌ ZGC is suitable for all scenarios → ✅ ZGC has slightly lower throughput than G1; for small heaps, G1 is generally the better choice
  • ❌ CMS is still viable in production → ✅ CMS was formally removed in JDK 14 — new projects should not use it

延伸追问:

  • G1 的 Mixed GC 触发条件是什么?
  • ZGC 的 colored pointers 原理是什么?
  • 如何调优 G1 避免 Evacuation Failure?
  • What triggers G1's Mixed GC, and how does it differ from a Young GC?
  • How do colored pointers work in ZGC, and why do they enable sub-millisecond pauses?
  • How do you tune G1 to prevent Evacuation Failure?

风控关联:

  • 实时风控对延迟敏感(SLA < 50ms),GC 调优直接影响决策响应时间。风控服务建议 G1 或 ZGC
  • Real-time risk control is latency-sensitive (SLA < 50ms). GC tuning directly impacts decision response times. G1 or ZGC is recommended for risk control services
  • 关联 实时风控引擎

English Answer:

  1. CMS, G1, and ZGC are all collectors designed to reduce pause time, but they make different trade-offs. CMS, or Concurrent Mark-Sweep, performs most marking and sweeping concurrently with application threads. It was deprecated in JDK 9 and removed in JDK 14. Its advantage is lower latency than traditional stop-the-world collectors, but its drawbacks are significant: it produces floating garbage during concurrent phases, it does not compact memory and therefore suffers from fragmentation, and it is sensitive to CPU pressure because concurrent GC threads compete with business threads.
  2. G1, or Garbage-First, has been the default collector since JDK 9. It divides the heap into regions, including Eden, Survivor, Old, and Humongous regions, and prioritizes reclaiming regions with the highest garbage ratio. G1 balances throughput and latency and allows a target pause time through -XX:MaxGCPauseMillis. Its concurrent marking is based on SATB, or Snapshot-At-The-Beginning. It is a good general-purpose choice for medium and large heaps where predictable pauses matter.
  3. ZGC was introduced experimentally in JDK 11 and became production-ready in JDK 15. It uses colored pointers and load barriers to support concurrent marking, relocation, and reference remapping, so stop-the-world pauses are usually below one millisecond and often far below that. It is suitable for very large heaps and ultra-low-latency services, but it may trade away some throughput compared with G1, so it is not automatically the best choice for small heaps.
  4. In production, I would choose based on heap size, latency SLA, JDK version, and throughput requirements. New projects should not choose CMS. For most Java backend services, G1 is the default and safest choice. For a real-time risk-control or trading service with a strict latency target and a large heap, I would evaluate ZGC with pressure tests and GC logs before rollout.

Q3. 类加载机制是什么?双亲委派模型能被打破吗?

EN: Explain the class loading mechanism. Can the parent delegation model be broken? Give examples.

难度: ★★★★ | 出现频率: 极高(阿里、美团、字节、京东)

Key Terms: Bootstrap ClassLoader (启动类加载器), Extension ClassLoader (扩展类加载器), Application ClassLoader (应用类加载器), parent delegation (双亲委派), hot deploy (热部署), SPI (服务提供者接口)

答案要点:

  1. 双亲委派(Parent Delegation):加载类时先委派父加载器,父加载器找不到才自己加载。保证核心类(如 java.lang.Object)不被篡改,避免重复加载
  2. 打破双亲委派的场景
  3. - SPI 机制(如 JDBC、JNDI):核心类(rt.jar)需要加载厂商实现类,但 BootstrapClassLoader 看不到 classpath 下的类。解决:线程上下文类加载器(Thread Context ClassLoader) - Tomcat:每个 WebApp 有独立 ClassLoader,优先加载 WEB-INF/classes 和 WEB-INF/lib,实现应用隔离 - OSGi / 模块化:网状加载而非树状 - 热部署:自定义 ClassLoader 重新加载修改后的类(如 Arthas 的 redefine)

代码示例:


// Tomcat 打破双亲委派的核心逻辑
public Class<?> loadClass(String name) {
    // 1. 先检查是否已加载
    Class<?> clazz = findLoadedClass(name);
    if (clazz != null) return clazz;

    // 2. Java 核心类仍走双亲委派
    if (name.startsWith("java.")) {
        return super.loadClass(name); // 委派给 parent
    }

    // 3. 自己的 classpath 优先加载(打破双亲委派)
    clazz = findClass(name);
    if (clazz != null) return clazz;

    // 4. 找不到再委派给 parent
    return super.loadClass(name);
}

常见误区:

  • ❌ 双亲委派绝对不能被打破 → ✅ SPI、Tomcat、OSGi 等场景都打破了双亲委派,这是一种有目的的设计
  • ❌ 打破双亲委派就是不安全 → ✅ Tomcat 对核心类(java.*)仍走双亲委派,只在业务类上打破,保证安全性
  • ❌ Breaking parent delegation is inherently unsafe → ✅ Tomcat still delegates loading of core classes (java.*) to the parent classloader; it only overrides business class loading to achieve application isolation
  • ❌ The parent delegation model can never be broken → ✅ SPI, Tomcat, and OSGi all break parent delegation intentionally as a design decision

延伸追问:

  • Arthas 怎么实现热更新?
  • Spring Boot 的 LaunchedURLClassLoader 和双亲委派的关系?
  • 如何实现一个支持热部署的自定义 ClassLoader?
  • How does Arthas implement hot-swapping of classes at runtime?
  • What is the relationship between Spring Boot's LaunchedURLClassLoader and the parent delegation model?
  • How would you implement a custom ClassLoader that supports hot deployment?

风控关联:

  • Drools 规则引擎编译生成大量 class,需要理解类加载机制做规则热更新,避免 ClassLoader 泄漏导致 Metaspace 膨胀
  • The Drools rule engine compiles and generates a large number of classes. Understanding class loading is essential for hot rule updates and preventing ClassLoader leaks that cause Metaspace bloat
  • 关联 实时风控引擎

English Answer:

  1. Class loading follows the parent delegation model by default. When a classloader receives a request to load a class, it first delegates the request to its parent. Only when the parent cannot find the class does the child classloader try to load it itself. This design ensures that core JDK classes such as java.lang.Object are loaded by the Bootstrap ClassLoader and cannot be replaced by application code. It also avoids duplicate loading of the same core classes.
  2. Parent delegation can be intentionally broken in specific scenarios. One classic example is SPI, such as JDBC and JNDI. JDK core classes may need to discover and load vendor implementations from the application classpath, but the Bootstrap ClassLoader cannot see those classes. The solution is the thread context classloader, which allows higher-level application classes to be loaded from a lower-level context.
  3. Tomcat is another common example. Each web application has its own classloader, and Tomcat lets it load WEB-INF/classes and WEB-INF/lib first so that different applications can use different library versions. However, Tomcat still delegates core java.* classes to the parent classloader, so breaking delegation is controlled rather than arbitrary.
  4. OSGi and modular systems use more flexible classloading relationships instead of a pure parent-child tree. Hot deployment also relies on custom classloaders: modified classes are loaded by a new classloader, and the old classloader must be released so its classes can be unloaded and Metaspace does not leak.

Q4. 线上服务频繁 Full GC,你怎么排查?

EN: Your production service is experiencing frequent Full GC. Walk me through your troubleshooting process.

难度: ★★★★★ | 出现频率: 极高(阿里、美团、字节、蚂蚁)

Key Terms: jstat (GC 统计工具), jmap (堆内存工具), jstack (线程栈工具), Arthas (Java 诊断工具), heap dump (堆转储), MAT (内存分析工具), GC log (GC 日志), promotion failure (晋升失败)

答案要点:

  1. 确认现象jstat -gcutil <pid> 1000 观察 GC 频率和各区使用率;检查 GC 日志(-Xlog:gc*
  2. 定位原因(常见几种):
  3. - 内存泄漏:老年代持续增长不下降 → jmap -histo:live <pid> 看大对象分布 → jmap -dump:live,format=b,file=heap.hprof <pid> 导出堆转储 → MAT 分析 Dominator Tree 找到 GC Root 引用链 - 大对象直接进老年代:超过 -XX:PretenureSizeThreshold 的对象直接分配在老年代 - Metaspace 满了:动态生成大量 class(如 CGLIB 代理、Drools 规则),-XX:MaxMetaspaceSize 触发 Full GC - System.gc() 被调用:NIO ByteBuffer.allocateDirect 分配直接内存时可能触发 - Promotion Failure:Young GC 时 survivor 放不下要晋升的对象

  4. 临时止血-XX:+DisableExplicitGC 禁止 System.gc();调大 -Xmx;重启
  5. 根本解决:修复内存泄漏代码;调优 GC 参数;升级 GC 算法

代码示例:


# 快速排查命令链
jstat -gcutil <pid> 1000 5          # 观察 GC 趋势
jmap -histo:live <pid> | head -20   # 大对象 Top20
jcmd <pid> GC.heap_info             # 堆内存概况
# Arthas 在线诊断
java -jar arthas-boot.jar
dashboard          # 综合面板
heapdump           # 堆转储

常见误区:

  • ❌ Full GC 一定是内存泄漏 → ✅ Full GC 还可能由 Metaspace 满、System.gc() 调用、Promotion Failure 等原因触发
  • ❌ 出现 Full GC 就立刻重启 → ✅ 应先用 jstat 和 GC 日志定位原因,盲目重启可能掩盖问题
  • -XX:+DisableExplicitGC 能解决所有 Full GC → ✅ 它只能禁止 System.gc() 触发的 Full GC,对内存泄漏等真实原因无效
  • ❌ Frequent Full GC always means a memory leak → ✅ Full GC can also be triggered by Metaspace exhaustion, System.gc() calls, or Promotion Failure
  • ❌ You should immediately restart when Full GC occurs → ✅ Always diagnose with jstat and GC logs first; blindly restarting may mask the root cause
  • -XX:+DisableExplicitGC solves all Full GC problems → ✅ It only prevents System.gc()-triggered Full GCs, not those caused by memory leaks or other real issues

延伸追问:

  • 如何区分是内存泄漏还是内存分配过快?
  • Young GC 频繁但 Full GC 不频繁,说明什么?
  • 线上无法导出堆转储时怎么排查?
  • How do you distinguish between a memory leak and excessively fast memory allocation?
  • What does it indicate when Young GC is frequent but Full GC is rare?
  • How would you troubleshoot when you cannot take a heap dump in production?

风控关联:

  • 风控服务的高频特征计算可能导致大量临时对象,需要关注 Young GC 频率。Drools 规则热更新可能导致 Metaspace 膨胀引发 Full GC
  • High-frequency feature computation in risk control services may generate a large volume of temporary objects, requiring attention to Young GC frequency. Drools rule hot-reloading can cause Metaspace bloat that triggers Full GC
  • 关联 实时风控引擎

English Answer:

  1. I would first confirm the symptom instead of guessing. I would run jstat -gcutil <pid> 1000 to observe Young GC and Full GC frequency, old generation usage, metaspace usage, and whether memory drops after GC. I would also check GC logs, using -Xlog:gc* on JDK 9+ or the corresponding GC log flags on JDK 8, to identify the trigger and pause pattern.
  2. Then I would locate the cause. If old generation usage keeps rising and does not drop after Full GC, I would suspect a memory leak. I would use jmap -histo:live <pid> to inspect large object classes, dump the heap with jmap -dump:live,format=b,file=heap.hprof <pid>, and analyze it in MAT, especially the Dominator Tree and GC root reference chains. If large objects are allocated directly into the old generation, I would check object size, allocation path, and -XX:PretenureSizeThreshold-related behavior.
  3. I would also check non-heap causes. If Metaspace is full, dynamic class generation from CGLIB, proxies, Groovy, or Drools rules may be the cause, and I would inspect classloader retention. If explicit System.gc() is being called, sometimes through direct buffer allocation or third-party libraries, I would verify it from logs and consider -XX:+DisableExplicitGC as a temporary mitigation. Promotion failure is another possibility when objects surviving Young GC cannot be promoted safely.
  4. For mitigation, I would separate emergency actions from root fixes. Temporary actions may include disabling explicit GC, increasing heap or metaspace limits, reducing traffic, or restarting after collecting evidence. The permanent fix is to remove the leak, reduce object allocation, tune GC parameters, adjust object lifecycle, or upgrade the collector based on measured data.

Q5. JVM 调优有哪些关键参数?你的调优思路是什么?

EN: What are the key JVM tuning parameters? Walk me through your tuning methodology.

难度: ★★★★ | 出现频率: 高(阿里、美团、字节)

Key Terms: -Xms (初始堆大小), -Xmx (最大堆大小), -Xmn (年轻代大小), MetaspaceSize (元空间大小), MaxGCPauseMillis (最大 GC 停顿时间), GC logging (GC 日志), adaptive sizing (自适应调整)

答案要点:

  1. 堆大小-Xms = -Xmx(避免动态扩容带来的 GC 抖动)。一般设物理内存的 50%-70%
  2. 年轻代-Xmn-XX:NewRatio。年轻代太小 → Young GC 频繁;太大 → 老年代空间不足
  3. Metaspace-XX:MetaspaceSize = -XX:MaxMetaspaceSize(避免动态扩容触发 Full GC)
  4. GC 选型
  5. - 低延迟服务(风控/API 网关):G1,-XX:MaxGCPauseMillis=50 - 大数据/批处理:Parallel GC(吞吐优先) - 超低延迟(交易系统):ZGC

  6. GC 日志:JDK 8: -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:gc.log;JDK 9+: -Xlog:gc*:file=gc.log:time,uptime,level,tags
  7. 调优思路:先确定目标(延迟 vs 吞吐)→ 开启 GC 日志 → 压测观察 → 调参 → 验证

常见误区:

  • -Xms-Xmx 设不同值可以动态节省内存 → ✅ 动态扩容会触发 Full GC 和堆内存抖动,生产环境必须设为相同值
  • ❌ 堆越大越好 → ✅ 堆过大会导致 GC 停顿时间变长,需要根据 GC 算法合理设置
  • ❌ Setting different values for Xms and Xmx saves memory dynamically → ✅ Dynamic heap resizing triggers Full GC and heap memory jitter; production environments must set them to the same value
  • ❌ A bigger heap is always better → ✅ An oversized heap leads to longer GC pauses; size should be chosen based on the GC algorithm used

延伸追问:

  • 你的服务 OOM 了但 GC 日志没有记录,可能是什么原因?
  • 如何判断一个服务需要 GC 调优?
  • Your service hit OOM but there's no GC log — what could be the cause?
  • How do you determine whether a service needs GC tuning?

风控关联:

  • 风控服务 SLA 通常 < 50ms,GC 停顿直接影响决策延迟。建议 G1 + -XX:MaxGCPauseMillis=50
  • Risk control services typically have an SLA of < 50ms. GC pauses directly impact decision latency. G1 with -XX:MaxGCPauseMillis=50 is recommended
  • 关联 实时风控引擎

English Answer:

  1. JVM tuning should start with a clear objective: are we optimizing for latency, throughput, memory footprint, or stability? After that I enable GC logs and runtime metrics first, then tune with pressure-test evidence instead of changing parameters blindly.
  2. For heap sizing, I usually set -Xms equal to -Xmx in production to avoid dynamic heap resizing and the Full GC or memory jitter it may cause. The heap is often set to about 50% to 70% of physical memory, but the final value depends on container limits, off-heap memory, thread stacks, direct buffers, and metaspace usage. The young generation can be tuned with -Xmn or -XX:NewRatio: if it is too small, Young GC becomes frequent; if it is too large, old generation space may become insufficient.
  3. For Metaspace, I prefer setting both -XX:MetaspaceSize and -XX:MaxMetaspaceSize to reasonable values so class metadata growth is visible and controlled. This is especially important for services that use dynamic proxies, rule engines, or hot deployment.
  4. GC selection depends on workload. For latency-sensitive API gateways or risk-control services, G1 with a pause target such as -XX:MaxGCPauseMillis=50 is a practical default. For batch jobs that care more about throughput, Parallel GC may be acceptable. For ultra-low-latency systems with large heaps, ZGC is worth evaluating. The workflow is: define the target, enable logs, run a representative load test, observe allocation rate and pause distribution, adjust one group of parameters at a time, and verify again.

Q6. 说说你对逃逸分析的理解?JIT 如何利用它做优化?

EN: What is escape analysis? How does the JIT compiler leverage it for optimization?

难度: ★★★★ | 出现频率: 中高(字节、美团、蚂蚁)

Key Terms: escape analysis (逃逸分析), scalar replacement (标量替换), stack allocation (栈上分配), lock elision (锁消除), JIT (即时编译器), C1/C2 compiler (C1/C2 编译器)

答案要点:

  1. 逃逸分析(Escape Analysis):JIT 分析对象的作用域是否"逃逸"出方法或线程
  2. 三种优化
  3. - 栈上分配(Stack Allocation):对象不逃逸出方法 → 直接在栈上分配,随方法返回自动回收,无需 GC - 标量替换(Scalar Replacement):对象不逃逸 → 拆解为标量(基本类型),连栈上分配都省了 - 锁消除(Lock Elision):对象不逃逸出线程 → synchronized 等锁操作可安全消除

  4. JIT 分层编译:C1(Client Compiler)快速编译,C2(Server Compiler)深度优化。逃逸分析在 C2 阶段执行

代码示例:


// 锁消除示例:StringBuffer 的 synchronized 在逃逸分析后可被消除
public String concat(String a, String b) {
    StringBuffer sb = new StringBuffer(); // sb 不逃逸
    sb.append(a);
    sb.append(b);
    return sb.toString();
    // JIT 消除 StringBuffer.append() 上的 synchronized
}

常见误区:

  • ❌ 逃逸分析能优化所有对象分配 → ✅ 只有不逃逸出方法/线程的对象才能被优化,逃逸对象仍需在堆上分配
  • ❌ 栈上分配意味着对象在物理栈内存上 → ✅ 标量替换才是更常见的优化,直接拆解为基本类型变量,连对象都不创建
  • ❌ Escape analysis can optimize all object allocations → ✅ Only non-escaping objects (those that don't escape the method/thread) can be optimized; escaping objects still require heap allocation
  • ❌ Stack allocation means the object physically resides on the stack → ✅ Scalar replacement is the more common optimization — it decomposes objects into primitive fields, not even creating the object at all

延伸追问:

  • -XX:-DoEscapeAnalysis 关闭逃逸分析会有什么影响?
  • 为什么有时候手动消除对象分配比依赖逃逸分析更可靠?
  • What happens when you disable escape analysis with -XX:-DoEscapeAnalysis?
  • Why is it sometimes more reliable to manually eliminate object allocations than to rely on escape analysis?

风控关联:

  • 风控特征计算中大量使用临时对象(Feature DTO、Map 等),逃逸分析可减少 GC 压力。但 C2 编译需要足够的热点代码触发,冷启动阶段的优化有限
  • Risk control feature computation generates many temporary objects (Feature DTOs, Maps, etc.). Escape analysis can reduce GC pressure, but C2 compilation requires sufficiently hot code paths — optimization is limited during cold-start phases
  • 关联 实时风控引擎

English Answer:

  1. Escape analysis is an optimization performed by the JIT compiler, mainly in the C2 compiler, to determine whether an object escapes the current method or thread. If an object is only used inside a method and no reference is returned, stored in a field, or passed to code that may retain it, the object is considered non-escaping. If it is visible to other threads, it escapes the thread.
  2. The JIT can use this analysis for three major optimizations. First, a non-escaping object can theoretically be allocated on the stack, so it is reclaimed automatically when the method returns and does not add pressure to the heap. Second, scalar replacement can break the object into primitive fields or local variables, which is even more common because the object allocation may disappear entirely. Third, if a locked object does not escape the thread, the JIT can remove unnecessary synchronized operations through lock elision.
  3. Tiered compilation matters here. C1 compiles quickly and provides profiling data, while C2 performs deeper optimizations after code becomes hot enough. Therefore escape-analysis benefits are more obvious on stable hot paths and less obvious during cold start.
  4. In practice, I would not assume every temporary object will be optimized away. Objects that escape through return values, fields, collections, lambdas, logging, or unknown method calls still need heap allocation. For high-QPS services, I would combine code-level allocation reduction with JIT-friendly code, then verify the effect through allocation profiling and GC metrics.

Q7. 什么是 JVM 的类加载生命周期?一个类在什么条件下会被卸载?

EN: Describe the class lifecycle in JVM. Under what conditions can a class be unloaded?

难度: ★★★ | 出现频率: 中高(阿里、美团)

Key Terms: loading (加载), linking (链接), verification (验证), preparation (准备), resolution (解析), initialization (初始化), unloading (卸载), classloader GC root (类加载器 GC 根)

答案要点:

  1. 生命周期:加载 → 链接(验证 → 准备 → 解析) → 初始化 → 使用 → 卸载
  2. 初始化触发条件(5 种):
  3. - new / getstatic / putstatic / invokestatic 指令 - 反射调用 Class.forName() - 子类初始化触发父类初始化 - 主类(main 方法所在类) - MethodHandle/VarHandle 解析

  4. 卸载条件(必须全部满足)
  5. - 该类所有的实例都已被 GC - 加载该类的 ClassLoader 已经被 GC - 该类的 java.lang.Class 对象没有在任何地方被引用

  6. 实际上很难满足卸载条件,尤其是 BootstrapClassLoader 永远不会被 GC

常见误区:

  • ❌ 类不再使用就会被卸载 → ✅ 类卸载需要同时满足三个严格条件,实际中很少发生,尤其是系统类加载器加载的类几乎不会被卸载
  • ❌ 调用 System.gc() 就能卸载类 → ✅ GC 只是必要条件之一,ClassLoader 和 Class 对象的引用也必须全部断开
  • ❌ A class is unloaded as soon as it is no longer used → ✅ Unloading requires all three strict conditions to be met simultaneously; in practice it rarely happens, especially for classes loaded by the system classloader
  • ❌ Calling System.gc() will unload classes → ✅ GC is only one necessary condition — references to the ClassLoader and Class object must also be fully released

延伸追问:

  • 为什么 BootstrapClassLoader 加载的类永远不会被卸载?
  • 框架做热部署时如何保证旧的类被正确卸载?
  • Why can classes loaded by the BootstrapClassLoader never be unloaded?
  • How do frameworks ensure old classes are properly unloaded during hot deployment?

风控关联:

  • 风控规则引擎(如 Drools)频繁热更新规则时,旧的 ClassLoader 如果不被正确释放,会导致 Metaspace 泄漏。需要理解类卸载条件设计合理的 ClassLoader 回收策略
  • When a risk control rule engine (e.g., Drools) frequently hot-updates rules, failure to properly release old ClassLoaders causes Metaspace leaks. Understanding class unloading conditions is essential for designing sound ClassLoader reclamation strategies
  • 关联 实时风控引擎

English Answer:

  1. A JVM class goes through loading, linking, initialization, usage, and unloading. Linking itself includes verification, preparation, and resolution. Loading reads the class bytes and creates the Class object. Verification checks bytecode safety. Preparation allocates and initializes static fields with default values. Resolution converts symbolic references into direct references when needed. Initialization executes class initialization logic such as static variable assignments and static blocks.
  2. Class initialization is triggered by active use, including new, getstatic, putstatic, and invokestatic; reflective calls such as Class.forName(); initialization of a subclass, which triggers parent initialization first; the main class that contains the main method; and MethodHandle or VarHandle resolution.
  3. A class can be unloaded only when three strict conditions are all satisfied. All instances of the class have been garbage-collected, the classloader that loaded the class has been garbage-collected, and the corresponding java.lang.Class object is no longer referenced anywhere. Missing any one of these conditions prevents unloading.
  4. In practice, class unloading is uncommon for classes loaded by the application or bootstrap classloader, because those classloaders usually live as long as the JVM. Hot-deploy frameworks therefore create custom classloaders for each deployment or rule version and must release all references to the old classloader; otherwise Metaspace will keep growing.

Q8. Arthas 你用过哪些功能?线上 OOM 怎么用它排查?

EN: Which Arthas features have you used? How would you use Arthas to troubleshoot an OOM issue in production?

难度: ★★★★ | 出现频率: 高(阿里、美团、字节)

Key Terms: Arthas (Java 诊断工具), dashboard (仪表盘), thread (线程), heapdump (堆转储), profiler (性能分析器), watch (方法观察), trace (调用链追踪), sc (类搜索), jad (反编译)

答案要点:

  1. 常用命令
  2. - dashboard:全局视图(线程/内存/GC) - thread -n 3:CPU 最高的 3 个线程 - thread -b:找出死锁 - heapdump --live /tmp/dump.hprof:堆转储(只包含存活对象) - watch com.xxx.Service method '{params, returnObj}':观察方法入参和返回值 - trace com.xxx.Service method:方法调用链路耗时 - sc -d com.xxx.Class:查看类信息(ClassLoader 等) - jad com.xxx.Class:反编译在线查看代码 - profiler start / profiler stop:生成火焰图

  3. OOM 排查流程
  4. - dashboard 确认内存区域(heap/metaspace) - heapdump 导出堆转储 → MAT 分析 - 或 vmtool --action getInstances --className java.lang.String --limit 10 在线查看大对象

常见误区:

  • ❌ Arthas 只能排查 OOM 问题 → ✅ Arthas 还能做性能分析(火焰图)、方法级耗时追踪、热更新代码等
  • heapdump 会严重影响线上服务 → ✅ --live 参数只 dump 存活对象,且 Arthas 的 heapdump 比 jmap -dump 对服务的影响更可控,但仍建议低峰期操作
  • ❌ Arthas can only troubleshoot OOM issues → ✅ Arthas also supports performance profiling (flame graphs), method-level latency tracing, and hot code patching
  • heapdump severely impacts production services → ✅ The --live flag only dumps live objects, and Arthas's heapdump is more controlled than jmap -dump, though low-traffic periods are still recommended

延伸追问:

  • Arthas 的 trace 命令对性能有什么影响?
  • 如何在不重启服务的情况下用 Arthas 修复一个紧急 bug?
  • What performance impact does Arthas's trace command have?
  • How do you use Arthas to patch an urgent bug without restarting the service?

风控关联:

  • 风控服务在线排查必备工具。规则引擎动态加载的 class 泄漏用 sc -d 查看 ClassLoader 数量
  • Arthas is an essential tool for online troubleshooting of risk control services. Use sc -d to inspect ClassLoader counts when investigating class leaks from dynamically loaded rule engine classes
  • 关联 实时风控引擎

English Answer:

  1. I use Arthas as an online diagnostics tool for Java services. dashboard gives a global view of threads, memory, GC, and runtime status. thread -n 3 shows the top CPU-consuming threads, and thread -b helps identify deadlocks. watch can observe method parameters, return values, and exceptions; trace shows method-level call latency; sc -d inspects loaded classes and their classloaders; jad decompiles loaded classes; and profiler start/stop can generate flame graphs for performance analysis.
  2. For an online OOM issue, I would first use dashboard to identify whether the pressure is in heap, metaspace, or another memory area. If it is a heap problem, I would use heapdump --live /tmp/dump.hprof to dump live objects and then analyze the dump in MAT. I would focus on dominant retainers, large object graphs, and GC root paths. If I cannot safely dump immediately, I may use Arthas commands such as vmtool --action getInstances --className ... --limit ... to inspect suspicious instances online.
  3. If I suspect metaspace or classloader leakage, I would use sc -d to check class information and classloader distribution. This is useful for services that dynamically generate classes through proxies, rule engines, or script engines.
  4. Arthas is not only for OOM. It is also useful for slow method diagnosis, live parameter observation, deadlock analysis, and urgent production verification. However, commands such as trace, watch, and heap dump can affect performance, so I would scope them narrowly and run heavier operations during a low-traffic window when possible.

Q9. 说说 JVM 中的三色标记法?CMS 和 G1 分别怎么处理并发标记时的对象变更?

EN: Explain the tri-color marking algorithm. How do CMS and G1 handle object mutations during concurrent marking?

难度: ★★★★★ | 出现频率: 高(字节、蚂蚁、美团)

Key Terms: tri-color marking (三色标记法), white/gray/black (白色/灰色/黑色), SATB (快照标记), incremental update (增量更新), write barrier (写屏障)

答案要点:

  1. 三色标记法:白色(未访问)→ 灰色(已访问但引用未处理完)→ 黑色(已访问且引用处理完)。GC 结束后白色对象被回收
  2. 漏标问题:并发标记时,如果黑色对象新增了对白色对象的引用,且灰色对象到白色对象的引用被删除 → 白色对象被错误回收
  3. CMS 解决方案:Incremental Update(增量更新):写屏障记录"黑色→白色"的新引用,将这些黑色对象重新标记为灰色。缺点:可能产生浮动垃圾
  4. G1 解决方案:SATB(Snapshot-At-The-Beginning):写屏障记录被删除的旧引用,在 GC 结束时重新扫描这些引用指向的对象。效率更高但可能保留更多浮动垃圾

常见误区:

  • ❌ 三色标记法是某种 GC 算法 → ✅ 三色标记法是一种可达性分析的标记策略,CMS、G1、ZGC 都使用它,只是处理并发标记时的策略不同
  • ❌ SATB 不会产生浮动垃圾 → ✅ SATB 通过保留标记开始时的快照引用,会保留一些实际已不可达的对象到下一次 GC
  • ❌ Tri-color marking is a specific GC algorithm → ✅ Tri-color marking is a reachability analysis marking strategy used by CMS, G1, and ZGC — they differ only in how they handle concurrent mutations
  • ❌ SATB does not produce floating garbage → ✅ SATB preserves a snapshot of references at the start of marking, retaining some actually unreachable objects until the next GC cycle

延伸追问:

  • SATB 为什么比 Incremental Update 效率更高?
  • ZGC 怎么处理并发标记时的引用变更?
  • Why is SATB more efficient than Incremental Update?
  • How does ZGC handle reference mutations during concurrent marking?

风控关联:

  • 理解三色标记和写屏障机制有助于选择和调优风控服务的 GC 策略。风控服务请求量大、对象分配频繁,SATB 的写屏障开销需要纳入性能考量
  • Understanding tri-color marking and write barrier mechanisms helps in selecting and tuning GC strategies for risk control services. Risk control services have high request volumes and frequent object allocations, so SATB write barrier overhead must be factored into performance considerations
  • 关联 实时风控引擎

English Answer:

  1. Tri-color marking is a reachability marking strategy rather than a standalone garbage collector. White means an object has not been visited yet, gray means the object has been visited but its references have not been fully scanned, and black means the object and all references reachable from it have been processed. At the end of marking, remaining white objects are considered unreachable and can be reclaimed.
  2. The main risk during concurrent marking is missed marking. A live white object can be missed if two things happen at the same time: a black object creates a new reference to the white object, and a gray object deletes its old reference to that same white object. The marker may never scan the new reference from the black object, and the old path from the gray object disappears, so a live object could be incorrectly reclaimed.
  3. CMS handles this with incremental update. A write barrier records new references from black objects to white objects and makes the relevant black objects gray again, so the marker can rescan them. This preserves correctness but may still leave floating garbage, because objects that become unreachable during concurrent marking may have to wait until the next GC cycle.
  4. G1 uses SATB, or Snapshot-At-The-Beginning. Its write barrier records old references before they are overwritten or deleted, so the collector can preserve a logical snapshot of the object graph as it existed when marking began. This is efficient for G1's region-based concurrent marking, but it can retain more floating garbage because objects reachable at snapshot time may be kept even if they become unreachable later.

Q10. JDK 9+ 模块化系统(JPMS)对类加载有什么影响?Spring Boot 兼容吗?

EN: How does the JPMS module system affect class loading? Is Spring Boot compatible with it?

难度: ★★★★ | 出现频率: 中高(字节、阿里)

Key Terms: module-info.java (模块描述文件), module path (模块路径), classpath (类路径), unnamed module (未命名模块), automatic module (自动模块), --add-opens (模块开放参数)

答案要点:

  1. JPMS:通过 module-info.java 声明模块的 exports 和 requires,实现强封装
  2. 影响:反射访问非 public API 被限制(如 sun.misc.Unsafe),需要 --add-opens 参数
  3. Spring Boot 兼容性:Spring Boot 3.x 基于 Spring Framework 6,需要 JDK 17+,大量使用 --add-opens 解决模块访问限制。Spring Boot 的 classpath scanning 工作在 unnamed module 模式
  4. 实际影响:多数应用不需要写 module-info.java,按 classpath 模式运行即可

常见误区:

  • ❌ 升级到 JDK 9+ 就必须写 module-info.java → ✅ 绝大多数应用仍按 classpath 模式运行在 unnamed module,无需编写 module-info
  • ❌ Spring Boot 无法在 JPMS 下运行 → ✅ Spring Boot 3.x 完全支持 JDK 17+,通过 --add-opens 参数处理模块访问限制
  • ❌ Upgrading to JDK 9+ requires writing module-info.java → ✅ The vast majority of applications still run in classpath mode as unnamed modules without needing module-info
  • ❌ Spring Boot cannot run under JPMS → ✅ Spring Boot 3.x fully supports JDK 17+ by using --add-opens flags to handle module access restrictions

延伸追问:

  • --add-opens--add-exports 有什么区别?
  • 如果一个项目想完全模块化,迁移步骤是什么?
  • What is the difference between --add-opens and --add-exports?
  • What are the migration steps to fully modularize a project?

风控关联:

  • 风控服务依赖的第三方库(如 Drools、FastJSON)可能使用反射访问内部 API,升级 JDK 版本时需要关注 JPMS 带来的模块访问限制,提前配置 --add-opens 参数
  • Third-party libraries used by risk control services (e.g., Drools, FastJSON) may use reflection to access internal APIs. When upgrading JDK versions, pay attention to JPMS module access restrictions and configure --add-opens flags in advance
  • 关联 实时风控引擎

English Answer:

  1. JPMS, introduced in JDK 9, adds a module system based on module-info.java. A module explicitly declares what it requires and what packages it exports. This gives Java stronger encapsulation than the traditional classpath model, because code outside a module cannot freely access non-exported packages.
  2. The biggest practical impact is on reflection and internal APIs. Reflective access to non-public members or JDK internals can be restricted, so frameworks or legacy libraries may need JVM flags such as --add-opens or --add-exports. --add-opens opens a package for deep reflection at runtime, while --add-exports exports a package for compile-time or normal access.
  3. Spring Boot is compatible with modern JDKs. Spring Boot 3.x is based on Spring Framework 6 and requires JDK 17+. In most applications it still runs on the traditional classpath as an unnamed module, and classpath scanning works normally in that mode. When a framework or library needs reflective access blocked by JPMS, the common solution is to configure the necessary --add-opens flags.
  4. In real projects, upgrading to JDK 9+ does not mean every application must write module-info.java. Most Spring Boot services can continue using the classpath model. Full modularization is a separate migration decision and requires checking dependencies, reflection usage, exported packages, and build configuration carefully.

关联

  • 并发编程 — 锁机制与 JVM 对象头密切相关
  • MySQL — 数据库连接池的 JVM 内存占用
  • 实时风控引擎 — GC 调优直接影响风控决策延迟