MySQL 面试题
覆盖 InnoDB 存储引擎、索引原理、事务与锁、SQL 优化、分库分表等高频考点。
每道题包含中英双语答案、代码示例、常见误区和风控关联。
相关页面: [[Redis]] | [[分布式系统]] | [[Java基础]]
Q1. InnoDB 为什么选择 B+ 树作为索引结构?和 B 树、Hash 索引对比有什么优劣?
EN: Why does InnoDB choose B+ tree as its index structure? What are the advantages and disadvantages compared to B tree and Hash indexes?
难度: ★★★★ | 出现频率: 极高(阿里、美团、字节)
Key Terms: B+ Tree (B+ 树), B Tree (B 树), Hash Index (哈希索引), Fan-out (扇出), Range Query (范围查询)
答案要点:
- B+ 树 vs B 树:
- B+ 树非叶子节点只存 key,不存 data,单个页(默认 16KB)能容纳更多 key,树更矮(通常 3-4 层即可表示千万级数据) - B+ 树所有数据都在叶子节点,叶子节点通过双向链表连接,范围查询效率极高(只需顺序扫描链表) - B 树每个节点都存 data,导致 fan-out 小,树更高,随机 IO 更多 - B+ 树查询路径长度一致,性能稳定(always 到叶子节点)
- B+ 树 vs Hash 索引:
- Hash 索引等值查询 O(1) 极快,但不支持范围查询、排序、最左匹配 - Hash 不支持部分列匹配(联合索引只能全列匹配) - Hash 索引无法利用索引排序(ORDER BY) - InnoDB 的自适应哈希索引(AHI)是自动优化的,非显式索引
- 为什么不用跳表 / 红黑树:
- 红黑树是二叉树,树高远大于 B+ 树,IO 次数多 - 跳表在内存数据库(如 Redis)中效果好,但磁盘场景下 B+ 树的页对齐更利于减少 IO
- 层数计算:
- 假设非叶子节点每个 key+pointer 约 12 字节,16KB 页约存 1365 个 pointer - 3 层 B+ 树:1365 × 1365 × 100(每页约 100 行)≈ 1.86 亿行
常见误区:
- ❌ B+ 树查询一定比 Hash 快 → ✅ 等值场景 Hash 更快,B+ 树胜在范围查询和排序
- ❌ B 树和 B+ 树的叶子节点特性相同 → ✅ B+ 树叶子节点通过双向链表连接,B 树不是
- ❌ 索引层数和数据量是线性关系 → ✅ 实际是对数关系,3-4 层即可支撑千万~亿级数据
- ❌ B+ tree queries are always faster than Hash lookups → ✅ Hash is faster for exact-match queries; B+ tree excels at range queries and sorting
- ❌ B tree and B+ tree have the same leaf node characteristics → ✅ B+ tree leaf nodes are linked via a doubly-linked list; B tree nodes are not
- ❌ Index depth scales linearly with data volume → ✅ The relationship is logarithmic — 3-4 levels suffice for hundreds of millions of rows
延伸追问:
- 一棵高度为 3 的 B+ 树大概能存多少行数据?如何计算?
- InnoDB 一个页的大小是多少?可以修改吗?
- 自适应哈希索引(AHI)在什么场景下会生效?什么情况下应该关闭?
- How many rows can a 3-level B+ tree approximately hold? Walk through the calculation.
- What is the default InnoDB page size, and can it be changed?
- Under what conditions does the Adaptive Hash Index (AHI) take effect, and when should it be disabled?
风控关联:
- 风控系统的交易表通常有千万~亿级行数据,B+ 树 3-4 层即可命中,保证单次查询在毫秒级。理解索引层数对查询延迟的影响,是评估实时风控 SLA 的基础。
- Risk control system transaction tables typically contain hundreds of millions of rows. A 3-4 level B+ tree ensures single-digit millisecond query latency. Understanding how index depth affects query latency is fundamental to evaluating real-time risk control SLAs.
English Answer:
- InnoDB chooses B+ tree because it matches disk-page storage and range-query workloads. Compared with a B tree, a B+ tree stores only keys and child pointers in non-leaf nodes, while actual row data or primary-key references are stored in leaf nodes. Because internal nodes do not store full row data, each 16KB InnoDB page can hold more keys and pointers, giving the tree a higher fan-out and a lower height. In practice, three to four levels can support tens or hundreds of millions of rows.
- B+ tree leaf nodes are linked in key order, typically through a doubly linked list. This makes range queries efficient: after locating the first matching leaf node, InnoDB can scan leaf pages sequentially. A B tree may store data in both internal and leaf nodes, so fan-out is smaller, the tree tends to be taller, and range scanning is less naturally sequential.
- Compared with a Hash index, a B+ tree is not the fastest for pure equality lookup, because Hash lookup can be O(1) in ideal cases. But Hash indexes do not support range queries, ordered scans,
ORDER BY, or leftmost-prefix matching on composite indexes. InnoDB's Adaptive Hash Index is an internal optimization built automatically for hot access patterns; it is not the same as a user-defined explicit hash index. - Red-black trees and skip lists are less suitable for InnoDB's disk-based storage. A red-black tree is a binary tree and has much greater height, causing more random IO. Skip lists work well in memory systems such as Redis, but B+ trees align better with fixed-size database pages and minimize disk reads.
- A rough capacity calculation illustrates the point. If each key plus pointer is about 12 bytes, a 16KB page can store about 1,365 pointers. With three levels, the tree can address roughly
1365 * 1365 * 100, or about 186 million rows if each leaf page stores around 100 records. This is why index depth grows logarithmically rather than linearly with data volume.
Q2. 聚簇索引和非聚簇索引(二级索引)有什么区别?什么是回表?
EN: What are the differences between clustered index and non-clustered index (secondary index)? What is a bookmark lookup (table lookup)?
难度: ★★★★ | 出现频率: 极高(阿里、美团、字节、京东)
Key Terms: Clustered Index (聚簇索引), Secondary Index (二级索引), Bookmark Lookup (回表), Primary Key (主键), Covering Index (覆盖索引)
答案要点:
- 聚簇索引(Clustered Index):
- InnoDB 的聚簇索引就是主键索引,叶子节点存储完整的行数据 - 一张表只能有一个聚簇索引(数据按主键顺序物理存储) - 如果没有定义主键,InnoDB 选择第一个唯一非空索引;都没有则自动生成 6 字节隐藏的 DB_ROW_ID
- 非聚簇索引(Secondary Index / 二级索引):
- 叶子节点存储主键值而非完整行数据 - 查询时如果需要的列不在二级索引中,需要通过主键值回到聚簇索引查找完整行 → 回表
- 回表的代价:
- 一次二级索引查询 = 二级索引 B+ 树查找 + 聚簇索引 B+ 树查找 - 如果扫描大量行且需要回表,代价可能比全表扫描还高(优化器可能放弃索引选择全扫)
- 主键选择建议:
- 推荐自增主键:顺序写入,页分裂少,二级索引叶子存的主键值也小 - 不推荐 UUID:随机写入导致频繁页分裂,且 36 字节占用空间远大于 BIGINT 的 8 字节
代码示例:
-- 创建表
CREATE TABLE trade_order (
id BIGINT AUTO_INCREMENT PRIMARY KEY, -- 聚簇索引
order_no VARCHAR(32) NOT NULL,
user_id BIGINT NOT NULL,
amount DECIMAL(12,2),
status TINYINT,
created_at DATETIME,
UNIQUE KEY uk_order_no (order_no), -- 二级唯一索引
KEY idx_user_id (user_id) -- 二级普通索引
) ENGINE=InnoDB;
-- 走二级索引,不需要回表(覆盖索引)
SELECT user_id FROM trade_order WHERE user_id = 10086;
-- 走二级索引 idx_user_id,但需要回表取 amount
SELECT user_id, amount FROM trade_order WHERE user_id = 10086;
常见误区:
- ❌ 所有索引都存储完整行数据 → ✅ 只有聚簇索引(主键索引)存储完整行数据,二级索引只存主键值
- ❌ 非聚簇索引一定比全表扫描快 → ✅ 取决于回表次数和表的行数,回表代价高时优化器可能选择全表扫描
- ❌ InnoDB 可以有多个聚簇索引 → ✅ 一张表只能有一个聚簇索引(即主键)
- ❌ All indexes store complete row data → ✅ Only the clustered index (primary key) stores full rows; secondary indexes store only the primary key value
- ❌ A non-clustered index is always faster than a full table scan → ✅ It depends on the number of bookmark lookups and table size — the optimizer may choose a full scan when lookup cost is high
- ❌ InnoDB can have multiple clustered indexes → ✅ A table can only have one clustered index (the primary key)
延伸追问:
- 为什么不建议用 UUID 做主键?对二级索引有什么影响?
- 联合主键
(a, b)的聚簇索引,二级索引叶子存的是什么? - InnoDB 中唯一索引和普通索引在查询和写入时的区别?
- Why is UUID not recommended as a primary key? How does it affect secondary indexes?
- For a composite primary key
(a, b), what does the secondary index leaf node store? - What is the difference between a unique index and a regular index in InnoDB for both reads and writes?
风控关联:
- 风控交易表使用自增主键保证写入性能,同时
user_id二级索引支持按用户查询交易记录。理解回表机制有助于设计覆盖索引,避免高频风控查询的性能陷阱。 - Risk control transaction tables use auto-increment primary keys for write performance, with
user_idsecondary indexes to support per-user queries. Understanding bookmark lookups helps design covering indexes and avoid performance pitfalls in high-frequency risk control queries.
English Answer:
- In InnoDB, the clustered index is the primary-key index, and its leaf nodes store the complete row data. Because table data is organized by the clustered index order, a table can have only one clustered index. If a table has no explicit primary key, InnoDB chooses the first unique non-null index; if none exists, it creates a hidden 6-byte
DB_ROW_ID. - A secondary index, also called a non-clustered index, does not store the full row in its leaf nodes. Its leaf nodes store the indexed columns plus the primary-key value. Therefore, if a query needs columns that are not contained in the secondary index, InnoDB first finds the primary key in the secondary-index B+ tree and then uses that primary key to look up the full row in the clustered-index B+ tree. This second lookup is called a bookmark lookup, or 回表.
- The cost of bookmark lookup is important. One secondary-index query may require two B+ tree traversals: one on the secondary index and one on the clustered index. If the query scans many secondary-index entries and each one needs a bookmark lookup, the cost can exceed a full table scan, and the optimizer may decide not to use the index.
- A covering index avoids bookmark lookup when all columns required by the query are already in the index. For example,
SELECT user_id FROM trade_order WHERE user_id = ?can be served directly by an index onuser_id, whileSELECT user_id, amount ...needs a lookup unlessamountis also included in the index. - For primary-key design, an auto-increment
BIGINTis usually recommended because it writes sequentially, reduces page splits, and keeps secondary-index leaf entries small. UUID primary keys are usually a bad default for InnoDB because random insertion causes page splits and the large key value is copied into every secondary index.
Q3. 什么是覆盖索引、索引下推(ICP)和最左匹配原则?
EN: What are covering index, index condition pushdown (ICP), and the leftmost prefix rule?
难度: ★★★★★ | 出现频率: 极高(阿里、美团、字节、快手)
Key Terms: Covering Index (覆盖索引), Index Condition Pushdown (索引下推), Leftmost Prefix Rule (最左匹配原则), Composite Index (联合索引), Extra (额外信息)
答案要点:
- 最左匹配原则(Leftmost Prefix Rule):
- 联合索引 (a, b, c) 相当于创建了 (a), (a, b), (a, b, c) 三个索引 - 查询条件必须从最左列开始匹配,遇到范围查询(>, <, BETWEEN, LIKE)后停止匹配 - 索引列顺序不影响 WHERE 匹配(优化器会重排),但影响索引的存储结构
```sql INDEX idx_abc (a, b, c)
-- 能用索引:a | a,b | a,b,c | a,c(a 列能用) WHERE a = 1 AND b = 2 AND c = 3 -- 全部匹配 ✓ WHERE a = 1 AND b = 2 -- 匹配 a,b ✓ WHERE a = 1 AND c = 3 -- 只匹配 a(c 无法跳过 b)✓(部分) WHERE b = 2 AND c = 3 -- 不匹配 ✗ WHERE a = 1 AND b > 2 AND c = 3 -- 匹配 a,b(c 在范围后无法用)✓(部分) ```
- 覆盖索引(Covering Index):
- 查询的所有列都在索引中,不需要回表 - EXPLAIN 中 Extra 显示 Using index - 是提升查询性能的重要手段
```sql INDEX idx_user_status_time (user_id, status, created_at)
-- 覆盖索引:查询列全在索引中 SELECT status, created_at FROM trade_order WHERE user_id = 10086 AND status = 1; -- Extra: Using index ```
- 索引下推(Index Condition Pushdown, ICP, MySQL 5.6+):
- 在存储引擎层对索引中包含的列进行条件过滤,减少回表次数 - 没有ICP:存储引擎根据索引找到主键 → 全部回表 → Server层过滤 - 有ICP:存储引擎在索引中先过滤不满足条件的 → 只对满足条件的主键回表 - EXPLAIN 中 Extra 显示 Using index condition
```sql INDEX idx_user_status_time (user_id, status, created_at)
-- name 列不在索引中,但 status 在索引中 -- ICP:在索引中先用 status 过滤,减少回表 SELECT * FROM trade_order WHERE user_id = 10086 AND status = 1 AND name LIKE '%风险%'; ```
常见误区:
- ❌
WHERE a = 1 AND c = 3完全不能命中联合索引(a, b, c)→ ✅ a 列仍能用索引定位 - ❌ 索引列顺序不影响查询性能 → ✅ 顺序影响索引的选择性和存储效率
- ❌ 覆盖索引和索引下推是同一概念 → ✅ 覆盖索引是"不需要回表",索引下推是"减少回表次数"
- ❌
WHERE a = 1 AND c = 3cannot use a composite index(a, b, c)at all → ✅ Column a can still be used for index lookup - ❌ Index column order does not affect query performance → ✅ Order affects selectivity and storage efficiency of the index
- ❌ Covering index and index condition pushdown are the same concept → ✅ Covering index means "no bookmark lookup needed"; ICP means "fewer bookmark lookups"
延伸追问:
- 联合索引
(a, b, c),WHERE a IN (1,2) AND b = 3 AND c = 4能用到索引的哪些列? - ORDER BY 能否利用索引排序?什么条件下可以避免 filesort?
- MySQL 8.0 的降序索引(Descending Index)解决了什么问题?
- For composite index
(a, b, c), which columns can be used byWHERE a IN (1,2) AND b = 3 AND c = 4? - Can ORDER BY leverage index sorting? Under what conditions can filesort be avoided?
- What problem does MySQL 8.0's Descending Index solve?
风控关联:
- 风控查询频繁使用
user_id + event_type + created_at的联合索引。理解最左匹配和覆盖索引,可以设计出既支持实时查询又避免回表的高效索引组合,降低 P99 延迟。 - Risk control queries frequently use a composite index on
user_id + event_type + created_at. Understanding the leftmost prefix rule and covering indexes enables designing efficient index combinations that support real-time queries without bookmark lookups, reducing P99 latency.
English Answer:
- The leftmost prefix rule means a composite index
(a, b, c)is ordered first bya, then byb, then byc. It can support predicates ona,(a, b), and(a, b, c). A query on(b, c)alone cannot use the index effectively because it skips the leftmost column. ForWHERE a = 1 AND c = 3, MySQL can still use theapart of the index, but it cannot jump overbto usecfor precise index positioning. - Range conditions affect how many index columns can be used for search. With
WHERE a = 1 AND b > 2 AND c = 3, MySQL can useaandbfor the index range, butccan no longer be used for continuing the ordered index match after the range condition. The optimizer may reorder equivalentWHEREpredicates logically, but the physical column order of the composite index still determines the searchable prefix. - A covering index means all columns required by the query, including selected columns and filter/order columns, are contained in the index. In this case, InnoDB does not need to perform bookmark lookups into the clustered index. In
EXPLAIN, this usually appears asUsing index. It is one of the most effective optimizations for high-frequency read queries. - Index Condition Pushdown, or ICP, is different from covering index. ICP lets the storage engine evaluate conditions on columns that are available in the index before returning rows to the MySQL Server layer. Without ICP, the engine may find primary keys through the index, fetch full rows, and then let the Server layer filter. With ICP, rows that fail index-available conditions are filtered earlier, reducing bookmark lookups.
EXPLAINshows this asUsing index condition. - In risk-control queries, a composite index such as
(user_id, event_type, created_at)should be designed around the most common access path. If the query can be covered by the index, P99 latency can drop because InnoDB avoids repeated random lookups into the clustered index.
Q4. MySQL 的四种事务隔离级别分别解决什么问题?InnoDB 的 MVCC 是如何实现的?
EN: What do the four transaction isolation levels in MySQL solve? How is InnoDB's MVCC implemented?
难度: ★★★★★ | 出现频率: 极高(阿里、美团、字节、腾讯)
Key Terms: Isolation Level (隔离级别), MVCC (多版本并发控制), Undo Log (回滚日志), Read View (读视图), Dirty Read (脏读), Phantom Read (幻读)
答案要点:
- 四种隔离级别:
| 隔离级别 | 脏读 | 不可重复读 | 幻读 | 实现方式 |
|---|---|---|---|---|
| READ UNCOMMITTED | 可能 | 可能 | 可能 | 无特殊处理 |
| READ COMMITTED (RC) | 不会 | 可能 | 可能 | MVCC(每次读生成新 Read View) |
| REPEATABLE READ (RR) | 不会 | 不会 | 可能 | MVCC(事务首次读生成 Read View) |
| SERIALIZABLE | 不会 | 不会 | 不会 | 所有读加共享锁 |
- InnoDB 默认 RR,通过 MVCC + 间隙锁在很大程度上避免幻读
- MVCC 核心机制(基于 Undo Log + Read View):
- 隐藏列:每行数据有两个隐藏列 DB_TRX_ID(最后修改的事务ID)、DB_ROLL_PTR(回滚指针指向 Undo Log) - Undo Log 版本链:通过 DB_ROLL_PTR 将数据的多个版本串成链表 - Read View:事务开始时创建的一致性快照,包含: - m_ids:创建 Read View 时所有活跃(未提交)事务 ID 列表 - min_trx_id:m_ids 中最小的事务 ID - max_trx_id:下一个将要分配的事务 ID(当前最大事务 ID + 1) - creator_trx_id:创建该 Read View 的事务 ID - 可见性判断规则: - DB_TRX_ID == creator_trx_id → 自己修改的,可见 - DB_TRX_ID < min_trx_id → 在 Read View 创建前已提交,可见 - DB_TRX_ID >= max_trx_id → 在 Read View 创建后才开始的事务,不可见 - min_trx_id <= DB_TRX_ID < max_trx_id → 检查是否在 m_ids 中:在则不可见(未提交),不在则可见(已提交)
- RC vs RR 的 MVCC 区别:
- RC:每次 SELECT 都生成新的 Read View → 能看到其他事务已提交的最新数据 - RR:只在事务第一次 SELECT 时生成 Read View → 整个事务期间看到的数据一致
代码示例:
-- 查看当前隔离级别
SELECT @@transaction_isolation;
-- 设置隔离级别
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- 查看正在运行的事务
SELECT * FROM information_schema.INNODB_TRX;
常见误区:
- ❌ RR 隔离级别完全杜绝幻读 → ✅ 快照读通过 MVCC 避免,但当前读需要间隙锁配合
- ❌ RC 和 RR 的 Read View 生成时机相同 → ✅ RC 每次 SELECT 生成新的,RR 只在事务首次 SELECT 生成
- ❌ MVCC 能完全替代锁 → ✅ 写操作仍然需要锁机制,MVCC 只解决读的并发问题
- ❌ RR isolation level completely eliminates phantom reads → ✅ Snapshot reads avoid phantoms via MVCC, but current reads still require gap locks
- ❌ RC and RR generate Read Views at the same time → ✅ RC creates a new Read View per SELECT; RR creates one only at the first SELECT in the transaction
- ❌ MVCC can completely replace locks → ✅ Write operations still require locking; MVCC only solves read concurrency issues
延伸追问:
- RR 级别下,事务 A 先 SELECT(快照读),事务 B INSERT 一条新行并提交,事务 A 再 SELECT 能看到吗?如果用
SELECT ... FOR UPDATE呢? - MVCC 能否解决 Lost Update 问题?怎么解决?
- 为什么互联网公司很多选择 RC 而不是默认的 RR?
- Under RR, if transaction A does a snapshot SELECT, transaction B INSERTs and commits a new row, will transaction A see it on a second SELECT? What about with
SELECT ... FOR UPDATE? - Can MVCC solve the Lost Update problem? How should it be handled?
- Why do many internet companies choose RC instead of the default RR isolation level?
风控关联:
- 风控系统的事务处理涉及账户余额更新、风险决策记录等,需要保证一致性。理解 MVCC 机制有助于排查风控决策中的数据一致性问题,避免因隔离级别选择不当导致的误判。
- Risk control system transactions involve account balance updates and risk decision records that require consistency. Understanding MVCC helps troubleshoot data consistency issues in risk decisions and avoid false judgments caused by improper isolation level selection.
English Answer:
- MySQL supports four isolation levels.
READ UNCOMMITTEDmay allow dirty reads, non-repeatable reads, and phantom reads.READ COMMITTEDprevents dirty reads, but differentSELECTstatements in the same transaction may see different committed versions, so non-repeatable reads can still happen.REPEATABLE READis InnoDB's default; it prevents non-repeatable reads for snapshot reads by keeping a consistent Read View for the transaction.SERIALIZABLEis the strongest level and prevents these anomalies by adding shared locks to reads, but it has the highest concurrency cost. - InnoDB implements MVCC through hidden columns, Undo Log version chains, and Read Views. Each row has hidden metadata such as
DB_TRX_ID, which records the transaction that last modified the row, andDB_ROLL_PTR, which points to the previous version in the Undo Log. Multiple old versions form a version chain. - A Read View is a consistency snapshot. It records the active uncommitted transaction IDs at the time the snapshot is created, including
m_ids,min_trx_id,max_trx_id, andcreator_trx_id. Visibility is decided by comparing the row version'sDB_TRX_IDwith these values. The creator's own changes are visible; versions committed beforemin_trx_idare visible; versions created aftermax_trx_idare invisible; versions between them depend on whether their transaction ID is still active inm_ids. - The key difference between RC and RR is Read View creation time. Under RC, every consistent
SELECTcreates a new Read View, so it can see data committed by other transactions before that statement starts. Under RR, the first consistent read in the transaction creates the Read View and later snapshot reads reuse it, so the transaction sees a stable snapshot. - MVCC mainly solves read concurrency; it does not replace locks for writes. InnoDB also uses next-key locks and gap locks under RR for current reads such as
SELECT ... FOR UPDATEto reduce phantom-read problems. For account updates and risk-decision writes, choosing the isolation level incorrectly can cause consistency bugs or unnecessary lock contention.
Q5. InnoDB 的行锁、间隙锁和临键锁分别是什么?在什么场景下会触发?
EN: What are InnoDB's record locks, gap locks, and next-key locks? Under what scenarios are they triggered?
难度: ★★★★★ | 出现频率: 高(阿里、美团、字节)
Key Terms: Record Lock (行锁/记录锁), Gap Lock (间隙锁), Next-Key Lock (临键锁), Deadlock (死锁), Lock Escalation (锁升级)
答案要点:
- 行锁(Record Lock):
- 锁定索引记录本身(注意:不是锁定数据行,InnoDB 的锁都是加在索引上的) - 精确匹配到一条存在的记录时加行锁 - 没有索引时退化为表锁(这是经典坑点)
- 间隙锁(Gap Lock):
- 锁定索引记录之间的间隙,防止其他事务在间隙中插入新记录 - 只在 RR 隔离级别下生效(RC 下没有间隙锁) - 间隙锁之间不冲突,冲突的是插入操作 - 目的:解决幻读问题(当前读场景)
- 临键锁(Next-Key Lock):
- = 行锁 + 间隙锁,锁定一条记录及其前面的间隙 - InnoDB 在 RR 级别下的默认加锁方式 - 是左开右闭区间 (上一个记录, 当前记录]
- 加锁规则(总结,MySQL 8.0):
- 加锁的基本单位是 Next-Key Lock - 查找过程中访问到的对象才会加锁 - 等值查询中,唯一索引匹配到记录时,Next-Key Lock 退化为行锁 - 等值查询中,最后一个不满足条件的值,Next-Key Lock 退化为间隙锁 - 范围查询中,会对扫过的范围加 Next-Key Lock
代码示例:
-- 表结构:id 主键,age 有索引,name 无索引
-- 数据:id=5(age=10), id=10(age=20), id=15(age=30)
-- 事务 A
BEGIN;
SELECT * FROM t WHERE age = 20 FOR UPDATE;
-- 加锁:id=10 的行锁 + (10, 20) 间隙锁 + (20, 30) 间隙锁
-- 即 Next-Key Lock (10, 20] + Gap Lock (20, 30)
-- 事务 B(会阻塞)
INSERT INTO t VALUES (8, 15); -- 被间隙锁 (10, 20) 阻塞
INSERT INTO t VALUES (12, 25); -- 被间隙锁 (20, 30) 阻塞
INSERT INTO t VALUES (3, 5); -- 不阻塞,在 (10, 20) 之外
常见误区:
- ❌ InnoDB 的行锁是锁数据行 → ✅ InnoDB 的锁加在索引记录上,没有索引时退化为表锁
- ❌ 没有命中索引的 UPDATE 只锁一行 → ✅ 没有索引会锁全表
- ❌ 间隙锁会阻塞读操作 → ✅ 间隙锁只阻塞 INSERT,不阻塞读
- ❌ 所有隔离级别都有间隙锁 → ✅ 间隙锁只在 RR 隔离级别下存在,RC 下不存在
- ❌ InnoDB record locks lock data rows → ✅ InnoDB locks are placed on index records; without an index, they escalate to table locks
- ❌ An UPDATE that misses the index only locks one row → ✅ Without an index, the entire table is locked
- ❌ Gap locks block read operations → ✅ Gap locks only block INSERTs, not reads
- ❌ All isolation levels have gap locks → ✅ Gap locks only exist under the RR isolation level, not under RC
延伸追问:
UPDATE t SET name = 'x' WHERE name = 'y'(name 无索引)会加什么锁?- 如何排查线上死锁?
SHOW ENGINE INNODB STATUS的死锁日志怎么看? - 如何减少死锁发生的概率?
- What locks does
UPDATE t SET name = 'x' WHERE name = 'y'acquire whennamehas no index? - How do you troubleshoot a production deadlock? How do you read the deadlock log from
SHOW ENGINE INNODB STATUS? - How can you reduce the probability of deadlocks?
风控关联:
- 风控系统中账户风控常涉及
UPDATE ... WHERE user_id = ? AND status = ?等操作,理解加锁规则可以避免因索引设计不当导致的锁升级(行锁 → 表锁)和死锁问题。高并发交易场景下,锁的粒度直接影响吞吐量。 - Risk control account operations frequently involve
UPDATE ... WHERE user_id = ? AND status = ?. Understanding locking rules prevents lock escalation (row lock to table lock) and deadlocks caused by poor index design. Lock granularity directly impacts throughput in high-concurrency transaction scenarios.
English Answer:
- InnoDB locks index records, not abstract table rows. A Record Lock locks a specific index record. When a query precisely matches an existing record through an index, InnoDB can lock that index record. If a write or current-read query does not use an index, InnoDB may scan and lock far more records, and in practice it can behave like a table-level lock from the application's point of view.
- A Gap Lock locks the gap between index records and prevents other transactions from inserting new records into that gap. Gap locks are mainly used under the
REPEATABLE READisolation level to prevent phantom rows in current-read scenarios. Gap locks do not block ordinary reads; they block inserts into the protected range. Gap locks between transactions can coexist, but insert-intention operations conflict with them. - A Next-Key Lock is a Record Lock plus the gap before that record. It locks a left-open, right-closed interval such as
(previous record, current record]. Under RR, InnoDB's default locking unit for many range scans is Next-Key Lock, because it protects both existing records and possible new records in the range. - MySQL 8.0 locking rules can be summarized as follows. The basic unit is Next-Key Lock. Only objects accessed during the search are locked. For equality lookup on a unique index that finds a matching record, the lock can degrade to a Record Lock. For equality lookup where the final searched value does not satisfy the condition, the next-key lock may degrade to a Gap Lock. Range queries lock the scanned ranges with Next-Key Locks.
- This matters in production because bad indexes enlarge lock ranges. For example,
UPDATE ... WHERE user_id = ? AND status = ?should have a matching index; otherwise a high-concurrency risk-control table may experience unnecessary blocking or deadlocks. Deadlock analysis should inspectSHOW ENGINE INNODB STATUS, the executed SQL, and the indexes used by those SQL statements.
Q6. 如何使用 EXPLAIN 分析 SQL 执行计划?各个字段分别代表什么含义?
EN: How do you use EXPLAIN to analyze SQL execution plans? What does each field mean?
难度: ★★★★ | 出现频率: 极高(阿里、美团、字节、京东)
Key Terms: EXPLAIN (执行计划), type (访问类型), key_len (索引长度), Extra (额外信息), Using index (使用覆盖索引), Filesort (文件排序)
答案要点:
- EXPLAIN 核心字段解读:
| 字段 | 含义 | 关注点 |
|---|---|---|
type |
访问类型 | 从优到差:system > const > eq_ref > ref > range > index > ALL |
key |
实际使用的索引 | 为 NULL 表示没走索引 |
key_len |
使用索引的字节数 | 判断联合索引用了几列 |
rows |
预估扫描行数 | 越小越好 |
filtered |
过滤比例 | 存储引擎返回的行在 Server 层过滤后剩余的百分比 |
Extra |
额外信息 | 重点看 Using index / Using filesort / Using temporary / Using index condition |
- type 字段详解(重点掌握):
- const:主键/唯一索引等值查询,最多一行 - eq_ref:JOIN 中被驱动表通过主键/唯一索引等值匹配 - ref:非唯一索引等值查询 - range:索引范围扫描(BETWEEN, >, <, IN) - index:全索引扫描(比 ALL 好,但仍扫描全部索引叶节点) - ALL:全表扫描 → 必须优化
- Extra 字段关键信息:
- Using index:覆盖索引,不回表,好 - Using index condition:索引下推,较好 - Using where:Server 层过滤,需关注扫描行数 - Using filesort:无法利用索引排序,需要额外排序操作 → 需要优化 - Using temporary:使用了临时表 → 需要优化 - Using join buffer (Block Nested Loop):JOIN 没有走索引 → 需要优化
代码示例:
EXPLAIN SELECT order_no, amount
FROM trade_order
WHERE user_id = 10086
AND status = 1
ORDER BY created_at DESC
LIMIT 20;
-- 理想结果:
-- type: ref
-- key: idx_user_status_time
-- key_len: 12(user_id 8字节 + status 1字节 + NULL 标志位等)
-- Extra: Using index condition; Backward index scan
常见误区:
- ❌ 只看
key是否用了索引就够了 → ✅ 还必须关注rows和Extra,走了索引但扫描大量行可能更慢 - ❌
type = index是好结果 → ✅ 实际是全索引扫描,仍需优化 - ❌ 忽略
key_len的价值 → ✅key_len可以判断联合索引使用了前几列 - ❌ Checking only whether
keyuses an index is sufficient → ✅ You must also examinerowsandExtra— an index scan over many rows may be slower than a full scan - ❌
type = indexis a good result → ✅ It actually means a full index scan, which still needs optimization - ❌ Overlooking the value of
key_len→ ✅key_lenreveals how many columns of a composite index are actually being used
延伸追问:
type = ref和type = range的区别?IN 查询是哪种 type?- 如何判断一个联合索引被使用了前几列?
- MySQL 8.0 的
EXPLAIN ANALYZE和普通 EXPLAIN 有什么区别? - What is the difference between
type = refandtype = range? Which type does an IN query produce? - How do you determine how many leading columns of a composite index are being used?
- What is the difference between MySQL 8.0's
EXPLAIN ANALYZEand a regular EXPLAIN?
风控关联:
- 风控实时查询对延迟极其敏感(通常 P99 < 100ms),通过 EXPLAIN 分析执行计划是 SQL 调优的第一步。发现
Using filesort或Using temporary通常意味着需要优化索引设计。 - Real-time risk control queries are extremely latency-sensitive (typically P99 < 100ms). EXPLAIN analysis is the first step in SQL tuning. Discovering
Using filesortorUsing temporaryusually indicates the need for index design optimization.
English Answer:
EXPLAINshows the optimizer's execution plan for a SQL statement. I first look attype,key,key_len,rows,filtered, andExtra.typedescribes the access method, roughly from good to bad:system,const,eq_ref,ref,range,index, andALL.ALLmeans full table scan and usually needs attention.indexmeans full index scan, which is better than full table scan but still scans the whole index.keyshows the index actually chosen by the optimizer. If it isNULL, no index was used.key_lenshows how many bytes of the index are used and is useful for determining how many leading columns of a composite index were matched.rowsis the estimated number of rows scanned, andfilteredestimates the percentage remaining after Server-layer filtering.Extracontains critical hints.Using indexmeans a covering index is used and no bookmark lookup is needed.Using index conditionmeans Index Condition Pushdown is used.Using wheremeans there is Server-layer filtering, which may be acceptable or problematic depending onrows.Using filesortmeans MySQL cannot use index order and must perform an extra sort.Using temporarymeans a temporary table is used.Using join bufferoften indicates a join without a suitable index.- I do not judge a plan only by whether
keyis non-null. An index can still be bad if it scans too many rows, fails to cover the query, causes many bookmark lookups, or requires filesort and temporary tables. For MySQL 8.0,EXPLAIN ANALYZEcan compare estimates with actual runtime, which is more reliable for diagnosing optimizer misestimation. - For latency-sensitive risk-control SQL, the ideal plan usually has a selective access type such as
reforrange, a suitable composite index, lowrows, and noUsing filesortorUsing temporaryon the hot path.
Q7. 一条慢 SQL 你会如何排查和优化?完整的排查流程是什么?
EN: How would you troubleshoot and optimize a slow SQL query? What is the complete troubleshooting workflow?
难度: ★★★★★ | 出现频率: 极高(阿里、美团、字节、快手)
Key Terms: Slow Query Log (慢查询日志), EXPLAIN (执行计划), Optimizer Trace (优化器追踪), Profile (性能剖析), Deep Pagination (深分页)
答案要点:
完整排查流程:
- 发现慢查询:
```sql -- 开启慢查询日志 SET GLOBAL slow_query_log = ON; SET GLOBAL long_query_time = 1; -- 超过 1 秒记录 SET GLOBAL log_queries_not_using_indexes = ON;
-- 查看慢查询配置 SHOW VARIABLES LIKE 'slow_query%'; SHOW VARIABLES LIKE 'long_query_time';
-- 使用 mysqldumpslow 分析 `` `bash mysqldumpslow -s t -t 10 /var/lib/mysql/slow.log ``
- 分析执行计划:
``sql EXPLAIN SELECT ...; -- 重点关注:type, key, rows, Extra ``
- 查看优化器选择:
``sql -- MySQL 8.0:查看优化器 trace SET optimizer_trace = 'enabled=on'; EXPLAIN SELECT ...; SELECT * FROM information_schema.OPTIMIZER_TRACE\G SET optimizer_trace = 'enabled=off'; ``
- 排查 Profile:
``sql SET profiling = ON; SELECT ...; -- 执行慢 SQL SHOW PROFILES; SHOW PROFILE FOR QUERY 1; -- 查看各阶段耗时 ``
- 优化策略(按优先级):
- 索引优化:添加/调整索引,利用覆盖索引、索引下推 - SQL 改写:避免 SELECT *、减少子查询改 JOIN、拆分复杂查询 - 分页优化:深分页 LIMIT 1000000, 10 改用 WHERE id > last_id LIMIT 10 - 数据优化:归档历史数据、分区表 - 架构优化:引入缓存(Redis)、读写分离
深分页优化示例:
-- 慢查询(深分页,扫描 100 万行后取 10 行)
SELECT * FROM trade_order ORDER BY id LIMIT 1000000, 10;
-- 优化方案 1:游标分页(推荐)
SELECT * FROM trade_order WHERE id > 1000000 ORDER BY id LIMIT 10;
-- 优化方案 2:子查询延迟关联
SELECT t.* FROM trade_order t
INNER JOIN (SELECT id FROM trade_order ORDER BY id LIMIT 1000000, 10) tmp
ON t.id = tmp.id;
常见误区:
- ❌ 发现慢 SQL 就加索引 → ✅ 可能优化器估算错误而选择错误的索引,需要先分析执行计划
- ❌ 忽略 LIMIT 深分页的性能问题 → ✅ 深分页会扫描大量行后丢弃,应使用游标分页
- ❌ 只关注执行时间 → ✅ 还需关注锁等待时间和统计信息准确性(
ANALYZE TABLE更新) - ❌ Just add an index when you find a slow query → ✅ The optimizer may estimate incorrectly and choose the wrong index; always analyze the execution plan first
- ❌ Ignoring the performance impact of deep LIMIT pagination → ✅ Deep pagination scans and discards a large number of rows; use cursor-based pagination instead
- ❌ Focusing only on execution time → ✅ You should also check lock wait time and statistics accuracy (update with
ANALYZE TABLE)
延伸追问:
- 优化器选择了错误的索引,如何处理?(FORCE INDEX / 优化索引统计信息)
- 慢查询日志文件太大怎么办?(pt-query-digest 工具)
- 线上紧急慢 SQL 如何快速止血?(Kill 连接 + 应用层降级)
- How do you handle the optimizer choosing the wrong index? (FORCE INDEX / index statistics optimization)
- What should you do when the slow query log file grows too large? (pt-query-digest tool)
- How do you quickly mitigate an urgent slow SQL in production? (Kill connection + application-level degradation)
风控关联:
- 风控系统存在大量实时查询(按用户查交易、按设备查历史),深分页场景常见。风控平台通常集成慢查询告警,P99 延迟超标触发自动降级。掌握慢 SQL 排查是风控后端开发的基本功。
- Risk control systems have numerous real-time queries (per-user transactions, per-device history), making deep pagination common. Risk control platforms typically integrate slow query alerts, with automatic degradation triggered when P99 latency exceeds thresholds. Mastering slow SQL troubleshooting is a fundamental skill for risk control backend developers.
English Answer:
- I would start by identifying the slow SQL from evidence, not from guesswork. Enable the slow query log, set an appropriate
long_query_time, and if needed enablelog_queries_not_using_indexes. Then use tools such asmysqldumpsloworpt-query-digestto aggregate the most expensive query patterns by total time, average time, and execution count. - Next, I analyze the execution plan with
EXPLAIN, focusing ontype,key,key_len,rows,filtered, andExtra. I check whether the query uses the expected index, whether the composite index prefix is fully used, whether it scans too many rows, and whetherUsing filesort,Using temporary, or excessive bookmark lookups appear. - If the optimizer chooses an unexpected index, I inspect optimizer reasoning. In MySQL 8.0,
optimizer_tracecan show cost estimates and index selection. I also check statistics freshness and may runANALYZE TABLEif statistics are stale. If necessary,FORCE INDEXcan be a temporary mitigation, but it should not replace proper index and SQL design. - Then I locate where time is spent. Profiling or
EXPLAIN ANALYZEcan reveal whether the cost is scanning, sorting, locking, network transfer, or waiting. Lock wait time and transaction conflicts must be checked because a SQL statement can be slow due to blocking even if its plan is reasonable. - Optimization should follow a priority order. First, add or adjust indexes, especially composite and covering indexes. Second, rewrite SQL: avoid
SELECT *, reduce unnecessary subqueries, split overly complex queries, and ensure joins use indexed columns. Third, fix deep pagination by using cursor pagination such asWHERE id > last_id LIMIT 10, or delayed join when necessary. Fourth, archive cold data, partition tables, or separate hot and historical tables. Finally, consider architectural changes such as Redis caching, read-write splitting, or OLAP storage for analytical queries. - For urgent production incidents, the mitigation path may include killing the offending query, temporarily disabling a feature, adding application-level degradation, or routing traffic away. After stabilization, the root cause should still be fixed through plan analysis and regression testing.
Q8. 分库分表的方案有哪些?ShardingSphere 的架构和适用场景是什么?
EN: What are the common database sharding strategies? What is ShardingSphere's architecture and when should it be used?
难度: ★★★★★ | 出现频率: 高(阿里、美团、字节)
Key Terms: Sharding (分片), ShardingSphere (分片中间件), Sharding Key (分片键), Distributed Transaction (分布式事务), Snowflake ID (雪花算法 ID)
答案要点:
- 什么时候需要分库分表:
- 单表数据量超过 2000 万行(非绝对,与行大小有关) - 单库 QPS 超过 5000-10000 - 单表数据文件超过 20GB
- 分表策略:
| 策略 | 方式 | 适用场景 | 优缺点 |
|---|---|---|---|
| 垂直分表 | 按列拆分(热点列/大字段拆出) | 宽表优化 | 减少IO,但需JOIN |
| 水平分表 | 按行拆分(相同结构的多张表) | 大数据量 | 核心方案 |
| 垂直分库 | 按业务拆分到不同库 | 微服务 | 服务解耦 |
| 水平分库 | 相同结构的多库 | 高并发+大数据 | 核心方案 |
- 分片键(Sharding Key)选择:
- 选择高基数、查询频繁的字段(如 user_id、order_id) - 避免选择会变更的字段 - 避免选择导致数据倾斜的字段(如 status 只有几个值)
- ShardingSphere 架构:
- ShardingSphere-JDBC:客户端代理,以 JDBC Jar 包形式嵌入应用 - 优点:无中间件,性能好,无单点 - 缺点:与应用耦合,升级需重启 - ShardingSphere-Proxy:独立部署的数据库代理 - 优点:对应用透明,支持异构语言 - 缺点:多一跳网络开销,有单点风险
- 分片算法:
- 精确分片:=, IN → 哈希取模 - 范围分片:BETWEEN, >, < → 按时间或 ID 范围 - 一致性哈希:扩缩容时迁移数据少
- 分库分表后的挑战:
- 跨库 JOIN → 代码层组装 / 冗余字段 / 宽表 - 分布式事务 → Seata / 本地消息表 - 全局唯一 ID → 雪花算法 - 跨库分页 → 各分片查询后合并排序 - 聚合查询 → 各分片聚合后二次聚合
代码示例:
# application-sharding.yml
sharding:
tables:
trade_order:
actual-data-nodes: ds_${0..3}.trade_order_${0..15}
database-strategy:
standard:
sharding-column: user_id
sharding-algorithm-name: trade_order_db_mod
table-strategy:
standard:
sharding-column: user_id
sharding-algorithm-name: trade_order_table_mod
sharding-algorithms:
trade_order_db_mod:
type: MOD
props:
sharding-count: 4
trade_order_table_mod:
type: MOD
props:
sharding-count: 16
常见误区:
- ❌ 分库分表是万能方案 → ✅ 引入大量复杂度(跨库JOIN、分布式事务),非必要不分
- ❌ 按
create_time分片就能解决数据分布问题 → ✅ 可能导致热点全在最新分片,数据倾斜 - ❌ 忽略分库分表后的全局排序和聚合问题 → ✅ 跨库排序和聚合需要各分片查询后合并
- ❌ Database sharding is a universal solution → ✅ It introduces significant complexity (cross-shard JOINs, distributed transactions) — avoid it unless truly necessary
- ❌ Sharding by
create_timesolves data distribution → ✅ It may cause hotspots concentrated in the latest shard, leading to data skew - ❌ Ignoring global sorting and aggregation after sharding → ✅ Cross-shard sorting and aggregation require querying all shards and merging results
延伸追问:
- 如何实现不停机在线扩容(分片数翻倍)?
- 跨库事务如何保证一致性?对比 TCC 和 Saga 方案。
- 为什么要先尝试分区表(Partition)再考虑分库分表?
- How do you implement zero-downtime online capacity expansion (doubling shard count)?
- How do cross-shard transactions guarantee consistency? Compare TCC and Saga patterns.
- Why should you try table partitioning before resorting to database sharding?
风控关联:
- 风控系统的交易表通常以
user_id为分片键,按用户维度分片。分片后跨用户聚合查询(如全局风险统计)需要单独的聚合层或 OLAP 引擎。理解分片策略对风控查询模式的影响,是设计风控数据架构的关键。 - Risk control transaction tables typically use
user_idas the sharding key for per-user partitioning. After sharding, cross-user aggregation queries (e.g., global risk statistics) require a separate aggregation layer or OLAP engine. Understanding how sharding strategies affect risk control query patterns is critical for designing risk control data architectures.
English Answer:
- Sharding is considered when a single table or single database becomes a real bottleneck, for example a table exceeding roughly 20 million rows, a single database reaching 5,000 to 10,000 QPS, or a table file growing beyond a manageable size such as 20GB. These thresholds are not absolute; row size, query pattern, hardware, and SLA matter. Because sharding introduces major complexity, I would first evaluate indexing, archiving, partitioning, and read/write separation.
- There are four common splitting strategies. Vertical table splitting separates columns, often moving large or cold fields out of a hot table. Vertical database splitting separates business domains into different databases, which is common in microservice architecture. Horizontal table splitting divides rows into multiple tables with the same schema. Horizontal database splitting distributes rows across multiple databases to handle both data volume and write throughput.
- The sharding key is critical. It should have high cardinality, be frequently used in queries, and be stable. Fields such as
user_idororder_idare common candidates. Low-cardinality fields such asstatuscause data skew, and mutable fields are dangerous because changing them may require moving data between shards. - ShardingSphere has two main deployment modes. ShardingSphere-JDBC is embedded in the application as a client-side JDBC layer. It has good performance and no extra proxy hop, but it couples sharding logic with the application and upgrades require application releases. ShardingSphere-Proxy is an independent database proxy. It is transparent to applications and supports heterogeneous languages, but adds a network hop and must be deployed highly available.
- After sharding, many hard problems appear: cross-shard joins need application-side assembly, redundant fields, or wide tables; distributed transactions may need Seata, TCC, Saga, or local-message-table patterns; global IDs often use Snowflake-style IDs; cross-shard pagination requires each shard to query and then merge-sort; aggregate queries require per-shard aggregation followed by secondary aggregation.
- For risk-control transaction tables, sharding by
user_idsupports per-user queries well, but global risk statistics should usually go to a separate aggregation layer or OLAP system. Sharding should be designed around query patterns, not just data volume.
Q9. MySQL 主从复制原理是什么?如何实现读写分离?主从延迟如何处理?
EN: What is the principle of MySQL master-slave replication? How to implement read-write splitting? How to handle replication lag?
难度: ★★★★ | 出现频率: 高(阿里、美团、字节)
Key Terms: Binlog (二进制日志), Relay Log (中继日志), GTID (全局事务标识), Read-Write Splitting (读写分离), Semi-Sync Replication (半同步复制)
答案要点:
- 主从复制原理(基于 Binlog):
- 主库:将数据变更写入 Binary Log(Binlog) - 从库 IO Thread:连接主库,拉取 Binlog 写入本地 Relay Log - 从库 SQL Thread:读取 Relay Log,回放 SQL 语句
- Binlog 格式:
- STATEMENT:记录 SQL 语句(日志小,但某些函数如 NOW() 导致不一致) - ROW:记录行变更(日志大,但数据一致性好,推荐) - MIXED:混合模式(普通语句用 STATEMENT,不确定函数用 ROW)
- 读写分离方案:
- 代码层:应用通过注解/AOP 区分读写数据源(如 ShardingSphere 的读写分离功能) - 代理层:通过中间件(MyCat / ShardingSphere-Proxy)自动路由 - 注意:写后读一致性问题 → 强制走主库 / 因果一致性方案
- 主从延迟的常见原因与处理:
| 原因 | 处理方式 |
|---|---|
| 大事务(大批量 UPDATE/DELETE) | 拆分为小批次 |
| 从库机器性能差 | 升级从库硬件 |
| 单 SQL Thread 回放慢 | 开启多线程复制(MTS) |
| 网络延迟 | 同机房部署、使用 GTID |
| 从库执行复杂查询 | 使用专用分析从库 |
```sql -- 查看主从延迟 SHOW SLAVE STATUS\G -- Seconds_Behind_Master: 延迟秒数
-- MySQL 5.7+ 开启多线程复制 STOP SLAVE; SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL slave_parallel_workers = 8; START SLAVE; ```
- GTID(Global Transaction ID):
- 格式:server_uuid:transaction_id(如 3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5) - 优点:自动定位复制位点,主从切换更简单 - 推荐:新部署环境都应开启 GTID
常见误区:
- ❌ 主从复制是同步的 → ✅ 默认是异步复制,存在延迟窗口
- ❌ 读写分离后写入立即可读 → ✅ 主从延迟导致可能读不到刚写入的数据
- ❌ Binlog 和 Redo Log 是同一个东西 → ✅ Binlog 用于复制(Server 层),Redo Log 用于崩溃恢复(引擎层)
- ❌ Master-slave replication is synchronous → ✅ By default it is asynchronous, with a replication lag window
- ❌ After write-read splitting, writes are immediately readable → ✅ Replication lag means you may not read the data just written
- ❌ Binlog and Redo Log are the same thing → ✅ Binlog is for replication (server layer), Redo Log is for crash recovery (engine layer)
延伸追问:
- 写后立即读如何保证读到主库数据?(方案:强制路由主库 / 等主从同步 / 缓存标记)
- 主库宕机如何快速切换?MHA / Orchestrator 的区别?
- 半同步复制(Semi-Sync)的原理和适用场景?
- How do you ensure reading your own writes after read-write splitting? (Solutions: force-route to master / wait for sync / cache marker)
- How do you quickly fail over when the master goes down? What are the differences between MHA and Orchestrator?
- What is the principle and use case for semi-synchronous replication?
风控关联:
- 风控系统对数据一致性要求高:写入风控决策后必须立即能读到,否则可能重复放行风险交易。常见方案是风控核心链路直接走主库,非实时查询走从库。主从切换的 RTO 和 RPO 直接影响风控 SLA。
- Risk control systems demand high data consistency: risk decisions must be immediately readable after writing, or risky transactions may be repeatedly approved. A common pattern is routing the risk control core path directly to the master, with non-real-time queries going to replicas. Master failover RTO and RPO directly impact risk control SLAs.
English Answer:
- MySQL replication is based on Binlog. The master writes committed data changes to the Binary Log. The replica's IO thread connects to the master, pulls Binlog events, and writes them into its local Relay Log. The replica's SQL thread, or multiple worker threads if multi-threaded replication is enabled, reads the Relay Log and replays the changes.
- Binlog has three common formats.
STATEMENTrecords SQL statements and produces smaller logs, but non-deterministic functions such asNOW()can cause inconsistency.ROWrecords row-level changes, produces larger logs, but is more accurate and is generally recommended.MIXEDuses statement logging for ordinary cases and row logging for unsafe cases. - Read-write splitting can be done in the application layer through annotations, AOP, or routing data sources, or in a proxy layer such as MyCat or ShardingSphere-Proxy. The main caveat is read-your-writes consistency. After a write, an immediate read from a replica may not see the new data because replication is usually asynchronous. Core paths can force reads to the master, wait for replication, or use a consistency marker.
- Replication lag has several causes: large transactions that take a long time to replay, underpowered replicas, single-threaded SQL replay, network latency, and heavy analytical queries running on replicas. Mitigations include splitting large updates or deletes into small batches, improving replica hardware, enabling multi-threaded replication with logical-clock parallelism, placing replicas in the same region, and separating analytical replicas from online-read replicas.
- GTID, or Global Transaction ID, identifies transactions as
server_uuid:transaction_id. It simplifies failover and replication-position management because a replica can identify exactly which transactions it has executed. New deployments should generally enable GTID. - For risk-control systems, the core decision path often reads and writes strongly related data. Those reads should usually go to the master or use a causal-consistency strategy, while non-real-time reports can use replicas. Failover RTO and RPO directly affect risk-control SLA.
Q10. 线上大表需要加字段或加索引,如何避免锁表?Online DDL 的原理是什么?
EN: How to add columns or indexes to a large production table without locking? What is the principle of Online DDL?
难度: ★★★★ | 出现频率: 中高(阿里、美团)
Key Terms: Online DDL (在线 DDL), ALGORITHM (算法), pt-online-schema-change (Percona 在线表变更工具), gh-ost (GitHub 在线表变更工具), Instant ADD Column (即时加列)
答案要点:
- DDL 的问题:
- 传统 DDL(ALTER TABLE)会锁表或长时间阻塞 DML - 大表 DDL 可能需要数小时,期间影响业务写入
- MySQL 5.6+ Online DDL:
- 支持在线执行 DDL,不阻塞 DML(大部分操作) - 原理: - 建立新临时表(新结构) - 增量拷贝旧表数据到新表,同时记录 DML 变更到 Online Log - 回放 Online Log 中的增量变更 - 最后短暂锁表(毫秒级),替换旧表
```sql -- Online DDL(默认 ALGORITHM=INPLACE, LOCK=NONE) ALTER TABLE trade_order ADD COLUMN risk_level TINYINT, ALGORITHM=INPLACE, LOCK=NONE;
-- 查看 DDL 进度(MySQL 8.0) SELECT * FROM performance_schema.setup_actors; -- 通过 performance_schema.processlist 监控 ```
- 不是所有 DDL 都支持 Online:
- 支持:加索引、加列(非主键)、改列类型(某些情况) - 不支持(仍会锁表):改主键、改列类型(某些)、ALGORITHM=COPY
- 第三方工具:
- pt-online-schema-change(Percona Toolkit) - 创建新表 → 创建触发器捕获变更 → 分批拷贝数据 → 交换表名 - 优点:控制拷贝速度,减少主库压力 - 缺点:触发器有额外开销,外键处理复杂
- gh-ost(GitHub) - 基于 Binlog 而非触发器捕获变更 - 优点:无触发器开销,可随时暂停/恢复 - 缺点:需要额外的从库或 Binlog 解析
- 生产操作建议:
- 在低峰期执行 - 先在从库执行验证 - 使用 --max-load 参数控制拷贝速度 - 提前测试 DDL 的锁类型和耗时
代码示例:
# pt-online-schema-change 示例
pt-online-schema-change \
--alter "ADD COLUMN risk_score DECIMAL(5,2)" \
--host=127.0.0.1 --port=3306 \
--user=admin --password=xxx \
D=risk_db,t=trade_order \
--max-load=Threads_running=50 \
--critical-load=Threads_running=100 \
--chunk-size=1000 \
--execute
# gh-ost 示例
gh-ost \
--host=127.0.0.1 --port=3306 \
--user=admin --password=xxx \
--database=risk_db --table=trade_order \
--alter="ADD COLUMN risk_score DECIMAL(5,2)" \
--max-load=Threads_running=50 \
--execute
常见误区:
- ❌ 所有
ALTER TABLE都是 Online DDL → ✅ 部分操作仍需 COPY 算法,会锁表 - ❌ Online DDL 完全无锁 → ✅ 最后 rename 阶段仍有极短暂锁
- ❌ 直接在大表上执行 DDL 不做测试 → ✅ 应先在从库验证,低峰期执行
- ❌ All
ALTER TABLEoperations are Online DDL → ✅ Some operations still require the COPY algorithm and will lock the table - ❌ Online DDL is completely lock-free → ✅ The final rename phase still involves a very brief lock
- ❌ Running DDL directly on a large table without testing → ✅ Always validate on a replica first and execute during low-traffic periods
延伸追问:
- 加字段和加索引的 Online DDL 有什么区别?
- Online DDL 期间如果 DML 量很大,Online Log 放不下了怎么办?
- MySQL 8.0 的 Instant ADD Column 原理是什么?什么条件下可以用?
- What is the difference between adding a column and adding an index via Online DDL?
- What happens if the Online Log fills up during a high-DML-volume Online DDL?
- How does MySQL 8.0's Instant ADD Column work? Under what conditions can it be used?
风控关联:
- 风控系统需要频繁迭代风控模型和策略,经常需要给交易表添加风控相关字段(如风险等级、决策结果)。使用 gh-ost 在线变更表结构,可以在不影响实时风控决策的前提下完成 DDL 操作。
- Risk control systems frequently iterate on models and strategies, often requiring new risk-related columns (e.g., risk level, decision result) on transaction tables. Using gh-ost for online schema changes allows DDL operations without impacting real-time risk control decisions.
English Answer:
- The problem with traditional
ALTER TABLEon a large table is that it may lock the table or block DML for a long time. A schema change on a hot transaction table can run for hours and directly affect writes, so production DDL must be planned, tested, throttled, and monitored. - MySQL 5.6+ Online DDL supports many operations without blocking normal DML for most of the operation. Conceptually, for operations that require rebuilding, MySQL creates a new structure, copies old data incrementally, records concurrent DML changes in an Online Log, replays those changes, and finally takes a short metadata lock to switch to the new structure. The final phase is brief, but not completely lock-free.
- Not all DDL operations are online. Adding secondary indexes and adding non-primary-key columns are often supported online, depending on version and exact syntax. Changing a primary key, changing some column types, or operations that require
ALGORITHM=COPYmay still rebuild and lock heavily. Therefore I would explicitly specify or verifyALGORITHMandLOCK, and test on a replica before running on production. - Third-party online schema change tools provide more control.
pt-online-schema-changecreates a shadow table, uses triggers to capture DML changes, copies data in chunks, and swaps table names. Its advantage is mature throttling; its downsides include trigger overhead and foreign-key complexity.gh-ostreads Binlog instead of using triggers, supports pause/resume, and is often friendlier for large online migrations, but it requires Binlog parsing and more operational setup. - MySQL 8.0 Instant ADD Column can add certain columns by changing metadata without rebuilding the table, under specific constraints. It is much faster, but it does not apply to every type of column change. In production, I still validate the actual algorithm and lock behavior before execution.
- My operating checklist is: run during low traffic, verify on a replica or staging copy, check whether the operation is online or instant, use throttling such as
--max-load, monitor replication lag and metadata locks, and prepare rollback or pause steps.
Q11. InnoDB 的 Redo Log、Undo Log 和 Binlog 分别是什么?WAL 机制如何保证持久性?
EN: What are InnoDB's Redo Log, Undo Log, and Binlog? How does the WAL mechanism ensure durability?
难度: ★★★★★ | 出现频率: 高(阿里、美团、字节)
Key Terms: Redo Log (重做日志), Undo Log (回滚日志), Binlog (二进制日志), WAL (预写式日志), Two-Phase Commit (两阶段提交), Checkpoint (检查点)
答案要点:
- 三种日志对比:
| 日志 | 存储位置 | 作用 | 写入时机 |
|---|---|---|---|
| Redo Log | InnoDB 引擎层 | 崩溃恢复,保证持久性(Durability) | 事务提交时写 |
| Undo Log | InnoDB 引擎层 | 事务回滚 + MVCC 版本链 | 数据修改前写 |
| Binlog | MySQL Server 层 | 主从复制 + 数据恢复 | 事务提交时写 |
- WAL(Write-Ahead Logging)机制:
- 先写日志(Redo Log),再写数据页(磁盘) - 数据页的修改先在 Buffer Pool 中完成(Dirty Page) - 通过后台线程异步刷盘(Checkpoint 机制) - 崩溃后通过 Redo Log 重放未持久化的修改
- 一条 UPDATE 语句的完整写入流程:
1. 读取数据页到 Buffer Pool(如不在内存中) 2. 写 Undo Log(记录旧值,用于回滚和 MVCC) 3. 更新 Buffer Pool 中的数据页(标记为 Dirty Page) 4. 写 Redo Log(prepare 阶段) 5. 写 Binlog 6. 写 Redo Log(commit 阶段)→ 两阶段提交 7. 返回客户端成功
- 两阶段提交(2PC):
- 保证 Redo Log 和 Binlog 的一致性 - 如果只有 Redo Log 没有 Binlog → 主从数据不一致 - 如果只有 Binlog 没有 Redo Log → 崩溃恢复后数据丢失 - 通过 prepare → 写 Binlog → commit 三个步骤保证
- Redo Log 关键参数:
```sql -- 查看 Redo Log 配置 SHOW VARIABLES LIKE 'innodb_log_file%'; -- 文件大小 SHOW VARIABLES LIKE 'innodb_log_files_in_group'; -- 文件数量 SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit'; -- 刷盘策略
-- innodb_flush_log_at_trx_commit 取值: -- 0:每秒刷盘(最多丢 1 秒数据) -- 1:每次提交刷盘(最安全,默认) -- 2:每次提交写 OS cache,每秒 fsync(折中) ```
常见误区:
- ❌ Redo Log 和 Binlog 是同一个东西 → ✅ Redo Log 在引擎层用于崩溃恢复,Binlog 在 Server 层用于复制
- ❌ 事务提交后数据已经写到磁盘 → ✅ 实际可能还在 Buffer Pool(Dirty Page),由 Redo Log 保证持久性
- ❌ 忽略
sync_binlog参数对数据安全的影响 → ✅sync_binlog = 0可能导致 Binlog 丢失 - ❌ Redo Log and Binlog are the same thing → ✅ Redo Log is at the engine layer for crash recovery; Binlog is at the server layer for replication
- ❌ After transaction commit, data is already on disk → ✅ It may still be in the Buffer Pool as a Dirty Page; durability is guaranteed by the Redo Log
- ❌ Ignoring the impact of
sync_binlogon data safety → ✅sync_binlog = 0may cause Binlog loss on crash
延伸追问:
innodb_flush_log_at_trx_commit = 2和sync_binlog = 0的组合安全吗?- Redo Log 写满了会怎样?(触发 Checkpoint,刷脏页)
- 为什么 Binlog 不能替代 Redo Log?(Binlog 是逻辑日志,恢复效率低且不支持随机写入)
- Is the combination of
innodb_flush_log_at_trx_commit = 2andsync_binlog = 0safe? - What happens when the Redo Log fills up? (Triggers Checkpoint, flushes dirty pages)
- Why can't Binlog replace Redo Log? (Binlog is a logical log with lower recovery efficiency and no random write support)
风控关联:
- 风控系统对数据持久性要求极高(交易风控决策不可丢失),生产环境必须设置
innodb_flush_log_at_trx_commit = 1和sync_binlog = 1。理解 WAL 机制有助于评估风控数据丢失风险和主从切换的 RPO。 - Risk control systems demand extremely high data durability (transaction risk decisions must not be lost). Production environments must set
innodb_flush_log_at_trx_commit = 1andsync_binlog = 1. Understanding WAL helps evaluate the risk of data loss and the RPO during master failover.
English Answer:
- Redo Log, Undo Log, and Binlog solve different problems. Redo Log is an InnoDB engine-layer physical log used for crash recovery and durability. Undo Log is also in the engine layer and stores old versions for transaction rollback and MVCC version chains. Binlog is a MySQL Server-layer logical log used for replication and point-in-time recovery.
- WAL means Write-Ahead Logging: write the log before flushing the data page. InnoDB modifies pages in the Buffer Pool first, marking them as dirty pages. Those dirty pages are flushed to disk later by background checkpoint mechanisms. If MySQL crashes before the dirty pages are flushed, InnoDB uses Redo Log to replay changes and recover committed data.
- A typical
UPDATEflow is: read the page into the Buffer Pool if needed; write Undo Log to preserve the old value; modify the page in the Buffer Pool and mark it dirty; write Redo Log in the prepare phase; write Binlog; write Redo Log commit; then return success to the client. Undo supports rollback and snapshot reads, while Redo guarantees crash durability. - Two-phase commit between Redo Log and Binlog is necessary to keep crash recovery and replication consistent. If Redo exists but Binlog does not, the master may recover a change that replicas never receive. If Binlog exists but Redo does not, replicas may apply a change that the master loses after crash recovery. The prepare -> Binlog -> commit sequence handles these crash windows.
- Important durability parameters include
innodb_flush_log_at_trx_commitandsync_binlog.innodb_flush_log_at_trx_commit = 1flushes Redo Log on every commit and is safest.sync_binlog = 1flushes Binlog on every commit. For risk-control decisions that must not be lost, production should prefer these safer settings, accepting the performance cost. - Binlog cannot simply replace Redo Log because Binlog is logical, generated at the Server layer, and not optimized for page-level crash recovery. Redo Log is physical and circular, designed for efficient recovery of dirty pages.
Q12. MySQL 中的 COUNT 查询为什么慢?COUNT(*)、COUNT(1)、COUNT(列名) 有什么区别?
EN: Why is COUNT query slow in MySQL? What are the differences between COUNT(*), COUNT(1), and COUNT(column)?
难度: ★★★ | 出现频率: 中高(美团、京东)
Key Terms: COUNT(*) (统计总行数), COUNT(1) (等价 COUNT(*)), MVCC (多版本并发控制), Secondary Index (二级索引), Parallel Query (并行查询)
答案要点:
- 三种 COUNT 的区别:
| 语句 | 含义 | 是否统计 NULL |
|---|---|---|
COUNT(*) |
统计总行数 | 是(NULL 行也算) |
COUNT(1) |
等价于 COUNT(*) |
是 |
COUNT(列名) |
统计该列非 NULL 的行数 | 否(NULL 行不计) |
- COUNT(*) 和 COUNT(1) 完全等价,InnoDB 内部实现一样 - InnoDB 不会缓存总行数(不像 MyISAM),因为 MVCC 下不同事务看到的行数不同
- 为什么 COUNT 慢:
- InnoDB 的 COUNT 需要遍历索引或数据行,因为不同事务看到的版本不同 - 优化器会选择最小的二级索引来遍历(减少 IO) - 如果没有二级索引,则遍历聚簇索引(全表扫描)
- 优化方案:
```sql -- 方案 1:使用覆盖索引减少遍历量 CREATE INDEX idx_status ON trade_order(status); SELECT COUNT(*) FROM trade_order WHERE status = 1; -- → Extra: Using index
-- 方案 2:维护计数表(推荐,高并发场景) CREATE TABLE table_counts ( table_name VARCHAR(64) PRIMARY KEY, row_count BIGINT NOT NULL ); -- 通过触发器或应用层维护
-- 方案 3:使用 Redis 缓存计数 -- SET trade_order:count 10000000 -- 适合精度要求不高的场景
-- 方案 4:估算(不需要精确值) EXPLAIN SELECT COUNT(*) FROM trade_order; -- 从 rows 字段获取估算值 ```
- MySQL 8.0 并行查询:
``sql -- 设置并行查询线程数 SET SESSION innodb_parallel_read_threads = 4; SELECT COUNT(*) FROM trade_order; -- 并行扫描 ``
常见误区:
- ❌
COUNT(1)比COUNT(*)快 → ✅ 两者完全等价,InnoDB 内部实现一样 - ❌ MyISAM 的
COUNT(*)和 InnoDB 一样快 → ✅ MyISAM 有行数缓存,InnoDB 没有 - ❌ 加索引就能解决所有 COUNT 性能问题 → ✅ 仍需遍历索引叶节点,大数据量下依然慢
- ❌
COUNT(1)is faster thanCOUNT(*)→ ✅ They are completely equivalent; InnoDB implements them identically internally - ❌ MyISAM's
COUNT(*)is as fast as InnoDB's → ✅ MyISAM caches row counts; InnoDB does not - ❌ Adding an index solves all COUNT performance issues → ✅ It still requires traversing index leaf nodes; COUNT remains slow on large datasets
延伸追问:
COUNT(DISTINCT col)的执行计划是什么样的?- 如何实现精确的实时在线用户数统计?
- 为什么 InnoDB 不能像 MyISAM 一样缓存行数?
- What does the execution plan for
COUNT(DISTINCT col)look like? - How would you implement accurate real-time online user count tracking?
- Why can't InnoDB cache row counts like MyISAM does?
风控关联:
- 风控系统需要统计在线交易量、异常交易数等实时指标。直接
COUNT(*)在大表上性能不可接受,通常通过维护计数表或使用 Redis 实时统计,配合定时任务校正偏差。 - Risk control systems need to track real-time metrics such as online transaction volume and anomaly counts. Direct
COUNT(*)on large tables is unacceptable for performance; the usual approach is maintaining a count table or using Redis for real-time statistics, combined with periodic tasks to correct drift.
English Answer:
COUNT(*)andCOUNT(1)are equivalent in InnoDB. Both count all rows visible to the current transaction, including rows where some columns areNULL.COUNT(column)is different: it counts only rows where that specific column is notNULL.- InnoDB does not cache an exact total row count the way MyISAM can, because MVCC means different transactions may see different visible row versions. One transaction's snapshot may include rows that another transaction cannot see, so a single global cached count would be incorrect.
COUNT(*)can be slow because InnoDB must traverse an index to count visible rows. The optimizer typically chooses the smallest available secondary index because scanning a narrow index costs less IO than scanning the clustered index. If there is no suitable secondary index, it may need to scan the clustered index.- Optimization depends on the query. If there is a
WHEREcondition, create a covering index that supports the filter and count path, such as an index onstatusforCOUNT(*) WHERE status = 1. For high-concurrency exact counters, maintain a separate count table and update it transactionally or through reliable application logic. For approximate or real-time dashboard values, Redis counters or periodic correction jobs may be better. If exactness is not required, estimates fromEXPLAINor metadata may be sufficient. - MySQL 8.0 can use parallel read threads in some scenarios, but it does not change the fundamental fact that exact counting over large visible datasets requires scanning. Therefore direct
COUNT(*)on a hot large risk-control transaction table should not be on the critical request path.
关联
- [[Redis]] — 缓存层与 MySQL 的配合、缓存一致性、分布式锁
- [[分布式系统]] — 分布式事务、CAP 与数据一致性、分库分表后的全局问题
- [[../../06-Risk-Control/wiki/特征平台|特征平台]] — 特征存储的索引设计、实时特征查询的 SQL 优化