evening breeze / Taizo
1415 字
7 分钟
MySQL窗口函数
学习目标
- 掌握窗口函数的语法与使用场景
- 熟悉排名/偏移/聚合类窗口函数(ROW_NUMBER、RANK、LAG、LEAD、SUM OVER等)
- 理解窗口帧(FRAME)的含义与性能影响
- 能使用窗口函数解决常见业务需求(排名、同环比、累计值、分组TopN)
学习计划
- 窗口函数基础与语法
- 排名类函数:ROW_NUMBER / RANK / DENSE_RANK / NTILE
- 偏移类函数:LAG / LEAD / FIRST_VALUE / LAST_VALUE / NTH_VALUE
- 聚合窗口:SUM / AVG / COUNT / MAX / MIN OVER
- 窗口帧(FRAME)与ROWS/RANGE区别
- 实战示例:电商销售场景
- 注意事项与性能优化
- 练习题与扩展
1. 窗口函数基础与语法
- 窗口函数定义:在不折叠行的前提下,对一组相关行进行计算并将结果返回到每一行。
- 与普通聚合的区别:聚合函数(GROUP BY)会聚合成更少的行;窗口函数保留每一行。
- 基本语法:
<window_function>() OVER ( [PARTITION BY <expr_list>] [ORDER BY <order_list>] [<frame_clause>])- 位置限制:仅能用于SELECT与ORDER BY子句,不能直接用于WHERE,需要借助子查询/CTE。
2. 排名类函数
2.1 ROW_NUMBER
为分区内的每行分配唯一且连续的序号。
SELECT user_id, amount, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rkFROM sales;2.2 RANK
同分并列,排名会跳跃(1,2,2,4)。
SELECT user_id, amount, RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rkFROM sales;2.3 DENSE_RANK
同分并列但不跳跃(1,2,2,3)。
SELECT user_id, amount, DENSE_RANK() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rkFROM sales;2.4 NTILE(n)
将分区内的行尽量均匀地分到n个桶中。
SELECT user_id, amount, NTILE(4) OVER (PARTITION BY user_id ORDER BY amount DESC) AS bucketFROM sales;3. 偏移类函数
3.1 LAG/LEAD
- LAG:取当前行之前第k行的值(默认k=1)
- LEAD:取当前行之后第k行的值
SELECT user_id, order_date, amount, LAG(amount, 1, 0) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_amount, LEAD(amount, 1, 0) OVER (PARTITION BY user_id ORDER BY order_date) AS next_amountFROM sales;3.2 FIRST_VALUE / LAST_VALUE
取分区内(或指定帧)首/尾值。注意默认帧会随ORDER BY移动,常与显式帧一起使用。
SELECT user_id, order_date, amount, FIRST_VALUE(amount) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS first_amt, LAST_VALUE(amount) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_amtFROM sales;3.3 NTH_VALUE
取分区内第n个值(随帧而变)。
SELECT user_id, order_date, amount, NTH_VALUE(amount, 3) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS third_amtFROM sales;4. 聚合窗口
在窗口上执行SUM/AVG/COUNT/MAX/MIN等,常用于累计、滑动窗口。
4.1 累计值(running total)
SELECT user_id, order_date, amount, SUM(amount) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_amountFROM sales;4.2 滑动窗口(近N行)
SELECT user_id, order_date, amount, AVG(amount) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS avg_3_rowsFROM sales;4.3 同比/环比(与上一期比较)
SELECT user_id, order_month, amount, LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month) AS prev_amount, amount - LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month) AS mom_diff, CASE WHEN LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month) = 0 THEN NULL ELSE ROUND( (amount - LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month)) / LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month) * 100, 2) END AS mom_rateFROM monthly_sales;5. 窗口帧(FRAME)与ROWS/RANGE
- ROWS:按物理行计数,严格依赖行数。
- RANGE:按排序键的逻辑范围(相同排序值为同一范围)。对数值/日期等更直观,但在MySQL中常见实现限制较多。
- 常见帧写法:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(从分区首到当前行)ROWS BETWEEN N PRECEDING AND CURRENT ROW(近N行含当前行)ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING(整个分区)
- 注意:未显式指定帧时,很多函数会采用默认帧(受ORDER BY影响),在FIRST_VALUE/LAST_VALUE等场景建议显式帧以避免“尾值漂移”。
6. 实战示例:电商销售
6.1 示例表结构
-- 销售明细CREATE TABLE sales ( order_id BIGINT PRIMARY KEY, user_id BIGINT NOT NULL, order_date DATE NOT NULL, amount DECIMAL(10,2) NOT NULL);6.2 需求1:用户内订单金额排名并取Top3
WITH ranked AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) AS rk FROM sales)SELECT * FROM ranked WHERE rk <= 3;6.3 需求2:计算用户累计消费与上/下单金额
SELECT user_id, order_date, amount, SUM(amount) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cum_amount, LAG(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_amount, LEAD(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS next_amountFROM sales;6.4 需求3:按月环比增长率
SELECT user_id, order_month, amount, LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month) AS prev_amount, ROUND( CASE WHEN LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month) = 0 THEN NULL ELSE (amount - LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month)) / LAG(amount) OVER (PARTITION BY user_id ORDER BY order_month) * 100 END, 2) AS mom_rateFROM monthly_sales;7. 注意事项与性能优化
- 不能用于WHERE:将窗口计算放在子查询/CTE,再在外层过滤。
- 索引优化:为
PARTITION BY与ORDER BY涉及的列建立合适索引,减少排序/临时表开销。 - 帧大小:
UNBOUNDED帧可能扫描大量数据;滑动窗口选择合适N。 - 内存与临时表:大窗口计算可能落磁盘;关注
tmp_table_size、max_heap_table_size。 - 版本要求:需MySQL 8.0+(5.7不支持窗口函数)。
- 与聚合混用:先窗口后聚合或先聚合后窗口,注意语义与行数变化。
8. 练习题与扩展
- 计算每位用户最近3单的平均客单价。
- 统计各品类Top5订单,并给出每单相对该品类均值的差值。
- 在同一查询中同时返回累计金额、上/下单金额、分位桶(NTILE(4))。
- 尝试用窗口函数重写传统子查询实现的“分组取TopN”。
总结
- 窗口函数在保留明细行的同时进行分组内计算,是现代SQL分析的核心能力。
- 掌握排名、偏移、聚合与帧控制,可以覆盖大多数业务分析需求。
- 重视索引与帧控制,避免无界窗口造成的性能问题。