row_number函数,使用rn=1过滤后,数据字段值发生改变

Viewed 83

执行sql内层res表查询 结果如下

在res表外加上rn=1后 select 不变 结果如下

相同sql在spark执行 数据字段值不会发生改变

sql如下:

select * from (
			SELECT
				t.lz_claim_date,
				t.zy_claim_date,
				t.lz_flag,
				t.id,
				t.borrow_id,
				sum( capital_amount ) over ( PARTITION BY t.borrow_id ) AS capital_amount,
				0 AS rate_amount,
				round( sum( bb.capital_amount ) over ( PARTITION BY t.borrow_id ) * 10.6 / 36500 * 3, 2 ) AS overdue_amount,
				row_number() over ( PARTITION BY t.lz_claim_date, t.zy_claim_date, t.lz_flag, t.id, t.borrow_id order by 1) rn 
			FROM
				(
				SELECT
					z.lz_claim_date,
					z.zy_claim_date,
					z.lz_flag,
					min( bill_id ) AS id,
					borrow_id 
				FROM
					tmp.js_detail_zyxj_dc_bill_p0 z 
				WHERE
					z.lz_flag = 2 
				GROUP BY
					z.lz_claim_date,
					z.zy_claim_date,
					z.lz_flag,
					borrow_id 
				) t
				LEFT JOIN ods.lt_lt_borrow_bill_new bb ON bb.id > t.id 
				AND bb.borrow_id = t.borrow_id 
			WHERE
				bb.id IS NOT NULL  and borrow_id = 2138376
			) res 
		WHERE
			res.rn = 1
3 Answers

因为你有SUM OVER,可能是关联翻倍了,然后用row_number over,过滤掉了一条,所以就没翻倍了。
我认为是这个地方的问题:

FROM (
SELECT xxx
group by XXX
)
LEFT JOIN xxx
这个地方,同一个id出现了两条数,
你可以把row_number 改成rank,看看是不是就是950了。

这么写的话结果是正常的:
select * from (select *,row_number() over ( PARTITION BY res.lz_claim_date, res.zy_claim_date, res.lz_flag, res.id, res.borrow_id ) rn from (
SELECT
t.lz_claim_date,
t.zy_claim_date,
t.lz_flag,
t.id,
t.borrow_id,
sum( capital_amount ) over ( PARTITION BY t.borrow_id ) AS capital_amount,
0 AS rate_amount,
round( sum( bb.capital_amount ) over ( PARTITION BY t.borrow_id ) * 10.6 / 36500 * 3, 2 ) AS overdue_amount
FROM
(
SELECT
z.lz_claim_date,
z.zy_claim_date,
z.lz_flag,
min( bill_id ) AS id,
borrow_id
FROM
dm.js_detail_zyxj_dc_bill_p0 z
WHERE
z.lz_flag = 2
GROUP BY
z.lz_claim_date,
z.zy_claim_date,
z.lz_flag,
borrow_id
) t
LEFT JOIN ods.lt_lt_borrow_bill_new bb ON bb.id > t.id
AND bb.borrow_id = t.borrow_id
WHERE
bb.id IS NOT NULL and t.borrow_id = 2138376
) res) res1
WHERE
res1.rn = 1
结果图:
image.png

这条sql本身是存在语义歧义的,所以出现不被期望的值是正常的,其实这个中间结果并不属于同一粒度的数据。所以Doris的计算结果没有问题,问题来自sql本身,感谢社区同学的宝贵建议