dense_rank函数排序问题

Viewed 4

我的需求是查询源表为debtor_id, channel_distribution, order_no相同的分配一个相同的序号order_id,SQL如下:

insert overwrite table target_data(
    id, order_id, debtor_id, channel_distribution, order_no, goods_no, sales_qty
)
select
    row_number() over (order by t.debtor_id, t.channel_distribution, t.order_no, t.goods_no) as id,
    dense_rank() over (order by t.debtor_id, t.channel_distribution, t.order_no) as order_id,
    t.debtor_id, t.channel_distribution, t.order_no, t.goods_no, sales_qty
from source_data

源表source_data数据量有700多万,执行sql插入target_data后再验证数据的时候发行dense_rank排序不太对,相同debtor_id, channel_distribution, order_no的数据有多个order_id。而且每次重新插入,多个order_id分组的都在变。

select debtor_id, order_no, channel_distribution, group_concat(distinct cast(order_id as string)) as order_ids from target_data
group by debtor_id, order_no, channel_distribution
having count(distinct order_id) > 1

1 Answers

按照这个描述来看是查询结果不一致是吧,这个能提供一个最小复现成本的case吗?
或者提供下查询结果不一致的对比的 profile 和 explain。我们进一步分析下的。