我的需求是查询源表为debtor_id, channel_distribution, order_no相同的分配一个相同的序号order_id,SQL如下:
insert overwrite table target_data(
id, order_id, debtor_id, channel_distribution, order_no, goods_no, sales_qty
)
select
row_number() over (order by t.debtor_id, t.channel_distribution, t.order_no, t.goods_no) as id,
dense_rank() over (order by t.debtor_id, t.channel_distribution, t.order_no) as order_id,
t.debtor_id, t.channel_distribution, t.order_no, t.goods_no, sales_qty
from source_data
源表source_data数据量有700多万,执行sql插入target_data后再验证数据的时候发行dense_rank排序不太对,相同debtor_id, channel_distribution, order_no的数据有多个order_id。而且每次重新插入,多个order_id分组的都在变。
select debtor_id, order_no, channel_distribution, group_concat(distinct cast(order_id as string)) as order_ids from target_data
group by debtor_id, order_no, channel_distribution
having count(distinct order_id) > 1