关于单机模式的join问题

Viewed 41

你好,关于3.0x版本,doris进行单机部署,join时会出现broadcast join,这种join对于单机来说影响性能,我想关闭broadcast join,设置参数auto_broadcast_join_threshold=-1,后重新创建会话,性能没有提升,查看了一下执行计划,仍然使用的broadcast join。
当我了解使用hint方式后,进行了强制走shuffle。修改后的sql为:
select
sum(IFNULL(arej.reject_qty,0)) as sum_reject_qty,
sum( (arr.rec_qty + IFNULL(arej.reject_qty,0)) ) as sum_arrival_qty,
sum( arr.rec_qty ) as sum_rec_qty ,
sum( truncate(arr.rec_qty / arr.big_pack_qty,0) ) as sum_whole_piece_qty,
sum( arr.rec_qty % arr.big_pack_qty ) as sum_scattered_qty
from ib_tt_asn a
inner join [shuffle] ib_tt_asn_dt ad on ad.asn_id = a.id
left join [shuffle] order_tt_in_order_extend ioe on a.in_order_id = ioe.in_order_id
inner join [shuffle] (select asn_id,
asn_dt_id,
sku_id,
batch_no,
production_batch,
production_date,
invalid_date,
sterilization_batch,
sterilization_date,
big_pack_qty,
sum(rec_qty) as rec_qty,
max(rec_user_name) as rec_user_name,
max(rec_time) as rec_time,
max(ship_temp) as ship_temp,
max(arrive_temp) as arrive_temp
from ib_tt_asn_rec_record
group by asn_id,asn_dt_id,sku_id,batch_no,production_batch,production_date,invalid_date,sterilization_batch,
sterilization_date,big_pack_qty) as arr on arr.asn_id = ad.asn_id and arr.asn_dt_id = ad.id
inner join [shuffle] base_tm_sku bts on ad.sku_id = bts.id
left join [shuffle] base_tm_sku_gsp tsg on tsg.sku_id = bts.id
left join [shuffle] base_tm_mfg mfg on mfg.id = bts.mfg_id
left join [shuffle] base_tm_mfg fbzqy on fbzqy.id = bts.pack_factory_id
left join [shuffle] base_tm_sku_category tsc on tsc.id = bts.sku_category_id
left join [shuffle] base_tm_sku_category big on big.id = tsc.big_category_id
left join [shuffle] base_tm_dic_dt ysfs on ysfs.company_code = a.company_code and ysfs.dic_code = a.arrive_type and
ysfs.dic_type_code = 'SH_YSFS'
left join [shuffle] base_tm_dic_dt ysgj on ysgj.company_code = a.company_code and ysgj.dic_code = a.ship_tool and
ysgj.dic_type_code = 'SH_YSGJ'
left join [shuffle] (select arej.asn_id,sku_id,
arej.production_batch,
arej.production_date,
arej.invalid_date,
sum(arej.reject_qty) reject_qty,
max(jsyy.dic_name) as reject_reason
from ib_tt_asn_reject_record arej
left join [shuffle] base_tm_dic_dt jsyy on jsyy.company_code = arej.company_code and jsyy.dic_code = arej.reject_reason
and jsyy.dic_type_code = 'SH_JSYY'
group by asn_id,sku_id,production_batch,production_date,invalid_date) arej on arej.asn_id = arr.asn_id and
arej.sku_id = arr.sku_id and arej.production_batch = arr.production_batch and arej.production_date =
arr.production_date and arej.invalid_date = arr.invalid_date
这种方式确实提升了不少性能。但当部署在应用中。druid数据源解析会报错。
关于这种问题,请问一下doris是否可以彻底关闭broadcast join。实在没有的话请问强制走shuffle是否有其他方式。感觉回答。

2 Answers

auto_broadcast_join_threshold参数已经废弃了。
如果当前使用3.0版本的doris的话,建议使用set broadcast_row_count_limit=-1;尝试彻底关闭broadcast join。

你可以看看profile,是不是因为没指定策略的时候,plan的时间比较长。