问题背景,今日将集群从2.0升级到2.1,升级完成后发现be节点经常自动重启。经排查可判断某些复杂的view嵌套view查询语句将造成BE触发SIGSEGV导致节点关闭。
测试语句:
SELECT
DISTINCT 'ec' AS `ec`,
`a`.`sku` AS `seller_sku`,
'US' AS `marketplace_id`,
`a`.`product_title` AS `fnsku英文标题`
FROM
`default_cluster:hyy`.`view_walmart_all_order` a
INNER JOIN (
SELECT
`sku` AS `sku`,
max(`workdate`) AS `workdate`
FROM
`default_cluster:hyy`.`view_walmart_all_order`
GROUP BY
`sku`) a1 ON
`a`.`sku` = `a1`.`sku`
AND `a`.`workdate` = `a1`.`workdate`
被杀死的节点BE.out:
*** Query id: 1345f2f0c91e472a-a2cc9c491f993037 ***
*** tablet id: 0 ***
*** Aborted at 1711352493 (unix time) try "date -d @1711352493" if you are using GNU date ***
*** Current BE git commitID: 91efb6a43d ***
*** SIGSEGV unknown detail explain (@0x0) received by PID 891071 (TID 891325 OR 0x7f79d19f7640) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:417
1# 0x00007F7B0800042F in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
3# 0x00007F7B07FF90FC in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
4# 0x00007F7B0C5DB520 in /lib/x86_64-linux-gnu/libc.so.6
5# doris::vectorized::VExprContext::execute(doris::vectorized::Block*, int*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exprs/vexpr_context.cpp:50
6# doris::pipeline::JoinProbeLocalState<doris::pipeline::HashJoinSharedState, doris::pipeline::HashJoinProbeLocalState>::_build_output_block(doris::vectorized::Block*, doris::vectorized::Block*, bool) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/exec/join_probe_operator.cpp:127
7# doris::pipeline::HashJoinProbeLocalState::filter_data_and_build_output(doris::RuntimeState*, doris::vectorized::Block*, bool*, doris::vectorized::Block*, bool) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:433
8# doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:364
9# doris::pipeline::StatefulOperatorX<doris::pipeline::HashJoinProbeLocalState>::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/pipeline_x/operator.cpp:459
10# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/pipeline_x/operator.cpp:210
11# doris::pipeline::StatefulOperatorX<doris::pipeline::DistinctStreamingAggLocalState>::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/pipeline_x/operator.cpp:444
12# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/pipeline_x/operator.cpp:210
13# doris::pipeline::PipelineXTask::execute(bool*) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/pipeline_x/pipeline_x_task.cpp:274
14# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/pipeline/task_scheduler.cpp:334
15# doris::ThreadPool::dispatch_thread() in /opt/apache-doris/be/lib/doris_be
16# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:499
17# 0x00007F7B0C62DAC3 in /lib/x86_64-linux-gnu/libc.so.6
18# 0x00007F7B0C6BF850 in /lib/x86_64-linux-gnu/libc.so.6
相关问题在2.0版本未有出现,请教应如何排查修复这个问题?谢谢!