集群配置:5节点 64c,384g。BE:5,FE: 3(与be混部)
版本:2.0.3
集群压力不高的情况下,5个be同时掉线。
be.out日志在该时间段没有查到异常
下面过滤后be.INFO输出的日志,所有节点都是连接其他节点rpc超时。但是不清楚为什么全部掉线了
W0408 10:52:38.007452 1575172 status.h:393] meet error status: [INTERNAL_ERROR]RuntimeFilter::join_rpc meet rpc error, msg=[E1008]Reached timeout=1000ms @10.188.29.204:8060.
0# doris::IRuntimeFilter::join_rpc() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
1# doris::VRuntimeFilterSlots::finish_publish() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
2# doris::vectorized::HashJoinNode::~HashJoinNode() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:335
3# doris::ObjectPool::add<doris::vectorized::HashJoinNode>(doris::vectorized::HashJoinNode*)::{lambda(void*)#1}::__invoke(void*) at /root/src/doris-2.0/be/src/common/object_pool.h:40
4# doris::RuntimeState::~RuntimeState() at /root/src/doris-2.0/be/src/common/object_pool.h:0
5# doris::pipeline::PipelineFragmentContext::~PipelineFragmentContext() at /root/src/doris-2.0/be/src/runtime/runtime_state.h:58
6# doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/atomicity.h:98
7# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /root/src/doris-2.0/be/src/pipeline/task_scheduler.cpp:0
8# doris::ThreadPool::dispatch_thread() at /root/src/doris-2.0/be/src/util/threadpool.cpp:0
9# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
10# start_thread
11# __clone