集群中be进程同时消失

Viewed 118

集群配置:5节点 64c,384g。BE:5,FE: 3(与be混部)
版本:2.0.3
集群压力不高的情况下,5个be同时掉线。
image.png
be.out日志在该时间段没有查到异常
下面过滤后be.INFO输出的日志,所有节点都是连接其他节点rpc超时。但是不清楚为什么全部掉线了

W0408 10:52:38.007452 1575172 status.h:393] meet error status: [INTERNAL_ERROR]RuntimeFilter::join_rpc meet rpc error, msg=[E1008]Reached timeout=1000ms @10.188.29.204:8060.

	0#  doris::IRuntimeFilter::join_rpc() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
	1#  doris::VRuntimeFilterSlots::finish_publish() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
	2#  doris::vectorized::HashJoinNode::~HashJoinNode() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:335
	3#  doris::ObjectPool::add<doris::vectorized::HashJoinNode>(doris::vectorized::HashJoinNode*)::{lambda(void*)#1}::__invoke(void*) at /root/src/doris-2.0/be/src/common/object_pool.h:40
	4#  doris::RuntimeState::~RuntimeState() at /root/src/doris-2.0/be/src/common/object_pool.h:0
	5#  doris::pipeline::PipelineFragmentContext::~PipelineFragmentContext() at /root/src/doris-2.0/be/src/runtime/runtime_state.h:58
	6#  doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/atomicity.h:98
	7#  doris::pipeline::TaskScheduler::_do_work(unsigned long) at /root/src/doris-2.0/be/src/pipeline/task_scheduler.cpp:0
	8#  doris::ThreadPool::dispatch_thread() at /root/src/doris-2.0/be/src/util/threadpool.cpp:0
	9#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
	10# start_thread
	11# __clone
1 Answers