【已解决】Doris 2.1.2 rc04 存在的一个内存泄露 BUG

Viewed 154

版本:Doris 2.1.2 rc4
现象:
在 steamLoad 和 query 存在的场景里,Doris 内存会阶段性飙高,如下图:
image.png
每次飙高时存在如下报错:

I20240626 10:47:40.612373 64650 topic_subscriber.cpp:43] begin handle topic info
2024-06-26 18:47:40.612	
I20240626 10:47:40.612512 64650 workload_group_listener.cpp:56] update workload group finish, tg info=TG[id = 1, name = normal, cpu_share = 1024, memory_limit = 2.12 GB, enable_memory_overcommit = true, version = 0, cpu_hard_limit = -1, scan_thread_num = 48, max_remote_scan_thread_num = 48, min_remote_scan_thread_num = 48, spill_low_watermark=50, spill_high_watermark=80], enable_cpu_hard_limit=false, cgroup cpu_shares=0, cgroup cpu_hard_limit=0, enable_cgroup_cpu_soft_limit=true, cgroup home path=
2024-06-26 18:47:40.612	
I20240626 10:47:40.612573 64650 cgroup_cpu_ctl.cpp:32] doris cgroup cpu path is not specify, path=
2024-06-26 18:47:40.612	
I20240626 10:47:40.612600 64650 workload_group_manager.cpp:123] init workload group mgr cpu ctl failed, [INTERNAL_ERROR]doris cgroup cpu path  is not specify.
2024-06-26 18:47:40.612	
I20240626 10:47:40.612641 64650 workload_group_manager.cpp:134] finish clear unused workload group, time cost: 0ms, deleted group size:0
2024-06-26 18:47:40.612	
I20240626 10:47:40.612660 64650 topic_subscriber.cpp:48] handle topic WORKLOAD_GROUP successfully
2024-06-26 18:47:40.612	
I20240626 10:47:40.612695 64650 workload_sched_policy_listener.cpp:73] [workload_schedule]finish update workload schedule policy, size=0
2024-06-26 18:47:40.612	
I20240626 10:47:40.612726 64650 topic_subscriber.cpp:48] handle topic WORKLOAD_SCHED_POLICY successfully
2024-06-26 18:47:41.303	
I20240626 10:47:41.303283  2128 memtable_memory_limiter.cpp:224] reached hard limit, process mem: 5.96 GB (without allocator cache: 5.47 GB), load mem: 0, memtable writers num: 0 (active: 0, write: 0, flush: 0)
2024-06-26 18:47:42.307	
I20240626 10:47:42.306810  2128 memtable_memory_limiter.cpp:224] reached hard limit, process mem: 5.96 GB (without allocator cache: 5.47 GB), load mem: 0, memtable writers num: 0 (active: 0, write: 0, flush: 0)
2024-06-26 18:47:43.307	
I20240626 10:47:43.307247  2128 memtable_memory_limiter.cpp:224] reached hard limit, process mem: 5.96 GB (without allocator cache: 5.47 GB), load mem: 0, memtable writers num: 0 (active: 0, write: 0, flush: 0)
2024-06-26 18:47:44.147	
I20240626 10:47:44.147195  1668 fold_constant_executor.cpp:79] fold_query_id: eddc138cd1be42c5-b33b676eba24f832
2024-06-26 18:47:44.147	
I20240626 10:47:44.147658  1668 fold_constant_executor.cpp:134] finish fold_query_id: eddc138cd1be42c5-b33b676eba24f832
2024-06-26 18:47:44.150	
I20240626 10:47:44.150179  1679 fold_constant_executor.cpp:79] fold_query_id: eddc138cd1be42c5-b33b676eba24f832
2024-06-26 18:47:44.150	
W20240626 10:47:44.150436  1679 function.h:279] function return type check failed, function_name=substring, expect_return_type=String, real_return_type=Nullable(String), input_arguments=String, Int32, Int32
2024-06-26 18:47:44.150	
W20240626 10:47:44.150539  1679 status.h:380] meet error status: [INTERNAL_ERROR]Function substring get failed, expr is VectorizedFnCall[substring](arguments=String, Int32, Int32,return=String) and return type is String.
2024-06-26 18:47:44.150	

2024-06-26 18:47:44.150	
	0#  doris::vectorized::VectorizedFnCall::prepare(doris::RuntimeState*, doris::RowDescriptor const&, doris::vectorized::VExprContext*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
2024-06-26 18:47:44.150	
	1#  doris::vectorized::VExprContext::prepare(doris::RuntimeState*, doris::RowDescriptor const&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:345
2024-06-26 18:47:44.150	
	2#  doris::Status doris::FoldConstantExecutor::_prepare_and_open<doris::vectorized::VExprContext>(doris::vectorized::VExprContext*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:335
2024-06-26 18:47:44.150	
	3#  doris::FoldConstantExecutor::fold_constant_vexpr(doris::TFoldConstantParams const&, doris::PConstantExprResult*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
2024-06-26 18:47:44.150	
	4#  doris::PInternalServiceImpl::_fold_constant_expr(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PConstantExprResult*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
2024-06-26 18:47:44.150	
	5#  std::_Function_handler<void (), doris::PInternalServiceImpl::fold_constant_expr(google::protobuf::RpcController*, doris::PConstantExprRequest const*, doris::PConstantExprResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:1332
2024-06-26 18:47:44.150	
	6#  doris::WorkThreadPool<false>::work_thread(int) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
2024-06-26 18:47:44.150	
	7#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
2024-06-26 18:47:44.150	
	8#  ?
2024-06-26 18:47:44.150	
	9#  ?
2024-06-26 18:47:44.150	
W20240626 10:47:44.150632  1679 internal_service.cpp:1351] exec fold constant expr failed, errmsg=[INTERNAL_ERROR]Function substring get failed, expr is VectorizedFnCall[substring](arguments=String, Int32, Int32,return=String) and return type is String.
2024-06-26 18:47:44.150	

2024-06-26 18:47:44.150	
	0#  doris::vectorized::VectorizedFnCall::prepare(doris::RuntimeState*, doris::RowDescriptor const&, doris::vectorized::VExprContext*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
2024-06-26 18:47:44.150	
	1#  doris::vectorized::VExprContext::prepare(doris::RuntimeState*, doris::RowDescriptor const&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:345
2024-06-26 18:47:44.150	
	2#  doris::Status doris::FoldConstantExecutor::_prepare_and_open<doris::vectorized::VExprContext>(doris::vectorized::VExprContext*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:335
2024-06-26 18:47:44.150	
	3#  doris::FoldConstantExecutor::fold_constant_vexpr(doris::TFoldConstantParams const&, doris::PConstantExprResult*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
2024-06-26 18:47:44.150	
	4#  doris::PInternalServiceImpl::_fold_constant_expr(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PConstantExprResult*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
2024-06-26 18:47:44.150	
	5#  std::_Function_handler<void (), doris::PInternalServiceImpl::fold_constant_expr(google::protobuf::RpcController*, doris::PConstantExprRequest const*, doris::PConstantExprResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:1332
2024-06-26 18:47:44.150	
	6#  doris::WorkThreadPool<false>::work_thread(int) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
2024-06-26 18:47:44.150	
	7#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
2024-06-26 18:47:44.150	
	8#  ?
2024-06-26 18:47:44.150	
	9#  ?
2024-06-26 18:47:44.150	
 .and query_id_is: eddc138cd1be42c5-b33b676eba24f832
2024-06-26 18:47:44.152	
I20240626 10:47:44.152595  1651 fold_constant_executor.cpp:79] fold_query_id: eddc138cd1be42c5-b33b676eba24f832
2024-06-26 18:47:44.153	
W20240626 10:47:44.152791  1651 function.h:279] function return type check failed, function_name=substring, expect_return_type=String, real_return_type=Nullable(String), input_arguments=String, Int32, Int32



2024-06-26 18:47:44.153	
W20240626 10:47:44.153013  1651 status.h:380] meet error status: [INTERNAL_ERROR]Function substring get failed, expr is VectorizedFnCall[substring](arguments=String, Int32, Int32,return=String) and return type is String.
2024-06-26 18:47:44.153	

2024-06-26 18:47:44.153	
	0#  doris::vectorized::VectorizedFnCall::prepare(doris::RuntimeState*, doris::RowDescriptor const&, doris::vectorized::VExprContext*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
2024-06-26 18:47:44.153	
	1#  doris::vectorized::VExprContext::prepare(doris::RuntimeState*, doris::RowDescriptor const&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:345
2024-06-26 18:47:44.153	
	2#  doris::Status doris::FoldConstantExecutor::_prepare_and_open<doris::vectorized::VExprContext>(doris::vectorized::VExprContext*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:335
2024-06-26 18:47:44.153	
	3#  doris::FoldConstantExecutor::fold_constant_vexpr(doris::TFoldConstantParams const&, doris::PConstantExprResult*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
2024-06-26 18:47:44.153	
	4#  doris::PInternalServiceImpl::_fold_constant_expr(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PConstantExprResult*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:449
2024-06-26 18:47:44.153	
	5#  std::_Function_handler<void (), doris::PInternalServiceImpl::fold_constant_expr(google::protobuf::RpcController*, doris::PConstantExprRequest const*, doris::PConstantExprResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:1332
2024-06-26 18:47:44.153	
	6#  doris::WorkThreadPool<false>::work_thread(int) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
2024-06-26 18:47:44.153	
	7#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
2024-06-26 18:47:44.153	
	8#  ?
2024-06-26 18:47:44.153	
	9#  ?
2024-06-26 18:47:44.153	
W20240626 10:47:44.153098  1651 internal_service.cpp:1351] exec fold constant expr failed, errmsg=[INTERNAL_ERROR]Function substring get failed, expr is VectorizedFnCall[substring](arguments=String, Int32, Int32,return=String) and return type is String.
2024-06-26 18:47:44.153	

报错内容:
status.h:380] meet error status: [INTERNAL_ERROR]Function substring get failed, expr is VectorizedFnCall[substring](arguments=String, Int32, Int32,return=String) and return type is String.

沿着报错信息,可以找到是 https://github.com/apache/doris/blob/master/be/src/runtime/fold_constant_executor.cpp 73行在进行常量折叠时报错
核心报错方法为:https://github.com/apache/doris/blob/2c3a96ed97bef5c9653424fdc61b7b40ff8119a3/be/src/service/internal_service.cpp#L1468 的PInternalService::_fold_constant_expr 方法
报错位置为:https://github.com/apache/doris/blob/master/be/src/vec/exprs/vectorized_fn_call.cpp#L60 VectorizedFnCall::prepare 方法

推测是 streamLoad 或者某些查询引发了 常量折叠处理函数内存控制上的 bug,导致内存泄露,直到多次内存泄露后,be 挂掉重启。麻烦官方帮忙看下

2 Answers