doris 页面提交一个任务,cg 收到后,cg pod 会CrashBackOff

Viewed 111

compute group 的pod 日志报错~~

RuntimeLogger I20241211 17:08:54.730472   862 pipeline_fragment_context.cpp:259] PipelineFragmentContext::prepare|query_id=cf418ead4f7b484c-8d77c76f4425694f|fragment_id=1|pthread_id=140254051419712
RuntimeLogger I20241211 17:08:54.730751   862 pipeline_fragment_context.cpp:259] PipelineFragmentContext::prepare|query_id=cf418ead4f7b484c-8d77c76f4425694f|fragment_id=0|pthread_id=140254051419712
RuntimeLogger I20241211 17:08:54.732961   331 pipeline_task.cpp:386] no workload group for query cf418ead4f7b484c-8d77c76f4425694f
RuntimeLogger I20241211 17:08:54.732975   344 vtablet_writer.cpp:128] init new node for instance 0, incremantal:0
RuntimeLogger I20241211 17:08:54.733075   344 dns_cache.cpp:61] update hostname bnq-prod-disaggregated-cluster-cg1-1.bnq-prod-disaggregated-cluster-cg1.default.svc.cluster.local's ip to 10.189.12.47
RuntimeLogger I20241211 17:08:54.733232   331 fragment_mgr.cpp:644] Removing query cf418ead4f7b484c-8d77c76f4425694f instance cf418ead4f7b484c-8d77c76f44256950
RuntimeLogger F20241211 17:08:54.733673   864 storage_engine.cpp:120] Check failed: _type == Type::CLOUD ( vs. )
*** Check failure stack trace: ***
    @     0x55b26d141366  google::LogMessage::SendToLog()
    @     0x55b26d13ddb0  google::LogMessage::Flush()
    @     0x55b26d141ba9  google::LogMessageFatal::~LogMessageFatal()
    @     0x55b262f2200a  doris::BaseStorageEngine::to_cloud()
    @     0x55b26319c1f7  doris::LoadChannel::open()
    @     0x55b263196f00  doris::LoadChannelMgr::open()
    @     0x55b2632e2c4d  std::_Function_handler<>::_M_invoke()
    @     0x55b2632fec4b  doris::WorkThreadPool<>::work_thread()
    @     0x55b270059da0  execute_native_thread_routine
    @     0x7f90fad35ac3  (unknown)
    @     0x7f90fadc7850  (unknown)
    @              (nil)  (unknown)
*** Query id: cf418ead4f7b484c-8d77c76f4425694f ***
*** is nereids: 0 ***
*** tablet id: 0 ***
*** Aborted at 1733908134 (unix time) try "date -d @1733908134" if you are using GNU date ***
*** Current BE git commitID: c21b9f5bce ***
*** SIGABRT unknown detail explain (@0xa3) received by PID 163 (TID 864 OR 0x7f8f6fe9a640) from PID 163; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
 1# 0x00007F90FACE3520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
 3# raise in /lib/x86_64-linux-gnu/libc.so.6
 4# abort in /lib/x86_64-linux-gnu/libc.so.6
 5# 0x000055B26D14BC3D in /opt/apache-doris/be/lib/doris_be
 6# 0x000055B26D13E27A in /opt/apache-doris/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /opt/apache-doris/be/lib/doris_be
 8# google::LogMessage::Flush() in /opt/apache-doris/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /opt/apache-doris/be/lib/doris_be
10# doris::BaseStorageEngine::to_cloud() in /opt/apache-doris/be/lib/doris_be
11# doris::LoadChannel::open(doris::PTabletWriterOpenRequest const&) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/load_channel.cpp:118
12# doris::LoadChannelMgr::open(doris::PTabletWriterOpenRequest const&) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/load_channel_mgr.cpp:104
13# std::_Function_handler<void (), doris::PInternalService::tablet_writer_open(google::protobuf::RpcController*, doris::PTabletWriterOpenRequest const*, doris::PTabletWriterOpenResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool<false>::work_thread(int) at /home/zcp/repo_center/doris_release/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
16# 0x00007F90FAD35AC3 in /lib/x86_64-linux-gnu/libc.so.6
17# 0x00007F90FADC7850 in /lib/x86_64-linux-gnu/libc.so.6

/opt/apache-doris/be/bin/start_be.sh: line 429:   163 Aborted                 (core dumped) ${LIMIT:+${LIMIT}} "${DORIS_HOME}/lib/doris_be" "$@" 2>&1 < /dev/null
4 Answers

我也有同样的问题:执行insert into 操作,每次都出现下面的报错:
RuntimeLogger F20241209 05:42:16.291368 862 storage_engine.cpp:120] Check failed: _type == Type::CLOUD (� vs. )
*** Check failure stack trace: ***
@ 0x5587f5bfe366 google::LogMessage::SendToLog()
@ 0x5587f5bfadb0 google::LogMessage::Flush()
@ 0x5587f5bfeba9 google::LogMessageFatal::~LogMessageFatal()
@ 0x5587eb9df00a doris::BaseStorageEngine::to_cloud()
@ 0x5587ebc591f7 doris::LoadChannel::open()
@ 0x5587ebc53f00 doris::LoadChannelMgr::open()
@ 0x5587ebd9fc4d std::_Function_handler<>::_M_invoke()
@ 0x5587ebdbbc4b doris::WorkThreadPool<>::work_thread()
@ 0x5587f8b16da0 execute_native_thread_routine
@ 0x7fca9ec0aac3 (unknown)
@ 0x7fca9ec9c850 (unknown)
@ (nil) (unknown)

尝试着,在be.conf中添加配置: deploy_mode = cloud
问题就不再出现了。你可以尝试一下。
但是在文档:https://doris.apache.org/zh-CN/docs/3.0/install/cluster-deployment/k8s-deploy/compute-storage-decoupled/config-cg 的样例中没有强调一定要配置这个

审计日志里面找下这个sql,cf418ead4f7b484c-8d77c76f4425694f ,看看能不能稳定复现?

找了,我执行了两次,每次 cg 的 pod 都是crash ,报错都是类似的

ize=1, after wg size=1
RuntimeLogger I20241211 18:12:06.616665  1480 topic_subscriber.cpp:50] [topic_publish]finish handle topic WORKLOAD_GROUP
RuntimeLogger I20241211 18:12:06.616669  1480 topic_subscriber.cpp:46] [topic_publish]begin handle topic WORKLOAD_SCHED_POLICY, size=0
RuntimeLogger I20241211 18:12:06.616672  1480 workload_sched_policy_listener.cpp:79] [workload_schedule]finish update workload schedule policy, size=0
RuntimeLogger I20241211 18:12:06.616676  1480 topic_subscriber.cpp:50] [topic_publish]finish handle topic WORKLOAD_SCHED_POLICY
RuntimeLogger I20241211 18:12:07.033452  1363 daemon.cpp:240] os physical memory 16.00 GB. process memory used 1.03 GB(= 1.12 GB[vm/rss] - 85.20 MB[tc/jemalloc_cache] + 0[reserved] + 0B[waiting_refresh]), limit 14.40 GB, soft limit 12.96 GB. sys available memory 15.28 GB(= 15.28 GB[proc/available] - 0[reserved] - 0B[waiting_refresh]), low water mark 819.20 MB, warning water mark 1.60 GB.
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
 1# 0x00007FABE5382520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
 3# raise in /lib/x86_64-linux-gnu/libc.so.6
 4# abort in /lib/x86_64-linux-gnu/libc.so.6
 5# 0x0000555908D92C3D in /opt/apache-doris/be/lib/doris_be
 6# 0x0000555908D8527A in /opt/apache-doris/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /opt/apache-doris/be/lib/doris_be
 8# google::LogMessage::Flush() in /opt/apache-doris/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /opt/apache-doris/be/lib/doris_be
10# doris::BaseStorageEngine::to_cloud() in /opt/apache-doris/be/lib/doris_be
11# doris::LoadChannel::open(doris::PTabletWriterOpenRequest const&) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/load_channel.cpp:118
12# doris::LoadChannelMgr::open(doris::PTabletWriterOpenRequest const&) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/load_channel_mgr.cpp:104
13# std::_Function_handler<void (), doris::PInternalService::tablet_writer_open(google::protobuf::RpcController*, doris::PTabletWriterOpenRequest const*, doris::PTabletWriterOpenResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool<false>::work_thread(int) at /home/zcp/repo_center/doris_release/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
16# 0x00007FABE53D4AC3 in /lib/x86_64-linux-gnu/libc.so.6
17# 0x00007FABE5466850 in /lib/x86_64-linux-gnu/libc.so.6

/opt/apache-doris/be/bin/start_be.sh: line 429:   164 Aborted                 (core dumped) ${LIMIT:+${LIMIT}} "${DORIS_HOME}/lib/doris_be" "$@" 2>&1 < /dev/null