Key | Value |
---|---|
operator 版本 | selectdb/doris.k8s-operator:1.6.0 |
Chart 版本 | 1.6.0 |
fe 镜像版本 | selectdb/doris.fe-ubuntu:2.1.5 |
be 镜像版本 | selectdb/doris.be-ubuntu:2.1.5 |
BE 从 2024-09-03 22:00 左右开始反复重启,无法启动,查看最近 kubectl logs 日志为:
RuntimeLogger I20240904 09:30:34.215209 2607 workload_group_manager.cpp:144] [topic_publish_wg]finish clear unused workload group, time cost: 0ms, deleted group size:0, before wg size=1, after wg size=1
RuntimeLogger I20240904 09:30:34.215219 2607 topic_subscriber.cpp:50] [topic_publish]finish handle topic WORKLOAD_GROUP
RuntimeLogger I20240904 09:30:34.215229 2607 topic_subscriber.cpp:46] [topic_publish]begin handle topic WORKLOAD_SCHED_POLICY, size=0
RuntimeLogger I20240904 09:30:34.215238 2607 workload_sched_policy_listener.cpp:79] [workload_schedule]finish update workload schedule policy, size=0
RuntimeLogger I20240904 09:30:34.215246 2607 topic_subscriber.cpp:50] [topic_publish]finish handle topic WORKLOAD_SCHED_POLICY
RuntimeLogger I20240904 09:30:36.217680 1568 wal_manager.cpp:481] Scheduled(every 10s) WAL info: [/opt/apache-doris/be/storage/wal: limit 154314736435 Bytes, used 0 Bytes, estimated wal bytes 0 Bytes, available 154314736435 Bytes.];
RuntimeLogger I20240904 09:30:36.941766 1672 storage_engine.cpp:1086] start to delete unused rowset, size: 5
RuntimeLogger I20240904 09:30:36.941839 1672 storage_engine.cpp:1120] collected 5 unused rowsets to remove, skipped 0 rowsets due to use count > 1, skipped 0 rowsets due to don't need to delete file, skipped 0 rowsets due to delayed expired timestamp.
RuntimeLogger I20240904 09:30:36.941855 1672 beta_rowset.cpp:217] deleting /opt/apache-doris/be/storage/data/595/525327/2114729018/02000000001a3783274c10f262f8d6410f7bc32de70de99f_0.dat
RuntimeLogger I20240904 09:30:36.945242 1672 beta_rowset.cpp:217] deleting /opt/apache-doris/be/storage/data/579/525263/2114729018/02000000001a3636274c10f262f8d6410f7bc32de70de99f_0.dat
RuntimeLogger I20240904 09:30:36.949038 1672 beta_rowset.cpp:217] deleting /opt/apache-doris/be/storage/data/585/525287/2114729018/02000000001a36c5274c10f262f8d6410f7bc32de70de99f_0.dat
RuntimeLogger I20240904 09:30:36.952517 1672 beta_rowset.cpp:217] deleting /opt/apache-doris/be/storage/data/581/525271/2114729018/02000000001a36b3274c10f262f8d6410f7bc32de70de99f_0.dat
RuntimeLogger I20240904 09:30:36.957330 1672 storage_engine.cpp:1136] removed all collected unused rowsets
RuntimeLogger I20240904 09:30:44.096683 1679 compaction.cpp:350] start cumulative compaction. tablet=537648, output_version=[5226-5230], permits: 5
terminate called after throwing an instance of 'std::out_of_range'
what(): _Map_base::at
*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 537648 ***
*** Aborted at 1725442244 (unix time) try "date -d @1725442244" if you are using GNU date ***
*** Current BE git commitID: d5a02e095d ***
*** SIGABRT unknown detail explain (@0x471) received by PID 1137 (TID 1679 OR 0x7fccb6d1d640) from PID 1137; stack trace: ***
RuntimeLogger I20240904 09:30:45.606619 2478 daemon.cpp:220] os physical memory 61.44 GB. process memory used 1.75 GB(= 1.89 GB[vm/rss] - 146.52 MB[tc/jemalloc_cache] + 0[reserved] + 0B[waiting_refresh]), limit 55.29 GB, soft limit 49.76 GB. sys available memory 58.57 GB(= 58.57 GB[proc/available] - 0[reserved] - 0B[waiting_refresh]), low water mark 3.07 GB, warning water mark 6.14 GB.
RuntimeLogger I20240904 09:30:46.234295 1568 wal_manager.cpp:481] Scheduled(every 10s) WAL info: [/opt/apache-doris/be/storage/wal: limit 154329987072 Bytes, used 0 Bytes, estimated wal bytes 0 Bytes, available 154329987072 Bytes.];
RuntimeLogger I20240904 09:30:46.993786 1705 olap_server.cpp:1172] cooldown producer get tablet num: 1
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
1# 0x00007FCECBF09520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
3# raise in /lib/x86_64-linux-gnu/libc.so.6
4# abort in /lib/x86_64-linux-gnu/libc.so.6
5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
7# 0x000055F7E9D91A31 in /opt/apache-doris/be/lib/doris_be
8# 0x000055F7E9D91B84 in /opt/apache-doris/be/lib/doris_be
9# std::__throw_out_of_range(char const*) at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:86
10# doris::TabletSchema::column_by_uid(int) const at /home/zcp/repo_center/doris_release/doris/be/src/olap/tablet_schema.cpp:1245
11# doris::Compaction::construct_output_rowset_writer(doris::RowsetWriterContext&, bool) at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:887
12# doris::Compaction::do_compaction_impl(long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:355
13# doris::Compaction::do_compaction(long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:136
14# doris::CumulativeCompaction::execute_compact_impl() at /home/zcp/repo_center/doris_release/doris/be/src/olap/cumulative_compaction.cpp:79
15# doris::Compaction::execute_compact() at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:118
16# doris::Tablet::execute_compaction(doris::Compaction&) at /home/zcp/repo_center/doris_release/doris/be/src/olap/tablet.cpp:2053
17# std::_Function_handler<void (), doris::StorageEngine::_submit_compaction_task(std::shared_ptr<doris::Tablet>, doris::CompactionType, bool)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
18# doris::ThreadPool::dispatch_thread() in /opt/apache-doris/be/lib/doris_be
19# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:499
20# 0x00007FCECBF5BAC3 in /lib/x86_64-linux-gnu/libc.so.6
21# 0x00007FCECBFED850 in /lib/x86_64-linux-gnu/libc.so.6
/opt/apache-doris/be/bin/start_be.sh: line 378: 1137 Aborted (core dumped) ${LIMIT:+${LIMIT}} "${DORIS_HOME}/lib/doris_be" "$@" 2>&1 < /dev/null
有官方群的大佬怀疑是 CPU 不支持 AVX1。后更换支持的 AMD64 类型机器依然不行