【已解决】BE 不知什么原因导致宕机

Viewed 51

今天遇到BE宕机了,通过查询be.out 日志,发现是Query id: c77e277b1e774a9b-b48384b97de2f858 导致的大佬们帮忙看看原因呀。
版本号:2.0.3
be.out里的堆栈信息
start time: Mon Mar 25 08:37:26 UTC 2024
INFO: java_cmd /data/doris/java8/bin/java
INFO: jdk_version 8
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/doris/be/lib/java_extensions/preload-extensions/preload-extensions-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/doris/be/lib/java_extensions/java-udf/java-udf-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/doris/be/lib/hadoop_hdfs/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /data/doris/be/lib/hadoop_hdfs/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
*** Query id: c77e277b1e774a9b-b48384b97de2f858 ***
*** tablet id: 0 ***
*** Aborted at 1715475700 (unix time) try "date -d @1715475700" if you are using GNU date ***
*** Current BE git commitID: 37d31a5 ***
*** SIGSEGV address not mapped to object (@0x10) received by PID 29812 (TID 30181 OR 0x7fa9c1655700) from PID 16; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/src/doris-2.0/be/src/common/signal_handler.h:417
1# os::Linux::chained_handler(int, siginfo*, void*) in /data/doris/java8/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /data/doris/java8/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo*, void*) in /data/doris/java8/jre/lib/amd64/server/libjvm.so
4# 0x00007FAAAE010D10 in /lib64/libc.so.6
5# doris::VersionGraph::capture_consistent_versions(doris::Version const&, std::vector<doris::Version, std::allocator >) const at /root/src/doris-2.0/be/src/olap/version_graph.cpp:591
6# doris::TimestampedVersionTracker::capture_consistent_versions(doris::Version const&, std::vector<doris::Version, std::allocator >
) const at /root/src/doris-2.0/be/src/olap/version_graph.cpp:330
7# doris::Tablet::capture_consistent_versions(doris::Version const&, std::vector<doris::Version, std::allocator >, bool) const at /root/src/doris-2.0/be/src/olap/tablet.cpp:858
8# doris::Tablet::capture_rs_readers(doris::Version const&, std::vector<doris::RowSetSplits, std::allocator >
) const at /root/src/doris-2.0/be/src/olap/tablet.cpp:958
9# doris::vectorized::NewOlapScanner::init() at /root/src/doris-2.0/be/src/vec/exec/scan/new_olap_scanner.cpp:190
10# doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr) at /root/src/doris-2.0/be/src/vec/exec/scan/scanner_scheduler.cpp:338
11# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#3}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
12# doris::WorkThreadPool::work_thread(int) at /root/src/doris-2.0/be/src/util/work_thread_pool.hpp:160
13# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
14# start_thread in /lib64/libpthread.so.0
15# clone in /lib64/libc.so.6

fe.audit.log:
2024-05-12 01:02:10,027 [query] |Client=xxx:58748|User=datasync|Db=biz_chatting|State=ERR|ErrorCode=1105|ErrorMessage=errCode = 2, detailMessage = There exists unhealthy backend. backend 10004 is down|Time(ms)=30006|ScanBytes=0|ScanRows=0|ReturnRows=0|StmtId=12849419|QueryId=c77e277b1e774a9b-b48384b97de2f858|IsQuery=false|isNereids=false|feIp=xxxx|Stmt=delete from active_broadcaster_daily_statistic where create_time >= UNIX_TIMESTAMP('2024-05-11') * 1000 and create_time < (UNIX_TIMESTAMP('2024-05-11') + 86400) * 1000;|CpuTimeMS=0|SqlHash=12f2fb980a621594d4fa9e5059e11855|peakMemoryBytes=0|SqlDigest=|TraceId=|WorkloadGroup=|FuzzyVariables=
2024-05-12 01:02:10,027 [slow_query] |Client=xxxx:58748|User=datasync|Db=biz_chatting|State=ERR|ErrorCode=1105|ErrorMessage=errCode = 2, detailMessage = There exists unhealthy backend. backend 10004 is down|Time(ms)=30006|ScanBytes=0|ScanRows=0|ReturnRows=0|StmtId=12849419|QueryId=c77e277b1e774a9b-b48384b97de2f858|IsQuery=false|isNereids=false|feIp=xxxx|Stmt=delete from active_broadcaster_daily_statistic where create_time >= UNIX_TIMESTAMP('2024-05-11') * 1000 and create_time < (UNIX_TIMESTAMP('2024-05-11') + 86400) * 1000;|CpuTimeMS=0|SqlHash=12f2fb980a621594d4fa9e5059e11855|peakMemoryBytes=0|SqlDigest=|TraceId=|WorkloadGroup=|FuzzyVariables=

1 Answers