2024-04-04 12:18 收到 kafka 消费积压告警,routine load 状态正常,但是不消费 kafka 消息
异常 be 节点机器监控
fe.log 有很多 failed to get latest offsets 异常
2024-04-04 12:18:01,655 WARN (Routine load task scheduler|48) [KafkaUtil.getLatestOffsets():212] failed to get latest offsets.
be.out
start time: Thu Apr 4 21:24:01 CST 2024
INFO: java_cmd /usr/lib/jvm/java/bin/java
INFO: jdk_version 8
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/doris/be/lib/java_extensions/preload-extensions/preload-extensions-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/doris/be/lib/java_extensions/java-udf/java-udf-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/doris/be/lib/hadoop_hdfs/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
*** Query id: 0-0 ***
*** tablet id: 0 ***
*** Aborted at 1712241970 (unix time) try "date -d @1712241970" if you are using GNU date ***
*** Current BE git commitID: 91efb6a43d ***
*** SIGSEGV unknown detail explain (@0x0) received by PID 1450 (TID 3459 OR 0x7fd463adc700) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:417
1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo_t*, void*) in /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
4# 0x00007FE0C001B400 in /lib64/libc.so.6
5# __GI___pthread_mutex_lock in /lib64/libpthread.so.0
6# std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<doris::PBackendService_Stub> >* phmap::priv::parallel_hash_set<8ul, phmap::priv::raw_hash_set, std::mutex, phmap::priv::FlatHashMapPolicy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<doris::PBackendService_Stub> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<doris::PBackendService_Stub> > > >::find_ptr<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, phmap::LockableBaseImpl<std::mutex>::WriteLock>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, phmap::LockableBaseImpl<std::mutex>::WriteLock&) at /home/zcp/repo_center/doris_release/doris/thirdparty/installed/include/parallel_hashmap/phmap.h:3736
7# bool phmap::priv::parallel_hash_set<8ul, phmap::priv::raw_hash_set, std::mutex, phmap::priv::FlatHashMapPolicy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<doris::PBackendService_Stub> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<doris::PBackendService_Stub> > > >::modify_if_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, doris::BrpcClientCache<doris::PBackendService_Stub>::get_client(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(auto:1 const&)#1}&, phmap::LockableBaseImpl<std::mutex>::WriteLock>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::BrpcClientCache<doris::PBackendService_Stub>::get_client(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(auto:1 const&)#1}&) in /usr/local/doris/be/lib/doris_be
8# doris::BrpcClientCache<doris::PBackendService_Stub>::get_client(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /home/zcp/repo_center/doris_release/doris/be/src/util/brpc_client_cache.h:95
9# doris::BrpcClientCache<doris::PBackendService_Stub>::get_client(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) at /home/zcp/repo_center/doris_release/doris/be/src/util/brpc_client_cache.h:90
10# doris::CheckRPCChannelAction::handle(doris::HttpRequest*) at /home/zcp/repo_center/doris_release/doris/be/src/http/action/check_rpc_channel_action.cpp:85
11# 0x000055F83C432E37 in /usr/local/doris/be/lib/doris_be
12# bufferevent_run_readcb_ in /usr/local/doris/be/lib/doris_be
13# 0x000055F83C435053 in /usr/local/doris/be/lib/doris_be
14# 0x000055F83C41BFB9 in /usr/local/doris/be/lib/doris_be
15# 0x000055F83C41C637 in /usr/local/doris/be/lib/doris_be
16# 0x000055F83C41EC68 in /usr/local/doris/be/lib/doris_be
17# std::_Function_handler<void (), doris::EvHttpServer::start()::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
18# doris::ThreadPool::dispatch_thread() in /usr/local/doris/be/lib/doris_be
19# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:499
20# start_thread in /lib64/libpthread.so.0
21# clone in /lib64/libc.so.6
2024-04-04 21:23:16 业务方访问 Doris 集群 rpc 异常
程序异常 2024-04-04 21:23:16 [-][-][-][error][application] ..........查询异常:PDOStatement::execute(): SQLSTATE[HY000]: General error: 1105 RpcException, msg: send fragments failed. io.grpc.StatusRuntimeException: UNAVAILABLE: io exception, host: .....
程序异常 2024-04-04 22:59:02 [-][-][-][error][application] .......异常:PDOStatement::execute(): SQLSTATE[HY000]: General error: 1105 errCode = 2, detailMessage = tablet 1100807 has no queryable replicas. err: replica 1100809's backend 10146 does not exist or not alive, replica 1100808's backend 10092 does not exist or not alive
程序异常 2024-04-04 23:09:44 [-][-][-][error][application] .....查询异常:PDOStatement::execute(): SQLSTATE[HY000]: General error: 1105 errCode = 2, detailMessage = (.....)[CANCELLED]failed to send brpc when exchange, error=Host is down, error_text=[E112]Not connected to ....:8060 yet, server_id=908 [R1][E112]Not connected to ....:8060 yet, server_id=908 [R2][E112]Not connected to ....:8060 yet, server_id=908 [R3][E112]Not connected to ....:8060 yet, server_id=908 [R4][E112]Not connected to ....:8060 yet, server_id=908 [R5][E1