bug1:频繁报错超时,但是有找不到具体错误原因。
bug2:be节点经常重启,一重启就打印gc回收日志,be.out频繁报错如下:
bug2补充如下:
**Process 79150 (doris_be) of user 1000 killed by SIGABRT - dumping core
**Executable '/data/doris/be/lib/doris_be' doesn't belong to any package and ProcessUnpackaged is set to 'no'****
到底是啥原因导致的,给排查下吧,谢谢。
** bug1 报错日志如下:**
W20241231 22:07:57.708006 25363 status.h:413] meet error status: [TIMEOUT]Query tiemout
0# doris::ResultBufferMgr::cancel_thread() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/result_buffer_mgr.cpp:210
1# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
2# start_thread
3# __clone
W20241231 22:07:57.708715 25363 status.h:413] meet error status: [TIMEOUT]Query tiemout
0# doris::ResultBufferMgr::cancel_thread() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/result_buffer_mgr.cpp:210
1# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
2# start_thread
3# __clone
W20241231 22:07:57.708735 25363 status.h:413] meet error status: [TIMEOUT]Query tiemout
0# doris::ResultBufferMgr::cancel_thread() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/result_buffer_mgr.cpp:210
1# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
2# start_thread
3# __clone
W20241231 22:07:57.708747 25363 status.h:413] meet error status: [TIMEOUT]Query tiemout
0# doris::ResultBufferMgr::cancel_thread() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/result_buffer_mgr.cpp:210
1# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
2# start_thread
3# __clone
W20241231 22:07:57.708760 25363 status.h:413] meet error status: [TIMEOUT]Query tiemout
0# doris::ResultBufferMgr::cancel_thread() at /home/zcp/repo_center/doris_release/doris/be/src/runtime/result_buffer_mgr.cpp:210
1# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
2# start_thread
3# __clone
```** bug1 报错日志如下:**
**bug2错误如下:**
be.out报错如下:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 0 ***
*** Aborted at 1735748165 (unix time) try "date -d @1735748165" if you are using GNU date ***
*** Current BE git commitID: 443e87e203 ***
*** SIGABRT unknown detail explain (@0x3e80000b553) received by PID 46419 (TID 47956 OR 0x7f54bc285700) from PID 46419; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
1# 0x00007F58398E7400 in /lib64/libc.so.6
2# __GI_raise in /lib64/libc.so.6
3# abort in /lib64/libc.so.6
4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
5# __cxxabiv1::__terminate(void ()()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
6# 0x000055FC5BD799C1 in /data/doris/be/lib/doris_be
7# 0x000055FC5BD79B14 in /data/doris/be/lib/doris_be
8# std::__throw_system_error(int) at ../../../../../libstdc++-v3/src/c++11/system_error.cc:338
9# 0x000055FC5BE4509D in /data/doris/be/lib/doris_be
10# std::thread::thread<void ()(std::shared_ptr), std::shared_ptr&, void>(void (*&&)(std::sh
ared_ptr), std::shared_ptr&) in /data/doris/be/lib/doris_be
11# apache::thrift::concurrency::Thread::start() in /data/doris/be/lib/doris_be
12# apache::thrift::server::TThreadedServer::onClientConnected(std::shared_ptr const&) in /data/doris/be/lib/doris_be
13# apache::thrift::server::TServerFramework::newlyConnectedClient(std::shared_ptr const&) in /data/doris/be/lib/doris_be
14# apache::thrift::server::TServerFramework::serve() in /data/doris/be/lib/doris_be
15# apache::thrift::server::TThreadedServer::serve() in /data/doris/be/lib/doris_be
be.gc如下:
Java HotSpot(TM) 64-Bit Server VM (25.261-b12) for linux-amd64 JRE (1.8.0_261-b12), built on Jun 17 2020 23:41:40 by "java_re" with gcc 7.3.0
Memory: 4k page, physical 527752888k(195506092k free), swap 0k(0k free)
CommandLine flags: -XX:-CriticalJNINatives -XX:InitialHeapSize=1073741824 -XX:MaxHeapSize=1073741824 -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
1.909: [GC (Metadata GC Threshold) 157304K->17327K(1005056K), 0.0401801 secs]
1.949: [Full GC (Metadata GC Threshold) 17327K->15242K(721408K), 0.0453656 secs]
2.685: [GC (Allocation Failure) 277386K->18877K(721408K), 0.0070888 secs]
2.930: [GC (Allocation Failure) 281021K->16338K(721408K), 0.0036951 secs]
3.146: [GC (Allocation Failure) 278482K->15858K(721408K), 0.0037470 secs]
be.out报错如下:
doris_be: rdkafka_broker.c:5756: rd_kafka_broker_add_logical: Assertion `rkb && "failed to create broker thread"' failed.
*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 0 ***
*** Aborted at 1735748235 (unix time) try "date -d @1735748235" if you are using GNU date ***
*** Current BE git commitID: 443e87e203 ***
*** SIGABRT unknown detail explain (@0x3e80001352e) received by PID 79150 (TID 80703 OR 0x7fca1f3d3700) from PID 79150; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
1# 0x00007FCDB027B400 in /lib64/libc.so.6
2# __GI_raise in /lib64/libc.so.6
3# abort in /lib64/libc.so.6
4# __assert_fail_base in /lib64/libc.so.6
5# 0x00007FCDB0274252 in /lib64/libc.so.6
6# 0x0000558137791FDE in /data/doris/be/lib/doris_be
7# rd_kafka_cgrp_new in /data/doris/be/lib/doris_be
8# rd_kafka_new in /data/doris/be/lib/doris_be
9# RdKafka::KafkaConsumer::create(RdKafka::Conf const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&) in /data/doris/be/lib/doris_be
10# doris::KafkaDataConsumer::init(std::shared_ptr) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/routine_load/data_consumer.cpp:143
11# doris::DataConsumerPool::get_consumer(std::shared_ptr, std::shared_ptr) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/routine_load/data_consumer_pool.cpp:71
12# doris::RoutineLoadTaskExecutor::get_kafka_latest_offsets_for_partitions(doris::PKafkaMetaProxyRequest const&, std::vector<doris::PIntegerPair, std::allocator >, int) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/routine_load/routine_load_task_executor.cpp:169
13# std::_Function_handler<void (), doris::PInternalServiceImpl::get_info(google::protobuf::RpcController*, doris::PProxyRequest const*, doris::PProxyResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
be.gc如下:
Java HotSpot(TM) 64-Bit Server VM (25.261-b12) for linux-amd64 JRE (1.8.0_261-b12), built on Jun 17 2020 23:41:40 by "java_re" with gcc 7.3.0
Memory: 4k page, physical 527752888k(263590796k free), swap 0k(0k free)
CommandLine flags: -XX:-CriticalJNINatives -XX:InitialHeapSize=1073741824 -XX:MaxHeapSize=1073741824 -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
1.252: [GC (Metadata GC Threshold) 157307K->16956K(1005056K), 0.0464714 secs]
1.299: [Full GC (Metadata GC Threshold) 16956K->15241K(701952K), 0.0394015 secs]
2.004: [GC (Allocation Failure) 277385K->18609K(701952K), 0.0069175 secs]
2.253: [GC (Allocation Failure) 280753K->16377K(701952K), 0.0114080 secs]
2.474: [GC (Allocation Failure) 278521K->16185K(701952K), 0.0039811 secs]