doris从1.1.4升级到1.2.6后,be经常宕机

Viewed 98

其中一次的be.out报错信息如下:

 *** Query id: 1bf0818e5fa1484f-a3142b65c6d387c6 ***
*** Aborted at 1714210262 (unix time) try "date -d @1714210262" if you are using GNU date ***
*** Current BE git commitID: Unknown ***
*** SIGSEGV unkown detail explain (@0x0) received by PID 1166003 (TID 0x7f4ee3c2d700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:420
 1# os::Linux::chained_handler(int, siginfo*, void*) in /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo*, void*) in /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
 4# 0x00007F502EE20090 in /lib/x86_64-linux-gnu/libc.so.6
 5# void google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField<doris::segment_v2::ColumnIndexMetaPB>::TypeHandler>() at /var/local/thirdparty/installed/include/google/protobuf/repeated_field.h:1662
 6# void google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField<doris::segment_v2::ColumnMetaPB>::TypeHandler>() at /var/local/thirdparty/installed/include/google/protobuf/repeated_field.h:1662
 7# doris::segment_v2::Segment::~Segment() at /root/doris/be/src/olap/rowset/segment_v2/segment.cpp:86
 8# std::_Sp_counted_ptr<doris::segment_v2::Segment*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:348
 9# doris::SegmentLoader::_insert(doris::SegmentLoader::CacheKey const&, doris::SegmentLoader::CacheValue&, doris::SegmentCacheHandle*)::{lambda(doris::CacheKey const&, void*)#1}::_FUN(doris::CacheKey const&, void*) at /root/doris/be/src/olap/segment_loader.cpp:54
10# doris::LRUHandle::free() at /root/doris/be/src/olap/lru_cache.h:263
11# doris::LRUCache::insert(doris::CacheKey const&, unsigned int, void*, unsigned long, void (*)(doris::CacheKey const&, void*), doris::MemTrackerLimiter*, doris::CachePriority, unsigned long) at /root/doris/be/src/olap/lru_cache.cpp:336
12# doris::ShardedLRUCache::insert(doris::CacheKey const&, void*, unsigned long, void (*)(doris::CacheKey const&, void*), doris::CachePriority, unsigned long) at /root/doris/be/src/olap/lru_cache.cpp:489
13# doris::SegmentLoader::_insert(doris::SegmentLoader::CacheKey const&, doris::SegmentLoader::CacheValue&, doris::SegmentCacheHandle*) at /root/doris/be/src/olap/segment_loader.cpp:61
14# doris::SegmentLoader::load_segments(std::shared_ptr<doris::BetaRowset> const&, doris::SegmentCacheHandle*, bool) at /root/doris/be/src/olap/segment_loader.cpp:87
15# doris::BetaRowsetReader::get_segment_iterators(doris::RowsetReaderContext*, std::vector<doris::RowwiseIterator*, std::allocator<doris::RowwiseIterator*> >*, bool) at /root/doris/be/src/olap/rowset/beta_rowset_reader.cpp:171
16# doris::BetaRowsetReader::init(doris::RowsetReaderContext*) at /root/doris/be/src/olap/rowset/beta_rowset_reader.cpp:201
17# doris::vectorized::BlockReader::_init_collect_iter(doris::TabletReader::ReaderParams const&, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > >*) at /root/doris/be/src/vec/olap/block_reader.cpp:83
18# doris::vectorized::BlockReader::init(doris::TabletReader::ReaderParams const&) at /root/doris/be/src/vec/olap/block_reader.cpp:157
19# doris::vectorized::NewOlapScanner::open(doris::RuntimeState*) at /root/doris/be/src/vec/exec/scan/new_olap_scanner.cpp:117
20# doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, doris::vectorized::VScanner*) at /root/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:205
21# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:543
22# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:455
23# start_thread in /lib/x86_64-linux-gnu/libpthread.so.0
24# __clone in /lib/x86_64-linux-gnu/libc.so.6

找到Query id对应的sql去单独进行查询是没有问题的。

后面又出现很多这种类似的错误:
** Query id: 0-0 ***
*** Aborted at 1715560839 (unix time) try "date -d @1715560839" if you are using GNU date ***
*** Current BE git commitID: Unknown ***
*** SIGSEGV address not mapped to object (@0x10000000000) received by PID 3530692 (TID 0x7f6a16dd0700) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:420
1# os::Linux::chained_handler(int, siginfo*, void*) in /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo*, void*) in /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
4# 0x00007F6CF78DF090 in /lib/x86_64-linux-gnu/libc.so.6
5# void google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField::TypeHandler>() at /var/local/thirdparty/installed/include/google/protobuf/repeated_field.h:1662
6# void google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField::TypeHandler>() at /var/local/thirdparty/installed/include/google/protobuf/repeated_field.h:1662
7# doris::segment_v2::Segment::~Segment() at /root/doris/be/src/olap/rowset/segment_v2/segment.cpp:86
8# std::_Sp_counted_ptr<doris::segment_v2::Segment*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:348
9# doris::SegmentLoader::_insert(doris::SegmentLoader::CacheKey const&, doris::SegmentLoader::CacheValue&, doris::SegmentCacheHandle*)::{lambda(doris::CacheKey const&, void*)#1}::_FUN(doris::CacheKey const&, void*) at /root/doris/be/src/olap/segment_loader.cpp:54
10# doris::LRUHandle::free() at /root/doris/be/src/olap/lru_cache.h:263
11# doris::LRUCache::prune_if(std::function<bool (void const*)>, bool) at /root/doris/be/src/olap/lru_cache.cpp:427
12# doris::ShardedLRUCache::prune_if(std::function<bool (void const*)>, bool) at /root/doris/be/src/olap/lru_cache.cpp:530
13# doris::SegmentLoader::prune() at /root/doris/be/src/olap/segment_loader.cpp:101
14# doris::StorageEngine::_start_clean_cache() at /root/doris/be/src/olap/storage_engine.cpp:612
15# doris::StorageEngine::_fd_cache_clean_callback() at /root/doris/be/src/olap/olap_server.cpp:174
16# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:455
17# start_thread in /lib/x86_64-linux-gnu/libpthread.so.0
18# __clone in /lib/x86_64-linux-gnu/libc.so.6
在不升级的情况下,有办法解决吗?

1 Answers

这个看着是 segment/schema cache 析构导致的,后续版本segment这块的代码重构了,可以尝试升级到2.0最新的tag上了,目前最新是 2.0.9 版本