【已解决】启动数据复制后,Doris 3个BE节点中的2个出现了崩溃的情况,且无法再重新启动

Viewed 60

Doris 3个FE和3个BE节点(操作系统都是centOS 7),初始安装后可以启动,但后续使用中,当把表的复制参数设置为2或3,然后往表内插入数据,Doris 3个BE节点中的2个出现了崩溃的情况,且无法再重新启动,这个问题已经出现过两次(第一次是把2个BE节点删除了重建解决的)。be.out的报错如下:

start time: Thu Jun 13 23:47:02 CST 2024
INFO: java_cmd /data1/jdk1.8.0_202//bin/java
INFO: jdk_version 8
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data1/doris/be/lib/java_extensions/preload-extensions/preload-extensions-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data1/doris/be/lib/java_extensions/java-udf/java-udf-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data1/doris/be/lib/hadoop_hdfs/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 982312 ***
*** Aborted at 1718293626 (unix time) try "date -d @1718293626" if you are using GNU date ***
*** Current BE git commitID: 2dc65ce356 ***
*** SIGSEGV address not mapped to object (@0x0) received by PID 20425 (TID 21688 OR 0x7fec200fb700) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421
1# os::Linux::chained_handler(int, siginfo*, void*) in /data1/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /data1/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo*, void*) in /data1/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
4# 0x00007FEE73361280 in /lib64/libc.so.6
5# doris::segment_v2::SegmentWriter::_full_encode_keys[abi:cxx11](std::vector<doris::KeyCoder const*, std::allocator<doris::KeyCoder const*> > const&, std::vector<doris::vectorized::IOlapColumnDataAccessor*, std::allocator > const&, unsigned long, bool) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/segment_v2/segment_writer.cpp:913
6# doris::segment_v2::SegmentWriter::_generate_short_key_index(std::vector<doris::vectorized::IOlapColumnDataAccessor*, std::allocator >&, unsigned long, std::vector<unsigned long, std::allocator > const&) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/segment_v2/segment_writer.cpp:1297
7# doris::segment_v2::SegmentWriter::append_block(doris::vectorized::Block const*, unsigned long, unsigned long) in /data1/doris/be/lib/doris_be
8# doris::VerticalBetaRowsetWriter::add_columns(doris::vectorized::Block const*, std::vector<unsigned int, std::allocator > const&, bool, unsigned int) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/vertical_beta_rowset_writer.cpp:85
9# doris::Merger::vertical_compact_one_group(std::shared_ptr, doris::ReaderType, std::shared_ptr, bool, std::vector<unsigned int, std::allocator > const&, doris::vectorized::RowSourcesBuffer*, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*, std::vector<unsigned int, std::allocator >) in /data1/doris/be/lib/doris_be
10# doris::Merger::vertical_merge_rowsets(std::shared_ptr, doris::ReaderType, std::shared_ptr, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*) at /home/zcp/repo_center/doris_release/doris/be/src/olap/merger.cpp:383
11# doris::Compaction::do_compaction_impl(long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:371
12# doris::Compaction::do_compaction(long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:136
13# doris::CumulativeCompaction::execute_compact_impl() at /home/zcp/repo_center/doris_release/doris/be/src/olap/cumulative_compaction.cpp:79
14# doris::Compaction::execute_compact() at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:118
15# doris::Tablet::execute_compaction(doris::Compaction&) at /home/zcp/repo_center/doris_release/doris/be/src/olap/tablet.cpp:1947
16# std::_Function_handler<void (), doris::StorageEngine::_submit_compaction_task(std::shared_ptr, doris::CompactionType, bool)::$_1>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
17# doris::ThreadPool::dispatch_thread() in /data1/doris/be/lib/doris_be
18# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:499
19# start_thread in /lib64/libpthread.so.0
20# __clone in /lib64/libc.so.6

2 Answers

谢谢回复,我发现是我只关注了BE的jdk版本,但没有检查FE的jdk版本;

当我把FE替换为jdk 8之后,重建数据库,问题暂时解决了。我会再监控一段时间,看是否还有同样问题。

1、麻烦提供下具体的版本号,可通过如下语句查询
select @@version_comment
2、请问下这句是指的什么操作,方便提供下相关的完整复现语句:

“当把表的复制参数设置为2或3,然后往表内插入数据”

3、出现该异常时,be.conf 设置 enable_vertical_compaction = false 重启,看能否正常启动be