doris升级到2.0.14之后数据查询出现INTERNAL_ERROR] missed_versions is empty异常

Viewed 64

spark-doris-connector版本1.3.2
客户端报错信息:

23-09-2024 09:35:00 CST jobname INFO - Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 105.0 failed 4 times, most recent failure: Lost task 4.3 in stage 105.0 (TID 22324, aidata-node010, executor 10): org.apache.doris.spark.exception.DorisInternalException: Doris server Doris BE{host='be主机IP', port=9060} internal failed, status code [INTERNAL_ERROR] error message is [(be主机IP)[E-230]missed_versions is empty, spec_version 1687, max_version 1705, tablet_id 19974756]
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:192)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.doris.spark.rdd.ScalaValueReader$$anonfun$13.apply(ScalaValueReader.scala:207)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.doris.spark.rdd.ScalaValueReader$$anonfun$13.apply(ScalaValueReader.scala:207)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.doris.spark.rdd.ScalaValueReader.org$apache$doris$spark$rdd$ScalaValueReader$$lockClient(ScalaValueReader.scala:239)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.doris.spark.rdd.ScalaValueReader.hasNext(ScalaValueReader.scala:207)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.doris.spark.rdd.AbstractDorisRDDIterator.hasNext(AbstractDorisRDDIterator.scala:56)
23-09-2024 09:35:00 CST jobname INFO - 	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
23-09-2024 09:35:00 CST jobname INFO - 	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.scheduler.Task.run(Task.scala:121)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
23-09-2024 09:35:00 CST jobname INFO - 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
23-09-2024 09:35:00 CST jobname INFO - 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
23-09-2024 09:35:00 CST jobname INFO - 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
23-09-2024 09:35:00 CST jobname INFO - 	at java.lang.Thread.run(Thread.java:748)

服务端be报错:

W0923 09:51:08.791533  1332 fragment_mgr.cpp:267] Got error while opening fragment cb4b9ac30f5dc7b5-3afac85f581b89bf, query id: 6f7d334831764735-b1e856afcb25fce4: [E-230]mi
ssed_versions is empty, spec_version 1687, max_version 1705, tablet_id 19974756

        0#  doris::Tablet::capture_consistent_versions(doris::Version const&, std::vector<doris::Version, std::allocator<doris::Version> >*, bool, bool) const at /root/apac
he-doris-2.0.14-src/be/src/common/status.h:357
        1#  doris::Tablet::capture_rs_readers(doris::Version const&, std::vector<doris::RowSetSplits, std::allocator<doris::RowSetSplits> >*, bool) const at /root/apache-do
ris-2.0.14-src/be/src/common/status.h:446
        2#  doris::vectorized::NewOlapScanner::init() at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
        3#  doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::
VScanner>) at /root/apache-doris-2.0.14-src/be/src/common/status.h:357
        4#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#
3}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
        5#  doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
        6#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
        7#  start_thread
        8#  clone
W0923 09:35:07.275390  5954 status.h:396] meet error status: [INTERNAL_ERROR]query_id: c677e192d93a47a5-a5664864ababcd0e, couldn't get a client for TNetworkAddress(hostname
=, port=0), reason is [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)

        0#  doris::ThriftClientImpl::open() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
        1#  doris::ThriftClientImpl::open_with_retry(int, int) at /root/apache-doris-2.0.14-src/be/src/common/status.h:357
        2#  doris::ClientCacheHelper::_create_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**
, int) at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
        3#  doris::ClientCacheHelper::get_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, in
t) at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
        4#  doris::ClientConnection<doris::FrontendServiceClient>::ClientConnection(doris::ClientCache<doris::FrontendServiceClient>*, doris::TNetworkAddress const&, int, d
oris::Status*, int) at /root/apache-doris-2.0.14-src/be/src/common/status.h:357
        5#  doris::FragmentMgr::coordinator_callback(doris::ReportStatusRequest const&) at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
        6#  doris::FragmentExecState::coordinator_callback(doris::Status const&, doris::RuntimeProfile*, doris::RuntimeProfile*, bool) at /var/local/ldb-toolchain/bin/../li
b/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244
        7#  doris::PlanFragmentExecutor::send_report(bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
        8#  doris::PlanFragmentExecutor::open() at /root/apache-doris-2.0.14-src/be/src/runtime/plan_fragment_executor.cpp:293
        9#  doris::FragmentExecState::execute() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
        10# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/apache-d
oris-2.0.14-src/be/src/common/status.h:446
        11# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::S
tatus*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:70
1
        12# doris::ThreadPool::dispatch_thread() at /root/apache-doris-2.0.14-src/be/src/util/threadpool.cpp:0
        13# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
        14# start_thread
        15# clone


        0#  doris::FragmentMgr::coordinator_callback(doris::ReportStatusRequest const&) at /root/apache-doris-2.0.14-src/be/src/common/status.h:0
        1#  doris::FragmentExecState::coordinator_callback(doris::Status const&, doris::RuntimeProfile*, doris::RuntimeProfile*, bool) at /var/local/ldb-toolchain/bin/../li
b/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244
        2#  doris::PlanFragmentExecutor::send_report(bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
        3#  doris::PlanFragmentExecutor::open() at /root/apache-doris-2.0.14-src/be/src/runtime/plan_fragment_executor.cpp:293
        4#  doris::FragmentExecState::execute() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
        5#  doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/apache-d
oris-2.0.14-src/be/src/common/status.h:446
        6#  std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::S
tatus*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:70
1
        7#  doris::ThreadPool::dispatch_thread() at /root/apache-doris-2.0.14-src/be/src/util/threadpool.cpp:0
        8#  doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
        9#  start_thread
        10# clone

tablet状态

MySQL [(none)]> SHOW PROC '/dbs/172688/634898/partitions/19974739/634899/19974756';
+-----------+-----------+---------+-------------------+------------------+---------------+------------+---------------+----------------+----------+--------+-------+------------+--------------+----------------------+-----------------------------------------------------+-------------------------------------------------------------------+-------------------+----------------+-----------+
| ReplicaId | BackendId | Version | LstSuccessVersion | LstFailedVersion | LstFailedTime | SchemaHash | LocalDataSize | RemoteDataSize | RowCount | State  | IsBad | IsUserDrop | VersionCount | PathHash             | MetaUrl                                             | CompactionStatus                                                  | CooldownReplicaId | CooldownMetaId | QueryHits |
+-----------+-----------+---------+-------------------+------------------+---------------+------------+---------------+----------------+----------+--------+-------+------------+--------------+----------------------+-----------------------------------------------------+-------------------------------------------------------------------+-------------------+----------------+-----------+
| 19974757  | 19709282  | 1742    | 1742              | -1               | NULL          | 1397656088 | 175803        | 0              | 5683     | NORMAL | false | false      | 6            | -6462981860052890346 | http://ip-node10:8045/api/meta/header/19974756  | http://ip-node10:8045/api/compaction/show?tablet_id=19974756  | -1                |                | 0         |
| 19974758  | 17495592  | 1742    | 1742              | -1               | NULL          | 1397656088 | 168617        | 0              | 5685     | NORMAL | false | false      | 3            | 4034116581310510095  | http://ip-node09:8045/api/meta/header/19974756 | http://ip-node09:8045/api/compaction/show?tablet_id=19974756 | -1                |                | 0         |
| 19974759  | 10005     | 1742    | 1742              | -1               | NULL          | 1397656088 | 169200        | 0              | 5685     | NORMAL | false | false      | 3            | -440745932939145627  | http://ip-node06:8045/api/meta/header/19974756 | http://ip-node06:8045/api/compaction/show?tablet_id=19974756 | -1                |                | 0         |
+-----------+-----------+---------+-------------------+------------------+---------------+------------+---------------+----------------+----------+--------+-------+------------+--------------+----------------------+-----------------------------------------------------+-------------------------------------------------------------------+-------------------+----------------+-----------+
3 rows in set (0.00 sec)

不知道这个tablet状态是否影响数据查询

2 Answers

但我用sql语句做同样的查询,是可以的,sparksql dataframe就不行