spark-doris-connector版本1.3.2
客户端报错信息:
23-09-2024 09:35:00 CST jobname INFO - Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 105.0 failed 4 times, most recent failure: Lost task 4.3 in stage 105.0 (TID 22324, aidata-node010, executor 10): org.apache.doris.spark.exception.DorisInternalException: Doris server Doris BE{host='be主机IP', port=9060} internal failed, status code [INTERNAL_ERROR] error message is [(be主机IP)[E-230]missed_versions is empty, spec_version 1687, max_version 1705, tablet_id 19974756]
23-09-2024 09:35:00 CST jobname INFO - at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:192)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.doris.spark.rdd.ScalaValueReader$$anonfun$13.apply(ScalaValueReader.scala:207)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.doris.spark.rdd.ScalaValueReader$$anonfun$13.apply(ScalaValueReader.scala:207)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.doris.spark.rdd.ScalaValueReader.org$apache$doris$spark$rdd$ScalaValueReader$$lockClient(ScalaValueReader.scala:239)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.doris.spark.rdd.ScalaValueReader.hasNext(ScalaValueReader.scala:207)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.doris.spark.rdd.AbstractDorisRDDIterator.hasNext(AbstractDorisRDDIterator.scala:56)
23-09-2024 09:35:00 CST jobname INFO - at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
23-09-2024 09:35:00 CST jobname INFO - at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.scheduler.Task.run(Task.scala:121)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
23-09-2024 09:35:00 CST jobname INFO - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
23-09-2024 09:35:00 CST jobname INFO - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
23-09-2024 09:35:00 CST jobname INFO - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
23-09-2024 09:35:00 CST jobname INFO - at java.lang.Thread.run(Thread.java:748)
服务端be报错:
W0923 09:51:08.791533 1332 fragment_mgr.cpp:267] Got error while opening fragment cb4b9ac30f5dc7b5-3afac85f581b89bf, query id: 6f7d334831764735-b1e856afcb25fce4: [E-230]mi
ssed_versions is empty, spec_version 1687, max_version 1705, tablet_id 19974756
0# doris::Tablet::capture_consistent_versions(doris::Version const&, std::vector<doris::Version, std::allocator<doris::Version> >*, bool, bool) const at /root/apac
he-doris-2.0.14-src/be/src/common/status.h:357
1# doris::Tablet::capture_rs_readers(doris::Version const&, std::vector<doris::RowSetSplits, std::allocator<doris::RowSetSplits> >*, bool) const at /root/apache-do
ris-2.0.14-src/be/src/common/status.h:446
2# doris::vectorized::NewOlapScanner::init() at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
3# doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::
VScanner>) at /root/apache-doris-2.0.14-src/be/src/common/status.h:357
4# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#
3}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
5# doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
6# execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
7# start_thread
8# clone
W0923 09:35:07.275390 5954 status.h:396] meet error status: [INTERNAL_ERROR]query_id: c677e192d93a47a5-a5664864ababcd0e, couldn't get a client for TNetworkAddress(hostname
=, port=0), reason is [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)
0# doris::ThriftClientImpl::open() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
1# doris::ThriftClientImpl::open_with_retry(int, int) at /root/apache-doris-2.0.14-src/be/src/common/status.h:357
2# doris::ClientCacheHelper::_create_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**
, int) at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
3# doris::ClientCacheHelper::get_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, in
t) at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
4# doris::ClientConnection<doris::FrontendServiceClient>::ClientConnection(doris::ClientCache<doris::FrontendServiceClient>*, doris::TNetworkAddress const&, int, d
oris::Status*, int) at /root/apache-doris-2.0.14-src/be/src/common/status.h:357
5# doris::FragmentMgr::coordinator_callback(doris::ReportStatusRequest const&) at /root/apache-doris-2.0.14-src/be/src/common/status.h:446
6# doris::FragmentExecState::coordinator_callback(doris::Status const&, doris::RuntimeProfile*, doris::RuntimeProfile*, bool) at /var/local/ldb-toolchain/bin/../li
b/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244
7# doris::PlanFragmentExecutor::send_report(bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
8# doris::PlanFragmentExecutor::open() at /root/apache-doris-2.0.14-src/be/src/runtime/plan_fragment_executor.cpp:293
9# doris::FragmentExecState::execute() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
10# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/apache-d
oris-2.0.14-src/be/src/common/status.h:446
11# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::S
tatus*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:70
1
12# doris::ThreadPool::dispatch_thread() at /root/apache-doris-2.0.14-src/be/src/util/threadpool.cpp:0
13# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
14# start_thread
15# clone
0# doris::FragmentMgr::coordinator_callback(doris::ReportStatusRequest const&) at /root/apache-doris-2.0.14-src/be/src/common/status.h:0
1# doris::FragmentExecState::coordinator_callback(doris::Status const&, doris::RuntimeProfile*, doris::RuntimeProfile*, bool) at /var/local/ldb-toolchain/bin/../li
b/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244
2# doris::PlanFragmentExecutor::send_report(bool) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
3# doris::PlanFragmentExecutor::open() at /root/apache-doris-2.0.14-src/be/src/runtime/plan_fragment_executor.cpp:293
4# doris::FragmentExecState::execute() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:180
5# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/apache-d
oris-2.0.14-src/be/src/common/status.h:446
6# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::S
tatus*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:70
1
7# doris::ThreadPool::dispatch_thread() at /root/apache-doris-2.0.14-src/be/src/util/threadpool.cpp:0
8# doris::Thread::supervise_thread(void*) at /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562
9# start_thread
10# clone
tablet状态
MySQL [(none)]> SHOW PROC '/dbs/172688/634898/partitions/19974739/634899/19974756';
+-----------+-----------+---------+-------------------+------------------+---------------+------------+---------------+----------------+----------+--------+-------+------------+--------------+----------------------+-----------------------------------------------------+-------------------------------------------------------------------+-------------------+----------------+-----------+
| ReplicaId | BackendId | Version | LstSuccessVersion | LstFailedVersion | LstFailedTime | SchemaHash | LocalDataSize | RemoteDataSize | RowCount | State | IsBad | IsUserDrop | VersionCount | PathHash | MetaUrl | CompactionStatus | CooldownReplicaId | CooldownMetaId | QueryHits |
+-----------+-----------+---------+-------------------+------------------+---------------+------------+---------------+----------------+----------+--------+-------+------------+--------------+----------------------+-----------------------------------------------------+-------------------------------------------------------------------+-------------------+----------------+-----------+
| 19974757 | 19709282 | 1742 | 1742 | -1 | NULL | 1397656088 | 175803 | 0 | 5683 | NORMAL | false | false | 6 | -6462981860052890346 | http://ip-node10:8045/api/meta/header/19974756 | http://ip-node10:8045/api/compaction/show?tablet_id=19974756 | -1 | | 0 |
| 19974758 | 17495592 | 1742 | 1742 | -1 | NULL | 1397656088 | 168617 | 0 | 5685 | NORMAL | false | false | 3 | 4034116581310510095 | http://ip-node09:8045/api/meta/header/19974756 | http://ip-node09:8045/api/compaction/show?tablet_id=19974756 | -1 | | 0 |
| 19974759 | 10005 | 1742 | 1742 | -1 | NULL | 1397656088 | 169200 | 0 | 5685 | NORMAL | false | false | 3 | -440745932939145627 | http://ip-node06:8045/api/meta/header/19974756 | http://ip-node06:8045/api/compaction/show?tablet_id=19974756 | -1 | | 0 |
+-----------+-----------+---------+-------------------+------------------+---------------+------------+---------------+----------------+----------+--------+-------+------------+--------------+----------------------+-----------------------------------------------------+-------------------------------------------------------------------+-------------------+----------------+-----------+
3 rows in set (0.00 sec)
不知道这个tablet状态是否影响数据查询