Spark Doris Connector使用中Stream Load返回503

Viewed 43

目前使用spark doris connector来导入数据到doris,

使用过程中发现Stream Load Response是

2024-06-18 17:21:36,004 WARN [task-result-getter-2] scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 18) (ad-nadp-dn08.prod.hl.ad.local executor 10): org.apache.doris.spark.exception.StreamLoadException: stream load error, http status:503, response:StreamLoadResponse(503,Service Unavailable,

503 Service Unavailable


No server is available to handle this request.

) at org.apache.doris.spark.load.StreamLoader.handleStreamLoadResponse(StreamLoader.scala:492) at org.apache.doris.spark.load.StreamLoader.$anonfun$load$1(StreamLoader.scala:102) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.doris.spark.load.StreamLoader.load(StreamLoader.scala:99) at org.apache.doris.spark.writer.DorisWriter.$anonfun$write$1(DorisWriter.scala:79) at org.apache.doris.spark.writer.DorisWriter.$anonfun$doWrite$4(DorisWriter.scala:98) at scala.util.Try$.apply(Try.scala:213) at org.apache.doris.spark.sql.Utils$.retry(Utils.scala:182) at org.apache.doris.spark.writer.DorisWriter.$anonfun$doWrite$3(DorisWriter.scala:97) at org.apache.doris.spark.writer.DorisWriter.$anonfun$doWrite$1(DorisWriter.scala:98) at org.apache.doris.spark.writer.DorisWriter.$anonfun$doWrite$1$adapted(DorisWriter.scala:93) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

查看be节点上的错误日志如下
W0619 07:09:44.932955 1519 status.h:383] meet error status: [INTERNAL_ERROR]cancelled: sender is gone
0. /root/src/doris-2.0/be/src/common/stack_trace.cpp:302: StackTrace::tryCapture() @ 0x000000000ba1f197 in /opt/apache-doris/be/lib/doris_be

  1. /root/src/doris-2.0/be/src/common/stack_trace.h:0: doris::get_stack_traceabi:cxx11 @ 0x000000000ba1d72d in /opt/apache-doris/be/lib/doris_be
  2. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:173: doris::Status doris::Status::Error<true, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&>(int, std::basic_string_view<char, std::char_traits >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&) @ 0x000000000aeca5f2 in /opt/apache-doris/be/lib/doris_be
  3. /root/src/doris-2.0/be/src/common/status.h:0: doris::io::StreamLoadPipe::read_at_impl(unsigned long, doris::Slice, unsigned long*, doris::io::IOContext const*) @ 0x000000000b837b7d in /opt/apache-doris/be/lib/doris_be
  4. /root/src/doris-2.0/be/src/io/fs/stream_load_pipe.cpp:0: non-virtual thunk to doris::io::StreamLoadPipe::read_at_impl(unsigned long, doris::Slice, unsigned long*, doris::io::IOContext const*) @ 0x000000000b837e62 in /opt/apache-doris/be/lib/doris_be
  5. /root/src/doris-2.0/be/src/common/status.h:432: doris::io::FileReader::read_at(unsigned long, doris::Slice, unsigned long*, doris::io::IOContext const*) @ 0x000000000aebc00d in /opt/apache-doris/be/lib/doris_be
  6. /root/src/doris-2.0/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp:373: doris::NewPlainTextLineReader::read_line(unsigned char const**, unsigned long*, bool*, doris::io::IOContext const*) @ 0x000000000dcd2a8f in /opt/apache-doris/be/lib/doris_be
  7. /root/src/doris-2.0/be/src/common/status.h:432: doris::vectorized::CsvReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) @ 0x000000000dcbf603 in /opt/apache-doris/be/lib/doris_be
  8. /root/src/doris-2.0/be/src/common/status.h:432: doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) @ 0x000000000f1c9d3d in /opt/apache-doris/be/lib/doris_be
  9. /root/src/doris-2.0/be/src/vec/exec/scan/vscanner.cpp:0: doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) @ 0x000000000f260c64 in /opt/apache-doris/be/lib/doris_be
  10. /root/src/doris-2.0/be/src/common/status.h:348: doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr) @ 0x000000000f1be1e3 in /opt/apache-doris/be/lib/doris_be
  11. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701: std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#3}>::_M_invoke(std::_Any_data const&) @ 0x000000000f1bf431 in /opt/apache-doris/be/lib/doris_be
  12. /root/src/doris-2.0/be/src/util/threadpool.cpp:0: doris::ThreadPool::dispatch_thread() @ 0x000000000ba5bdaf in /opt/apache-doris/be/lib/doris_be
  13. /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562: doris::Thread::supervise_thread(void*) @ 0x000000000ba51d3c in /opt/apache-doris/be/lib/doris_be
  14. start_thread @ 0x0000000000007ea5 in /usr/lib64/libpthread-2.17.so
  15. clone @ 0x00000000000feb0d in /usr/lib64/libc-2.17.so

doris版本:2.0.2
spark-doris-connector commit 6f50c3418d1226380ea39750d232b63a83ac6465
be节点128g内存
异常错误期间CPU/MEM/IO

1 Answers

"cancelled: sender is gone" 这个一般都是客户端主动断开链接了,可以看下客户端到服务端的网络情况,ping看下时延的。