【已解决】flink sql 执行stream load 报错 broken pipe

Viewed 56

be.INFO

I0327 10:42:30.299463 2287907 olap_server.cpp:1060] cooldown producer get tablet num: 0
I0327 10:42:33.805141 2287994 task_worker_pool.cpp:1071] successfully report TASK|host=10.1.0.83|port=9020
I0327 10:42:36.789175 2288571 heartbeat_server.cpp:61] get heartbeat from FE.host:10.1.0.83, port:9020, cluster id:323961314, counter:31321, BE start time: 1711378634096
I0327 10:42:39.661494 2287286 daemon.cpp:218] OS physical memory 31.34 GB. Process memory usage 6.26 GB, limit 25.07 GB, soft limit 22.56 GB. Sys available memory 17.58 GB, low water mark 1.60 GB, warning water mark 3.20 GB. Refresh interval memory growth 0 B
I0327 10:42:43.075098 2287286 daemon.cpp:218] OS physical memory 31.34 GB. Process memory usage 6.52 GB, limit 25.07 GB, soft limit 22.56 GB. Sys available memory 17.35 GB, low water mark 1.60 GB, warning water mark 3.20 GB. Refresh interval memory growth 0 B
I0327 10:42:44.805984 2287994 task_worker_pool.cpp:1071] successfully report TASK|host=10.1.0.83|port=9020
W0327 10:42:47.968636 2287682 file_reader.cpp:44] [INTERNAL_ERROR]cancelled: sender is gone
W0327 10:42:47.968696 2287682 scanner_scheduler.cpp:348] Scan thread read VScanner failed: [INTERNAL_ERROR]cancelled: sender is gone
W0327 10:42:47.970039 2498935 fragment_mgr.cpp:145] query_id=48483591817fa612-44050ca3b2b61caa, instance_id=48483591817fa612-44050ca3b2b61cab meet error status [INTERNAL_ERROR]cancelled: sender is gone
W0327 10:42:47.970165 2498935 fragment_mgr.cpp:481] report error status: [INTERNAL_ERROR]cancelled: sender is gone to coordinator: TNetworkAddress(hostname=10.1.0.83, port=9020), query id: 48483591817fa612-44050ca3b2b61caa, instance id: 48483591817fa612-44050ca3b2b61cab
W0327 10:42:47.970737 2498935 fragment_mgr.cpp:267] Got error while opening fragment 48483591817fa612-44050ca3b2b61cab, query id: 48483591817fa612-44050ca3b2b61caa: [INTERNAL_ERROR]cancelled: sender is gone
I0327 10:42:47.970782 2498935 plan_fragment_executor.cpp:491] PlanFragmentExecutor::cancel|query_id=TUniqueId(hi=5208471868113135122, lo=4901337666680134826)|instance_id=TUniqueId(hi=5208471868113135122, lo=4901337666680134827)|reason=3|error message=PlanFragmentExecutor open failed, reason: [INTERNAL_ERROR]cancelled: sender is gone
W0327 10:42:47.971343 2498935 status.h:371] meet error status: [CANCELLED]PlanFragmentExecutor open failed, reason: [INTERNAL_ERROR]cancelled: sender is gone
0. /root/src/doris-2.0/be/src/common/stack_trace.cpp:302: StackTrace::tryCapture() @ 0x000000000b8a51a7 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
1. /root/src/doris-2.0/be/src/common/stack_trace.h:0: doris::get_stack_trace[abi:cxx11]() @ 0x000000000b8a377d in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
2. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187: doris::Status doris::Status::Error<true>(int, std::basic_string_view<char, std::char_traits<char> >) @ 0x000000000ad9807b in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
3. /root/src/doris-2.0/be/src/common/status.h:0: doris::RuntimeState::set_is_cancelled(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) @ 0x000000000b715611 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
4. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187: doris::PlanFragmentExecutor::cancel(doris::PPlanFragmentCancelReason const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) @ 0x000000000b7138f7 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
5. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1291: doris::FragmentExecState::cancel(doris::PPlanFragmentCancelReason const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) @ 0x000000000b699e8b in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
6. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360: doris::FragmentExecState::execute() @ 0x000000000b6997be in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
7. /root/src/doris-2.0/be/src/common/status.h:420: doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) @ 0x000000000b69e0bc in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
8. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701: std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0>::_M_invoke(std::_Any_data const&) @ 0x000000000b6aa039 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
9. /root/src/doris-2.0/be/src/util/threadpool.cpp:0: doris::ThreadPool::dispatch_thread() @ 0x000000000b8e193f in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
10. /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562: doris::Thread::supervise_thread(void*) @ 0x000000000b8d79fc in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
11. ? @ 0x00007f9564094ac3 in ?
12. ? @ 0x00007f9564126850 in ?
I0327 10:42:47.970804 2498935 runtime_state.h:167] task is cancelled, st = [CANCELLED]PlanFragmentExecutor open failed, reason: [INTERNAL_ERROR]cancelled: sender is gone
0. /root/src/doris-2.0/be/src/common/stack_trace.cpp:302: StackTrace::tryCapture() @ 0x000000000b8a51a7 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
1. /root/src/doris-2.0/be/src/common/stack_trace.h:0: doris::get_stack_trace[abi:cxx11]() @ 0x000000000b8a377d in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
2. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187: doris::Status doris::Status::Error<true>(int, std::basic_string_view<char, std::char_traits<char> >) @ 0x000000000ad9807b in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
3. /root/src/doris-2.0/be/src/common/status.h:0: doris::RuntimeState::set_is_cancelled(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) @ 0x000000000b715611 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
4. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187: doris::PlanFragmentExecutor::cancel(doris::PPlanFragmentCancelReason const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) @ 0x000000000b7138f7 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
5. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1291: doris::FragmentExecState::cancel(doris::PPlanFragmentCancelReason const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) @ 0x000000000b699e8b in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
6. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360: doris::FragmentExecState::execute() @ 0x000000000b6997be in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
7. /root/src/doris-2.0/be/src/common/status.h:420: doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) @ 0x000000000b69e0bc in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
8. /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701: std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0>::_M_invoke(std::_Any_data const&) @ 0x000000000b6aa039 in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
9. /root/src/doris-2.0/be/src/util/threadpool.cpp:0: doris::ThreadPool::dispatch_thread() @ 0x000000000b8e193f in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
10. /var/local/ldb-toolchain/bin/../usr/include/pthread.h:562: doris::Thread::supervise_thread(void*) @ 0x000000000b8d79fc in /home/software/apache-doris-2.0.1-bin-x64/be/lib/doris_be
11. ? @ 0x00007f9564094ac3 in ?
12. ? @ 0x00007f9564126850 in ?
I0327 10:42:47.971446 2498935 exec_node.cpp:200] fragment_instance_id=48483591817fa612-44050ca3b2b61cab closed
W0327 10:42:47.977264 2498935 vtablet_sink.cpp:635] cancel node channel VNodeChannel[70603-10167], load_id=48483591817fa612-44050ca3b2b61caa, txn_id=18142, node=10.1.0.83:8060, error message: [INTERNAL_ERROR]cancelled: sender is gone
I0327 10:42:47.977408 2288506 vtablet_sink.cpp:1121] all node channels are stopped(maybe finished/offending/cancelled), sender thread exit. 48483591817fa612-44050ca3b2b61caa
I0327 10:42:47.977453 2498935 vtablet_sink.cpp:1403] close olap table sink. load_id=48483591817fa612-44050ca3b2b61caa, txn_id=18142, canceled all node channels due to error: false
I0327 10:42:47.977481 2498935 plan_fragment_executor.cpp:564] Close() fragment_instance_id=48483591817fa612-44050ca3b2b61cab
W0327 10:42:47.977502 2498935 stream_load_executor.cpp:107] fragment execute failed, query_id=48483591817fa612-44050ca3b2b61caa, err_msg=[INTERNAL_ERROR]cancelled: sender is gone, id=48483591817fa612-44050ca3b2b61caa, job_id=-1, txn_id=18142, label=TASK-1-5_1538-1365_94
78_0_1, elapse(s)=96
I0327 10:42:48.015709 2288244 load_channel_mgr.cpp:228] load channel has been cancelled: 48483591817fa612-44050ca3b2b61caa
I0327 10:42:48.015729 2288244 load_channel.cpp:46] load channel removed. mem peak usage=3646784817, info=label: LoadChannel#senderIp=10.1.0.83#loadID=48483591817fa612-44050ca3b2b61caa; consumption: 1608735555; peak_consumption: 3646784817; , load_id=48483591817fa612-440
50ca3b2b61caa, is high priority=0, sender_ip=10.1.0.83
I0327 10:42:48.015755 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148621.104722319.7645af80a5b8c70c-32decbf464ad34a3
I0327 10:42:48.015990 2498935 query_context.h:69] Deregister query/load memory tracker, queryId=48483591817fa612-44050ca3b2b61caa, Limit=2.00 GB, CurrUsed=523.12 KB, PeakUsed=124.78 MB
I0327 10:42:48.023365 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148619.104722319.b2480e43c16fed74-e3d64b36730098a0
I0327 10:42:48.025802 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148617.104722319.7e4159ddfe243bd8-1a767209a2a8e9a2
I0327 10:42:48.027663 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148615.104722319.4249238f11f73f6b-33a3972d4c795884
I0327 10:42:48.029505 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148613.104722319.7f49e182877da1cd-22fc60a8b59779b6
I0327 10:42:48.031769 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148611.104722319.ae4377274ea661c4-40128da3f3ec42bf
I0327 10:42:48.033775 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148609.104722319.8a4cfdee81b72d8d-8428a7763a13d983
I0327 10:42:48.035663 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148607.104722319.ec49c4ce3cba0c53-0a6bf8c369b68dac
I0327 10:42:48.037559 2288244 txn_manager.cpp:469] rollback transaction from engine successfully. partition_id: 148602, transaction_id: 18142, tablet: 148605.104722319.fa48173ef99786a5-ab3a8c817b178ab0

2 Answers

stream load 导入数据,在一批次数据量较大时,可能会报错 Broken Pipe
除了 Broken Pipe 外,还可能出现一些其他的奇怪的错误。

这个情况通常出现在开启httpv2后。因为httpv2是使用spring boot实现的http 服务,并且使用tomcat作为默认内置容器。但是tomcat对307转发的处理似乎有些问题,所以后面将内置容器修改为了jetty。此外,在java程序中的 apache http client的版本需要使用4.5.13以后的版本。之前的版本,对转发的处理也存在一些问题。

所以这个问题可以有两种解决方式:

1.关闭httpv2

在fe.conf中添加 enable_http_server_v2=false后重启FE。但是这样无法再使用新版UI界面,并且之后的一些基于httpv2的新接口也无法使用。(正常的导入查询不受影响)。

2.升级

可以升级到 Doris 0.15 及之后的版本,已修复这个问题。

具体参考官方文档数据操作问题Q7

  1. 我的版本是2.0.1
  2. 我个人的解决方式是通过设置flink connector 的sink mode设置为批模式 解决的。