RpcException: send fragments failed

Viewed 151

doris版本2.1.0

搭建好了之后,使用jdbc连接(navicat、mybatis),隔一段时间,都会不时出现以下错误,不知道情况

fe.log

引用
Process one query failed because unknown reason:
org.apache.doris.rpc.RpcException: send fragments failed. io.grpc.StatusRuntimeException: UNAVAILABLE: io exception, host: 172.21.172.130
at org.apache.doris.qe.Coordinator.waitPipelineRpc(Coordinator.java:1160) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.Coordinator.sendPipelineCtx(Coordinator.java:991) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.Coordinator.execInternal(Coordinator.java:701) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.Coordinator.exec(Coordinator.java:627) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.sendResult(StmtExecutor.java:1582) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1556) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:708) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:660) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:493) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:472) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:265) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:183) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:176) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:205) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:258) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_321]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_321]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_321]
Caused by: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:592) ~[guava-32.1.2-jre.jar:?]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:467) ~[guava-32.1.2-jre.jar:?]
at org.apache.doris.qe.Coordinator.waitPipelineRpc(Coordinator.java:1116) ~[doris-fe.jar:1.2-SNAPSHOT]
... 18 more
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:537) ~[grpc-api-1.60.1.jar:1.60.1]
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:538) ~[grpc-stub-1.60.1.jar:1.60.1]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:574) ~[grpc-core-1.60.1.jar:1.60.1]
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:72) ~[grpc-core-1.60.1.jar:1.60.1]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:742) ~[grpc-core-1.60.1.jar:1.60.1]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723) ~[grpc-core-1.60.1.jar:1.60.1]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.60.1.jar:1.60.1]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.60.1.jar:1.60.1]
... 3 more
Caused by: io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection timed out

2 Answers

异常的时候资源(cpu/mem/io)有负载不;
有没有装 doris manager 看看集群资源趋势

原doris集群开发环境:3个fe Follower、3个be,
fe配置:
2台8c、16G
1台2c、4G,并且还部署了一些其他应用,算上buffer内存,达到100%

后面直接换成1个fe Follower, 1个fe Observer,之后问题解决

重新部署完成后,加上Prometheus监控,cpu、内存一致很稳定,无出现此问题

问题原因:
怀疑fe内存不足,频繁导致gc原因

建立doris环境还是要搭上Prometheus,服务器给上充足资源