FE好几次断掉,重启又好了,大概是什么原因造成的?

Viewed 6

单机部署,没有用集群,FE好几次断掉,重启又好了,大概是什么原因造成的?

2024-12-26 07:11:28,472 WARN (mysql-nio-pool-2360|217) [Coordinator.cancel():1308] Query 2092544de0bc4ed0-83f89ce05c52e83d already in abnormal status Status [errorCode=THRIFT_RPC_ERROR, errorMsg=(10.1.1.76)[THRIFT_RPC_ERROR]failed to call frontend service, FE address=192.168.122.1:9020, reason: THRIFT_EAGAIN (timed out)], but received cancel again,so that send cancel to BE again
java.lang.Exception: cancel failed
at org.apache.doris.qe.Coordinator.cancel(Coordinator.java:1310) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeAndSendResult(StmtExecutor.java:2001) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1853) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:874) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:811) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:607) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.queryRetry(StmtExecutor.java:557) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:547) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:397) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:238) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:222) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:281) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
2024-12-26 07:11:28,472 WARN (mysql-nio-pool-2360|217) [Coordinator.cancel():1315] Cancel execution of query 2092544de0bc4ed0-83f89ce05c52e83d, this is a outside invoke, cancelReason Status [errorCode=INTERNAL_ERROR, errorMsg=cancel fragment query_id:2092544de0bc4ed0-83f89ce05c52e83d cause (10.1.1.76)[THRIFT_RPC_ERROR]failed to call frontend service, FE address=192.168.122.1:9020, reason: THRIFT_EAGAIN (timed out)]

1 Answers

可能是集群资源负载导致,观察断掉时集群的cpu/mem/io使用率如何。(ps:生产环境不建议混部)