数据库经常HANG

Viewed 35

表现为前端连接无响应,监控显示FE无法连接,持续时间一般1-2分钟。日志如下:

2025-02-28 14:14:26,623 WARN (mysql-nio-pool-81734|3589026) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 92a22c83fa924935-828102fc2a7bba92, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81734|3589026) [Coordinator.getNext():1445] Query 92a22c83fa924935-828102fc2a7bba92 failed: (10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 156867
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81734|3589026) [StmtExecutor.executeAndSendResult():1963] cancel fragment query_id:92a22c83fa924935-828102fc2a7bba92 cause errCode = 2, detailMessage = (10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 156867
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81734|3589026) [Coordinator.cancel():1563] Query 92a22c83fa924935-828102fc2a7bba92 already in abnormal status Status [errorCode=CANCELLED, errorMsg=(10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 156867], but received cancel again,so that send cancel to BE again
java.lang.Exception: cancel failed
at org.apache.doris.qe.Coordinator.cancel(Coordinator.java:1565) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeAndSendResult(StmtExecutor.java:1964) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleCacheStmt(StmtExecutor.java:1731) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1804) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:851) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:805) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:556) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:532) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:337) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:218) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:284) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:312) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:479) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_141]
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81734|3589026) [Coordinator.cancel():1570] Cancel execution of query 92a22c83fa924935-828102fc2a7bba92, this is a outside invoke, cancelReason cancel fragment query_id:92a22c83fa924935-828102fc2a7bba92 cause errCode = 2, detailMessage = (10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 156867
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81734|3589026) [ResultReceiver.updateCancelReason():203] Query 92a22c83fa924935-828102fc2a7bba92 already has cancel reason: (10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 156867, new reason cancel fragment query_id:92a22c83fa924935-828102fc2a7bba92 cause errCode = 2, detailMessage = (10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 156867 will be ignored
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81734|3589026) [ResultReceiver.cancel():228] ResultReceiver of query 92a22c83fa924935-828102fc2a7bba92 cancel failed, typically means the future is finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,624 WARN (mysql-nio-pool-81711|3588549) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 535e4ebf769e4979-820828af44b95a16, cancel after finished
2025-02-28 14:14:26,625 WARN (mysql-nio-pool-81734|3589026) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 92a22c83fa924935-828102fc2a7bba92, cancel after finished
2025-02-28 14:14:26,625 WARN (mysql-nio-pool-81734|3589026) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 92a22c83fa924935-828102fc2a7bba92, cancel after finished
2025-02-28 14:14:26,625 WARN (mysql-nio-pool-81734|3589026) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 92a22c83fa924935-828102fc2a7bba92, cancel after finished
2025-02-28 14:14:26,625 WARN (mysql-nio-pool-81734|3589026) [Coordinator$PipelineExecContext.cancelFragmentInstance():3600] Query 92a22c83fa924935-828102fc2a7bba92, cancel after finished
2025-02-28 14:14:26,625 WARN (backend-rpc-callback-13|1497) [Coordinator$PipelineExecContext$1.onSuccess():3506] Failed to cancel query 59cdb4aebf564eb7-a71aec4369e8c188 instance initiated=true done=false backend: 10041,fragment id=F21, reason: without status
2025-02-28 14:14:26,626 WARN (mysql-nio-pool-81734|3589026) [ConnectProcessor.handleQueryException():464] Process one query failed because.
org.apache.doris.common.UserException: errCode = 2, detailMessage = (10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 156867
at org.apache.doris.qe.Coordinator.getNext(Coordinator.java:1446) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeAndSendResult(StmtExecutor.java:1882) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleCacheStmt(StmtExecutor.java:1731) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1804) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:851) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:805) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:556) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:532) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:337) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:218) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:284) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:312) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:479) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_141]
2025-02-28 14:14:26,627 WARN (backend-rpc-callback-8|1492) [Coordinator$PipelineExecContext$1.onSuccess():3506] Failed to cancel query 535e4ebf769e4979-820828af44b95a16 instance initiated=true done=false backend: 10041,fragment id=F15, reason: without status
2025-02-28 14:14:26,627 WARN (backend-rpc-callback-26|1910) [Coordinator$PipelineExecContext$1.onSuccess():3506] Failed to cancel query 92a22c83fa924935-828102fc2a7bba92 instance initiated=true done=false backend: 10041,fragment id=F06, reason: without status
2025-02-28 14:14:26,627 WARN (mysql-nio-pool-81711|3588549) [ConnectProcessor.handleQueryException():464] Process one query failed because.
org.apache.doris.common.UserException: errCode = 2, detailMessage = (10.132.2.113)[CANCELLED]failed to send brpc when exchange, error=Resource temporarily unavailable, error_text=[E11]Resource temporarily unavailable @10.132.2.117:8060, client: 10.132.2.113, latency = 166430
at org.apache.doris.qe.Coordinator.getNext(Coordinator.java:1446) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeAndSendResult(StmtExecutor.java:1882) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleCacheStmt(StmtExecutor.java:1731) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1804) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:851) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:805) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:556) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:532) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:337) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:218) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:284) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:312) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:479) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_141]
2025-02-28 14:14:26,628 WARN (backend-rpc-callback-27|2497) [Coordinator$PipelineExecContext$1.onSuccess():3506] Failed to cancel query 92a22c83fa924935-828102fc2a7bba92 instance initiated=true done=false backend: 10041,fragment id=F05, reason: without status
2025-02-28 14:14:26,629 WARN (backend-rpc-callback-3|1486) [Coordinator$PipelineExecContext$1.onSuccess():3506] Failed to cancel query 59cdb4aebf564eb7-a71aec4369e8c188 instance initiated=true done=false backend: 10041,fragment id=F22, reason: without status
2025-02-28 14:14:26,631 WARN (backend-rpc-callback-28|2498) [Coordinator$PipelineExecContext$1.onSuccess():3506] Failed to cancel query 92a22c83fa924935-828102fc2a7bba92 instance initiated=true done=false backend: 10041,fragment id=F07, reason: without status
2025-02-28 14:14:26,633 WARN (backend-rpc-callback-12|1495) [Coordinator$PipelineExecContext$1.onSuccess():3506] Failed to cancel query 535e4ebf769e4979-820828af44b95a16 instance initiated=true done=false backend: 10041,fragment id=F18, reason: without status

1 Answers

已设置set global parallel_pipeline_task_num = 5;观察中。。。。。