doris 1.2.7高可用失效问题

Viewed 35

我的集群有三个follower 和两个observer
然后有一天我的master死掉了。他自动选举出了其他的master.
但是我再次向observer发送请求的时候我的请求失败了。
发现了任务的报错信息为:
org.apache.doris.ge .Master0pExecutor$formardToMasterException: fomward to master FE %sNetworkAdres(hostname:132.25.169.166, port;920), statement id: 15359978 ;
于是我查看 死掉的节点日志 如下:
2024-11-23 03:04:21,900 WARN (thrift-server-pool-971|211268) [Table.tryWriteLock():184] Failed to try table stg_d_mat_zw_store_export_record's write lock. timeout 10000 MILLISECONDS. Current owner: Thread[thrift-server-pool-947,5,main]
2024-11-23 03:04:21,900 WARN (thrift-server-pool-971|211268) [StmtExecutor.handleInsertStmt():1536] handle insert stmt fail: insert_582943aac15f4014_a528353935e57ebb
org.apache.doris.common.UserException: errCode = 2, detailMessage = get tableList write lock timeout, tableList=(Table [id=72432819, name=stg_d_mat_zw_store_export_record, type=OLAP])
at org.apache.doris.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:271) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.transaction.GlobalTransactionMgr.commitAndPublishTransaction(GlobalTransactionMgr.java:260) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleInsertStmt(StmtExecutor.java:1524) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe
可能是任务量过大死掉了,不管他是怎样死掉了 这时已经选出了新的master,集群并没有瘫痪。
但是奇怪的是 我在observer上提交的sql任务还是报错了。

于是我重新启动了老的master:日志如下:
2024-11-23 11:40:23,972 WARN (UNKNOWN 132.225.169.166_9010_1677118892185(-1)|1) [Env.notifyNewFETypeTransfer():2377] notify new FE type transfer: UNKNOWN
2024-11-23 11:40:24,010 WARN (RepNode 132.225.169.166_9010_1677118892185(-1)|78) [Env.notifyNewFETypeTransfer():2377] notify new FE type transfer: FOLLOWER
2024-11-23 11:40:24,045 WARN (replayer|93) [Env.setCanRead():2348] meta out of date. current time: 1732333224044, synchronized time: 0, has log: false, fe type: UNKNOWN
2024-11-23 11:40:30,395 WARN (replayer|93) [Env.replayJournal():2532] replay journal cost too much time: 1337 replayedJournalId: 524684958
2024-11-23 11:40:30,985 WARN (thrift-server-pool-2|103) [StmtExecutor.execute():589] execute Exception. stmt[34, fd3a4e60757f4e6d-bf8c7fea0c967b22]
org.apache.doris.common.UserException: errCode = 2, detailMessage = The statement has been forwarded to master FE(132.225.169.166) and failed to execute because Master FE is not ready. You may need to check FE's status
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:469) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.proxyExecute(ConnectProcessor.java:644) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.service.FrontendServiceImpl.forward(FrontendServiceImpl.java:553) ~[doris-fe.jar:1.2-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_202]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_202]
at org.apache.doris.service.FeServer.lambda$start$0(FeServer.java:59) ~[doris-fe.jar:1.2-SNAPSHOT]
at com.sun.proxy.$Proxy31.forward(Unknown Source) ~[?:?]
at org.apache.doris.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:1847) ~[fe-common-1.2-SNAPSHOT.jar:1.2-SNAPSHOT]
at org.apache.doris.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:1827) ~[fe-common-1.2-SNAPSHOT.jar:1.2-SNAPSHOT]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[thrift-shade-0.13-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[thrift-shade-0.13-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) ~[thrift-shade-0.13-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
2024-11-23 11:40:30,985 WARN (thrift-server-pool-5|106) [StmtExecutor.execute():589] execute Exception. stmt[35, baaeba60945a4fb6-af35ac9de0ea4743]
org.apache.doris.common.UserException: errCode = 2, detailMessage = The statement has been forwarded to master FE(132.225.169.166) and failed to execute because Master FE is not ready. You may need to check FE's status
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:469) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.proxyExecute(ConnectProcessor.java:644) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.service.FrontendServiceImpl.forward(FrontendServiceImpl.java:553) ~[doris-fe.jar:1.2-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_202]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_202]
at org.apache.doris.service.FeServer.lambda$start$0(FeServer.java:59) ~[doris-fe.jar:1.2-SNAPSHOT]
at com.sun.proxy.$Proxy31.forward(Unknown Source) ~[?:?]
at org.apache.doris.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:1847) ~[fe-common-1.2-SNAPSHOT.jar:1.2-SNAPSHOT]
at org.apache.doris.thrift.FrontendService$Processor$forward.getResult(FrontendService.java:1827) ~[fe-common-1.2-SNAPSHOT.jar:1.2-SNAPSHOT]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[thrift-shade-0.13-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[thrift-shade-0.13-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) ~[thrift-shade-0.13-1.0.2-SNAPSHOT.jar:1.0.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]

重新启动后 老的master已经沦为follower了。
但是observer似乎依然在向他发送sql请求。于是出现了because Master FE is not ready. You may need to check FE's status 这样的报错。
于是我最后重启了observer.任务正常了。
这种情况下observer不能正常识别新的master吗。
有没有大佬遇到这种情况,求解答

1 Answers

流程上ob会自动跟新的fe master互信,建议升级至>=2.0.15或>=2.1.7看看。