【已解决】配置了虚拟IP后,重启FE启动报错,Replica is configured xxx,HANDSHAKE_ERROR...

Viewed 80

我们的环境是 3台Linux服务器,分别是 3be,3fe 。
为了负载均衡和高可用,我们在这3台服务器上安装了 nginx+keepalived,使用虚拟IP 。

用了几天后,重启fe,发现报错了。报错信息如下:
10.166.1.137是这台服务器的IP,10.166.1.227是虚拟IP。

137的fe.conf配置如下:

#实际的ip地址
priority_networks = 10.166.1.137/24

我该如何处理,才能重启FE成功,并且不影响现有数据,且以后不会出现这个问题呢?

2024-04-18 17:22:25,911 WARN (UNKNOWN 10.166.1.137_9010_1698371205844(-1)|1) [DorisFE.start():212]
com.sleepycat.je.EnvironmentFailureException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 10.166.1.137_9010_1698371205844(1):/opt/doris/fe/doris-meta/bdb  Feeder: fe_09624f70_4094_40a2_9044_0eb99722d8a1(3). Conflicting hostnames for replica id: 10.166.1.137_9010_1698371205844(1) Feeder thinks it is: 10.166.1.137 Replica is configured to use: 10.166.1.227 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed.
        at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:230) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:802) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.log.LogManager.getLogEntryHandleNotFound(LogManager.java:956) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.dbi.DiskOrderedScanner.fetchEntry(DiskOrderedScanner.java:2068) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.dbi.DiskOrderedScanner.fetchAndProcessBINs(DiskOrderedScanner.java:1640) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:789) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:708) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:1510) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.Database.count(Database.java:2042) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:286) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:374) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.persist.EditLog.open(EditLog.java:1138) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.initialize(Env.java:901) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.DorisFE.start(DorisFE.java:164) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.DorisFE.main(DorisFE.java:84) ~[doris-fe.jar:1.2-SNAPSHOT]
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 18.3.12) 10.166.1.137_9010_1698371205844(1):/opt/doris/fe/doris-meta/bdb  Feeder: fe_09624f70_4094_40a2_9044_0eb99722d8a1(3). Conflicting hostnames for replica id: 10.166.1.137_9010_1698371205844(1) Feeder thinks it is: 10.166.1.137 Replica is configured to use: 10.166.1.227 HANDSHAKE_ERROR: Error during the handshake between two nodes. Some validity or compatibility check failed, preventing further communication between the nodes. Environment is invalid and must be closed. Originally thrown by HA thread: RepNode 10.166.1.137_9010_1698371205844(-1)
        at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.verifyMembership(ReplicaFeederHandshake.java:342) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.rep.stream.ReplicaFeederHandshake.execute(ReplicaFeederHandshake.java:267) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:709) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
        at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]


我将虚拟IP偏移到另一台服务器后,这台FE就可以正常重启了。但是这只是临时方案,是否有完美的方案呢?

1 Answers