doris2.1.7的fe的cpu持续超过60%

Viewed 92

很奇怪现象,2.1.7的fe在没有实时连接的情况下,cpu的占用率也超过60%,如下图所示
image.png
top -H -p fe进程
image.png

fe的配置如下

元数据保存位置

meta_dir = /data/doris-meta

#多网卡配置网段

priority_networks = 172.16.0.0/24

#调整 FE 内存,默认8g,因为目前总内存16G,暂不调整

#修改 Doris 大小写敏感参数

lower_case_table_names = 1
enable_outfile_to_local=true

#解决内存占用高的参数
wait_timeout = 300
set global enable_auto_analyze = false

4 Answers

日志中大量出现类似下面的信息
image.png

jstack的结果如下:jstack 31657 |grep -A 50 7bee

jstack 31657 |grep -A 50 7bee
"replayer" #92 daemon prio=5 os_prio=0 tid=0x00007fa65c005000 nid=0x7bee runnable [0x00007fa60affa000]
java.lang.Thread.State: RUNNABLE
at com.sleepycat.je.dbi.DiskOrderedScanner.processBINInternal(DiskOrderedScanner.java:1945)
at com.sleepycat.je.dbi.DiskOrderedScanner.accumulateBINs(DiskOrderedScanner.java:1169)
at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:758)
at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:708)
at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:1510)
at com.sleepycat.je.Database.count(Database.java:2042)
at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalIdInternal(BDBJEJournal.java:414)
at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:379)
at org.apache.doris.persist.EditLog.getMaxJournalId(EditLog.java:136)
at org.apache.doris.catalog.Env.getMaxJournalId(Env.java:4257)
at org.apache.doris.catalog.Env.replayJournal(Env.java:2821)
- locked <0x00000005c5801ca0> (a org.apache.doris.catalog.Env)
at org.apache.doris.catalog.Env$4.runOneCycle(Env.java:2622)
at org.apache.doris.common.util.Daemon.run(Daemon.java:119)

"Thread-38" #50 daemon prio=5 os_prio=0 tid=0x00007fa6c0b53800 nid=0x7bed waiting on condition [0x00007fa60b3fb000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.doris.common.util.Daemon.run(Daemon.java:125)

"Automatic Analyzer" #43 daemon prio=5 os_prio=0 tid=0x00007fa6c0b52800 nid=0x7bec waiting on condition [0x00007fa60b7fc000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.doris.common.util.Daemon.run(Daemon.java:125)

"Statistics Table Cleaner" #41 daemon prio=5 os_prio=0 tid=0x00007fa6c1cf1000 nid=0x7beb waiting on condition [0x00007fa60bbfd000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.doris.common.util.Daemon.run(Daemon.java:125)

"stateListener" #90 daemon prio=5 os_prio=0 tid=0x00007fa6c1cf0000 nid=0x7bea waiting on condition [0x00007fa60bffe000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c5b11c08> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
at org.apache.doris.catalog.Env$5.runOneCycle(Env.java:2715)
- locked <0x00000005c3203ff0> (a org.apache.doris.catalog.Env$5)
at org.apache.doris.common.util.Daemon.run(Daemon.java:119)

"ReplayThread" #87 daemon prio=5 os_prio=0 tid=0x00007fa648825000 nid=0x7be9 waiting on condition [0x00007fa614df2000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c3204328> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)

有几个FE节点呢,这个看着是在回放元数据日志呢
image.png

自己解决了上述问题
之前是关闭了默认的22的端口,导致ssh到其他机器,无法正常ssh过去,我重新开发了22端口后,目前机器的cpu已经稳定了,fe的cpu在7%-10%之前,完全正常了。
原理猜测是fe的元数据同步不能正常进行,导致fe总在尝试,具体原理细节没有看到对应的资料,这里做个记录。