Doris Manager 巡检执行失败 V24.1.4

Viewed 56

企业微信截图_17367501472421.png

2025-01-13 13:29:18.553 [pool-6-thread-16] ERROR com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer - inspection task 最大打开进程数 on node 1 retry reach limits
2025-01-13 13:29:18.553 [pool-6-thread-16] ERROR com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer - failed to run inspection

java.lang.Exception: inspection retry exceed limits.
        at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.inspectNode(ClusterInspectionSynchronizer.java:194)
        at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.startNodeInspections(ClusterInspectionSynchronizer.java:237)
        at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.run(ClusterInspectionSynchronizer.java:146)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException: null
        at java.lang.String.replace(String.java:2240)
        at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.assembleContent(ClusterInspectionSynchronizer.java:219)
        at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.inspectNode(ClusterInspectionSynchronizer.java:170)
        ... 7 common frames omitted
2025-01-13 13:29:18.554 [pool-6-thread-16] INFO  com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer - assembled content: #!/bin/bash
# Copyright 2023 SelectDB, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

DEPLOY_DIR=/usr/local/doris-2.0.1/apache-doris-2.0.1-bin-x64/fe

pid_file=${DEPLOY_DIR}/bin/fe.pid

if [ ! -f "$pid_file" ]; then
  echo '{"status":"warn","value":"fe pid file not found"}'
  exit 0
fi

# get FE pid
pid=$(cat "$pid_file")

if [ -z $pid ]; then
  echo '{"status":"warn","value":"fe pid is null"}'
  exit 0
fi

# get JVM setting
jvm=$(cat /proc/$pid/cmdline| tr '\0' '\n' | grep -E '\-Xmx' | awk '{print $1}' | sed 's/-Xmx//')
# get FE memory(unit is byte)
memory=$(ps -p $pid -o rss=)

## change jvm to bytes unit
# value part
value=$(echo $jvm | sed 's/[^0-9]*//g')
# unit part(such as g、m、k)
unit=$(echo $jvm | sed 's/[0-9]*//g')

case $unit in
    g|G)
        jvmBytes=$(($value * 1024 * 1024 * 1024))
        ;;
    m|M)
        jvmBytes=$(($value * 1024 * 1024))
        ;;
    k|K)
        jvmBytes=$(($value * 1024))
        ;;
    *)
        jvmBytes=0
        ;;
esac

## calculate memory/jvm ratio
ratio=$(awk "BEGIN { printf \"%.2f\", $memory / $jvmBytes }")

## 对内存结果进行分析,低于16g建议扩充至16g,大于16g时若使用已达jvm上限一半时,提醒用户及时增加jvm内存。
## 状态注释
## 0请调整jvm大小至16g,
## 1当前实际内存为:$memory,jvm设置为:$jvm,已超过上限一半,请及时更改jvm内存上限大于当前内存占用的2倍
## 2无异常

## warn/8g/建议值
if (( $(awk "BEGIN { print ($jvmBytes < 17179869184) }") )); then
    echo '{"status":"warn","value":"'$jvm'"}'
elif (( $(awk "BEGIN { print ($ratio >= 0.5) }") )); then
    echo '{"status":"warn","value":"'$jvm'"}'
else
    echo '{"status":"info","value":"'$jvm'"}'
fi
1 Answers

问题已定位,近期会在 24.1.5 上修复