2025-01-13 13:29:18.553 [pool-6-thread-16] ERROR com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer - inspection task 最大打开进程数 on node 1 retry reach limits
2025-01-13 13:29:18.553 [pool-6-thread-16] ERROR com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer - failed to run inspection
java.lang.Exception: inspection retry exceed limits.
at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.inspectNode(ClusterInspectionSynchronizer.java:194)
at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.startNodeInspections(ClusterInspectionSynchronizer.java:237)
at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.run(ClusterInspectionSynchronizer.java:146)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException: null
at java.lang.String.replace(String.java:2240)
at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.assembleContent(ClusterInspectionSynchronizer.java:219)
at com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer.inspectNode(ClusterInspectionSynchronizer.java:170)
... 7 common frames omitted
2025-01-13 13:29:18.554 [pool-6-thread-16] INFO com.selectdb.enterprise.manager.service.impl.ClusterInspectionSynchronizer - assembled content: #!/bin/bash
# Copyright 2023 SelectDB, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.
DEPLOY_DIR=/usr/local/doris-2.0.1/apache-doris-2.0.1-bin-x64/fe
pid_file=${DEPLOY_DIR}/bin/fe.pid
if [ ! -f "$pid_file" ]; then
echo '{"status":"warn","value":"fe pid file not found"}'
exit 0
fi
# get FE pid
pid=$(cat "$pid_file")
if [ -z $pid ]; then
echo '{"status":"warn","value":"fe pid is null"}'
exit 0
fi
# get JVM setting
jvm=$(cat /proc/$pid/cmdline| tr '\0' '\n' | grep -E '\-Xmx' | awk '{print $1}' | sed 's/-Xmx//')
# get FE memory(unit is byte)
memory=$(ps -p $pid -o rss=)
## change jvm to bytes unit
# value part
value=$(echo $jvm | sed 's/[^0-9]*//g')
# unit part(such as g、m、k)
unit=$(echo $jvm | sed 's/[0-9]*//g')
case $unit in
g|G)
jvmBytes=$(($value * 1024 * 1024 * 1024))
;;
m|M)
jvmBytes=$(($value * 1024 * 1024))
;;
k|K)
jvmBytes=$(($value * 1024))
;;
*)
jvmBytes=0
;;
esac
## calculate memory/jvm ratio
ratio=$(awk "BEGIN { printf \"%.2f\", $memory / $jvmBytes }")
## 对内存结果进行分析,低于16g建议扩充至16g,大于16g时若使用已达jvm上限一半时,提醒用户及时增加jvm内存。
## 状态注释
## 0请调整jvm大小至16g,
## 1当前实际内存为:$memory,jvm设置为:$jvm,已超过上限一半,请及时更改jvm内存上限大于当前内存占用的2倍
## 2无异常
## warn/8g/建议值
if (( $(awk "BEGIN { print ($jvmBytes < 17179869184) }") )); then
echo '{"status":"warn","value":"'$jvm'"}'
elif (( $(awk "BEGIN { print ($ratio >= 0.5) }") )); then
echo '{"status":"warn","value":"'$jvm'"}'
else
echo '{"status":"info","value":"'$jvm'"}'
fi