K8S 部署 ddc 3.0.3 升级 3.0.4 BE 一直重启

Viewed 11

请问大家遇到过 ddc 从 3.0.3 升级到 3.0.4 be 一直重启的问题吗?
K8S 部署 FE MS 都启动好了,切换回 3.0.3 可以正常启动
报错如下:

RuntimeLogger W20250407 12:15:06.792353   684 status.h:424] meet error status: [RUNTIME_ERROR]Could not create thread. (error 11) Resource temporarily unavailable

	0#  doris::Thread::start_thread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&, unsigned long, scoped_refptr<doris::Thread>*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:445
	1#  doris::ThreadPool::create_thread() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:244
	2#  doris::ThreadPool::init() at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:502
	3#  doris::Status doris::ThreadPoolBuilder::build<doris::ThreadPool>(std::unique_ptr<doris::ThreadPool, std::default_delete<doris::ThreadPool> >*) const at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:502
	4#  doris::EvHttpServer::start() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
	5#  doris::HttpService::start() at /home/zcp/repo_center/doris_release/doris/be/src/service/http_service.cpp:0
	6#  main at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:389
	7#  ?
	8#  __libc_start_main
	9#  _start
*** Query id: 0-0 ***
*** is nereids: 0 ***
*** tablet id: 0 ***
*** Aborted at 1743999306 (unix time) try "date -d @1743999306" if you are using GNU date ***
*** Current BE git commitID: 39f9074cec ***
*** SIGSEGV address not mapped to object (@0x8) received by PID 684 (TID 684 OR 0x7fb2013c1a00) from PID 8; stack trace: ***
RuntimeLogger I20250407 12:15:06.815495  1466 wal_manager.cpp:485] Scheduled(every 10s) WAL info: [/opt/apache-doris/be/storage/wal: limit 33838706688 Bytes, used 0 Bytes, estimated wal bytes 0 Bytes, available 33838706688 Bytes.];
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/r

DDC配置如下:

apiVersion: v1
kind: ConfigMap
metadata:
  name: be-configmap
  namespace: hd-dev-doris-v1
  labels:
    app.kubernetes.io/component: be
data:
  be.conf: |
    # For jdk 17, this JAVA_OPTS will be used as default JVM options
    JAVA_OPTS_FOR_JDK_17="-Xmx1024m -DlogPath=$LOG_DIR/jni.log -Xlog:gc*:$LOG_DIR/be.gc.log.$CUR_DATE:time,uptime:filecount=10,filesize=50M -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.security.krb5.debug=true -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-opens=java.management/sun.management=ALL-UNNAMED"
    file_cache_path = [{"path":"/mnt/disk1/doris_cloud/file_cache","total_size":207374182400,"query_limit":207374182400}]
---
apiVersion: disaggregated.cluster.doris.com/v1
kind: DorisDisaggregatedCluster
metadata:
  name: dev-disaggregated-cluster
  namespace: hd-dev-doris-v1
spec:
  metaService:
    image: apache/doris:ms-3.0.3
    envVars:
      - name: TZ
        value: Asia/Shanghai
    requests:
      cpu: 4
      memory: 4Gi
    limits:
      cpu: 4
      memory: 4Gi
    fdb:
      configMapNamespaceName:
        name: fdb-dev-cluster-config
        namespace: hd-dev-doris-v1
  feSpec:
    replicas: 2
    image: apache/doris:fe-3.0.3
    envVars:
      - name: TZ
        value: Asia/Shanghai
    requests:
      cpu: 6
      memory: 80Gi
    limits:
      cpu: 6
      memory: 80Gi
    service:
      type: NodePort
      portMaps:
        - nodePort: 30830
          targetPort: 8030
        - nodePort: 30920
          targetPort: 9020
        - nodePort: 30930
          targetPort: 9030
        - nodePort: 30910
          targetPort: 9010
    persistentVolume:
      persistentVolumeClaimSpec:
        storageClassName: netapp-iscsi
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  computeGroups:
    - uniqueId: cg1
      replicas: 3
      image: apache/doris:be-3.0.3
      envVars:
      - name: TZ
        value: Asia/Shanghai
      requests:
        cpu: 8
        memory: 80Gi
      limits:
        cpu: 8
        memory: 80Gi
      persistentVolume:
        # logNotStore: true
        persistentVolumeClaimSpec:
          storageClassName: ocs-storagecluster-ceph-rbd
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 200Gi
0 Answers