冷热分层-hdfs source 降冷失败

Viewed 73

版本:2.1.4

resource信息:
image.png

相关be.WARNING日志:

W20240722 14:53:26.604739 172536 file_system.cpp:35] NOT_FOUND, No such file or directory), reason: RemoteException: File does not exist: /data/44230502/44230505.0.meta
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:170)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1860)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:697)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

0# doris::io::HdfsFileHandle::init(long) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:0
1# doris::io::FileHandleCache::get_file_handle(hdfs_internal* const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, long, long, bool, doris::io::FileHandleCache::Accessor*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:481
2# doris::io::HdfsFileHandleCache::get_file(std::shared_ptr const&, std::filesystem::__cxx11::path const&, long, long, doris::io::FileHandleCache::Accessor*) at /home/zcp/repo_center/doris_release/doris/be/src/io/fs/hdfs_file_system.cpp:117
3# doris::io::HdfsFileSystem::open_file_internal(std::filesystem::__cxx11::path const&, std::shared_ptr, doris::io::FileReaderOptions const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
4# doris::io::RemoteFileSystem::open_file_impl(std::filesystem::__cxx11::path const&, std::shared_ptr
, doris::io::FileReaderOptions const*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:481
5# doris::io::FileSystem::open_file(std::filesystem::__cxx11::path const&, std::shared_ptr, doris::io::FileReaderOptions const) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:377
6# doris::Tablet::_read_cooldown_meta(std::shared_ptr const&, doris::TabletMetaPB*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:360
7# doris::Tablet::_follow_cooldowned_data() at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:481
8# doris::Tablet::cooldown(std::shared_ptr) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:481
9# std::_Function_handler<void (), doris::StorageEngine::_cooldown_tasks_producer_callback()::$_1>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
10# doris::WorkThreadPool::work_thread(int) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
11# execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
12# start_thread
13# __clone
W20240722 14:53:26.604816 172536 olap_server.cpp:1161] failed to cooldown, tablet: 44230502 err: [INTERNAL_ERROR]cannot read cooldown meta
W20240722 14:53:29.995173 10361 fragment_mgr.cpp:432] report error status: TStatus: to coordinator: TNetworkAddress(hostname=10.196.162.16, port=9020), query id: 92ede7f8df454360-9f8e176b2c054ef0, instance id: 0-0

相关路径存在:
image.png

又找了一下日志,发现元数据的写入目录还是在根目录,没有权限导致失败,这个有办法解决吗?
image.png

2 Answers

【问题状态】已记录
【问题处理】内部定位中,有进展会更新回帖

嗨 基于你给出的堆栈 可以show tablet 44230502;看一下这个tablet的情况,报错的这台BE上的replica应该不是cooldown replica. 这个错误很可能是cooldown replica还没上传最新的cooldown meta