根据 dfsadmin 命令，HDFS 配置容量小于原始磁盘容量答案

【问题标题】：HDFS Configured Capacity is lesser than the original disk capacity as per dfsadmin command根据 dfsadmin 命令，HDFS 配置容量小于原始磁盘容量
【发布时间】：2016-05-15 13:15:42
【问题描述】：

我在 VMWare 工作站中使用 Cloudera Manager 5.4.1 实现了 2 节点集群，其中包括 Hbase、Impala、Hive、Sqoop2、Oozie、Zookeeper、NameNode、SecondaryName 和 YARN 等组件。我为每个节点模拟了 3 个磁盘驱动器，其中包括用于 OS 的 sda、用于 Hadoop 存储的 sdb 和 sdc。

因为我在每个节点上分配了 16GB 的 sdb1 和 16GB 的 sdc1 专用于 Hadoop 存储。因此，我假设我的 HDFS 存储总容量（包括两个节点）应该是 64GB。但是，当使用 dfsadmin 命令和 NameNode UI 检查输出时，我看到“配置容量小于分配给 HDFS 的原始磁盘大小”。我在下面显示了 dfsadmin 命令的输出，还显示了 df -h 的输出。请帮助我了解为什么配置的容量显示小于我的原始磁盘大小？

[hduser@node1 ~]$ df -h


Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/vg_node1-LogVol00   40G   15G   23G  39% /
tmpfs                          3.9G   76K  3.9G   1% /dev/shm
/dev/sda1                      388M   39M  329M  11% /boot
/dev/sdb1                       16G  283M   15G   2% /disks/disk1/hdfsstorage/dfs
/dev/sdc1                       16G  428M   15G   3% /disks/disk2/hdfsstorage/dfs
/dev/sdb2                      8.1G  147M  7.9G   2% /disks/disk1/nonhdfsstorage
/dev/sdc2                      8.1G  147M  7.9G   2% /disks/disk2/nonhdfsstorage
cm_processes                   3.9G  5.8M  3.9G   1% /var/run/cloudera-scm-agent/process
[hduser@node1 ~]$


[hduser@node1 zookeeper]$ sudo -u hdfs hdfs dfsadmin -report
[sudo] password for hduser:
Configured Capacity: 47518140008 (44.25 GB)
Present Capacity: 47518140008 (44.25 GB)
DFS Remaining: 46728742571 (43.52 GB)
DFS Used: 789397437 (752.83 MB)
DFS Used%: 1.66%
Under replicated blocks: 385
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.52.111:50010 (node1.example.com)
Hostname: node1.example.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 23759070004 (22.13 GB)
DFS Used: 394702781 (376.42 MB)
Non DFS Used: 0 (0 B)
DFS Remaining: 23364367223 (21.76 GB)
DFS Used%: 1.66%
DFS Remaining%: 98.34%
Configured Cache Capacity: 121634816 (116 MB)
Cache Used: 0 (0 B)
Cache Remaining: 121634816 (116 MB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Sun May 15 18:15:33 IST 2016


Name: 192.168.52.112:50010 (node2.example.com)
Hostname: node2.example.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 23759070004 (22.13 GB)
DFS Used: 394694656 (376.41 MB)
Non DFS Used: 0 (0 B)
DFS Remaining: 23364375348 (21.76 GB)
DFS Used%: 1.66%
DFS Remaining%: 98.34%
Configured Cache Capacity: 523239424 (499 MB)
Cache Used: 0 (0 B)
Cache Remaining: 523239424 (499 MB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Sun May 15 18:15:32 IST 2016

【问题讨论】：

标签： hadoop hdfs hadoop-yarn cloudera

【解决方案1】：

你应该检查配置

<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
  </description>
</property>

预留空间不是“配置容量”的一部分。

【讨论】：

谢谢沃尔特苏。是的，根据属性“dfs.datanode.du.reserved”，它被配置为使用 4.25 GB，因此我认为现在为给定节点中的每个数据目录分配了 4.25 GB。由于我有两个数据目录分区，每个节点的保留空间为 8.5 GB，这使每个节点上的配置容量为 23.5 GB（32GB - 8.5GB）我得出了公式：配置容量 = 为数据分配的总磁盘空间目录 (dfs.data.dir) - 非 DFS 使用的保留空间 (dfs.datanode.du.reserved)
我在集群中的“Non DFS used”已经增长到 400MB，如果您能告诉我“Non DFS used”到底是什么以及如何删除它，那将会有很大帮助。堆栈溢出有一些答案，但还是看不懂。
what-exactly-non-dfs-used-means 有很好的答案。我脑子里只有两种可能的方法：1。关闭将文件处理程序保存到已删除文件的进程。 2.假设/mnt/disk0/是你的挂载点，/mnt/disk0/dfs/是你配置的dataDir，你确定磁盘上没有其他文件吗？像 /mnt/disk0/otherDir/otherFile。尝试删除它。