Hadoop YARN资源管理-容量调度器(Yahoo!的Capacity Scheduler)

　　　Hadoop YARN资源管理-容量调度器(Yahoo!的Capacity Scheduler)

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　作者：尹正杰

一.队列和子队列

1>.YARN资源调度器概述

　　博主推荐阅读:
　　　　https://www.cnblogs.com/yinzhengjie/p/13341939.html

2>.队列概述

　　容量调度器依赖于队列的概念来控制集群中的资源分配。一个(作业)队列是作业的有序列表。当创建队列是，为其分配一些集群资源。

　　然后，用户应用程序被提交到此队列以访问队列的资源，关于队列我们需要了解以下几点:
　　　　(1)可以配置队列容量的软限制以及硬限制;
　　　　(2)被提交到队列的应用程序以FIFO顺序运行;
　　　　(3)一旦提交到队列的应用程序开始运行，它们不能被抢占，但随着任务的完成，任何空闲的资源都将被分配到其他资源低于允许容量的队列;
　　　　(4)如果一个队列没有使用分配给它的所有资源，那么多余的资源可以被集群中的其他队列使用，从而优化集群的资源利用率;

　　容量调度器支持使用分层队列来确保组织(在多租户设置中，指共享相同集群的多个组织)资源在其子队列之间共享，这优先于让其他队列使用这些可用资源。

3>.Apache Hadoop的容量调度器默认队列

　　作业队列是一切事情的开端，可以在"${HADOOP_HOME}/etc/hadoop/capacity-scheduler.xml"文件中设置队列，该文件默认位于Hadoop安装目录的下的"etc/hadoop/"目录中。

　　如下图所示，root队列是预定的队列，随后创建的所有队列都将被视为root队列下的子队列（比如Apache Hadoop在其root队列下就有默认的子队列"default"）。

Hadoop YARN资源管理-容量调度器(Yahoo!的Capacity Scheduler)

4>.容量调度器队列的命名规则

　　创建任何的队列相对于队列路径来命名，该路径显示队列的层次结构，使用YARN配置属性"yarn.scheduler.capacity.<queue-path>.queues"来配置队列。
　　　　yarn.scheduler.capacity.root.queues
　　　　yarn.scheduler.capacity.root.queues.default.queues
　　　　yarn.scheduler.capacity.root.queues.yinzhengjie.queues
　　　　yarn.scheduler.capacity.root.queues.yinzhengjie.queues.op.queues

　　温馨提示:
　　　　root始终是创建所有队列的顶级队列(这一点不能更改，如果你将顶级队列进行更名，那么YARN集群在启动时就会抛出如下图所示的异常)，此外，子队列可能有也可能没有哟。
　　　　顶级子队列(如下图所示的"default")是直接位于root队列下的子队列。在每个顶级子队列下，也可以创建子队列，因此我们可以说队列是支持嵌套的。

Hadoop YARN资源管理-容量调度器(Yahoo!的Capacity Scheduler)

5>.分层队列

　　为了细粒度级别控制资源分配，还可以在每个队列下配置称为分层队列的子队列，从而允许来自特定组织的应用程序有效利用分配给它的所有资源。

　　队列的多余或空闲资源只有在其子队列满足其资源需求之后才被其他队列使用。

　　除了队列的配额和最大容量之外，管理员还可以做以下限制:
　　　　(1)特定用户可以使用最大的资源量;
　　　　(2)每个队列(或每个用户)的待处理任务数量;
　　　　(3)每个队列(或每个用户)的活动(或接受)作业的数量;
　　　　(4)容量保证和弹性;

　　如下图所示，就是典型的容量调度器队列分层的案例。

Hadoop YARN资源管理-容量调度器(Yahoo!的Capacity Scheduler)

6>.容量保证

　　容量调度器的主要目标是确保资源共享的可预测性。它通过为配置的作业队列提供容量保证来实现这一可预测性。发送到队列的应用程序能够访问队列的容量。

　　每个队列被分配一部分集群容量，因此具体容量在队列中。可以为队列分配的容量设置软和硬(可选)限制。

7>.队列弹性

　　为了充分利用集群资源，调度器还允许队列具有一定弹性，如果集群中有空闲资源，则队列总是可以利用超出其配置容量的资源。

　　这里的弹性是指基于资源的可用性(或不可用性)，集群可以分配超过(或少于)原始配置的资源。这意味着超载的作业队列可以潜在地使用集群中其他队列的未使用容量，从而最优使用集群资源。

　　当然，随着其他队列的增加并要求为它们保证容量，Hadoop将回收分配给队列的超额资源。为了防止队列使用比分配的容量更多的资源，可以设置队列弹性的上线。

8>.容量调度器的元素

　　以上我们了解了容量调度器的基本配置元素，接下来我们探讨如何在集群中设置调度器，需要做两件事:
　　　　(1)设置队列;
　　　　(2)配置队列的容量;

　　容量调度器配置文件(${HADOOP_HOME}/etc/hadoop/capacity-scheduler.xml)中的队列元素是容量调度器中关键的调度单位，一切都围绕它来做。因此，要配置容量调度器，必须首先配置队列。

　　容量调度器中可以有多个队列，每个队列具有以下特性:
　　　　(1)队列名称和完整队列路径名;
　　　　(2)子队列和应用程序的列表;
　　　　(3)用户列表及其资源分配限制;
　　　　(4)队列的保证容量和最大容量;
　　　　(5)队列的状态(运行或停止);
　　　　(6)队列的访问控制，格式为Access Control List(ACL);

　　可以在调度器的配置文件(${HADOOP_HOME}/etc/hadoop/capacity-scheduler.xml)中指定所有这些属性，该文件通常位于Hadoop安装目录的下的"etc/hadoop/"目录中。

　　温馨提示:
　　　　如下图所示，可以通过配置"${HADOOP_HOME}/etc/hadoop/yarn-site.xml"文件中的"yarn.admin.acl"属性控制谁可以通过"yarn rmadmin -refreshQueues"命令来更新"capacity-scheduler.xml"文件。
　　　　　　<property>
　　　　　　　　<name>yarn.admin.acl</name>
　　　　　　　　<value>yinzhengjie</value>
　　　　　　　　<description>用于指定谁可以管理YARN集群的ACL，默认值为"*"，即任何用户都可以用来管理Hadoop集群.</description>
　　　　　　</property>

[root@hadoop101.yinzhengjie.com ~]# yarn rmadmin -help
rmadmin is the command to execute YARN administrative commands.
The full syntax is: 

yarn rmadmin [-refreshQueues] [-refreshNodes [-g|graceful [timeout in seconds] -client|server]] [-refreshNodesResources] [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings
] [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] [-addToClusterNodeLabels <"label1(exclusive=true),label2(exclusive=false),label3">] [-removeFromClusterNodeLabels <label1,label2,label3>] [-replaceLabelsOnNode <"node1[:port]=label1,label2 node2[:port]=label1"> [-failOnUnknownNodes]] [-directlyAccessNodeLabelStore] [-refreshClusterMaxPriority] [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help [cmd]]
   -refreshQueues: Reload the queues' acls, states and scheduler specific properties. 
        ResourceManager will reload the mapred-queues configuration file.
   -refreshNodes [-g|graceful [timeout in seconds] -client|server]: Refresh the hosts information at the ResourceManager. Here [-g|graceful [timeout in seconds] -client|server] is optional,
 if we specify the timeout then ResourceManager will wait for timeout before marking the NodeManager as decommissioned. The -client|server indicates if the timeout tracking should be handled by the client or the ResourceManager. The client-side tracking is blocking, while the server-side tracking is not. Omitting the timeout, or a timeout of -1, indicates an infinite timeout. Known Issue: the server-side tracking will immediately decommission if an RM HA failover occurs.   -refreshNodesResources: Refresh resources of NodeManagers at the ResourceManager.
   -refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
   -refreshUserToGroupsMappings: Refresh user-to-groups mappings
   -refreshAdminAcls: Refresh acls for administration of ResourceManager
   -refreshServiceAcl: Reload the service-level authorization policy file. 
        ResourceManager will reload the authorization policy file.
   -getGroups [username]: Get the groups which given user belongs to.
   -addToClusterNodeLabels <"label1(exclusive=true),label2(exclusive=false),label3">: add to cluster node labels. Default exclusivity is true
   -removeFromClusterNodeLabels <label1,label2,label3> (label splitted by ","): remove from cluster node labels
   -replaceLabelsOnNode <"node1[:port]=label1,label2 node2[:port]=label1,label2"> [-failOnUnknownNodes] : replace labels on nodes (please note that we do not support specifying multiple lab
els on a single host for now.)        [-failOnUnknownNodes] is optional, when we set this option, it will fail if specified nodes are unknown.
   -directlyAccessNodeLabelStore: This is DEPRECATED, will be removed in future releases. Directly access node label store, with this option, all node label related operations will not conn
ect RM. Instead, they will access/modify stored node labels directly. By default, it is false (access via RM). AND PLEASE NOTE: if you configured yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or HDFS), this option will only work when the command run on the machine where RM is running.   -refreshClusterMaxPriority: Refresh cluster max priority
   -updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): Update resource on specific node.
   -help [cmd]: Displays help for the given command or all commands if none is specified.

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

[root@hadoop101.yinzhengjie.com ~]#

[root@hadoop101.yinzhengjie.com ~]# yarn rmadmin -help