我发现了一种“合理”的方法来获得它。显然,最好的方法是让 Spark 库将它们已经直接获取到启动器应用程序的ApplicationReport 公开,因为它们会遇到设置委托令牌等的麻烦。但是,这似乎不太可能发生。
这种方法是双管齐下的。首先,它尝试自己构建一个YarnClient,以获取ApplicationReport,这将具有权威的跟踪URL。但是,根据我的经验,这可能会失败(例如:如果作业在 CLUSTER 模式下运行,并且在 Kerberized 环境中使用 --proxy-user,那么这将无法正确向 YARN 进行身份验证。
在我的例子中,我从驱动程序本身调用这个辅助方法,并将结果报告回我的启动器应用程序。但是,原则上,您拥有 Hadoop Configuration 的任何地方都应该可以工作(可能包括您的启动器应用程序)。显然,您可以根据您的需要和对复杂性、额外处理等的容忍度来使用此实现的“叉”(或两者)。
/**
* Given a Hadoop {@link org.apache.hadoop.conf.Configuration} and appId, use the YARN API (via an
* {@link YarnClient} instance) to get the application report, which includes the trackingUrl. If this fails,
* then as a fallback, it attempts to "guess" the URL by looking at various YARN configuration properties,
* and assumes that the URL will be something like: <pre>[yarnWebUI:port]/proxy/[appId]</pre>.
*
* @param hadoopConf the Hadoop {@link org.apache.hadoop.conf.Configuration}
* @param appId the YARN application ID
* @return the app trackingUrl, either retrieved using the {@link YarnClient}, or manually constructed using
* the fallback approach
*/
public static String getYarnApplicationTrackingUrl(org.apache.hadoop.conf.Configuration hadoopConf, String appId) {
LOG.debug("Attempting to look up YARN url for applicationId {}", appId);
YarnClient yarnClient = null;
try {
// do not attempt to fail over on authentication error (ex: running with proxy-user and Kerberos)
hadoopConf.set("yarn.client.failover-max-attempts", "0");
yarnClient = YarnClient.createYarnClient();
yarnClient.init(hadoopConf);
yarnClient.start();
final ApplicationReport report = yarnClient.getApplicationReport(ConverterUtils.toApplicationId(appId));
return report.getTrackingUrl();
} catch (YarnException | IOException e) {
LOG.warn(
"{} attempting to get report for YARN appId {}; attempting to use manually constructed fallback",
e.getClass().getSimpleName(),
appId,
e
);
String baseYarnWebappUrl;
String protocol;
if ("HTTPS_ONLY".equals(hadoopConf.get("yarn.http.policy"))) {
// YARN is configured to use HTTPS only, hence return the https address
baseYarnWebappUrl = hadoopConf.get("yarn.resourcemanager.webapp.https.address");
protocol = "https";
} else {
baseYarnWebappUrl = hadoopConf.get("yarn.resourcemanager.webapp.address");
protocol = "http";
}
return String.format("%s://%s/proxy/%s", protocol, baseYarnWebappUrl, appId);
} finally {
if (yarnClient != null) {
yarnClient.stop();
}
}
}