Hadoop Hive - 如何“添加 jar”以与 Hive JDBC 客户端一起使用？答案

【问题标题】：Hadoop Hive - How can I 'add jar' for use with the Hive JDBC client?Hadoop Hive - 如何“添加 jar”以与 Hive JDBC 客户端一起使用？
【发布时间】：2012-01-17 17:20:02
【问题描述】：

所以，我有 hdfs 和 hive 一起工作。我还有用于 Hive 的 jdbc 驱动程序，以便我可以进行远程 jdbc 调用。

现在，我添加了 Hive 用户定义函数 (UDF)。它在 CLI 中运行良好......我什至通过 .hiverc 文件自动加载 jar 和相关函数。但是，我无法使用 hive jdbc 驱动程序使其工作。我认为它也会使用 .hiverc 文件（默认情况下，位于 /usr/lib/hive/bin/），但它似乎不起作用。我还尝试通过“添加 jar”SQL 命令作为第一件事添加它，但无论我将 jar 文件放在哪里，我都会在 hive.log 中收到一个错误，即找不到该文件。

有人知道怎么做吗？我正在使用 Cloudera Distribution (CDH3u2)，它使用 Hive-0.7.1。

谢谢，提前。

【问题讨论】：

标签： jdbc hadoop hive hdfs

【解决方案1】：

根据 Hive 开发者邮件列表，在当前的 Hive 版本 (0.9) 中没有解决此问题的方法。为了解决这个问题，我使用了一个连接工厂类，该类在每次启动连接会话时正确注册 jars 和函数。下面的代码效果很好：

    package com.rapidminer.operator.bigdata.runner.helpers;
import java.sql.*;

/** A Hive connection factory utility 
@author Marcelo Beckmann
*/
public class ConnectionFactory {

private static ConnectionFactory instance;

/** Basic attributes to make the connection*/
public String url = "jdbc:hive://localhost:10000/default";
public final String DRIVER = "org.apache.hadoop.hive.jdbc.HiveDriver";

public static ConnectionFactory getInstance(){
    if (instance==null)
        instance = new ConnectionFactory();
    return instance;
}
private ConnectionFactory()
{}
/**
 * Obtains a hive connection.
 * Warning! To use simultaneous connection from the Thrift server, you must change the
 * Hive metadata server from Derby to other database (MySQL for example).
 * @return
 * @throws Exception
 */
public Connection getConnection() throws Exception {

    Class.forName(DRIVER);

    Connection connection = DriverManager.getConnection(url,"","");

    runInitializationQueries(connection);
    return connection;
}

/**
 * Run initialization queries after the connection be obtained. This initialization was done in order
 * to workaround a known Hive bug (HIVE-657).
 * @throws SQLException
 */
private void runInitializationQueries(Connection connection) throws SQLException
{
    Statement stmt = null;
    try {
        //TODO Get the queries from a .hiverc file
        String[] args= new String[3];
        args[0]="add jar /home/hadoop-user/hive-0.9.0-bin/lib/hive-beckmann-functions.jar";  
        args[1]="create temporary function row_number as 'com.beckmann.hive.RowNumber'"; 
        args[2]="create temporary function sequence as 'com.beckmann.hive.Sequence'";
        for (String query:args)
        {
            stmt.execute(query);
        }
    }
    finally {
        if (stmt!=null)
            stmt.close();
    }

}
}

【讨论】：

【解决方案2】：

我也使用 JDBC 驱动程序连接到 Hive。我将我的 jar scp 到集群的主节点上，这也是 Hive 的安装位置，然后在我的 add jar 命令中使用文件的绝对路径（在主节点上）。我通过 JDBC 驱动程序发出 add jar 命令，就像任何其他 HQL 命令一样。

【讨论】：

【解决方案3】：

我认为 JDBC 驱动程序使用 Thrift，这意味着 JAR 可能需要位于 Thrift 服务器（您在 conn 字符串中连接到的 hive 服务器）上，并且位于那里的 hive 类路径中。

【讨论】：

感谢您的快速回复。但是，我不确定我会在哪里这样做。我试图将这些 jar 添加到服务器的 hadoop-env.sh 文件 (CLASSPATH) 以及 hive-env.sh 文件 (CLASSPATH) 中。这些似乎都不起作用。