【问题标题】:How to Cast String to Array of Structs in Hive?如何将字符串转换为 Hive 中的结构数组?
【发布时间】:2018-04-28 07:53:47
【问题描述】:

我在 hive 中的表具有如下架构:

DESCRIBE struct_demo;
+-------------------+-------------------------------+
| name              | type                          |
+-------------------+-------------------------------+
| lr_id             | string                        |
| segment_info      | ARRAY<struct<                 |
|                   |   idlpSegmentName:string,     |
|                   |   idlpSegmentValue:string >   |
|                   |      >                        |
|                   |                               |
+-------------------+-------------------------------+

我在 Redshift(或任何 Sql 数据库)中创建表 它为 hive 中的上述数据类型创建了具有类似格式的行, 但作为字符串。

在将数据从 redshift 插入 hive 时如何进行投射? 更具体地说,如何从字符串转换为结构数组?

我的 SQL 表:

lr_id    |          segment_info
---------|------------------------------------------------------------
1        |      [{"idlpsegmentname":"axciom","idlpsegmentvalue":"200"},{"idlpsegmentname":"people","idlpsegmentvalue":"z"}]

到目前为止,找不到任何符合要求的 udf。

【问题讨论】:

    标签: hadoop hive amazon-redshift


    【解决方案1】:

    总之,找到了解决办法。

    package hive;
    
    
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.List;
    
    import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
    import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    import org.apache.hadoop.io.Text;
    
    
    public class UAStructUDF extends GenericUDF {
    private Object[] result;
    
    @Override
    public String getDisplayString(String[] arg0) {
        return "My display string";
    }
    
    public static void main(String... args) {
        UAStructUDF ua = new UAStructUDF();
        ua.parseUAString("");
    }
    
    @Override
    public ObjectInspector initialize(ObjectInspector[] arg0) throws UDFArgumentException {
        // Define the field names for the struct<> and their types
        ArrayList<String> structFieldNames = new ArrayList<String>();
        ArrayList<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>();
        // fill struct field names
        // segmentname
        structFieldNames.add("idlpsegmentname");
        structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
        // segmentvalue
        structFieldNames.add("idlpsegmentvalue");
        structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
        StructObjectInspector si = ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames,
                structFieldObjectInspectors);
        return ObjectInspectorFactory.getStandardListObjectInspector(si);
        // return si;
    }
    
    @Override
    public Object evaluate(DeferredObject[] args) throws HiveException {
        if (args == null || args.length < 1) {
            throw new HiveException("args is empty");
        }
        if (args[0].get() == null) {
            throw new HiveException("args contains null instead of object");
        }
        Object argObj = args[0].get();
        // get argument
        String argument = null;
        if (argObj instanceof Text) {
            argument = ((Text) argObj).toString();
        } else if (argObj instanceof String) {
            argument = (String) argObj;
        } else {
            throw new HiveException(
                    "Argument is neither a Text nor String, it is a " + argObj.getClass().getCanonicalName());
        }
        // parse UA string and return struct, which is just an array of objects:
        // Object[]
        return parseUAString(argument);
    }
    
    private Object parseUAString(String argument) {
        String test = "acxiom_cluster,03|aff_celeb_ent,Y";
        List<Object[]> ret = new ArrayList<Object[]>();
        for (String s : test.split("\\|")) {
            String arr[] = s.split(",");
            Object[] o = new Object[2];
            o[0] = new Text(arr[0]);
            o[1] = new Text(arr[1]);
            ret.add(o);
        }
        return ret;
    }
    }
    

    【讨论】:

      猜你喜欢
      • 2020-12-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-03-16
      相关资源
      最近更新 更多