Apache gora，在减速器中设置新表名的位置答案

【问题标题】：Apache gora, where to set new table name in reducerApache gora，在减速器中设置新表名的位置
【发布时间】：2019-08-27 03:36:18
【问题描述】：

我有一个应用程序，它基本上是一个使用 Apache Gora 的 Hbase Mapreduce 作业。我是一个非常简单的案例，我想将一个 Hbase 表数据复制到一个新表中。在哪里写新的表名。我已经查看了this Guide，但找不到放置新表名的位置。以下是代码sn -p，

/* Mappers are initialized with GoraMapper.initMapper() or 
   * GoraInputFormat.setInput()*/
  GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class,
      LogAnalyticsMapper.class, true);

  /* Reducers are initialized with GoraReducer#initReducer().
   * If the output is not to be persisted via Gora, any reducer 
   * can be used instead. */
  GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);

对于这种情况，简单的 MR 工作非常简单。

【问题讨论】：

标签： java hadoop mapreduce hbase gora

【解决方案1】：

我会将您重定向到tutorial，但我会尝试在此处澄清:)

表名在您的映射中定义。检查Table Mappings。也许您有一个名为gora-hbase-mapping.xml 的文件，其中定义了映射。应该是这样的：

<table name="Nameofatable">
...
<class name="blah.blah.EntityA" keyClass="java.lang.Long" table="Nameofatable">

您可以在此处配置表名（如果您找到两者，请输入相同的名称）。可以有多个<table> 和<class>。也许一个用于您的输入，一个用于您的输出。

之后，您必须实例化您的输入/输出数据存储 inStore 和 outStore。教程有点乱，inStore 和 outStore 的创建得到了to the wrong section。您只需执行以下操作：

inStore = DataStoreFactory.getDataStore(String.class, EntityA.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(Long.class, OtherEntity.class, hadoopConf);

解释“以另一种方式”：

您使用DataStoreFactory.getDatastore(key class, entity class, conf). 实例化数据存储
请求的实体类在gora-hbase-mapping.xml 中查找<class name="blah.blah.EntityA"。
在<class> 中，它是属性table=。 那是你的表名 :)

所以：你用它的表名定义一个实体作为输入，你用它的表名定义一个实体作为输出

编辑 1：

如果实体类相同，但表名不同，我能想到的唯一解决方案是创建两个具有相同架构的类Entity1 和Entity2，并在您的gora-hbase-mapping.xml 创建两个 <table> 和 <class>。然后实例化商店，例如：

inStore = DataStoreFactory.getDataStore(String.class, Entity1.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(String.class, Entity2.class, hadoopConf);

它不是很干净，但它应该可以工作：\

编辑 2（不适用于这个问题）：

如果源表和目标表相同，则有一个允许此行为的 initReducerJob 版本。示例在Nutch's GeneratorJob.java：

StorageUtils.initMapperJob(currentJob, fields, SelectorEntry.class, WebPage.class, GeneratorMapper.class, SelectorEntryPartitioner.class, true);
StorageUtils.initReducerJob(currentJob, GeneratorReducer.class);

【讨论】：

谢谢兄弟。对于我的用例，它实际上是 Apache Nutch。我在你的指导下思考。但是我如何在一项工作中处理具有相同架构的两个不同的表（源 + 接收器）。就我而言，我必须将一个 Hbase 表（由 Nutch 创建）复制到一个新表中。
在我的例子中，两个实体都属于同一个类。
更新了答案。基本上，将输入和输出视为不同的类（尽管几乎相等，但必须创建另一个实体类）