将多个前缀行过滤器设置为扫描仪 hbase java答案

【问题标题】：Set Multiple prefix row filter to scanner hbase java将多个前缀行过滤器设置为扫描仪 hbase java
【发布时间】：2016-08-22 12:45:33
【问题描述】：

我想创建一个扫描仪，它会给我带有 2 个前缀过滤器的结果
例如，我想要其键以字符串“x”开头或以字符串“y”开头的所有行。
目前我知道只用一个前缀通过以下方式做到这一点：

scan.setRowPrefixFilter(prefixFiltet)

【问题讨论】：

标签： java hadoop mapreduce hbase

【解决方案1】：

在这种情况下你不能使用setRowPrefixFilter API，你必须使用更通用的setFilter API，比如：

scan.setFilter(
  new FilterList(
    FilterList.Operator.MUST_PASS_ONE, 
    new PrefixFilter('xx'), 
    new PrefixFilter('yy')
  )
);

【讨论】：

关于此解决方案性能的说明：您正在执行全表扫描并将所有行都通过这些过滤器。一般来说，这是非常低效的。如果表很大，并且只有少数几个前缀，使用scan.setRowPrefixFilter(prefix) 进行多次扫描可能会更快。

【解决方案2】：

我刚刚尝试过，但似乎您不能将正则表达式添加到 RowPrefixFilter，所以我想解决方案是使用

发出两个请求

scan.setRowPrefixFilter("x")
scan.setRowPrefixFilter("y")

这将为您提供所需的行。

【讨论】：

如果您这样做，它将仅返回以“y”开头的键，因为您覆盖了“x”。我的目标是获得 1 个具有两个键结果的 Scan 对象。
糟糕，我忘了补充一点，您应该分别执行每个扫描。尝试执行一个，存储结果，并添加第二个的结果

【解决方案3】：

我已经实现了一个批量设置前缀过滤器，也许可以帮助你

    List<String> bindCodes = new ArrayList<>();
    bindCodes.add("CM0001");
    bindCodes.add("CE7563");
    bindCodes.add("DR6785");

    Scan scan = new Scan();
    scan.setCaching(50);//set get batch numbers
    //set Column
    scan.addColumn(HTableColumnEnum.GPS_CF_1.getCfName().getBytes(), LOCATION_CREATE_DATE_ARRAY);
    //set Family
    scan.addFamily(HTableColumnEnum.GPS_CF_1.getCfName().getBytes());

    //create filterList
    FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);
    //put mulit prefix row key
    bindCodes.forEach(s -> {
        filterList.addFilter(new PrefixFilter(Bytes.toBytes(s)));
    });

    //set filterList to scan
    scan.setFilter(filterList);

【讨论】：