如何在 Pentaho 中完全删除用户定义的 Java 类中的特定输入字段答案

【问题标题】：How to completely remove specific input fields in User Defined Java Class in Pentaho如何在 Pentaho 中完全删除用户定义的 Java 类中的特定输入字段
【发布时间】：2020-01-30 09:19:11
【问题描述】：

我不明白在 Pentaho 数据集成中使用 用户定义的 Java 类时如何完全删除特定的输入字段。

假设我有输入字段 A、B 和 C。假设我想连接 B 和 C 中的值（用空格分隔），将结果写入 C，只留下名称为 A 和 C 的字段没有名称为 B 的字段（真正的问题要复杂得多）。我了解如何将结果写入字段 C，但我不知道如何完全删除字段 B。

private String outFieldName1;
private String outFieldName2;
private String removeFieldName;

private int outFieldIndex1;
private int outFieldIndex2;
private int removeFieldIndex;

private Object[] inputRow;

private int inputRowMetaSize;
private int outputRowMetaSize;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    inputRow = getRow();
    if (inputRow == null) {
        setOutputDone();
        return false;
    }

    if (first) processMetadata();

    pushOutputRow( get(Fields.In, removeFieldName).getString(inputRow) + " "
                 + get(Fields.In, outFieldName2).getString(inputRow));

    return true;
}

private void processMetadata() throws KettleException {
    outFieldName1 = getParameter("OUT1");
    outFieldName2 = getParameter("OUT2");
    removeFieldName = getParameter("REMOVE");

    outFieldIndex1 = getInputRowMeta().indexOfValue(outFieldName1);
    outFieldIndex2 = getInputRowMeta().indexOfValue(outFieldName2);
    removeFieldIndex = getInputRowMeta().indexOfValue(removeFieldName);

    inputRowMetaSize = data.inputRowMeta.size();
    outputRowMetaSize = data.outputRowMeta.size();

    first=false;
}


private void pushOutputRow(String content) throws KettleException {
    Object[] outRow = RowDataUtil.allocateRowData(outputRowMetaSize);

    for (int fieldN=0; fieldN < inputRow.length; ++fieldN) {
        if(fieldN == outFieldIndex1) {
            outRow[fieldN] = inputRow[fieldN];
        } else if(fieldN == outFieldIndex2) {
            outRow[fieldN] = content;
        } else if(fieldN == removeFieldIndex) {
            outRow[fieldN] = "";
            // Unable to delete this row!
        }

    }

    putRow( data.outputRowMeta, outRow );
}

【问题讨论】：

首先，一个简单的注释，在步骤类别Transform 和Utility 之间，至少有30 个不同的步骤，在PDI 中，你最后的手段是'Modified Java Script Value'。由此看来，您可以结合使用 Concat Fields 和 Select Values 步骤。您也不必删除一列，您可以在输出中省略该列，因为根据您的工作量，删除一列可能会对转换造成很大的负担，而不仅仅是省略它。
感谢您的回复。我举了一个简单的例子，你可以检查我想要什么。这是我在这里的第一个问题。如有必要，我可以将我从中获取 java 代码的相同简单转换示例复制到某个地方。
我的实际任务比这复杂，使用UDJC步骤为我节省了10多个步骤并减少了处理的数据量，需要更改字段集的地方的数量，如有必要。对于我的任务，UDJC 的工作速度比在没有 IT 的情况下针对大量数据编写的转换更快且内存消耗更少。
在我的 UDJC 步骤之后，我必须使用 Select Step 并删除不必要的列。据我了解，Select Step 也是用 Java 编写的，并且以某种方式做到了。我不明白为什么将不必要的字段名称复制到输出然后在下一步（选择步骤）中删除它们比根本不传递它们要快？
如果“省略所述列”意味着UDJC步骤既不会输出字段名称也不会输出其值，那么这对我来说很合适，但我不明白我该怎么做。

标签： java pentaho-data-integration

【解决方案1】：

只需要：

将 data.outputRowMeta 保存在 RowMetaInterface 类型的变量中（在我的例子中为 rowMeta）；
为其调用rowMeta.removeValueMeta方法，使用要删除的字段的名称或索引；
使用rowMeta代替getInputRowMeta()搜索输出字段的索引和输出数据量；
在putRow()方法中，使用rowMeta作为第一个参数。

private String outFieldName1;
private String outFieldName2;
private String removeFieldName;

private int outFieldIndex1;
private int outFieldIndex2;

private Object[] inputRow;

private int inputRowMetaSize;
private int outputRowMetaSize;
private RowMetaInterface rowMeta;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    inputRow = getRow();
    if (inputRow == null) {
        setOutputDone();
        return false;
    }

    if (first) processMetadata();

    pushOutputRow( get(Fields.In, removeFieldName).getString(inputRow) + " "
                 + get(Fields.In, outFieldName2).getString(inputRow));

    return true;
}

private void processMetadata() throws KettleException {
    outFieldName1 = getParameter("OUT1");
    outFieldName2 = getParameter("OUT2");
    removeFieldName = getParameter("REMOVE");

    inputRowMetaSize = data.inputRowMeta.size();
    outputRowMetaSize = data.outputRowMeta.size();

    rowMeta = data.outputRowMeta;
    rowMeta.removeValueMeta(removeFieldName);

    outFieldIndex1 = rowMeta.indexOfValue(outFieldName1);
    outFieldIndex2 = rowMeta.indexOfValue(outFieldName2);

    outputRowMetaSize = rowMeta.size();

    first=false;
}

private void pushOutputRow(String content) throws KettleException {
    Object[] outRow = RowDataUtil.allocateRowData(outputRowMetaSize);

    for (int fieldN=0; fieldN < inputRow.length; ++fieldN) {

        if(fieldN == outFieldIndex1) {
            outRow[fieldN] = inputRow[fieldN];
        } else if(fieldN == outFieldIndex2) {
            outRow[fieldN] = content;
        }
    }

    putRow( rowMeta, outRow );
}

【讨论】：