【问题标题】:How do I get CAS to update a small subset of record properties during a partial update?如何让 CAS 在部分更新期间更新一小部分记录属性?
【发布时间】:2015-01-22 20:15:48
【问题描述】:

我在 Oracle Commerce 11.1 上,在仅使用 CAS(没有 Forge)运行的应用程序上。

基线更新工作正常。我对部分更新有疑问。

我们有一个提取文件,其中包含需要更新的记录子集。但是,此文件仅列出了每条记录的一小部分属性(即,它仅提供实际更改的属性)。

当我进行部分更新(使用仅 CAS 部署模板附带的默认机制)时,它成功完成,但更新的记录仅包含文件中提供的字段子集 - 所有字段没有改变只是缺少。就好像 CAS 只是将现有记录(具有完整的属性集)替换为仅包含提取文件中少数属性的新记录。

例如,假设其中一条记录如下所示:

Record 23
---------
id 23
name Test
inventoryCount 23
buyable 1
imageUrl test.jpg

并说部分提取文件有这样的条目

Record 23
---------
id 23
inventoryCount 10

部分更新后我得到的结果是这样的:

Record 23
---------
id 23
inventoryCount 10

我想知道如何让 CAS 保留这些属性而不是删除它们。我知道 Forge 可以做到这一点。

【问题讨论】:

    标签: etl endeca


    【解决方案1】:

    我已经确认没有真正明确的机制来执行此操作,因此我发明了自己的机制。

    总结一下它的工作原理:我自定义了 PartialUpdate beanshell 脚本,以便在最后一英里爬网运行之后,它立即调用我创建的名为 DGIDXTransformer 的自定义组件(即它扩展了 CustomComponent)。此类解压缩并解析 last-mile-crawl 创建的文件,该文件应该输入 DGIDX 并写出该文件的修改版本。具体来说,它会修改所有更新信息,以便更新记录而不是替换为新属性。这有点 hacky,因为 DGIDX 输入文件的格式没有记录,但根据我的研究,在 Endeca 的未来版本中格式不太可能发生很大变化。

    这是 DGIDXTransformer:

    import com.endeca.soleng.eac.toolkit.component.*;
    import org.apache.logging.log4j.LogManager;
    import org.apache.logging.log4j.Logger;
    
    import java.io.*;
    import java.nio.file.AccessDeniedException;
    import java.nio.file.Files;
    import java.util.Map;
    import java.util.zip.GZIPInputStream;
    import java.util.zip.GZIPOutputStream;
    
    /**
     * Custom component which runs during the PartialUpdate beanshell script. It transforms the DGIDX-compatible input file
     * that CAS produces so that records will be updated instead of replaced.
     *
     * Expects only one property entry called "dgidxInputFileDirectory", specifying the directory to look in to
     * find the file to transform (relative to the config directory).
     *
     * @author chairbender
     */
    public class DGIDXTransformer extends CustomComponent {
        private static final String DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME = "dgidxInputFileDirectory";
        private static final String RECORD_SPEC_PROPERTY_NAME = "record.spec";
    
        /**
         * Does the transformation as specified in the class javadoc.
         */
        public void transformDGIDXInputFileToUpdateInsteadOfReplace() throws Exception {
            //Find the file in the directory
            Map<String, String> properties = getProperties();
            if (null == properties || !properties.containsKey(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME)) {
                throw new Exception();
            } else {
                File directory = new File(properties.get(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME));
                File[] gzipFiles = directory.listFiles(new FilenameFilter() {
                    @Override
                    public boolean accept(File dir, String name) {
                        return name.endsWith(".xml.gz");
    
                    }
                });
                if (gzipFiles == null || gzipFiles.length == 0) {
                    throw new Exception();
                } else {
                    File gzipFile = gzipFiles[0];
                    File unzippedFile = unzipFile(gzipFile);
    
                    transformInputFile(unzippedFile, unzippedFile.getAbsolutePath().replace(".xml", "transformed.xml"));
    
                    //delete the extra files in a way that throws an exception if deletion fails
                    Files.delete(gzipFile.toPath());
                    Files.delete(unzippedFile.toPath());
    
                }
            }
    
    
    
        }
    
        /**
         * Gzips the passed file and saves it at the specified location
         * @param toGzip file to gzip
         * @param outputPath where to output the gzipped file
         *
         */
        private void gzipFile(File toGzip,String outputPath) throws IOException {
            byte[] buffer = new byte[1024];
    
            GZIPOutputStream gzipOutputStream =
                    new GZIPOutputStream(new FileOutputStream(outputPath,false));
    
            FileInputStream inputStream =
                    new FileInputStream(toGzip);
    
            int len;
            while ((len = inputStream.read(buffer)) > 0) {
                gzipOutputStream.write(buffer, 0, len);
            }
    
            inputStream.close();
    
            gzipOutputStream.finish();
            gzipOutputStream.close();
            inputStream.close();
        }
    
        /**
         *
         * @param unzippedFile file representing DGIDX input data to transform
         * @param transformedFilePath path where transformed file should go.
         * @return the transformed file
         */
        private File transformInputFile(File unzippedFile, String transformedFilePath) throws IOException {
            File outputFile = new File(transformedFilePath);
    
            //Since the XML and the transformation isn't very complicated, we'll just write it out line by line as we go through the
            //unzipped file line-by-line
            BufferedReader unzippedFileReader = new BufferedReader(new FileReader(unzippedFile));
            BufferedWriter outputFileWriter = new BufferedWriter(new FileWriter(outputFile));
    
            String nextLine;
            while ((nextLine = unzippedFileReader.readLine()) != null) {
                if (nextLine.contains("RECORD_ADD_OR_REPLACE")) {
                    //If the line contains RECORD_ADD_OR_REPLACE, need to change it to RECORD_UPDATE
                    outputFileWriter.write(nextLine.replace("RECORD_ADD_OR_REPLACE","RECORD_UPDATE"));
                } else if (nextLine.contains("<PROP NAME=")) {
                    //if this line contains <PROP NAME="...">, and the property
                    //name isn't the record spec, we need to transform this element only if it isn't the record spec.
                    String propertyName = nextLine.split("\"")[1];
                    if (!propertyName.equals(RECORD_SPEC_PROPERTY_NAME)) {
                        //Read the property value from the next line
                        String propertyValueLine = unzippedFileReader.readLine();
                        String propertyValue = propertyValueLine.replace("<PVAL>","").replace("</PVAL>","").trim();
    
                        //Now write the PVAL_DELETE and PVAL_ADD entries
                        outputFileWriter.write("<PVAL_DELETE><PROPERTY_NAME NAME=\"" + propertyName + "\"/></PVAL_DELETE>");
                        outputFileWriter.write("<PVAL_ADD><PROP NAME=\"" + propertyName + "\"><PVAL>" + propertyValue + "</PVAL></PROP></PVAL_ADD>");
    
                        //Discard the closing element line of the input file
                        unzippedFileReader.readLine();
                    } else {
                        //it's not the record spec, so don't transform it.
                        outputFileWriter.write(nextLine);
                    }
                } else {
                    //Just output the line
                    outputFileWriter.write(nextLine);
                }
            }
            unzippedFileReader.close();
            outputFileWriter.close();
            return outputFile;
        }
    
        /**
         *
         * @param gzipFile file to un-gzip. Will create the un-gzipped version in the same directory as gzipFile,
         *                 but without the ".gz" ending.
         * @return the unzipped version of the file.
         */
        private File unzipFile(File gzipFile) throws IOException {
            //Un-gzip the file in one pass
            GZIPInputStream gzipInputStream =
                    new GZIPInputStream(new FileInputStream(gzipFile));
            File outputFile = new File(gzipFile.getAbsolutePath().replace(".gz",""));
            FileOutputStream outputStream =
                    new FileOutputStream(outputFile);
    
            int len;
            byte[] buffer = new byte[1024];
            while ((len = gzipInputStream.read(buffer)) > 0) {
                outputStream.write(buffer, 0, len);
            }
    
            gzipInputStream.close();
            outputStream.close();
    
            return outputFile;
        }
    
    
    }
    

    这被编译成一个 JAR 文件,放在 config/lib/java 中。

    这是 DataIngest.xml 中的自定义组件定义:

    <custom-component id="DGIDXTransformer" host-id="ITLHost" class="com.chairbender.DGIDXTransformer">
        <properties>
            <property name="dgidxInputFileDirectory" value="../data/cas_output" />
        </properties>
    </custom-component>
    

    这是自定义 PartialUpdate 脚本的相关部分:

      CAS.runIncrementalCasCrawl("${lastMileCrawlName}");     
      DGIDXTransformer.transformDGIDXInputFileToUpdateInsteadOfReplace();     
      CAS.archiveDvalIdMappingsForCrawlIfChanged("${lastMileCrawlName}");
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多