革命/rxMerge 和行的重复答案

【问题标题】：Revolution/rxMerge and duplication of rows革命/rxMerge 和行的重复
【发布时间】：2015-11-18 23:35:35
【问题描述】：

我正在尝试在子集具有基于变量的重复 ID 的长表后合并两个 xdf 文件。

假设我有两列：id 和 type

我根据type = 'type1' 对原始 xdf 表进行子集化，并获取第一个 xdf 文件我根据 type = 'type2' 对原始 xdf 表进行子集化，并获取第二个 xdf 文件

第一个 xdf 文件看起来像（有很多不同的 ID，但我在下面的示例中显示了一个 ID）

id type1
__ ____
1    5

第二个 xdf 文件看起来像（有很多不同的 ID，但我在下面的示例中显示了一个 ID）

id type2
__ ____
1    3

然后，我将两个 xdf 文件合并到另一个 xdf 文件中

rxMerge(file1, file2, outFile = final, autoSort = FALSE, matchVars = 'id', type = 'full', overwrite = TRUE)

我得到两条 id = 1 的记录

id type1 type2
__ ____ ______
1    5    NA

1    NA    3

我期待

id type1 type2
__ ____ ______
1    5    3

我做错了什么？

【问题讨论】：

标签： r revolution-r

【解决方案1】：

嗯...在 RRE 7.4.1 中，您给出的示例对我有用：

# Example data
x <- data.frame(id = 1, type1 = 5)
y <- data.frame(id = 1, type2 = 3)

# Creating XDFs for the example data
file1 <- tempfile(fileext = ".xdf")
rxImport(inData = x, outFile = file1)

file2 <- tempfile(fileext = ".xdf")
rxImport(inData = y, outFile = file2)

# Merging into a third XDF
final <- tempfile(fileext = ".xdf")

rxMerge(inData1 = file1, 
        inData2 = file2, 
        outFile = final, 
        autoSort = FALSE, 
        matchVars = 'id',
        type = 'full',
        overwrite = TRUE)

# Check the output
rxDataStep(final)

所以很难知道会发生什么。当你设置autoSort = TRUE 时会发生什么？你运行的是什么版本的 RRE？（可以通过加载RevoScaleR并运行sessionInfo()获取版本号）

【讨论】：