【问题标题】:Using R to process CSV to evaluate if ((ColA != ColB) with consideration for ColC使用 R 处理 CSV 以评估 if ((ColA != ColB) 并考虑 ColC
【发布时间】:2015-07-26 11:06:07
【问题描述】:

我正在尝试跨两列进行简单的字符串比较。 (模拟)数据样本:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015

我要使用的逻辑是:

If From_DeptCode = To_DeptCode 
      then ChangeType="No Change" 
ElseIf From_DeptCode != To_DeptCode AND TransactionType = "Reorg" 
      then ChangeType="Reorg"
Else ChangeType="Transfer"

所以我的输出看起来像:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012,Transfer
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013,No Change
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014,No Change
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011,Reorg
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010,Transfer
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015,No Change

这是我目前所知道的:

transfers <- read.csv(file="Transfers.csv", head=TRUE,
    sep=",",colClasses=c(NA,NA,NA,NA,NA,NA,NA,"Date",NA))

在这一点上,我假设,我会实现我的逻辑:

If From_DeptCode = To_DeptCode 
      then ChangeType="No Change" 
ElseIf From_DeptCode != To_DeptCode AND TransactionType = "Reorg" 
      then ChangeType="Reorg"
Else ChangeType="Transfer"

我假设我会在这里写出我的新 csv write.csv(transfers, file = "transfersprocessed.csv", row.names = FALSE)

有什么建议可以继续前进吗?

更新:

根据@josilber 的回答,我运行了以下代码:

transfers <- read.csv(file="Transfers.csv", head=TRUE, sep=",", colClasses=c(NA,NA,NA,NA,NA,NA,NA,"Date",NA))

dat$ChangeType <- ifelse(dat$From_DeptCode == dat$To_DeptCode, "No Change",ifelse(dat$TransactionType == "Reorg", "Reorg", "Transfer"))

View(transfers)

关于以下数据:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015

ChangeType 变量仍然是“NA”。

嵌套的 ifelse 语句语法是否正确?知道为什么 ChangeType 不起作用吗?

【问题讨论】:

    标签: r csv string-comparison


    【解决方案1】:

    您可以使用嵌套的ifelse 语句来做到这一点:

    dat$ChangeType <- ifelse(dat$From_DeptCode == dat$To_DeptCode, "No Change",
                             ifelse(dat$TransactionType == "Reorg", "Reorg", "Transfer"))
    dat
    #       EMPLID From_DeptCode     FromDept To_DeptCode         To_Dept TransactionTypeCode
    # 1  239583290            21        Sales          43 CustomerService                  10
    # 2 1230495829            21        Sales          21           Sales                  10
    # 3 4059503918            93   Operations          93      Operations                  10
    # 4 3040593021            19 Headquarters          23   International                  11
    # 5 7029406920            15    Marketing          84     Development                  19
    # 6 2039052819            19 Headquarters          19    Headquarters                  10
    #   TransactionType EffectiveDate ChangeType
    # 1       Promotion    12/12/2012   Transfer
    # 2       Promotion      9/1/2013  No Change
    # 3        Demotion    11/18/2014  No Change
    # 4           Reorg    12/13/2011      Reorg
    # 5    Reassignment    01/05/2010   Transfer
    # 6       Promotion     4/15/2015  No Change
    

    ifelse 被传递一个 TRUE/FALSE 值的向量作为其第一个参数,对于 TRUE 情况使用第二个参数,对于 FALSE 情况使用第三个参数。对于您的错误案例,您实际上想要运行另一个 ifelse,这就是逻辑嵌套在这里的原因。

    请注意,对于大型数据帧,这比遍历数据并一次执行嵌套 if 语句要快得多。

    【讨论】:

    • 不错的解决方案。很高兴看到 R 如何自动尝试矢量化。在这种情况下,无需告诉 R 要查看哪一行。太棒了!
    • 如果不是打印 dat$From_DeptCode == dat$To_DeptCode 的“无变化”,我希望它通过或根本不打印该行。我还会使用这种格式吗?
    • @user1694958 这可以通过首先生成 ChangeType 变量和基于变量值的子集来完成(例如 dat2 &lt;- subset(dat, ChangeType != "No Change"))。
    • @user1694958 您的代码没有出现在评论中。无论如何,听起来您现在正在问一个新问题(如何根据某些条件对数据集进行子集化)。在 Stack Overflow 上,最好将其作为一个单独的问题提出,而不是将其链接到现有的问题上。
    • @josilber,您对嵌套 ifelse 语句的建议无效。 ChangeType 代码仍为 NA。我如何在此处发布我的代码和确切数据而不将其记录为答案?内容的字符限制。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-11-04
    • 2021-08-28
    • 1970-01-01
    • 2018-11-29
    • 2018-10-14
    • 2021-04-29
    • 1970-01-01
    相关资源
    最近更新 更多