【问题标题】:How do i read a .txt file into R with different separators, and run on lines?如何使用不同的分隔符将 .txt 文件读入 R 并在线运行?
【发布时间】:2021-03-03 06:31:44
【问题描述】:

我有一个以下格式的大型 .txt 文件,显示大量用户的日期、用户和产品评论;

    YYYY:MM:D1 @Username1: this is a product review
    YYYY:MM:D1 @Username2: this is also a product review
    YYYY:MM:D1 @Username3: this is also a product review that
    runs to the next line
    YYYY:MM:D1 @Username4: this here is also a product review

我想将其提取到具有 3 列的数据框,如下所示:

    date/time      username      comment
    yyyy/mm/dd     @Username1    this is a product review   
    yyyy/mm/dd     @Username2    this is also a product review   
    yyyy/mm/dd     @Username3    this is also a product review contained in the same row
    yyyy/mm/dd     @Username4    this here is also a product review

使用标准的 R 基本命令

    read.table("filename.txt", fill=TRUE)

给我一​​个数据框,它将产品评论中的每个单词视为不同的列。它还将评论变成足够长的“连续行”进入新行,即

    V1          V2          V3          V4          V5          
    yy/mm/dd    Username1   this        is          a 
    product     review 
    ...

任何帮助表示赞赏!

【问题讨论】:

  • 列之间的空格是否只是一个空格字符(这是愚蠢的)?如果是这样,您只能在不带分隔符的情况下导入(即,作为一列)和strsplit,然后使用正则表达式。
  • tidyr 包中还有separate_rows() 用于精确的字符串拆分,@Roland 描述了这一点。

标签: r dataframe text read.table


【解决方案1】:

您可以通过几种不同的方式解决此问题。一种方法是将数据导入单个列,然后使用tidyr::separatedata.table::strsplitsplit the column at the appropriate places。这是tidyr 的示例:

# Use a separator symbol that is unlikely to appear in the file,
# to read the data into a single column:
data <- read.table("filename.txt", sep = "^")

# First split the column at the @-sign, and then at the ": "-part:
library(tidyr)
data %>% separate(V1,
                into = c("Date", "User"),
                sep = " @") %>%
    separate(User,
        into = c("User", "Review"),
        sep = ": ") -> data

# If you want to add back the @-sign to the usernames:
data$User <- paste("@", data$User, sep = "")

【讨论】:

  • 谢谢@MansT!作品精美而清晰的解释
  • @human12 如果答案有帮助,请随时单击左侧的复选标记接受答案。每个帖子只能接受一个答案。参考 - stackoverflow.com/help/someone-answers
  • 我已经对答案投了赞成票,只是因为我是新用户而没有出现
猜你喜欢
  • 1970-01-01
  • 2014-06-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-03-07
  • 2021-10-15
  • 2021-10-29
相关资源
最近更新 更多