【问题标题】:Import data from .txt file to R将数据从 .txt 文件导入 R
【发布时间】:2019-09-04 08:18:51
【问题描述】:

我是 R 的新手,正在尝试将 .txt 文件转换为 R,请参阅下面的示例。非常感谢,如果我能找到传输数据的解决方案。

.txt 文件中的数据:

user_14:beneficiary_649,beneficiary_1312,beneficiary_1983,beneficiary_726,beneficiary_759,beneficiary_229,beneficiary_673,
user_13:beneficiary_1928,beneficiary_553,beneficiary_483,beneficiary_1127,beneficiary_2887
user_11:beneficiary_2158,beneficiary_871,beneficiary_1969,beneficiary_1120,beneficiary_185,beneficiary_2180

期望 R 中的输出为:

user_14                 user_13                  user_11
beneficiary_649     beneficiary_1928         beneficiary_2158
beneficiary_1312    beneficiary_553          beneficiary_871
beneficiary_1983    beneficiary_483          beneficiary_1969
beneficiary_726     beneficiary_1127         beneficiary_1120
beneficiary_759     beneficiary_2887         beneficiary_185
beneficiary_229     beneficiary_2180
beneficiary_673

【问题讨论】:

    标签: r


    【解决方案1】:

    一个可能的解决方案:

    使用 readLines 读取数据。修复 Tidyverse 的功能将所有内容放在它的位置。

    input <- readLines("text_file.txt") # read the data from the text file
    
    df <- data.frame(input = input, stringsAsFactors = F) # store it in a data.frame
    
    library(tidyr)
    library(dplyr)
    
    df %>% 
      separate(input, into = c("users", "data"), sep = ":") %>%  # split users and rest
      separate_rows(data, sep = ",") %>%  # build rows from data
      group_by(users) %>% # group by needed for creating row numbers per user
      mutate(rowid = row_number()) %>% # add row numbers
      spread(users, data, fill = "") # put data under the users, empty values as "" instead of NA
    
    # A tibble: 7 x 4
      rowid user_11          user_13          user_14         
      <int> <chr>            <chr>            <chr>           
    1     1 beneficiary_2158 beneficiary_1928 beneficiary_649 
    2     2 beneficiary_871  beneficiary_553  beneficiary_1312
    3     3 beneficiary_1969 beneficiary_483  beneficiary_1983
    4     4 beneficiary_1120 beneficiary_1127 beneficiary_726 
    5     5 beneficiary_185  beneficiary_2887 beneficiary_759 
    6     6 beneficiary_2180 ""               beneficiary_229 
    7     7 ""               ""               beneficiary_673 
    

    【讨论】:

      【解决方案2】:

      base 中,您可以使用strsplit 拆分:,,然后查看最长的字符向量有多长,并使用NA 填充所有其他字符向量@987654325 @ 也会转置向量。

      tt <- readLines(con=textConnection("user_14:beneficiary_649,beneficiary_1312,beneficiary_1983,beneficiary_726,beneficiary_759,beneficiary_229,beneficiary_673,
      user_13:beneficiary_1928,beneficiary_553,beneficiary_483,beneficiary_1127,beneficiary_2887
      user_11:beneficiary_2158,beneficiary_871,beneficiary_1969,beneficiary_1120,beneficiary_185,beneficiary_2180"))
      
      tt <- strsplit(tt, ":|,")  #Split on : or ,
      ttn <- max(sapply(tt, length))  #Get longest vector
      tt <- sapply(tt, function(x) x[seq_len(ttn)]) #Fill up with NA and give per col
      colnames(tt)  <- tt[1,] #Set colnames from first line
      tt <- tt[-1,]  #Remove first line
      tt
      #     user_14            user_13            user_11           
      #[1,] "beneficiary_649"  "beneficiary_1928" "beneficiary_2158"
      #[2,] "beneficiary_1312" "beneficiary_553"  "beneficiary_871" 
      #[3,] "beneficiary_1983" "beneficiary_483"  "beneficiary_1969"
      #[4,] "beneficiary_726"  "beneficiary_1127" "beneficiary_1120"
      #[5,] "beneficiary_759"  "beneficiary_2887" "beneficiary_185" 
      #[6,] "beneficiary_229"  NA                 "beneficiary_2180"
      #[7,] "beneficiary_673"  NA                 NA                
      

      【讨论】:

        【解决方案3】:

        您可以使用strsplit() 获取列表:

        DF <- read.table(stringsAsFactors = FALSE, sep=':', text=
        "user_14:beneficiary_649,beneficiary_1312,beneficiary_1983,beneficiary_726,beneficiary_759,beneficiary_229,beneficiary_673,
        user_13:beneficiary_1928,beneficiary_553,beneficiary_483,beneficiary_1127,beneficiary_2887
        user_11:beneficiary_2158,beneficiary_871,beneficiary_1969,beneficiary_1120,beneficiary_185,beneficiary_2180")
        L <- strsplit(DF$V2, ',')
        names(L) <- DF$V1
        

        【讨论】:

          【解决方案4】:

          以上所有答案都是正确的,但这里是我使用 base R 找到的更简单的替代方法:

          # Just read in your data as a comma separated data frame
          df <- read.table("link to your file", header = F, sep = ',', fill = T)
          # The first column will contain both the user and the first "beneficiary" as they are separated by ":", so you need to split its values by ":"
          k <- sapply(df[,1], function(x){strsplit(x, split = ":")[[1]]})
          # Add the corrected first column to your data frame and transpose the data frame to have one column per user
          df <- t(cbind(k[2,], df[,2:ncol(df)]))
          # Provide the "user" as colnames
          colnames(df) <- k[1,]
          # I noticed that some lines in your text have a comma at the end, which introduces NAs. To remove them: 
          df[is.na(df)] <- ""
          

          【讨论】:

            【解决方案5】:

            感谢大家的尝试和建议:

            我解决了问题并得到了我期望的确切数据表。在下面分享解决方案以供您查看以及是否有任何改进建议:

            这是 users.txt 中的所有数据

            user_14:beneficiary_649,beneficiary_1312,beneficiary_1983,beneficiary_726,beneficiary_759,beneficiary_229,beneficiary_673,beneficiary_2322,beneficiary_2598,beneficiary_1705,beneficiary_2743,beneficiary_220,beneficiary_977,beneficiary_1098,beneficiary_2891,beneficiary_1253,beneficiary_2065,beneficiary_1492,beneficiary_268,beneficiary_1991,beneficiary_684,beneficiary_1493,beneficiary_2294 ,beneficiary_73,beneficiary_1524,beneficiary_2349,beneficiary_2978,beneficiary_2575,beneficiary_2506,beneficiary_3051,beneficiary_612,beneficiary_617,beneficiary_1748,beneficiary_3031,beneficiary_2431,beneficiary_948,beneficiary_46,beneficiary_469,beneficiary_2047,beneficiary_1461,beneficiary_2549,beneficiary_2539,beneficiary_412,beneficiary_1615,beneficiary_2842,beneficiary_2228,beneficiary_2634,beneficiary_2534 ,受益人_358,受益人_1475,受益人_146,受益人_1971,受益人_1411,受益人_2395,受益人_1047,受益人_2062,受益人_2373,受益人_2328,受益人_1669,受益人iciary_2986,beneficiary_1040,beneficiary_248,beneficiary_1816,beneficiary_1465,beneficiary_133,beneficiary_2401,beneficiary_2626,beneficiary_1819,beneficiary_2864,beneficiary_1008,beneficiary_1101,beneficiary_2529,beneficiary_1487,beneficiary_787,beneficiary_2595,beneficiary_2947,beneficiary_2808,beneficiary_547,beneficiary_2113,beneficiary_825,beneficiary_396,beneficiary_2321,beneficiary_2512,beneficiary_72, beneficiary_90,beneficiary_957,beneficiary_1799,beneficiary_2787,beneficiary_277,beneficiary_2472,beneficiary_194,beneficiary_2521,beneficiary_760,beneficiary_558,beneficiary_2404,beneficiary_763,beneficiary_2466,beneficiary_1881,beneficiary_2483,beneficiary_107,beneficiary_1392,beneficiary_2558,beneficiary_557,beneficiary_1923,beneficiary_322,beneficiary_310,beneficiary_1655,beneficiary_226,beneficiary_527,受益人_2542,受益人_1372,受益人_142,受益人_1055,受益人_378,受益人_296,受益人_733,受益人_1755,受益人_1932,受益人_1989,受益人_1379,beneficiary_2199,beneficiary_1288,beneficiary_2877,beneficiary_1045,beneficiary_2613,beneficiary_2455,beneficiary_2503,beneficiary_706,beneficiary_1562,beneficiary_1446,beneficiary_247,beneficiary_1020,beneficiary_1250,beneficiary_777,beneficiary_2645,beneficiary_1850,beneficiary_2724,beneficiary_2192,beneficiary_715,beneficiary_1321,beneficiary_201,beneficiary_961,beneficiary_2802,beneficiary_414, beneficiary_1997,beneficiary_2760,beneficiary_82,beneficiary_2746,beneficiary_918,beneficiary_2386,beneficiary_729,beneficiary_3057,beneficiary_491,beneficiary_1190,beneficiary_1561,beneficiary_2744,beneficiary_923,beneficiary_1815,beneficiary_240,beneficiary_2016,beneficiary_2479,beneficiary_1692,beneficiary_1630,beneficiary_2899,beneficiary_965,beneficiary_2675,beneficiary_34,beneficiary_2226,beneficiary_550,受益人_1795,受益人_981,受益人_1934,受益人_2579,受益人_3012,受益人_2366,受益人_1684,受益人_2107,受益人_1249,受益人_2574,受益人y_1447,beneficiary_1052,beneficiary_219,beneficiary_357,beneficiary_2324,beneficiary_2791,beneficiary_2528,beneficiary_1066,beneficiary_2984,beneficiary_2559,beneficiary_767,beneficiary_1031,beneficiary_271,beneficiary_2278,beneficiary_15,beneficiary_463,beneficiary_917,beneficiary_1839,beneficiary_1048,beneficiary_2435,beneficiary_2441,beneficiary_1272,beneficiary_2056,beneficiary_993,beneficiary_371,受益人_2582,受益人_1476 user_13:beneficiary_1928,beneficiary_553,beneficiary_483,beneficiary_1127,beneficiary_2887,beneficiary_2184,beneficiary_1694,beneficiary_2276,beneficiary_1961,beneficiary_2994,beneficiary_781,beneficiary_1264,beneficiary_2001,beneficiary_1657,beneficiary_1065,beneficiary_636,beneficiary_1892,beneficiary_1091,beneficiary_2237,beneficiary_205,beneficiary_1699,beneficiary_2023,beneficiary_2767,beneficiary_104, beneficiary_157,beneficiary_1199,beneficiary_493,beneficiary_375,beneficiary_2614,beneficiary_1856,beneficiary_1177,beneficiary_3024,beneficiary_1185,beneficiary_1205,beneficiary_773,beneficiary_1508,beneficiary_2379,beneficiary_433,beneficiary_1801,beneficiary_33,beneficiary_510,beneficiary_2552,beneficiary_575,beneficiary_2492,beneficiary_2839,beneficiary_1033,beneficiary_1396,beneficiary_2281,beneficiary_41,受益人_677,受益人_2862,受益人_652,受益人_1582,受益人_2422,受益人_1599,受益人_2844,受益人_466,受益人_2639,受益人_984,受益人ry_407,beneficiary_1097,beneficiary_594,beneficiary_2073,beneficiary_2773,beneficiary_1504,beneficiary_3064,beneficiary_816,beneficiary_577,beneficiary_804,beneficiary_2148,beneficiary_949,beneficiary_2520,beneficiary_443,beneficiary_2453,beneficiary_408,beneficiary_554,beneficiary_754,beneficiary_2960,beneficiary_2344,beneficiary_1497,beneficiary_184,beneficiary_255,beneficiary_542,beneficiary_2004, beneficiary_692,beneficiary_89,beneficiary_1385,beneficiary_1814,beneficiary_2621,beneficiary_670,beneficiary_2022,beneficiary_24,beneficiary_2820,beneficiary_2958,beneficiary_1708,beneficiary_685,beneficiary_1552,beneficiary_420,beneficiary_2168,beneficiary_2209,beneficiary_2189,beneficiary_1474,beneficiary_2253,beneficiary_1159,beneficiary_2210,beneficiary_2537,beneficiary_177,beneficiary_1355,beneficiary_2092,受益人_2231,受益人_613,受益人_2227,受益人_520,受益人_2139,受益人_2742,受益人_720,受益人_770,受益人_1247,受益人_717 user_11:beneficiary_2158,beneficiary_871,beneficiary_1969,beneficiary_1120,beneficiary_185,beneficiary_2180,beneficiary_2120,beneficiary_1832,beneficiary_1470,beneficiary_2689,beneficiary_1679,beneficiary_769,beneficiary_2380,beneficiary_2999,beneficiary_1113,beneficiary_2932,beneficiary_1763,beneficiary_391,beneficiary_2381,beneficiary_650,beneficiary_419,beneficiary_1998,beneficiary_775,beneficiary_2590, beneficiary_2593,beneficiary_2042,beneficiary_2102,beneficiary_1765,beneficiary_1201,beneficiary_332,beneficiary_26,beneficiary_1273,beneficiary_799,beneficiary_79,beneficiary_2099,beneficiary_622,beneficiary_394,beneficiary_2830,beneficiary_934,beneficiary_1170,beneficiary_2297,beneficiary_3009,beneficiary_1278,beneficiary_1573,beneficiary_315,beneficiary_1610,beneficiary_1875,beneficiary_1899,beneficiary_88,受益人_560,受益人_508,受益人_1674,受益人_1490,受益人_1824,受益人_751,受益人_2122,受益人_936,受益人_132,受益人_2756,受益人_ 2246,beneficiary_561,beneficiary_2063,beneficiary_2600,beneficiary_2875,beneficiary_2333,beneficiary_3003,beneficiary_381,beneficiary_1528,beneficiary_1733,beneficiary_1316,beneficiary_573,beneficiary_2312,beneficiary_991,beneficiary_202,beneficiary_1858,beneficiary_17,beneficiary_2130,beneficiary_571,beneficiary_1631,beneficiary_2720,beneficiary_2132,beneficiary_1526,beneficiary_232,beneficiary_2444, beneficiary_1721,beneficiary_537,beneficiary_2408,beneficiary_1918,beneficiary_946,beneficiary_300,beneficiary_2049,beneficiary_768,beneficiary_1854,beneficiary_2028,beneficiary_319,beneficiary_1433,beneficiary_343,beneficiary_2897,beneficiary_61,beneficiary_1803,beneficiary_2400,beneficiary_2758,beneficiary_910,beneficiary_7,beneficiary_172,beneficiary_1503,beneficiary_453,beneficiary_69,beneficiary_823,受益人_986,受益人_2123,受益人_802 user_27:beneficiary_1003,beneficiary_1919,beneficiary_2304,beneficiary_2597,beneficiary_2242,beneficiary_2818,beneficiary_580,beneficiary_305,beneficiary_651,beneficiary_260,beneficiary_2071,beneficiary_1703,beneficiary_3052,beneficiary_2588,beneficiary_2860,beneficiary_2943,beneficiary_1293,beneficiary_2066,beneficiary_2191,beneficiary_1135,beneficiary_2084,beneficiary_994,beneficiary_2658,beneficiary_628, beneficiary_2313,beneficiary_2355,beneficiary_2730,beneficiary_1634,beneficiary_2159,beneficiary_974,beneficiary_3016,beneficiary_678,beneficiary_2665,beneficiary_1325,beneficiary_1598,beneficiary_1985,beneficiary_416,beneficiary_274,beneficiary_369,beneficiary_1802,beneficiary_3054,beneficiary_2648,beneficiary_663,beneficiary_960,beneficiary_2190,beneficiary_476,beneficiary_405,beneficiary_1256,beneficiary_85,受益人_1782,受益人_2949,受益人_947,受益人_1384,受益人_401,受益人_1026,受益人_2208,受益人_1304,受益人_1455,受益人_2198,受益人iary_2556,beneficiary_1871,beneficiary_449,beneficiary_1566,beneficiary_52,beneficiary_811,beneficiary_1859,beneficiary_559,beneficiary_1798,beneficiary_1067,beneficiary_494,beneficiary_2908,beneficiary_16,beneficiary_1940,beneficiary_94,beneficiary_2375,beneficiary_842,beneficiary_1976,beneficiary_1424,beneficiary_2221,beneficiary_1794,beneficiary_2982,beneficiary_2640,beneficiary_353,beneficiary_1565,受益人_195,受益人_1017,受益人_1458,受益人_1004,受益人_820,受益人_1187,受益人_1716,受益人_91,受益人_2478,受益人_1596,受益人_632,受益人_2382,受益人_1847,受益人_1847

            #Get data file from .txt
            usersDbase <- read.csv("C:/Users/users.txt", header=FALSE)
            
            #naming column Automatically with a prefix and auto number
            colnames(usersDbase) <- paste0("Ben", 1:ncol(usersDbase))
            
            #Create AutoID as per row number
            id <- rownames(usersDbase)
            dwithID <- cbind(id=id, usersDbase)
            
            
            #Split one column into two columns
            df1 <- setNames(data.frame(do.call("rbind",strsplit(gsub("\\(|\\)|,","",dwithID$Ben1),split=" "))),c("User","Ben1"))
            
            #pick specific column
            library(dplyr)
            dfc1<-select(df1,'User')
            
            #remove Special Character
            
            library(dplyr)
            dfc11<-dfc1 %>%
              mutate_all(funs(gsub("[[:punct:]]", "", .)))
            
            #Create AutoID as per rownumber
            id <- rownames(dfc11)
            dwithIDS <- cbind(id=id, dfc11)
            
            
            #pick specific column
            library(dplyr)
            dfben1<-select(df1,'Ben1')
            
            
            #Create AutoID as per rownumber
            id <- rownames(dfben1)
            dbenwithIDS <- cbind(id=id, dfben1)
            
            # Merge
            mergeUb1<-merge(dwithIDS, dbenwithIDS, dwithIDS = "id",dbenwithIDS= "id")
            
            #pick all except column 1
            
            library(dplyr)
            dwithIDAll<-select(dwithID,-"Ben1")
            
            
            #merge 
            
            mergeall<-merge(mergeUb1, dwithIDAll, mergeUb1 = "id",dwithIDAll= "id")
            
            #pick all except column 1
            
            library(dplyr)
            UsersClean<-select(mergeall,-"id") 
            
            # Transpose data where first column will be taken as header
            
            Finaldf = setNames(data.frame(t(UsersClean[,-1])), UsersClean[,1])
            
            # Remove rowname
            rownames(Finaldf) <- c()
            
            #Remove all temporary data frame and values
            
            rm(dbenwithIDS, df1, dfben1,dfc1,dfc11,dwithID,dwithIDAll,dwithIDS,mergeall,mergeUb1,UsersClean,usersDbase, "id")
            

            【讨论】:

              猜你喜欢
              • 1970-01-01
              • 2017-09-30
              • 2015-03-17
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 2012-11-16
              相关资源
              最近更新 更多