【问题标题】:Split a dataframe into multiple to run a function that only takes two-column dataframesSplit a dataframe into multiple to run a function that only takes two-column dataframes
【发布时间】:2022-12-01 19:55:07
【问题描述】:

I want to perform a column-wise operation in R on column pairs.The function I actually want to use is not the one shown here, because it would complicate this example.

I have a dataframe:

df <- data.frame(p1 = c(-5, -4, 2, 0, -2, 1, 3, 4, 2, 7)
                 ,p2 = c(0, 1, 2, 0, -2, 1, 3, 3, 2, 0))

and a vector of the same length as the df:

tocompare <- c(0, 0, 2, 0, 2, 4, 16, 12, 6, 9)

I want to run a function that compares each column of df to the tocompare object. The steps I need to take is:

  1. Make a two-element list. First element is a two-column dataframe x, in which the first column comes from the df and the second column is the tocompare object. Second element is a number. (this is needed for my actual function to work, I appreciate that it is not needed in this example). This number is constant for all iterations of this process (it's a number of rows in df / length of tocompare) in this example, it's 10.
    data1 <- list(x = cbind(df %>% select(1), tocompare), N = length(tocompare))
    
    # select(1) is used rather than df[,1] ensures the column header is kept
    
    1. Compare the two columns of the first element (called x) of the data1 list. The function that I use in real life is not cor; this simplified example captures the problem. I wrote my_function in such a way that it needs the data1 object created above.
    my_function <- function(data1){
    x <- data1[[1]]
    cr <- cor(x[,1], x[,2])
    header <- colnames(x)[1]
    print(c(header, cr))
    }
    
    cr_df1 <- my_function(data1)
    

    I can do the same for the second df column:

    data2 <- list(x = cbind(df %>% select(2), tocompare), N = length(tocompare))
    cr_df2 <- my_function(data2)
    

    And make a dataframe of final results:

    final_df <- rbind(cr_df1, cr_df2) %>% 
    `rownames<-`(NULL) %>% 
    `colnames<-`(c("p", "R")) %>% 
    as.data.frame()
    

    the output will look like this:

    > final_df 
       p         R
    1 p1 0.7261224
    2 p2 0.6233169
    

    I would like to do this on a dataframe with thousands of columns. The bit I don't know ishow to split the single dataframe into multiple two-column dataframes and then run my_function on these many small dataframes to return a single output. I think I would be able to do it with a loop and with transposing the df, but maybe there is a better way (I feel I should try to use map here)?

【问题讨论】:

    标签: r dataframe loops dplyr purrr


    【解决方案1】:

    I think you are overcomplicating things. You can just do,

    sapply(df, function(i) cor(i, tocompare))
    
    #       p1        p2 
    #0.7261224 0.6233169
    

    【讨论】:

    • HI Sotos, thanks for your reply. Yes, I could use apply with cor. In fact I would not even need apply. But I am not actually using cor (as explained in the text), but a more complicated function that takes two-column dataframes, and therefore I haven't come up with a way to use apply.
    猜你喜欢
    • 2022-12-02
    • 2022-04-20
    • 2022-12-01
    • 2022-12-01
    • 2019-09-21
    • 2022-12-02
    • 2022-12-02
    • 1970-01-01
    • 2022-12-02
    相关资源
    最近更新 更多