【发布时间】:2022-12-01 19:55:07
【问题描述】:
I want to perform a column-wise operation in R on column pairs.The function I actually want to use is not the one shown here, because it would complicate this example.
I have a dataframe:
df <- data.frame(p1 = c(-5, -4, 2, 0, -2, 1, 3, 4, 2, 7)
,p2 = c(0, 1, 2, 0, -2, 1, 3, 3, 2, 0))
and a vector of the same length as the df:
tocompare <- c(0, 0, 2, 0, 2, 4, 16, 12, 6, 9)
I want to run a function that compares each column of df to the tocompare object. The steps I need to take is:
- Make a two-element list. First element is a two-column dataframe
x, in which the first column comes from thedfand the second column is thetocompareobject. Second element is a number. (this is needed for my actual function to work, I appreciate that it is not needed in this example). This number is constant for all iterations of this process (it's a number of rows indf/ length oftocompare) in this example, it's10.data1 <- list(x = cbind(df %>% select(1), tocompare), N = length(tocompare)) # select(1) is used rather than df[,1] ensures the column header is kept- Compare the two columns of the first element (called
x) of thedata1list. The function that I use in real life is notcor; this simplified example captures the problem. I wrotemy_functionin such a way that it needs thedata1object created above.
my_function <- function(data1){ x <- data1[[1]] cr <- cor(x[,1], x[,2]) header <- colnames(x)[1] print(c(header, cr)) } cr_df1 <- my_function(data1)I can do the same for the second
dfcolumn:data2 <- list(x = cbind(df %>% select(2), tocompare), N = length(tocompare)) cr_df2 <- my_function(data2)And make a dataframe of final results:
final_df <- rbind(cr_df1, cr_df2) %>% `rownames<-`(NULL) %>% `colnames<-`(c("p", "R")) %>% as.data.frame()the output will look like this:
> final_df p R 1 p1 0.7261224 2 p2 0.6233169I would like to do this on a dataframe with thousands of columns. The bit I don't know ishow to split the single dataframe into multiple two-column dataframes and then run
my_functionon these many small dataframes to return a single output. I think I would be able to do it with aloopand with transposing thedf, but maybe there is a better way (I feel I should try to usemaphere)? - Compare the two columns of the first element (called
【问题讨论】:
标签: r dataframe loops dplyr purrr