您好,这里是您想要的代码示例。请注意,您可以使用它,对其进行变形以检索数据集的不同方面。
# Reproduction of your dataset type (not a copy, sample is a random function).
# This is the kind of example it is nice to have in your question
df <- data.frame(Compassion = sample(c(1,0), 5, replace = TRUE),
relevance = sample(c(1,0), 5, replace = TRUE),
Time = sample(c(1,0), 5, replace = TRUE),
Exemplification = sample(c(1,0), 5, replace = TRUE),
credit = sample(c(1,0), 5, replace = TRUE),
Science = sample(c(1,0), 5, replace = TRUE),
Work = sample(c(1,0), 5, replace = TRUE),
Action = sample(c(1,0), 5, replace = TRUE),
Response = sample(c(1,0), 5, replace = TRUE),
efficient = sample(c(1,0), 5, replace = TRUE))
df
# The groups
g1 <- c("Compassion", "relevance", "Time", "Exemplification")
g2 <- c("credit", "Science", "Work")
g3 <- c("Action", "Response", "efficient")
# TRUE/FALSE on each group. As your data is coded in 0/1, a sum by row is efficient.
boolG1 <- rowSums(df[g1]) >= 1
boolG2 <-rowSums(df[g2]) >= 1
boolG3 <-rowSums(df[g3]) >= 1
# extract the rows where the sum is > to 0
df[boolG1 | boolG2 | boolG3,]
# Printing the number of rows, and changing the conditions
sprintf("number of tweet from 3 groups : %d", nrow(df[boolG1 | boolG2 | boolG3,]))
sprintf("number of tweet from 1st group : %d", nrow(df[boolG1,]))
sprintf("number of tweet from 2nd group : %d", nrow(df[boolG2,]))
sprintf("number of tweet from 3rd group : %d", nrow(df[boolG3,]))
# You can also extract percentage ?
paste0(sprintf("percentage of tweet from 3 groups : %d ",
nrow(df[boolG1 | boolG2 | boolG3,])/nrow(df)*100), "%")
您尝试使用 if 条件执行此操作,没关系,但您需要将其放入 for 循环中。 R 在矢量化计算时更有效。这个article有更多信息。
编辑
这是一个用维恩图表示数据集的小代码
library(VennDiagram) # you may need to install this package
venn.diagram(
x = list(g1 = which(boolG1),
g2 = which(boolG2),
g3 = which(boolG3)),
filename = 'venn_diagramm.tiff', # be aware it create a file !
)