【问题标题】:Removing regression lines with data that have 3 or less points删除包含 3 个或更少点的数据的回归线
【发布时间】:2018-06-14 14:11:10
【问题描述】:

如何删除 ggplot2 中的某些回归线,但使用 stat_smoothgeom_smooth 函数保留其他回归线?

我正在为鱼绘制length ~ weight 关系,比较湖泊、年份、物种和寄生性。

我可以为所有事物绘制回归线,例如寄生与非寄生,但是如果一组说例如寄生只有 2 个点,那么仍然会为它制作一条回归线,就像其他所有有 3 个点一样或更多点。

我的问题是您如何绘制数据来为具有 3 个或更多点的数据创建回归线,但同时不为只有两个点的数据创建回归线?

我已在问题中包含数据和示例图表:

> str(B2_2016)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   8 obs. of  16 variables:
 $ Year             : num  2016 2016 2016 2016 2016 ...
 $ Sample ID        : chr  "b2-ss-01" "b2-ss-03" "b2-ss-05" "b2-ss-06" ...
 $ Species          : chr  "P. pungitius" "P. pungitius" "P. pungitius" "P. pungitius" ...
 $ Total Wt (g)     : num  0.0643 0.923 0.0807 0.1435 0.0292 ...
 $ Total Length (cm): num  2.4 5.2 2.7 3 1.9 2.3 3.6 5.7
 $ Sex              : num  0 0 0 0 0 0 1 1
 $ Age              : chr  "-" "3" "-" "-" ...
 $ Liver Wt (g)     : chr  "-" "4.02E-2" "-" "-" ...
 $ Gonad Wt (g)     : chr  "-" "-" "-" "-" ...
 $ Condition (K)    : num  0.465 0.656 0.41 0.531 0.426 ...
 $ HSI              : chr  "-" "4.3553629469122424" "-" "-" ...
 $ GSI              : chr  "-" "-" "-" "-" ...
 $ Parasites        : num  0 0 0 0 0 0 1 1
 $ P Weight         : num  NA NA NA NA NA ...
 $ Gut Contents     : chr  "-" "Y" "-" "-" ...
 $ S.I.             : chr  "-" "Y" "-" "-" ...        

x 和 y 轴为 log10

Year = c(2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016), 
`Sample ID` = c("b2-ss-01", "b2-ss-03", "b2-ss-05", "b2-ss-06", "b2-ss-07", "b2-ss-08", "b2-ss-02", "b2-ss-04"), 
Species = c("P. pungitius", "P. pungitius", "P. pungitius", "P. pungitius", "P. pungitius", "P. pungitius", "P. pungitius", "P. pungitius"), 
`Total Wt (g)` = c(0.0643, 0.923, 0.0807, 0.1435, 0.0292, 0.0689, 0.13, 1.1902), 
`Total Length (cm)` =    c(2.4, 5.2, 2.7, 3, 1.9, 2.3, 3.6, 5.7), 
Sex = c(0, 0, 0, 0, 0, 0, 1, 1), 
Age = c("-", "3", "-", "-", "-", "-", "2", "3.3"), 
`Liver Wt (g)` = c("-    ","4.02E-2", "-", "-", "-", "-", "8.9999999999999993E-3", "3.3799999999999997E-2"),                                                            
`Gonad Wt (g)` = c("-", "-", "-", "-", "-", "-", "2.3999999999999998E-3", "4.5999999999999999E-3"),                                               
`Condition (K)` = c(0.465133101851852, 0.656434911242603, 0.409998475842097, 0.531481481481481, 0.425718034698936, 0.56628585518205, 0.27863511659808, 0.642680878866912),                                                           
HSI = c("-", "4.3553629469122424", "-", "-", "-", "-", "6.9230769230769225", "2.8398588472525623"), 
GSI = c("-", "-", "-", "-", "-", "-", "1.846153846153846", "0.38648966560241976"), 
Parasites = c(0, 0, 0, 0, 0, 0, 1, 1), 
`P Weight` = c(NA, NA, NA, NA, NA, NA, 0.1918, 0.0586), 
`Gut Contents` = c("-", "Y", "-", "-", "-", "-", "Y", "N"), 
S.I. = c("-", "Y", "-", "-", "-", "-", "Y", "Y")), 
.Names = c("Year", "Sample ID", "Species", "Total Wt (g)", 
"Total Length (cm)", "Sex", "Age", "Liver Wt (g)", "Gonad Wt (g)", 
"Condition (K)", "HSI", "GSI", "Parasites", "P Weight", "Gut Contents", 
"S.I."), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))

所以我得到了这些错误......

attach(All_Years_All_Lakes_All_Species)
> SticklesDataF = group_by(All_Years_All_Lakes_All_Species, Species, Year, Parasites, Lake) %>% mutate(n = n()), LogLength = log('Total Length (cm)'), LogWeight = log('Total Wt (g)')) 
Error: unexpected ',' in "SticklesDataF = group_by(All_Years_All_Lakes_All_Species, Species, Year, Parasites, Lake) %>% mutate(n = n()),"

【问题讨论】:

  • 使用 dplyrdata.table 或其他标记点数为 3 或更少的组,根据该标记对 geom_smooth 的数据进行子集化。
  • 这隐含地重复了最近解决的问题:stackoverflow.com/questions/48100957/…。也许你应该修复你的 non-reprex-Q 并写下你自己的答案。
  • 对不起,我将如何使用 dplyr 来做这件事,标志是什么意思,我对 R 很陌生
  • 我不想删除这些数据点,我只是不想在它们上面绘制回归线
  • 例如library(tidyverse); ggplot(mtcars, aes(hp, mpg, color = factor(cyl))) + geom_point() + geom_smooth(data = mtcars %>% group_by(cyl) %>% filter(n() > 10), method = 'lm')

标签: r ggplot2 regression


【解决方案1】:

使用您共享的数据并将其称为df

df = group_by(df, Species, Year, Parasites) %>%
    mutate(n = n(),
           LogLength = log(`Total Length (cm)`),
           LogWeight = log(`Total Wt (g)`))

ggplot(df, aes(
      x = LogLength,
      y = LogWeight,
      shape = factor(Parasites),
      color = factor(Parasites)
    )) +
    geom_point() + 
    geom_smooth(data = filter(df, n > 3), method = "lm") +
    theme_classic()

我将把标签调整等留给你。

【讨论】:

  • 太棒了!!!非常感谢,我会尝试收集数据并回复您,谢谢!!!!
  • 您可能需要向group_by 添加更多列。您的问题是“比较湖泊、年份、物种和寄生性”,但我没有看到看起来像“湖泊”的列。
  • 例如,如果要包括湖泊:'code' df = group_by(df, Species, Year, Parasites, Lake) %>% mutate(n = n()), LogLength = log( Total Length (cm)), LogWeight = log(Total Wt (g)))
  • 所以我收到了这个错误,attach(All_Years_All_Lakes_All_Species) > SticklesDataF = group_by(All_Years_All_Lakes_All_Species, Species, Year, Parasites, Lake) %>% mutate(n = n()), LogLength = log( '总长度 (cm)'), LogWeight = log('Total Wt (g)')) 错误:“SticklesDataF = group_by(All_Years_All_Lakes_All_Species, Species, Year, Parasites, Lake) %>% mutate(n = n()),"
  • 永远不要使用attach,尤其是不要使用dplyrdetach(或重新启动 R),然后重试。我在n() 之后多了一个括号,在答案中修复了它。
猜你喜欢
  • 2013-03-07
  • 2016-05-18
  • 2017-10-02
  • 1970-01-01
  • 2012-01-08
  • 2014-10-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多