【发布时间】:2018-07-31 19:55:49
【问题描述】:
所以我开始对我使用 readxl 包(版本:1.1.0)中的 read_xls 函数读取的数据框执行一些统计信息,当我意识到 R 没有按照我想要的方式读取列时它到。该列在 excel 电子表格中有大量空白,我相信在进行一些研究后是问题所在。在大量空白行之后,该列确实具有我需要在 R 中执行分析的数值。但是,当我使用read_xls 函数将其读入,它为它提供了一个逻辑类和所有 NA... 在浏览此网站readxl 之后,我似乎很清楚这个问题是由于列中的空白造成的。我仍然对如何解决这个问题感到困惑,因为只有一列在数据集的开头有空白。我将不胜感激任何帮助或指导!谢谢!给我带来问题的列是 Rep_Val_Quantity_Avg
数据输入:
dput(head(df_trib,10))
structure(list(NJPDES = c("NJ0020206", "NJ0020532", "NJ0021326",
"NJ0022021", "NJ0022985", "NJ0023361", "NJ0023736", "NJ0024015",
"NJ0024031", "NJ0024040"), Facility_Name = c("ALLENTOWN BORO WWTP",
"HARRISON TWP MULLICA HILL WWTP", "MEDFORD LAKES BOROUGH STP",
"SWEDESBORO WTP", "WRIGHTSTOWN BOROUGH STP", "WILLINGBORO WATER POLLUTION CONTROL PLANT",
"PINELANDS WASTEWATER CO", "MOUNT HOLLY WPCF", "ELMWOOD WTP",
"WOODSTREAM STP"), `Monitored Location Designator` = c("001A",
"001A", "001A", "001A", "001A", "001A", "001A", "001A", "001A",
"001A"), Date = structure(c(1372550400, 1372550400, 1372550400,
1372550400, 1372550400, 1372550400, 1372550400, 1372550400, 1372550400,
1372550400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Parameter_Number_DMR = c("00300", "00300", "00300", "00300",
"00300", "00300", "00300", "00300", "00300", "00300"), Parameter = c("Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)"
), Sample_Point_Desc = c("Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value"), Rep_Val_Quantity_Avg = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), X__1 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), `Reported Value Quantity Maximum` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), `Quantity Units Description` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), Rep_Val_Con_Min = c("7.2",
NA, "7.65", "6.79", NA, NA, "6", NA, "6.6", NA), Val_Con_AVG = c("7.3",
"8.8", NA, "7.58", "7.5", "7.100", "5", "7.8", "6.6", "7.4"
), Rep_Val_Con_Max = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), valunit = c("MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER"
)), .Names = c("NJPDES", "Facility_Name", "Monitored Location Designator",
"Date", "Parameter_Number_DMR", "Parameter", "Sample_Point_Desc",
"Rep_Val_Quantity_Avg", "X__1", "Reported Value Quantity Maximum",
"Quantity Units Description", "Rep_Val_Con_Min", "Val_Con_AVG",
"Rep_Val_Con_Max", "valunit"), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
使用的代码:
df_trib<-read_xls("4_Del_Tribs_ DMR data all pull for certain params.xls",
sheet = "NJEMS DATA", col_names = TRUE,
col_types = c("text","text","text","date","text","text","text",
"numeric","numeric","text","numeric",
"numeric","numeric","text","text"))
【问题讨论】:
-
您是否尝试过在read_xls中使用
col_types =选项并指定每列的变量类型? -
具体来说,通常最容易使用 catch all 字符类型来读取杂乱的列
-
@Dave2e 是的,我尝试使用 col_types= 选项,但这并不能解决问题......正如问题中所述,我认为这与之前存在大量空白有关该列中的第一个数据点。您对如何处理该问题有任何建议吗?
-
可能会更改
guess_max =选项。如果没有更好地描述 Excel 文件和您使用的代码,就很难重现您遇到的问题并提供任何有意义的帮助。 -
@Dave2e 添加了我正在使用的代码。不知道我可以对 excel 文件给出什么更好的描述。给我带来问题的列是 Rep_Val_Quantity_Avg。原因是它的前 1000 行有空白单元格。我只是不明白如何在 R 中解决这个问题,因为我对使用该软件还很陌生。。
标签: r excel import tidyverse readxl