【问题标题】:Printing out a SQL table in an R Sweave PDF在 R Sweave PDF 中打印 SQL 表
【发布时间】:2015-07-07 20:22:13
【问题描述】:

这似乎应该很简单,但我似乎无法在任何地方找到答案。

这似乎与使用 R 代码一样,使用巧妙的 SQL 查询更容易解决。

正在使用以下代码将表格拉入脚本:

dbhandle <- SQLConn_remote(DBName = "DATABASE", ServerName = "SERVER")
Testdf<-sqlQuery(dbhandle, 'select * from TABLENAME
                order by FileName, Number, Category', stringsAsFactors = FALSE)

我想在 R Sweave PDF 上打印出一个 SQL 表。我想满足以下条件:

  • 仅打印特定列。使用sqlQuery 似乎很简单,但我已经在我的脚本中创建了一个名为Testdf 的变量,其中包含所有表,所以如果可以的话,我宁愿只是子集。我不满足于简单地这样做的原因是,下一个条件似乎超出了我的查询范围。

  • 这是棘手的部分。在我下面给出的示例表中,有一个按版本号和组Numbers 组织的文件名列表。我想在 .Rnw 文件中打印表格,以便有 3 列。第一列是FileName 列,第二列是所有值的列,其中Number == 2,最后(第三)列是所有值的列,其中Number == 3。

表格如下所示:

|  Name | Version | Category | Value |  Date  | Number |   Build   | Error |
|:-----:|:-------:|:--------:|:-----:|:------:|:------:|:---------:|:-----:|
| File1 | 0.01    | Time     | 123   | 1-1-12 | 1      | Iteration | None  |
| File1 | 0.01    | Size     | 456   | 1-1-12 | 1      | Iteration | None  |
| File1 | 0.01    | Final    | 789   | 1-1-12 | 1      | Iteration | None  |
| File2 | 0.01    | Time     | 312   | 1-1-12 | 1      | Iteration | None  |
| File2 | 0.01    | Size     | 645   | 1-1-12 | 1      | Iteration | None  |
| File2 | 0.01    | Final    | 978   | 1-1-12 | 1      | Iteration | None  |
| File3 | 0.01    | Time     | 741   | 1-1-12 | 1      | Iteration | None  |
| File3 | 0.01    | Size     | 852   | 1-1-12 | 1      | Iteration | None  |
| File3 | 0.01    | Final    | 963   | 1-1-12 | 1      | Iteration | None  |
| File1 | 0.02    | Time     | 369   | 1-1-12 | 2      | Iteration | None  |
| File1 | 0.02    | Size     | 258   | 1-1-12 | 2      | Iteration | None  |
| File1 | 0.02    | Final    | 147   | 1-1-12 | 2      | Iteration | None  |
| File2 | 0.02    | Time     | 753   | 1-1-12 | 2      | Iteration | None  |
| File2 | 0.02    | Size     | 498   | 1-1-12 | 2      | Iteration | None  |
| File2 | 0.02    | Final    | 951   | 1-1-12 | 2      | Iteration | None  |
| File3 | 0.02    | Time     | 753   | 1-1-12 | 2      | Iteration | None  |
| File3 | 0.02    | Size     | 915   | 1-1-12 | 2      | Iteration | None  |
| File3 | 0.02    | Final    | 438   | 1-1-12 | 2      | Iteration | None  |

这是我想要的样子:

|  Name | 0.01 | 0.02 |
|:-----:|:----:|:----:|
| File1 | 123  | 369  |
| File1 | 456  | 258  |
| File1 | 789  | 147  |
| File2 | 312  | 753  |
| File2 | 645  | 498  |
| File2 | 978  | 951  |
| File3 | 741  | 753  |
| File3 | 852  | 915  |
| File3 | 963  | 438  |

中间和右侧的列标题源自原始的Version 列。中间列中的值是Value 列中的所有条目,它们对应于Version 列中的0.01Number 列中的1。右列中的值是Value 列中的所有条目,它们对应于Version 列中的0.02Number 列中的2

这是一个供参考的示例数据库,如果您想使用 R 复制它:

rw1 <- c("File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3", "File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3", "File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3")
rw2 <- c("0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03")
rw3 <- c("Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final")
rw4 <- c(123, 456, 789, 312, 645, 978, 741, 852, 963, 369, 258, 147, 753, 498, 951, 753, 915, 438, 978, 741, 852, 963, 369, 258, 147, 753, 498)
rw5 <- c("01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12")
rw6 <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)
rw7 <- c("Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Release", "Release", "Release", "Release", "Release", "Release", "Release", "Release", "Release")
rw8 <- c("None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "Cannot Connect to Database", "None", "None", "None", "None", "None", "None", "None", "None")


Testdf = data.frame(rw1, rw2, rw3, rw4, rw5, rw6, rw7, rw8)
colnames(Testdf) <- c("FileName", "Version", "Category", "Value", "Date", "Number", "Build", "Error") 

【问题讨论】:

  • 在您的示例表中,您有相同数量的 Versions 和 VersionNumber 对应。在您的示例中,R 数据都不是真的。我可以看出这在第一种情况下是如何有意义的,但考虑到示例 R 数据中的不相等数字,我认为这根本没有意义。
  • @Ista 我不认为我在听你说的话。
  • 我是说Testdf1有9行Version = 0.01Number = 1,8行Version = 0.02Number = 2,1行Version = 0.03Number = 2,和 9 行,其中 Version = 0.03Number = 3。您的原始示例表要简单得多:Version = 0.01Number = 1 的 9 行,Version = 0.02Number = 2 的 9 行。在更复杂的Testdf上,不清楚你描述的操作。
  • 这是一个错误。所有版本和编号的迭代都应该有 9 个条目。

标签: sql sql-server r sql-server-2008 sweave


【解决方案1】:

这是使用dplyrtidyr 的解决方案。选择相关变量。然后添加一个索引列以允许数据为spread,而不会出现重复索引的问题。然后用spread 重新整形数据,最后删除索引列。

library("dplyr")
library("tidyr")
Testdf %>%
  select(FileName, Version, Value) %>%
  group_by(FileName, Version) %>%
  mutate(Index = 1:n()) %>%
  spread(Version, Value) %>%
  select(-Index)

如果始终可以假设每个文件名将有 9 个值,每个版本和类别组合一个,那么这将起作用:

Testdf %>%
    select(FileName, Category, Version, Value) %>%
    spread(Version, Value) %>%
    select(-Category)

如果你想使用data.table,你可以这样做:

setDT(Testdf)[, split(Value, Version), by = FileName]

如果您想要 LaTeX 输出,那么您可以进一步将输出通过管道传输到 xtable::xtable

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-08-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-08-29
    • 1970-01-01
    相关资源
    最近更新 更多