如何从 SQL Server R 存储过程返回列名和汇总值？答案

【问题标题】：How to return Column names and Summary Values from SQL Server R Stored Procedure?如何从 SQL Server R 存储过程返回列名和汇总值？
【发布时间】：2018-06-12 03:21:23
【问题描述】：

我创建了一个使用dplyr 库的存储过程，目的是按StudyID 和ProductNumber 分组。我想返回字段 c1-c8 中每个值的平均值以及相同字段的标准偏差。

我的存储过程如下：

ALTER PROCEDURE [dbo].[spCodeMeans]
    @StudyID INT,
    @StudyID_outer INT OUT
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @inquery NVARCHAR(MAX) = N'Select
        c.StudyID, c.RespID, c.ProductNumber, c.ProductSequence, c.BottomScaleValue, 
        c.BottomScaleAnchor, c.TopScaleValue, c.TopScaleAnchor, c.StudyDate,
        c.DayOfWeek, c.A, c.B, c.C, c.D, c.E, c.F,
        c.DependentVarYN, c.VariableAttributeID, c.VarAttributeName, c.[1] as c1, 
        c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6, c.[7] as c7, c.[8] as c8
        from ClosedStudyResponses c
        --Sensory Value Attributes only for mean and standard deviation analytics.
        where VariableAttributeID = 1
        and c.StudyID = 22'
        ;

    BEGIN TRY
        EXEC sp_execute_external_script
                @language = N'R',
                @script = N'
        library(dplyr)
            OutputDataSet <- InputDataSet %>%
                group_by (StudyID, ProductNumber) %>%
                summarise(c1_mean = mean(c1), c2_mean = mean(c2), c3_mean = mean(c3), c4_mean = mean(c4), c5_mean = mean(c5), c6_mean = mean(c6), 
                c7_mean = mean(c7), c8_mean = mean(c8), c1_sd = sd(c1), c2_sd = sd(c2), c3_sd = sd(c3), c4_sd = sd(c4), c5_sd = sd(c5), c6_sd = sd(c6), 
                c7_sd = sd(c7), c8_sd = sd(c8)) %>%
            `colnames<-`(c("StudyID", "ProductNumber","c1_mean","c2_mean","c3_mean","c4_mean","c5_mean","c6_mean","c7_mean",
            "c8_mean","c1_sd","c2_sd","c3_sd","c4_sd","c5_sd","c6_sd","c7_sd","c8_sd"))',
                @input_data_1 = @inquery
    END TRY
    BEGIN CATCH
        THROW;
    END CATCH

结果不包括列名，也不包括字段 c1 到 c8 的平均值和标准差。如何调整我的语法以完成此操作？

更新：根据建议，我修改了存储过程以将结果插入到临时表中。修改后的语法如下：

ALTER PROCEDURE [dbo].[spCodeMeans]
-- Add the parameters for the stored procedure here
@StudyID int,
@StudyID_outer int OUT


AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

--Create temptable to store the outputdataset
Create table #temp_table (
    StudyID int,
    ProductNumber int,
    c1_mean decimal,
    c2_mean decimal,
    c3_mean decimal,
    c4_mean decimal,
    c5_mean decimal,
    c6_mean decimal,
    c7_mean decimal,
    c8_mean decimal,

    c1_sd decimal,
    c2_sd decimal,
    c3_sd decimal,
    c4_sd decimal,
    c5_sd decimal,
    c6_sd decimal,
    c7_sd decimal,
    c8_sd decimal
);

-- Insert statements for procedure here
Declare @inquery nvarchar(max) = N'Select
        c.StudyID, c.RespID, c.ProductNumber, c.ProductSequence, c.BottomScaleValue, 
        c.BottomScaleAnchor, c.TopScaleValue, c.TopScaleAnchor, c.StudyDate,
        c.DayOfWeek, c.A, c.B, c.C, c.D, c.E, c.F,
        c.DependentVarYN, c.VariableAttributeID, c.VarAttributeName, c.[1] as c1, 
        c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6, c.[7] as c7, c.[8] as c8
        from ClosedStudyResponses c
        --Sensory Value Attributes only for mean and standard deviation analytics.
        where VariableAttributeID = 1
        and c.StudyID = 22'
        ;
 BEGIN TRY
        Insert into #temp_table
        exec sp_execute_external_script
        @language = N'R',
        @script = N'
        library(dplyr)
            OutputDataSet <- InputDataSet %>%
                group_by (StudyID, ProductNumber) %>%
                summarise_all(.funs=c(mean, sd)) %>%
                setNames(c("StudyID","ProductNumber",
                paste0("c",1:8, "_mean"),
                paste0("c",1:8, "_sd")))
            ',
@input_data_1 = @inquery,
@output_data_1 = N'OutputDataSet';

END TRY

BEGIN CATCH
    THROW;
END CATCH

Select * from #temp_table;
END

当我尝试运行该过程时，我收到一条错误消息，指出 Procedure expects parameter '@params' of type 'ntext/nchar/nvarchar'. 请注意，我已经将 @inquery 声明为 NVARCHAR(Max)。是否还有其他步骤我忽略了？

更新 #2：我花了一些时间重新处理存储过程，发现输出需要采用 data.frame 的格式。我相应地修改了存储过程的 R 部分。现在我的列名出现了，但没有返回数据、平均值或标准差。当前存储过程如下：

ALTER PROCEDURE [dbo].[spCodeMeans]
-- Add the parameters for the stored procedure here
@StudyID int

AS
BEGIN


-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;



-- Insert statements for procedure here
Declare @sStudy varchar(50)
Set @sStudy = Convert(Varchar(50),@StudyID)
Declare @inquery nvarchar(max) = N'Select
        c.StudyID, c.RespID, c.ProductNumber, c.ProductSequence, c.BottomScaleValue, 
        c.BottomScaleAnchor, c.TopScaleValue, c.TopScaleAnchor, c.StudyDate,
        c.DayOfWeek, c.A, c.B, c.C, c.D, c.E, c.F,
        c.DependentVarYN, c.VariableAttributeID, c.VarAttributeName, c.[1] as c1, 
        c.[2] as c2, c.[3] as c3, c.[4] as c4, c.[5] as c5, c.[6] as c6, c.[7] as c7, c.[8] as c8
        from ClosedStudyResponses c
        --Sensory Value Attributes only for mean and standard deviation analytics.
        where VariableAttributeID = 1
        and c.StudyID =' +@sStudy ;

BEGIN TRY
        --Insert into CodeMeans
        exec sp_execute_external_script
        @language = N'R',
        @script = N'
        library(dplyr)
        codemeans <- function(StudyID){
            res <- InputDataSet %>%
                group_by (StudyID, ProductNumber) %>%
                summarise_all(.funs=c(mean, sd)) %>%
                setNames(c("StudyID","ProductNumber",
                paste0("c",1:8, "_mean"),
                paste0("c",1:8, "_sd")))
            df <- data.frame(res)
            }
            ',
@input_data_1 = @inquery,
@output_data_1_name = N'df',
@params = N'@StudyID int',
@StudyID = @StudyID

END TRY

BEGIN CATCH
    THROW;
END CATCH

Select * from CodeMeans;
END

因此，此时，我正在寻求有关如何从基本查询返回输出以及均值和标准差的指导。

【问题讨论】：

那么输出包括什么？另外，为什么不在 SQL 中运行标准聚合查询呢？为什么使用 R？支持均值和标准差aggregate functions。

标签： r sql-server tsql dplyr

【解决方案1】：

目前，您没有定义输出集，也没有将返回值插入到预定义的表中。考虑使用临时表进行以下调整（根据需要调整以下类型）。还可以考虑使用dplyr::summarise_all() 进行多次聚合调用，setNames 保持管道流动，paste 避免名称重复。

ALTER PROCEDURE [dbo].[spCodeMeans]
-- Add the parameters for the stored procedure here
@StudyID int,
@StudyID_outer int OUT

AS
BEGIN
   -- SET NOCOUNT ON added to prevent extra result sets from
   -- interfering with SELECT statements.
   SET NOCOUNT ON;


   CREATE TABLE #tmp_table -- LOCAL TEMP TABLE (EXISTS ONLY IN SESSION)
   (
        StudyID INTEGER,
        ProductNumber VARCHAR(50),
        c1_mean DECIMAL(10,2), c2_mean DECIMAL(10,2), c3_mean DECIMAL(10,2), c4_mean DECIMAL(10,2),
        c5_mean DECIMAL(10,2), c6_mean DECIMAL(10,2), c7_mean DECIMAL(10,2), c8_mean DECIMAL(10,2),
        c1_sd DECIMAL(10,2), c2_sd DECIMAL(10,2), c3_sd DECIMAL(10,2), c4_sd DECIMAL(10,2),
        c5_sd DECIMAL(10,2), c6_sd DECIMAL(10,2), c7_sd DECIMAL(10,2), c8_sd DECIMAL(10,2)
   );

   DECLARE @inquery nvarchar(max) = -- ...same as before...

   BEGIN TRY
        -- ACTION QUERY
        INSERT INTO @temp_table
        EXEC sp_execute_external_script
        @language = N'R',
        @script = N'
        library(dplyr)
          OutputDataSet <- InputDataSet %>%
            group_by (StudyID, ProductNumber) %>%
            summarise_all(.funs=c(mean, sd)) %>%
            setNames(c("StudyID", "ProductNumber", 
                       paste0("c", 1:8, "_mean"), 
                       paste0("c", 1:8, "_sd")))',
        @input_data_1 = @inquery,
        @output_data_1 = N'OutputDataSet';

   END TRY

   BEGIN CATCH
       THROW;
   END CATCH

   -- RESULTSET QUERY
   SELECT * FROM @temp_table;

END

【讨论】：

感谢您的回答。当我按照上面修改我的存储过程时，我收到过程需要类型为“ntext/nchar/nvarchar”的参数“@params”。还需要哪些更正？
您是否复制了相同的SELECT 字符串，正如此处为@inquery 建议的那样？这个想法是替换-- ...same as before...。
是的。这是相同的查询语法。请参阅修改后的问题。任何指导都是最有帮助的。谢谢！