如何在 R 中使用并行计算？答案

【问题标题】：How to use parallel computing in R?如何在 R 中使用并行计算？
【发布时间】：2020-03-19 09:51:17
【问题描述】：

    sect<-c("Healthcare","Basic Materials","Utilities","Financial Services","Technology","Consumer" 
    "Defensive","Industrials","Communication Services","Energy","Real Estate","Consumer 
    Cyclical","NULL")

    mcap<-c("3 - Large","2 - Mid","1 - Small")

    df_total = data.frame()
    start <- as.Date("01-01-14",format="%d-%m-%y")
    end   <- as.Date("18-03-20",format="%d-%m-%y")
    theDate <- start

    while (theDate <= end){
      for (value1 in sect){
        for (value2 in mcap){
            date=theDate
            sector<-value1
            marketcap1<-value2
            newquery("Select * from table where date='%s' and sector='%s' and marketcap='%s'",date,sector,marketcap1)
   topdemo <- sqlQuery(dbhandle,newquery)
   df=data.frame(topdemo)
   df_total <- rbind(df_total,df)

     }
    }
   theDate <- theDate + 1 
   }

在我的程序中，我进行了一些 SQL 计算，而不是“选择”查询。我需要这段代码从 2014 年运行到 2020 年，但执行它需要很多时间。有什么办法可以减少执行时间？该数据库为每个市值和行业提供了许多股票价格。

【问题讨论】：

是的。避免select *。只选择您需要的列。
我实际上对数据进行了大量计算。给出了“选择命令”以供参考。该程序基本上会遍历 2014 年以来的每个日期、市值和行业，并且计算的东西很少。如何减少时间？

标签： sql r sql-server parallel-processing database-management

【解决方案1】：

运行一个查询而不是所有循环：

select *
from table
where sector in ('Healthcare', 'Basic Materials', 'Utilities',
                 'Financial Services', 'Technology', 'Consumer' 
                 'Defensive', 'Industrials', 'Communication Services', 'Energy', 'Real Estate', 'Consumer Cyclical', 'NULL'
                 ) and
        marketcap in ('3 - Large', '2 - Mid', '1 - Small') and
        date between '2014-01-01 and '2020-03-18';

运行大量小查询会产生很多开销，通常一个会更好。

也就是说，您似乎正在移动大量数据。我想知道是否有必要进行所有数据移动。

奇怪的是，您正在循环数以千计的日期，但不包括查询中的日期。

【讨论】：

我在 where 子句中包含了日期。我需要经历每一天来计算某些事情。我运行程序 45 分钟。它只执行到 2014 年 4 月。
@Theguy 。 . .运行单个查询以将数据加载到数据框中。如果您需要循环遍历数据，请在 R 中执行此操作。请注意，如果操作可以用 SQL 表示，则最好在数据库中进行计算。