查询完成后如何在 apache Drill 上释放堆内存？答案

【问题标题】：How to release heap memory on apache drill once the query is complete?查询完成后如何在 apache Drill 上释放堆内存？
【发布时间】：2019-09-28 23:55:10
【问题描述】：

问题很简单，每次在drill上查询，堆内存一直在累积。我的堆内存是 7 GB，但它没有被刷新。每 15 分钟后我必须杀死钻并再次启动它以清除堆内存。

当前配置：

-) 我在单个节点上运行 apache Drill。使用 R 包 'sergeant' 在钻头上执行查询，通常，镶木地板文件是目标文件。当前操作系统是 windows 7 企业版。 -) 我们首先使用 src_drill 构建查询，然后使用 drl_con 执行查询。构建查询然后执行查询的架构是一种架构选择，因为我们希望应用程序能够在不同的查询引擎之间切换，例如 sql、hive、spark 等。

library(sergeant)

# setting up drill query, I do not use collect() here
ds <- src_drill("localhost") 
query <- tbl(ds, "cp.`employee.json`") 
query %<>% dbplyr::sql_render()


# using drill con to execute the query
drl_con <- drill_connection("localhost") 
Mapping <- drill_query(drl_con, query, .progress = FALSE)

##  # A tibble: 100 x 16
##     employee_id full_name first_name last_name position_id position_title store_id department_id birth_date hire_date
##     <chr>       <chr>     <chr>      <chr>     <chr>       <chr>          <chr>    <chr>         <chr>      <chr>    
##   1 1           Sheri No… Sheri      Nowmer    1           President      0        1             1961-08-26 1994-12-…
##   2 2           Derrick … Derrick    Whelply   2           VP Country Ma… 0        1             1915-07-03 1994-12-…
##   3 4           Michael … Michael    Spence    2           VP Country Ma… 0        1             1969-06-20 1998-01-…
##   4 5           Maya Gut… Maya       Gutierrez 2           VP Country Ma… 0        1             1951-05-10 1998-01-…
##   5 6           Roberta … Roberta    Damstra   3           VP Informatio… 0        2             1942-10-08 1994-12-…
##   6 7           Rebecca … Rebecca    Kanagaki  4           VP Human Reso… 0        3             1949-03-27 1994-12-…
##   7 8           Kim Brun… Kim        Brunner   11          Store Manager  9        11            1922-08-10 1998-01-…
##   8 9           Brenda B… Brenda     Blumberg  11          Store Manager  21       11            1979-06-23 1998-01-…
##   9 10          Darren S… Darren     Stanz     5           VP Finance     0        5             1949-08-26 1994-12-…
##  10 11          Jonathan… Jonathan   Murraiin  11          Store Manager  1        11            1967-06-20 1998-01-…
##  # … with 90 more rows, and 6 more variables: salary <chr>, supervisor_id <chr>, education_level <chr>,
##  #   marital_status <chr>, gender <chr>, management_role <chr>

理想情况下，我希望钻在每次查询后自行对堆内存进行垃圾收集，但现在它没有发生。

【问题讨论】：

标签： heap-memory parquet apache-drill

【解决方案1】：

Apache Drill 有自己的内存管理器。在任务管理器上它永远不会释放堆内存，但在后台它会在堆内存满后开始重用。

如果您遇到内存问题，您可能会过度使用其他一些内存参数，例如分配给单个查询的总内存等。

堆内存的回收不是您应该担心的事情。详情请参考：https://books.google.com.au/books?id=-Tp7DwAAQBAJ&printsec=frontcover&dq=apache+drill+nook&hl=en&sa=X&ved=0ahUKEwil7LeJuPzkAhXKZSsKHUDoBw4Q6AEIKjAA#v=onepage&q&f=false

【讨论】：