在 AWS Elastic Beanstalk 中使用 Resque 和/或 Redis 部署 Rails答案

【问题标题】：Deploying Rails with Resque and/or Redis in AWS Elastic Beanstalk在 AWS Elastic Beanstalk 中使用 Resque 和/或 Redis 部署 Rails
【发布时间】：2016-04-10 22:00:23
【问题描述】：

我正在尝试在 AWS Elastic Beanstalk 上以独立模式、Resque 和 Redis 使用 Websockets-Rails 部署我的 Rails 应用程序。 Ubuntu 14.04 服务器在 Puma 上运行 Ruby 2.2。

在 Puma 的开发模式下一切正常。我在 AWS Elastic Beanstalk 上的生产中遇到的错误似乎与 Redis 有关。

Redis::CannotConnectError (Error connecting to Redis on my.domain:6379 (ECONNREFUSED)):
  redis (3.2.0) lib/redis/client.rb:320:in `rescue in establish_connection'
  redis (3.2.0) lib/redis/client.rb:311:in `establish_connection'
  redis (3.2.0) lib/redis/client.rb:91:in `block in connect'
  redis (3.2.0) lib/redis/client.rb:273:in `with_reconnect'
  redis (3.2.0) lib/redis/client.rb:90:in `connect'
  redis (3.2.0) lib/redis/client.rb:337:in `ensure_connected'
  redis (3.2.0) lib/redis/client.rb:204:in `block in process'
  redis (3.2.0) lib/redis/client.rb:286:in `logging'
  redis (3.2.0) lib/redis/client.rb:203:in `process'
  redis (3.2.0) lib/redis/client.rb:109:in `call'
  redis (3.2.0) lib/redis.rb:1874:in `block in hget'
  redis (3.2.0) lib/redis.rb:37:in `block in synchronize'
  /opt/rubies/ruby-2.2.3/lib/ruby/2.2.0/monitor.rb:211:in `mon_synchronize'
  redis (3.2.0) lib/redis.rb:37:in `synchronize'
  redis (3.2.0) lib/redis.rb:1873:in `hget'
  redis-objects (1.2.1) lib/redis/hash_key.rb:29:in `hget'
  /opt/rubies/ruby-2.2.3/lib/ruby/gems/2.2.0/bundler/gems/websocket-rails-cf5d59b671c5/lib/websocket_rails/synchronization.rb:184:in `block in find_user'

有时我会收到 Redis::TimeoutError（我似乎无法再重现此内容）。

我为 Redis 和 Resque 添加了 pre appdeploy 脚本：

# .ebextensions/redis_server.config
files:
  "/opt/elasticbeanstalk/hooks/appdeploy/pre/14_redis_server.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash
      . /opt/elasticbeanstalk/support/envvars
      cd $EB_CONFIG_APP_ONDECK
      su -c "leader_only redis-server" $EB_CONFIG_APP_USER ||
      echo "Redis server startup failed, skipping."
      true

# .ebextensions/resque_workers.config
files:
  "/opt/elasticbeanstalk/hooks/appdeploy/pre/16_resque_workers.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash
      . /opt/elasticbeanstalk/support/envvars
      cd $EB_CONFIG_APP_ONDECK
      su -c "leader_only TERM_CHILD=1 QUEUES=* rake environment resque:work & rake environment resque:scheduler" $EB_CONFIG_APP_USER ||
      echo "Resque initialization failed, skipping."
      true

我怀疑这可能是由于 Redis 没有实际部署，但我不确定如何检查它是否是。

在 Elastic Beanstalk 上部署时启动 Redis 和其他 rake 任务（如 Resque）的正确方法是什么？

另一个可能的问题是在 Redis 中使用 websockets。我在某处读到需要修改nginx.conf 及其升级标头标签以允许Websockets，但我不确定这是否是此问题的直接原因。

编辑：

Redis 现在在 Elasticache 上运行。我不再收到任何 Redis 连接错误，但 Resque 和 Websockets 似乎无法正常工作。我不认为是 Redis 导致了这个问题，但可能是 Resque 和 Websockets 的孤立问题。

我尝试使用监控脚本来确保 Resque 调度程序和工作人员持续存在：

packages:
  yum:
    monit: []

files:
  "/etc/monit.d/resque_worker":
    mode: "000644"
    owner: root
    group: root
    content: |
      check process resque_worker_QUEUE
        with pidfile /var/app/resque_worker_QUEUE.pid
        start program = "/bin/sh -l -c 'cd /var/app/current; nohup rake environment resque:scheduler PIDFILE=/var/app/resque_scheduler.pid >> log/resque_scheduler.log 2>&1' && nohup rake nohup rake environment resque:work TERM_CHILD=1 QUEUE=* VERBOSE=1 PIDFILE=/var/app/resque_worker_QUEUE.pid >> log/resque_worker_QUEUE.log 2>&1'" as uid webapp and gid webapp
        stop program = "/bin/sh -c 'cd /var/app/current && kill -9 $(cat tmp/pids/resque_scheduler.pid) && rm -f /var/app/resque_scheduler.pid && kill -9 $(cat tmp/pids/resque_worker_QUEUE.pid) && rm -f /var/app/resque_worker_QUEUE.pid; exit 0;'"
        if totalmem is greater than 300 MB for 10 cycles then restart  # eating up memory?
        group resque_workers

commands:
  remove_bak:
    command: "rm /etc/monit.d/resque_worker.bak"
    ignoreErrors: true

service:
  sysvinit:
    monit:
      ensureRunning: true
      enabled: true

这似乎不起作用。

它在开发中工作，因为我手动运行命令并且实例没有被破坏。

我还需要在端口 3001 上为 Websockets 保留一个独立服务器（我正在使用 gem 'Websocket-Rails'）。

【问题讨论】：

您在 beanstalk 中运行 redis 而不是使用 Elasticache 有什么原因吗？从 beanstalk 虚拟机内部运行 redis 的问题是 redis 会反复上下。它也将有多个，这意味着您的连接将通过负载均衡器并最终到达一个随机的 redis 实例。我认为这不会像你想象的那样奏效。
我已经启动并运行了一个 Redis Elasticache，它似乎不再产生错误。只要 EB 环境启动，有什么方法可以启动 Resque 工作人员并持续使用它们？ Resque 延迟/排队的任务似乎没有通过。
通过 .pid 文件进行简单的 PID 管理，您可以管理工作人员。我不知道 resque 的具体细节
@Richard 您现在面临的具体问题是什么？
我不完全确定如何在 EB 上为 Resque 保留工作人员和调度程序，尤其是因为它会自动平衡。我已经用我尝试使用的监控脚本编辑了我的答案。

标签： ruby-on-rails amazon-web-services redis amazon-elastic-beanstalk

【解决方案1】：

我认为在 Elastic Beanstalk Web 环境中运行队列进程不是一个好主意。我认为使用 vanilla EC2 实例来托管队列进程更有意义。但是，您也可以使用 Amazon Simple Queue Service (SQS) 代替 Resque。这样您就不必监控和维护队列实例，并且拥有非常可扩展的解决方案。

如果您在 Rails >= 4.2 应用程序中使用 Resque 来协调后台作业，请查看 Active Elastic Job gem。它可能会以一种优雅的方式解决您的问题。

免责声明：我是Active Elastic Job的作者。

【讨论】：

我认为我需要使用 Resque，因为我必须在很远的将来排队支付工作。如果我要使用 vanilla EC2 实例，我会处理与 REST 端点的交叉通信吗？ IE。我是否会向 vanilla EC2 实例发送 HTTP 请求以对代理作业进行排队，以便在代理作业运行时将 HTTP 请求发送回 EB 以处理实际作业？
确实，您无法使用 SQS 队列将作业安排到未来。但是，对于定期作业，我会使用 cron 守护程序，例如 Linux 系统中的 crond。工作环境允许以类似的方式运行 cronjobs，在 worker environment docu 中搜索“定期任务”。关于香草 EC2 方法：查看 Resque README 中的Standalone section。它解释了如何使用 Web 前端运行 Resque。