使用二进制堆实现的优先级队列的事件驱动仿真答案

【问题标题】：Event driven simulation using priority queue implemented with binary heap使用二进制堆实现的优先级队列的事件驱动仿真
【发布时间】：2019-02-25 01:09:29
【问题描述】：

我需要模拟给定的一组任务的执行。这意味着您需要随时跟踪哪些任务在任何给定时间点处于活动状态，并在它们完成时将它们从活动列表中删除。

我需要使用 priority queue 来解决这个问题，使用 binary heap 来实现。

输入由一组按开始时间升序给出的任务组成，每个任务都有一个相关的持续时间。第一行是任务数，例如

这意味着有 3 个任务。第一个从时间 2 开始，到 7 (2+5) 结束。第二个从 4 开始，到 27 结束。第三个从 7 开始，到 11 结束。

我们可以跟踪活动任务的数量：

Time       #tasks
0 - 2        0
2 - 4        1
4 - 11       2
11 - 27      1

我需要找到：

任何给定时间的最大活动任务数（在本例中为 2）和
此处计算的整个持续时间的平均活动任务数如下：

[ 0*(2-0) + 1*(4-2) + 2*(11-4) + 1*(27-11) ] / 27

我编写了以下代码将时间值读入结构：

#include "stdio.h"
#include "stdlib.h"

typedef struct
{
    long int start;
    long int end;
    int dur;
} task;

int main()
{
    long int num_tasks;
    scanf("%ld",&num_tasks);
    task *t = new task[num_tasks];
    for (int i=0;i<num_tasks;i++)
    {
        scanf("%ld %d",&t[i].start,&t[i].dur);
        t[i].end = t[i].start + t[i].dur;
    }
}

我想了解如何将其实现为堆优先级队列并从堆中获取必要的输出。

【问题讨论】：

我只使用 C 库在 C++ 中编写了代码，但使用了新命令进行内存分配。
如果您要的是共享软件库的链接，那么这个问题就跑题了。如果你打算实现堆，这里是place to start。
我的问题是，在为这个问题实现堆时，我应该使用什么属性进行排序以及何时插入和删除。总之，一个简短的伪代码将帮助我实现它。
谈论“C/C++”这件事确实让人们感到不安，因为 C 和 C++ 是两种不同的语言，对相同的问题有两种不同的解决方案。如果你想要 C 代码，请询问 C。例如，C++ 有一个内置的priority queue，而 C 没有。
感谢您的编辑！会记住这一点。

标签： c heap priority-queue event-driven event-simulation

【解决方案1】：

既然你说伪代码对你来说就足够了，我相信你的话。以下是在 Ruby 中实现的，类似于可运行的伪代码。我也对它进行了相当广泛的评论。

这里概述的方法只需要一个优先级队列。您的模型在概念上围绕两个事件 - 任务何时开始，何时结束。一种非常灵活的离散事件实现机制是使用优先级队列来存储事件通知，按事件触发的时间排序。每个事件都被实现为一个单独的方法/函数，它执行与事件相关的任何状态转换，并且可以通过将事件通知放在优先级队列中来安排进一步的事件。然后，您需要一个执行循环，该循环不断将事件通知从优先级队列中拉出，将时钟更新为当前事件的时间，并调用相应的事件方法。有关此方法的更多信息，请参阅this paper。本文在 Java 中实现了这些概念，但它们可以（并且正在）用许多其他语言实现。

事不宜迟，以下是适用于您的案例的有效实现：

# User needs to type "gem install simplekit" on the command line to
# snag a copy of this library from the public gem repository
require 'simplekit' # contains a PriorityQueue implementation

# define an "event notice" structure which stores the tag for an event method,
# the time the event should occur, and what arguments are to be passed to it.
EVENT_NOTICE = Struct.new(:event, :time, :args) {
  include Comparable
  def <=>(other)    # define a comparison ordering based on my time vs other's time
    return time - other.time  # comparison of two times is positive, zero, or negative
  end
}

@pq = PriorityQueue.new    # @ makes globally shared (not really, but close enough for illustration purposes)
@num_tasks = 0      # number of tasks currently active
@clock = 0          # current time in the simulation

# define a report method
def report()
  puts "#{@clock}: #{@num_tasks}"  # print current simulation time & num_tasks
end

# define an event for starting a task, that increments the @num_tasks counter
# and uses the @clock and task duration to schedule when this task will end
# by pushing a suitable EVENT_NOTICE onto the priority queue.
def start_task(current_task)
  @num_tasks += 1
  @pq.push(EVENT_NOTICE.new(:end_task, @clock + current_task.duration, nil))
  report()
end

# define an event for ending a task, which decrements the counter
def end_task(_)   # _ means ignore any argument
  @num_tasks -= 1
  report()
end

# define a task as a suitable structure containing start time and duration
task = Struct.new(:start, :duration)

# Create a set of three tasks.  I've wired them in, but they could
# be read in or generated dynamically.
task_set = [task.new(2, 5), task.new(4, 23), task.new(7, 4)]

# Add each of the task's start_task event to the priority queue, ordered
# by time of occurrence (as specified in EVENT_NOTICE)
for t in task_set
  @pq.push(EVENT_NOTICE.new(:start_task, t.start, t))
end

report()
# Keep popping EVENT_NOTICE's off the priority queue until you run out. For
# each notice, update the @clock and invoke the event contained in the notice
until @pq.empty?
  current_event = @pq.pop
  @clock = current_event.time
  send(current_event.event, current_event.args)
end

我使用 Ruby 是因为虽然它看起来像伪代码，但它实际上会运行并产生以下输出：

C 实现

我终于抽出一些时间来复习 20 年前的技能并在 C 中实现它。结构与 Ruby 的结构非常相似，但还有很多细节需要管理。我已将此因素考虑到模型、仿真引擎和堆中，以表明执行循环不同于任何特定模型的细节。这是模型实现本身，它说明了构建模型的“事件即函数”方向。

model.c

#include <stdio.h>
#include <stdlib.h>
#include "sim_engine.h"

// define a task as a suitable structure containing start time and duration
typedef struct {
  double start;
  double duration;
} task;

// stamp out new tasks on demand
task* new_task(double start, double duration) {
  task* t = (task*) calloc(1, sizeof(task));
  t->start = start;
  t->duration = duration;
  return t;
}

// create state variables
static int num_tasks;

// provide reporting
void report() {
  // print current simulation time & num_tasks
  printf("%8.3lf: %d\n", sim_time(), num_tasks);
}

// define an event for ending a task, which decrements the counter
void end_task(void* current_task) {
  --num_tasks;
  free(current_task);
  report();
}

// define an event for starting a task, that increments the num_tasks counter
// and uses the task duration to schedule when this task will end.
void start_task(void* current_task) {
  ++num_tasks;
  schedule(end_task, ((task*) current_task)->duration, current_task);
  report();
}

// all event graphs must supply an initialize event to kickstart the process.
void initialize() {
  num_tasks = 0;      // number of tasks currently active
  // Create an initial set of three tasks.  I've wired them in, but they could
  // be read in or generated dynamically.
  task* task_set[] = {
    new_task(2.0, 5.0), new_task(4.0, 23.0), new_task(7.0, 4.0)
  };
  // Add each of the task's start_task event to the priority queue, ordered
  // by time of occurrence.  In general, though, events can be scheduled
  // dynamically from trigger events.
  for(int i = 0; i < 3; ++i) {
    schedule(start_task, task_set[i]->start, task_set[i]);
  }
  report();
}

int main() {
  run_sim();
  return 0;
}

请注意布局与 Ruby 实现的高度相似。除了具有浮点时间之外，输出与 Ruby 版本相同。（如果需要，Ruby 也会给出小数位，但 OP 给出的试用任务是不必要的。）

接下来是模拟引擎头文件和实现。请注意，这是为了将模型构建器与直接使用优先级队列隔离开来。细节由schedule() 前端处理，将事物放入事件列表，执行循环在正确的时间点提取和运行事物。

sim_engine.h

typedef void (*event_p)(void*);

void initialize();
void schedule(event_p event, double delay, void* args);
void run_sim();
double sim_time();

sim_engine.c

#include <stdlib.h>
#include "sim_engine.h"
#include "heap.h"

typedef struct {
  double time;
  event_p event;
  void* args;
} event_notice;

static heap_t *event_list;
static double sim_clock;

// return the current simulated time on demand
double sim_time() {
  return sim_clock;
}

// schedule the specified event to occur after the specified delay, with args
void schedule(event_p event, double delay, void* args) {
  event_notice* en = (event_notice*) calloc(1, sizeof(event_notice));
  en->time = sim_clock + delay;
  en->event = event;
  en->args = args;
  push(event_list, en->time, en);
}

// simulation executive loop.
void run_sim() {
  event_list = (heap_t *) calloc(1, sizeof(heap_t));
  sim_clock = 0.0;     // initialize time in the simulation

  initialize();

  // Keep popping event_notice's off the priority queue until you run out. For
  // each notice, update the clock, invoke the event contained in the notice,
  // and cleanup.
  while(event_list->len > 0) {
    event_notice* current_event = pop(event_list);
    sim_clock = current_event->time;
    current_event->event(current_event->args);
    free(current_event);
  }
}

最后，优先级队列实现完全从 Rosetta 代码中解脱出来，重构并切换到使用 void* 来处理数据负载而不是字符串。

堆.h

typedef struct {
    double priority;
    void *data;
} node_t;

typedef struct {
    node_t *nodes;
    int len;
    int size;
} heap_t;

void push(heap_t *h, double priority, void *data);
void *pop(heap_t *h);

堆.c

#include <stdlib.h>
#include "heap.h"

void push(heap_t *h, double priority, void *data) {
    if (h->len + 1 >= h->size) {
        h->size = h->size ? h->size * 2 : 4;
        h->nodes = (node_t *)realloc(h->nodes, h->size * sizeof (node_t));
    }
    int i = h->len + 1;
    int j = i / 2;
    while (i > 1 && h->nodes[j].priority > priority) {
        h->nodes[i] = h->nodes[j];
        i = j;
        j = j / 2;
    }
    h->nodes[i].priority = priority;
    h->nodes[i].data = data;
    h->len++;
}

void *pop(heap_t *h) {
    int i, j, k;
    if (!h->len) {
        return NULL;
    }
    void *data = h->nodes[1].data;

    h->nodes[1] = h->nodes[h->len];

    h->len--;

    i = 1;
    while (i!=h->len+1) {
        k = h->len+1;
        j = 2 * i;
        if (j <= h->len && h->nodes[j].priority < h->nodes[k].priority) {
            k = j;
        }
        if (j + 1 <= h->len && h->nodes[j + 1].priority < h->nodes[k].priority) {
            k = j + 1;
        }
        h->nodes[i] = h->nodes[k];
        i = k;
    }
    return data;
}

底线：这种事件调度方法非常灵活，并且可以非常简单地为优先级队列和模拟引擎实现给定的实现。如您所见，引擎实际上非常简单。

【讨论】：

【解决方案2】：

这个问题可以通过使用两个堆来解决，一个包含开始时间，另一个包含结束时间。读取任务时，将开始时间和结束时间添加到两个堆中。那么算法是这样的：

number_of_tasks = 0

while start_heap not empty
    if min_start_time < min_end_time
       pop min_start_time
       number_of_tasks += 1    
    else if min_end_time < min_start_time
       pop min_end_time
       number_of_tasks -= 1
    else 
       pop min_start_time
       pop min_end_time

while end_heap not empty
       pop min_end_time
       number_of_tasks -= 1

【讨论】：

只是为了确保我理解正确，min_start_time 和 min_end_time 是堆的根元素对吗？
@fereydoon318 是的，没错。堆通常使用数组实现，因此很容易查看根元素，因为它是数组中的第一个元素。
如何使用此方法计算正在执行的平均任务数？这部分是在整个持续时间内运行的外循环吗？
@fereydoon318 考虑计算 [0*(2-0) + ... 要进行该计算，您需要三个信息：正在运行的任务数（更新 number_of_tasks 之前）、当前事件的时间和上一个事件的时间。所以你需要两个额外的变量，一个用来保存总和，另一个用来保存前一个事件的时间。然后，如果您从start_heap 弹出，则计算为sum += number_of_tasks * (min_start_time - previous_time)。 end_heap 类似。当从两个堆中弹出时（else 子句），不需要更新总和。