与 Python 相比，具有动态分配的结构数组在 C 中运行速度非常慢答案

【问题标题】：Array of structs with dynamic allocation runs very slow in C in comparison to Python与 Python 相比，具有动态分配的结构数组在 C 中运行速度非常慢
【发布时间】：2019-09-30 00:44:49
【问题描述】：

我将一个小型 C 程序“粘合”在一起（在其他 SO 用户的帮助下），该程序将字符串标签映射到边缘列表数据结构中的整数标签。例如，对于输入文件

Mike Andrew
Mike Jane
John Jane

程序输出

1 2
1 3
4 3

但是，我映射了巨大的边缘列表文件，不幸的是，与 Python 替代方案相比，该程序的运行速度非常慢。下面我用 C 和 Python 粘贴了这两个程序。恳请指点如何提高 C 程序的速度。

#include <stdio.h>
#include <stdlib.h>

// Initial number of maximal lines in a file
enum { MAXL = 200};

typedef struct {
    unsigned int first;
    unsigned int second;
} edge;

typedef struct {
    unsigned int hashed;
     char **map;
} hash;


int insertInMap(hash *map, char *entry)
{
  int i =0;
  for (i=0;i<map->hashed;i++)
  {
    if (strcmp(map->map[i],entry) == 0)
    return i+1;
  }
  /* Warning no boundary check is added */
  map->map[map->hashed++] = strdup(entry);   
  return map->hashed;
}

int main() {
  FILE *fp = NULL;
  char node1[30];
  char node2[30];
  int idx = 0;
  int i, n = 0, maxl = MAXL;

  edge *edges;
  hash map;

  edges = malloc(MAXL * sizeof(edge));
  map.map = malloc(MAXL * sizeof(char*));
  map.hashed = 0;

  fp = fopen("./test.txt", "r");

  while (fscanf(fp, "%s %s", &node1, &node2) == 2) {
    if (++n == maxl) { /* if limit reached, realloc lines  */
      void *tmp = realloc (edges, (maxl + 40) * sizeof *edges);
      void *tmp1 = realloc (map.map, (maxl + 80) * sizeof(char*));
      if (!tmp) {     /* validate realloc succeeded */
        fprintf (stderr, "error: realloc - virtual memory exhausted.\n");
        break;      /* on failure, exit with existing data */
      }
      edges = tmp;    /* assign reallocated block to lines */

      map.map = tmp1;
      maxl += 40;     /* update maxl to reflect new size */
    }
    edges[idx].first = insertInMap(&map,node1);
    edges[idx].second = insertInMap(&map,node2);
    idx++;
  }

  fclose(fp);

  for (int i = 0; i < idx; i++) {
    printf("%d -- %d\n", edges[i].first, edges[i].second);
  }


  free(edges);

  return 0;
}

相应的 Python 替代方案：

import fileinput

i = 0
cui2int = {}

for line in fileinput.input():    
    (cui1, cui2) = line.split()
    if cui1 in cui2int:
        int1 = cui2int[cui1]
    else:
        i += 1
        cui2int[cui1] = i
        int1 = i

    if cui2 in cui2int:
        int2 = cui2int[cui2]
    else:
        i += 1
        cui2int[cui2] = i
        int2 = i

    print(int1, int2)

编辑和添加

以下是使用 GLib 哈希实现的修改代码。我提高了性能，但不幸的是，输出仍然有问题，应该是

1 2
1 3
4 3

而不是

0 0
0 1
1 1

有人可以看看吗。

#include <stdio.h>
#include <stdlib.h>
#include <glib.h>
#include <stdint.h>

int main() {
  GHashTable *table;
  table = g_hash_table_new(g_int_hash, g_int_equal);

  FILE *fp = NULL;
  char node1[30];
  char node2[30];

  fp = fopen("./test.txt", "r");
  int i = 0;
  while (fscanf(fp, "%s %s", &node1, &node2) == 2) {
    char *key1 = malloc(sizeof(char)*1024);
    char *key2 = malloc(sizeof(char)*1024);
    uint32_t* value = (uint32_t *)malloc(sizeof(uint32_t));
    key1 = g_strdup(node1);
    key2 = g_strdup(node2);
    *value = i;

    uint32_t *x;
    if (g_hash_table_contains(table, key1)) {
      x = (uint32_t *)g_hash_table_lookup(table, key1);
    } else {
      i++;
      g_hash_table_insert(table, (gpointer)key1, (gpointer)value);
      x = (uint32_t *)value;
    }

    uint32_t *y;
    if (g_hash_table_contains(table, key2)) {
      y = (uint32_t *)g_hash_table_lookup(table, key2);
    } else {
      g_hash_table_insert(table, (gpointer)key2, (gpointer)value);
      y = (uint32_t *)value;
    }
    printf("%d -- %d\n", *x, *y);
  }

  fclose(fp);

  g_hash_table_destroy(table);
  table = NULL;
  return 0;
}

【问题讨论】：

因为 Python 版本不像 C 版本那样对每个条目进行线性搜索。即使有像“hash”和“map”这样的词，C 版本也只是一个线性搜索的字符串数组。会很慢。
您在程序中使用了“散列”一词，但实际上并没有进行散列，只是线性搜索非常慢。您需要找出“哈希”的含义并实现哈希表。
@m.raynal 当然可以，但首先使用哈希。
为第一个 C 示例发布的代码导致大量严重警告，编译时，始终启用警告，然后修复这些警告。（对于gcc，至少使用：-Wall -Wextra -Wconversion -pedantic -std=gnu11）注意：其他编译器使用不同的选项来产生相同的结果。
OT：关于：uint32_t* value = (uint32_t *)malloc(sizeof(uint32_t)); 1) 在 C 中，堆分配函数：malloc()、calloc()、realloc() 返回类型 void*，可以分配给任何指针。强制转换只会使代码混乱，使其更难以理解、调试等。2) 始终检查 (!=NULL) 返回值以确保操作成功。

标签： python c graph-theory

【解决方案1】：

您在 C 中的“散列”操作更像是一个链表，具有线性插入和查找功能。另一方面，Python 的字典是工业级的，具有 O(1) 平均插入和查找（in 运算符）。如果您使用 C 从头开始编写哈希图，则需要将大量理论付诸实践，以便开始在性能方面接近 Python's implementation。

在我看来，最好的办法是尽可能用 C++ 编写代码并使用unordered_map。这是两全其美：所有工作都已为您完成，但您无需在性能上做出妥协。

如果您选择（或坚持使用）C，互联网上有很多资源，但我不愿在此处发布任何链接，因为我无法保证它们的质量。这应该是一项教育工作。

【讨论】：

【解决方案2】：

这两个程序使用具有不同时间复杂度的根本不同的数据结构。 python 程序使用一个字典，它是一个高度调整的哈希表，具有 O(1) 用于查找和删除的摊销性能。

所以 python 程序以 O(number of words) 渐近复杂度运行。

现在，谈论您尝试创建的 C 程序实际上只是一个键值对数组。在此处插入或检索键需要 O(数组大小)，因为您可能会遍历数组直到最后找到匹配项。

如果你做一些数学运算，结果是 O((字数)² )。

C++ 具有名为unordered_map 的内置哈希表实现，如果您在切换到C++ 时没有问题，可以使用它。或者在 SO 上查看这个问题，学习用 C 编写自己的哈希表。What is a hash table and how do you make it in C?

【讨论】：

【解决方案3】：

您的代码的问题在于，尽管有名称，但它不是一个有效的哈希表。您使用非常慢的线性搜索浏览地图。你应该怎么做：

将哈希表大小设置为固定大小。避免任何基于 realloc 的解决方案。
想出一个散列函数来确定表索引。网上应该有很多使用字符串的代码示例。
实现一种存储/检查索引的方法。这可以存储在下一个可用的表索引中，或者通过实现“链接”，其中每个索引都是一个链表等。

【讨论】：