【发布时间】:2015-05-21 00:39:59
【问题描述】:
我有一个大的 unicode 代码点描述的静态哈希图。每个哈希值都会导致一个以空结尾的指向元素的指针数组。以下是我的访问函数:
extern CodepointInfo ***codepoint_table;
uint32_t fnv1a(uint32_t input) { ... } // hash function
const CodepointInfo CodepointInfo::get(uint32_t codepoint) {
uint32_t hash = fnv1a(codepoint) % 30000 // Number of buckets of the hashmap
CodepointInfo **bucket = codepoint_table[hash];
for(uint32_t i = 0; bucket[i] != nullptr; i++) {
if(bucket[i]->codepoint == codepoint)
return *(codepoint_table[hash][i]);
}
return {codepoint, "unassigned", GeneralCategory::UNASSIGNED, 0, BidiClass::L, DecompositionType::NONE, nullptr, -1, nullptr, false, 0, 0, 0};
}
现在当我尝试使用它时,我遇到了段错误,所以我开始使用 gdb 调试它并得到以下输出:
Breakpoint 1, nsucs::CodepointInfo::get (codepoint=0) at /home/richard/src/libnsucs/lib/codepoint_info.cc:21
21 uint32_t hash = fnv1a(codepoint) % NSUCS_CODEPOINTTABLE_NUM_BUCKETS;
(gdb) n
22 CodepointInfo **bucket = codepoint_table[hash];
(gdb) n
23 for(uint32_t i = 0; bucket[i] != nullptr; i++) {
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78cbd44 in nsucs::CodepointInfo::get (codepoint=0) at /home/richard/src/libnsucs/lib/codepoint_info.cc:23
23 for(uint32_t i = 0; bucket[i] != nullptr; i++) {
(gdb) print hash
$1 = 18805
(gdb) print codepoint_table[hash]
$2 = (nsucs::CodepointInfo **) 0x7ffff7d61d00 <nsucs::codepoint_table_fragment_18805>
(gdb) print bucket
$3 = (nsucs::CodepointInfo **) 0x0
codepoint_table[hash] 和 bucket 在将一个分配给另一个后不应该相等吗?当我将 bucket 的用法替换为 codepoint_table[hash] 时,它仍然会出现段错误,但 gdb 中的 print codepoint_table[hash][i] 会产生正确的结果。
这里发生了什么?二进制文件根本没有优化。
编辑:
CodepointInfo 结构体的定义:
struct CodepointInfo {
static const CodepointInfo get(uint32_t codepoint);
uint32_t codepoint;
const char *name;
uint32_t general_category;
uint8_t canonical_combining_class;
BidiClass::Enum bidi_class;
DecompositionType::Enum decomposition_type;
uint32_t *decomposition_mapping;
int8_t decimal_value;
const char *numeric_value;
bool bidi_mirrored;
uint32_t simple_uppercase_mapping;
uint32_t simple_lowercase_mapping;
uint32_t simple_titlecase_mapping;
};
【问题讨论】:
-
CodepointInfo 有一些赋值运算符重载?
-
不,没有运算符重载。我将在问题中添加定义。
-
你永远不会改变
bucket,对吧?如果是,请将其声明为CodepointInfo ** const -
另外,分配前
assert(nullptr!=codepoint_table[hash])。 -
广角:是否有其他代码可能会在另一个线程中更改
codepoint_table?