[算法]树算法之B树

树算法之 B 树

B 树是一种被设计成专门存储在磁盘上的平衡查找树。因为磁盘的操作速度要大大慢于随机存取存储器，所以在分析B 树的性能时，不仅要看动态集合操作花了多少计算时间，还要看执行了多少次磁盘存储操作。 B 树与红黑树（下一篇介绍）类似，但在降低磁盘I/O 操作次数方面要更好一些。许多数据库系统就使用 B 树或 B 树的变形来存储信息，想象一下一棵每个节点包含 1001 个 key 的高度为 2 的 B 树能容纳多少数据啊，而在内存中我们只存储了一个节点，在需要的时候再从磁盘中读取所需的节点。

B 树红黑树比较：

B 树的节点有很多子女，从几个到几千，而红黑树只有左右两个；一棵含有 n 个节点的B 树与红黑树的高度均为 O(lgn)，只不过 B 树的分支较多，因此高度一般要少于红黑树。

B 树到底是怎样一棵树呢，下面来看定义：

1），每个节点有如下域：
   A），keyNum，节点中存储的关键字的个数。
   B），keyNum 个以非降序次序排列的关键字 key[0 ... keyNum - 1]；
   C），isLeaf，判断是否是叶子节点还是内节点的标志。

2），每个内节点还包含 keyNum + 1 个指向其子女的指针 child[i] (i >= 0 && i <= keyNum)。

3），各个关键字 key[i] 对存储在各子树中的关键字范围加以分隔：即 key[i] 大于等于其左侧子树中的所有关键字，而小于等于其右侧子树中的所有关键字。

4），每个叶节点具有相同的深度，即均为树的高度 h。

5）每一个节点能包含的关键字有一个上限和下限。这些界限可以用一个称作 B 树的最小度数的固定整数 T >= 2 来表示。
A），每个非根的节点必须至少有 T - 1 个关键字。每个非根的内节点至少有 T 个子女。如果树是非空的，则根节点至少包含一个关键字。
B），每个节点可包含至多 2T - 1 个关键字。所以一个内节点至多有 2T 个子女。当一个节点正好有 2T - 1 个关键字时，我们就说这个节点是满的。

T 等于 2 时的B 树是最简单的。这时每个内节点有 2 个或 3 个或 4 个子女，这种 B 树也被称作为 2-3-4 树。当然在实际应用中 T 的取值比这个大得多。

下面我实现了 T 默认等于 2 的 B 树：

// 定义 B 树的最小度数
[算法]树算法之B树

// 每个节点中关键字的最大数目 BTree_N = 2 * BTree_T - 1
[算法]树算法之B树

#define BTree_T 2
[算法]树算法之B树

#define BTree_N (BTree_T * 2 - 1)
[算法]树算法之B树

pos);

下面来看具体的接口实现：

#define max(a, b) (((a) > (b)) ? (a) : (b))
[算法]树算法之B树

//#define DEBUG_TREE
[算法]树算法之B树

#ifdef DEBUG_TREE
[算法]树算法之B树

#define debug_print(fmt, [算法]树算法之B树

) printf(fmt, ## __VA_ARGS__)
[算法]树算法之B树

#else

#define debug_print(fmt, [算法]树算法之B树

)

#endif

// 模拟向磁盘写入节点
[算法]树算法之B树

void disk_write(BTNode* node)

测试代码：

//==================================================================
//                    测试 B 树
//==================================================================
void test_BTree_search(BTree tree, int key)
{
    int pos = -1;
    BTNode*    node = BTree_search(tree, key, &pos);
    if (node) {
        printf("在%s节点（包含 %d 个关键字）中找到关键字 %c，其索引为 %d/n",
            node->isLeaf ? "叶子" : "非叶子",
            node->keynum, key, pos);
    }
    else {
        printf("在树中找不到关键字 %c/n", key);
    }
}

void test_BTree_remove(BTree* tree, int key)
{
    printf("/n移除关键字 %c /n", key);
    BTree_remove(tree, key);
    BTree_print(*tree);
    printf("/n");
}

void test_btree()
{
    const int length = 10;
    int array[length] = {
        'G', 'M', 'P', 'X', 'A', 'C', 'D', 'E', 'J', 'K',
        //'N', 'O', 'R', 'S', 'T', 'U', 'V', 'Y', 'Z', 'F'
    };

    BTree tree = NULL;
    BTNode* node = NULL;
    int pos = -1;
    int key1 = 'R';        // in the tree.
    int key2 = 'B';        // not in the tree.

    // 创建
    BTree_create(&tree, array, length);

    printf("/n=== 创建 B- 树 ===/n");
    BTree_print(tree);
    printf("/n");

    // 查找
    test_BTree_search(tree, key1);
    printf("/n");
    test_BTree_search(tree, key2);

    // 插入关键字
    printf("/n插入关键字 %c /n", key2);
    BTree_insert(&tree, key2);
    BTree_print(tree);
    printf("/n");

    test_BTree_search(tree, key2);

    // 移除关键字
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'M';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'E';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'G';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'A';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'D';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'K';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'P';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'J';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'C';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    key2 = 'X';
    test_BTree_remove(&tree, key2);
    test_BTree_search(tree, key2);

    // 销毁
    BTree_destory(&tree);
}

测试结果：

=== 创建 B 树 ===

第 1 层， 1 node : E

第 2 层， 1 node : C

第 3 层， 1 node : A

第 3 层， 1 node : D

第 2 层， 1 node : M

第 3 层， 3 node : G J K

第 3 层， 2 node : P X

从磁盘读取节点

在树中找不到关键字 R

从磁盘读取节点

在树中找不到关键字 B

插入关键字 B

从磁盘读取节点

向磁盘写入节点

第 1 层， 1 node : E

第 2 层， 1 node : C

第 3 层， 2 node : A B

第 3 层， 1 node : D

第 2 层， 1 node : M

第 3 层， 3 node : G J K

第 3 层， 2 node : P X

从磁盘读取节点

在叶子节点（包含 2 个关键字）中找到关键字 B，其索引为 1

.......

参考资料：

1，《算法导论》