Ch8 file organizations & indexes(笔记+习题)

Review：

·文件组织是排列文件记录的方式。在我们对不同文件组织的讨论中，我们使用了简单的成本模型，该模型使用磁盘页面I / O的数量作为成本指标。（第8.1节）
·我们比较了3种基本文件组织(heap files, sorted files, and hashed files)的以下操作：扫描、等值查询、范围查询、插入和删除。选择不同的文件组织有不同的性能影响。
·索引是一种能够加速对文件的特定操作的数据结构。操作涉及搜索键，即一系列记录字段(大部分情况下是一个字段)。索引的元素叫做数据条目。数据条目可以是实际数据记录，<search-key,rid>对，或者<search-key,rid-list>对。一个数据记录的文件可以有多个索引，每个索引有不同的搜索键。 (Section 8.3)
·在聚集索引中，文件中记录的顺序与索引中的数据条目相匹配。一个索引被称为稠密的，当且仅当每个搜索键至少对应一个数据条目；否则称为稀疏的。一个索引被称为主索引，当且仅当搜索键包含主键；否则称为辅助索引。一个包含多个字段的搜索键叫做合成键。 (Section 8.4)
·SQL-92不包括对索引结构设置的描述，所以在不同的DBMS中索引关系有不同之处。 (Section 8.5)

Exercises：

Exercise 8.1 What are the main conclusions that you can draw from the discussion of the three file organizations?
三种文件组织各有优缺点，具体使用哪一种要根据实际情况。

Exercise 8.2 Consider a delete specified using an equality condition. What is the cost if no record qualifies? What is the cost if the condition is not on a key?
对于heap files将扫描整个文件,成本是B(D+RC);
对于sorted files只需要找到第一个记录，成本是Dlog2B+Clog2B;
对于hashed files成本是H+D。
如果没有等值记录，hashed files成本将变成H+D+RC。

Exercise 8.3 Which of the three basic file organizations would you choose for a file where the most frequent operations are as follows?
1. Search for records based on a range of field values.
选择sorted files最佳。
2. Perform inserts and scans where the order of records does not matter.
选择heap files最佳。
3. Search for a record based on a particular field value.
选择hashed files最佳。

Exercise 8.4 Explain the difference between each of the following:
1. Primary versus secondary indexes.
主索引是按照索引字段值进行排序的一个有序文件，通常建立在有序文件的基于主码的排序字段上。
辅助索引通常对字段（该字段非排序）的每一个不同值有一个索引项，字段值不唯一，引入中间桶保存指针列表。
主索引是稀疏索引，辅助索引是稠密索引。
一个主文件仅有一个主索引，但可以有多个辅助索引。
可以利用主索引重新组织主文件数据，辅助索引不可以。
2. Dense versus sparse indexes.
在稠密索引中，文件中的每个搜索码值都对应一个索引值。
在稀疏索引中，只为搜索码的某些值建立索引项。
稠密索引比稀疏索引更快的定位一条记录。
稀疏索引所占空间小，并且插入和删除时所需维护的开销也小。
3. Clustered versus unclustered indexes.
聚集索引：数据行的物理顺序与列值（一般是主键的那一列）的逻辑顺序相同。
非聚集索引：该索引中索引的逻辑顺序与磁盘上行的物理存储顺序不同。
一个表中只能拥有一个聚集索引，但可以拥有多个非聚集索引。
If you were about to create an index on a relation, what considerations would guide your choice with respect to each pair of properties listed above?
注意事项有字段值是否唯一、空间和时间上的要求、哪种操作类型需求更大。

Exercise 8.5 Consider a relation stored as a randomly ordered file for which the only index is an unclustered index on a field called sal. If you want to retrieve all records with sal > 20,is using the index always the best alternative? Explain.
使用索引不是最好的选择。在这个例子中，索引是非聚集的，每一个数据条目可以包含一个指向数据页面的rid，将导致与匹配范围查询的数据条目一样多的数据页面I/O。所以使用文件扫描更佳。

Exercise 8.6 If an index contains data records as `data entries', is it clustered or unclustered? Dense or sparse?
根据定义，它是聚集的，稠密的。

Exercise 8.7 Consider Alternatives (1), (2) and (3) for `data entries' in an index, as discussed in Section 8.3.1. Are they all suitable for secondary indexes? Explain.
并不都适合。alternatives (1)的索引把实际数据记录作为条目。这必须有一个主索引，并且没有重复的。它并不适合辅助索引，因为我们不想要数据记录的副本。

Exercise 8.8 Consider the instance of the Students relation shown in Figure 8.7, sorted by age: For the purposes of this question, assume that these tuples are stored in a sorted file in the order shown; the first tuple is in page 1, slot 1; the second tuple is in page 1, slot 2; and so on. Each page can store up to three data records. You can use hpage-id, sloti to identify a tuple.
List the data entries in each of the following indexes. If the order of entries is significant, say so and explain why. If such an index cannot be constructed, say so and explain why.

1. A dense index on age using Alternative (1).
11,12,18,19,19.条目的顺序很重要，因为条目顺序与数据记录的顺序相同。
2. A dense index on age using Alternative (2).
<11,(1,1)>,<12,(1,2)>,<18,(1,3)>,<19,(2,1)>,<19,(2,2)>
3. A dense index on age using Alternative (3).
<11,(1,1)>,<12,(1,2)>,<18,(1,3)>,<19,(2,1),(2,2)>
4. A sparse index on age using Alternative (1).
11,19.
5. A sparse index on age using Alternative (2).
<11,(1,1)>,<19,(2,1)>
6. A sparse index on age using Alternative (3).
<11,(1,1)><19,(2,1),(2,2)>
7. A dense index on gpa using Alternative (1).
1.8,2.0,3.4,3.2,3.8.
8. A dense index on gpa using Alternative (2).
<1.8,(1,1)>,<2.0,(1,2)>,<3.4,(1,3)>,<3.2,(1,4)>,<3.8,(1,5)>
9. A dense index on gpa using Alternative (3).
<1.8,(1,1)>,<2.0,(1,2)>,<3.4,(1,3)>,<3.2,(1,4)>,<3.8,(1,5)>
10. A sparse index on gpa using Alternative (1).
1.8,3.8.
11. A sparse index on gpa using Alternative (2).
<1.8,(1,1)>,<3.8,(1,2)>
12. A sparse index on gpa using Alternative (3).
<1.8,(1,1)>,<3.8,(1,2)>