First class

生僻单词

retrieve
Concurrency
extraneous
diamond
tuple

What is database

A database system consists of two components
Database (DB) and Database Management System (DBMS)
The DB contains the data
The DBMS is software that stores, manages and
retrieves the information in the DB

Database Overview

Data Models

database

A database models a real-world enterprise, e.g. CityU

data model

A data model is a formal language for describing data

schema

A schema is a description of a particular collection of data using a particular data model

The most widely used data model is the relational data model

relational data model

The main concept relation, essentially a table with rows and columns
Each relation has a schema, e.g. Enrolled(sid: string, cid:
string, gpa: real)

Data Abstraction

Physical schema

Relations stored as unordered files.
Index on first column of Students.

Conceptual (or logical) schema

Courses(cid: string, cname:string, credits:integer)
Students(sid: string, name: string, login: string, age: integer, gpa:real)
Enrolled(sid:string, cid:string, grade:string)

External (or view) schema

Course_info(cid:string,enrollment:integer)

Data Independence

Physical data independence

Allows the physical schema to be modified without rewriting application programs
e.g. adding or removing an index or moving
a file to a different disk

Logical data independence

Allows the logical schema to be modified without rewriting application programs
e.g. adding an attribute to a relation

Efficient Access

Concurrency Control

Data Integrity

Transactions

A transaction is a sequence of reads and writes to the DB caused by one execution of a user program

Transactions must have the ACID properties:
Atomic: all or nothing
Consistent: the DB must be in a consistent state after
the transaction
Isolated: transactions are performed serially
Durable: the effects of a transaction are permanent

Database Languages

Database Languages

A database language is divided into two parts
Data definition language (DDL)
Data manipulation language (DML)
Structured query language (SQL) is both a DDL and a DML

Data definition language (DDL)

The DDL allows entire databases to be created, and allows integrity constraints to be specified
Domain constraints
Referential integrity
Assertions
Authorization

The DDL is also used to modify existing DB schema
Addition of new tables
Deletion of tables
Addition of attributes

Data manipulation language (DML)

The DML allows users to access or change data in a database

Retrieve information stored in the database
Insert new information into database
Delete information from the database
Modify information stored in the database

Database Users

SDSC5003

SDSC5003 Topics

DB specification and implementation
Database design – the relational model and the ER model
Creating and accessing a database
Relational algebra
Creating and querying a DB using SQL
Query optimization, transaction processing, …
Database application development
Spark/Hadoop (more of data processing systems)
Basic programming model

Exercise

Which of the following plays an important role in representing information about the real world in a database?

  1. The data definition language.
  2. The data manipulation language.
  3. The buffer manager.(some data is retrieved more frequently, if we put these data in the buffer, we can accelerate the retrieving process)
  4. The data model.(correct)

SDSC5003 The Entity Relationship Model

Database Design

Design Overview

5003 Storing and Retrieving Data

Requirements Analysis

Conceptual Database Design

ER Model is used at this stage

Entities

ER Model Basics

Entity

Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes.
(also can be an action like enrollment which is not an object in the real world)
(in ER model rectangle can represent an entity, attributes are represented by ellipse, they are connected by line)

Entity Set

A collection of similar entities. E.g., all employees.

All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!)

Each entity set has a key.

Each attribute has a domain (A set of all possible attribute
values, no value indicated by NULL).
5003 Storing and Retrieving Data

keys

A key is a set of attributes whose values uniquely identify an entity in an entity set

5003 Storing and Retrieving Data

Types of Keys

Superkey

Any set of attributes whose values uniquely identify an entity in an entity set

Candidate key

A minimal superkey, that is a superkey with no extraneous attributes
A relation can have more than one candidate key

Primary key (must be a candidate key)

The candidate key designated by the database designer to refer to tuples (rows) in the relation (table)

Should never (or very rarely) change

Primary Key

5003 Storing and Retrieving Data
If {sin} is the only candidate key, how many superkeys do we have?
C40+C41+C42+C43+C44 = 16

Relationships

ER Model Basics (Contd.)

Relationship

Relationship: Association among two or more entities. E.g., Kris works in Pharmacy department.

Relationship Set

Relationship Set: Collection of similar relationships.
An n-ary relationship set R relates n entity sets E1… En; each relationship in R involves entities e1, …, en
Same entity set could participate in different relationship sets,
or in different “roles” in same set.
It may have descriptive attributes

(in the ER model, we use diamond to represent a relationship)

5003 Storing and Retrieving Data(this picture shows that we can use relationship to represent association between two identical entity sets)

Relation Math

Cartesian or Cross-Products

A tuple <a1,a2,…,an> is just a list with n elements in order
A binary tuple <a,b> is called an ordered pair

In set notation: A x B = {<a,b> | a in A, b in B}.
Example: {1,2,3} x {x,y} = {<1,x>,<1,y>,<2,x>,<2,y>,❤️,x>,❤️,y>}

N-fold cross products

Exercise

5003 Storing and Retrieving Data

The Cross Product

Relationship set = subset of cross product.

Relationships

Relation Types

An employee can work in many departments;
each dept has at most one manager
(an employee is a manager of the department)

5003 Storing and Retrieving Data
(one to many relation type,arrow on the many side)
(department is the key)
Key constraint is represented by an arrow →

Why is the arrow on the many side?
because the many side decide the tuple, the tuple is consist of value of the one side and many slide, their relation’s type is one to many, so the one part always be the same)

Relationship Set Primary Keys

The primary key of a relationship depends on the key constraints in the relationship

Many-to-many – all the non-descriptive attributes of the relationship set

One-to-many – the primary key for the many entity

One-to-one – the primary key of either entity

Exercise 2.2

5003 Storing and Retrieving Data5003 Storing and Retrieving Data

5003 Storing and Retrieving Data(because this case only record the most recent offering, the Cid are unique, so the primary key do not contain the attribute semester)

Participation Constraints

Indicate that each entity in an entity set must be involved in at least one relationship
Participation is said to be either total (there is a constraint) or partial (no constraint)
If there is no participation constraint all the entities may still be involved in the relationship
Total participation is indicated by a double line from the relationship to the entity
Or a thick line

5003 Storing and Retrieving Data5003 Storing and Retrieving Data

5003 Storing and Retrieving Data

Exercise

5003 Storing and Retrieving Data

5003 Storing and Retrieving Data
5003 Storing and Retrieving Data

Extended ER Model

Weak Entities

A weak entity can be identified uniquely only by considering
the primary key of another (owner) entity.
(because many Dependents may have the same pname, so only considering the Employees plus pname, we can identify Dependents)

Owner entity set and weak entity set must participate in a one-to many relationship set (one owner, many weak entities).
(arrow weak)
Weak entity set must have total participation in this identifying
relationship set.

5003 Storing and Retrieving Data

ISA (‘is a’) Hierarchies

If we declare A ISA B, every A entity is also considered to be a B entity.

Non-Binary Relationships

A relationship is does not have to be binary but can include any number of entity sets

Ternary Relationships

Replace Ternary Relationship

We can replace the ternary relationships with binary relationships
But can have only one role on each team
And the relationship pairs are not forced to relate to each other
Bob may have a manager role and Bob may work on the Harmony OS project with a team
But the Harmony OS project may not have a manager
5003 Storing and Retrieving Data

Aggregation

Indicates that a relationship set participates in another relationship set

When should aggregation be used?

When there is a relationship between an entity set and another relationship

Aggregation Example

5003 Storing and Retrieving Data

ER Design Principles

ER Design Principles

Faithfulness

Avoid redundancy

Simplicity

Specify as many constraints as possible

Some constraints cannot be shown in ER diagrams
Example: if a team has more than 10 members, it should have one manager.

Entity vs. Attribute

5003 Storing and Retrieving Data

Entity vs. Relationship!

5003 Storing and Retrieving Data

Binary vs. Ternary Relationships

5003 Storing and Retrieving Data

相关文章: