Storing and Retrieving Data
- First class
- 生僻单词
- What is database
- Database Overview
- Data Models
- Data Abstraction
- Data Independence
- Efficient Access
- Concurrency Control
- Data Integrity
- Transactions
- Database Languages
- SDSC5003
- Exercise
- SDSC5003 The Entity Relationship Model
- Exercise
- Exercise 2.2
- Exercise
- Extended ER Model
First class
生僻单词
retrieve
Concurrency
extraneous
diamond
tuple
What is database
A database system consists of two components
Database (DB) and Database Management System (DBMS)
The DB contains the data
The DBMS is software that stores, manages and
retrieves the information in the DB
Database Overview
Data Models
database
A database models a real-world enterprise, e.g. CityU
data model
A data model is a formal language for describing data
schema
A schema is a description of a particular collection of data using a particular data model
The most widely used data model is the relational data model
relational data model
The main concept relation, essentially a table with rows and columns
Each relation has a schema, e.g. Enrolled(sid: string, cid:
string, gpa: real)
Data Abstraction
Physical schema
Relations stored as unordered files.
Index on first column of Students.
Conceptual (or logical) schema
Courses(cid: string, cname:string, credits:integer)
Students(sid: string, name: string, login: string, age: integer, gpa:real)
Enrolled(sid:string, cid:string, grade:string)
External (or view) schema
Course_info(cid:string,enrollment:integer)
Data Independence
Physical data independence
Allows the physical schema to be modified without rewriting application programs
e.g. adding or removing an index or moving
a file to a different disk
Logical data independence
Allows the logical schema to be modified without rewriting application programs
e.g. adding an attribute to a relation
Efficient Access
Concurrency Control
Data Integrity
Transactions
A transaction is a sequence of reads and writes to the DB caused by one execution of a user program
Transactions must have the ACID properties:
Atomic: all or nothing
Consistent: the DB must be in a consistent state after
the transaction
Isolated: transactions are performed serially
Durable: the effects of a transaction are permanent
Database Languages
Database Languages
A database language is divided into two parts
Data definition language (DDL)
Data manipulation language (DML)
Structured query language (SQL) is both a DDL and a DML
Data definition language (DDL)
The DDL allows entire databases to be created, and allows integrity constraints to be specified
Domain constraints
Referential integrity
Assertions
Authorization
The DDL is also used to modify existing DB schema
Addition of new tables
Deletion of tables
Addition of attributes
Data manipulation language (DML)
The DML allows users to access or change data in a database
Retrieve information stored in the database
Insert new information into database
Delete information from the database
Modify information stored in the database
Database Users
SDSC5003
SDSC5003 Topics
DB specification and implementation
Database design – the relational model and the ER model
Creating and accessing a database
Relational algebra
Creating and querying a DB using SQL
Query optimization, transaction processing, …
Database application development
Spark/Hadoop (more of data processing systems)
Basic programming model
Exercise
Which of the following plays an important role in representing information about the real world in a database?
- The data definition language.
- The data manipulation language.
- The buffer manager.(some data is retrieved more frequently, if we put these data in the buffer, we can accelerate the retrieving process)
- The data model.(correct)
SDSC5003 The Entity Relationship Model
Database Design
Design Overview
Requirements Analysis
Conceptual Database Design
ER Model is used at this stage
Entities
ER Model Basics
Entity
Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes.
(also can be an action like enrollment which is not an object in the real world)
(in ER model rectangle can represent an entity, attributes are represented by ellipse, they are connected by line)
Entity Set
A collection of similar entities. E.g., all employees.
All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!)
Each entity set has a key.
Each attribute has a domain (A set of all possible attribute
values, no value indicated by NULL).
keys
A key is a set of attributes whose values uniquely identify an entity in an entity set
Types of Keys
Superkey
Any set of attributes whose values uniquely identify an entity in an entity set
Candidate key
A minimal superkey, that is a superkey with no extraneous attributes
A relation can have more than one candidate key
Primary key (must be a candidate key)
The candidate key designated by the database designer to refer to tuples (rows) in the relation (table)
Should never (or very rarely) change
Primary Key
If {sin} is the only candidate key, how many superkeys do we have?
C40+C41+C42+C43+C44 = 16
Relationships
ER Model Basics (Contd.)
Relationship
Relationship: Association among two or more entities. E.g., Kris works in Pharmacy department.
Relationship Set
Relationship Set: Collection of similar relationships.
An n-ary relationship set R relates n entity sets E1… En; each relationship in R involves entities e1, …, en
Same entity set could participate in different relationship sets,
or in different “roles” in same set.
It may have descriptive attributes
(in the ER model, we use diamond to represent a relationship)
(this picture shows that we can use relationship to represent association between two identical entity sets)
Relation Math
Cartesian or Cross-Products
A tuple <a1,a2,…,an> is just a list with n elements in order
A binary tuple <a,b> is called an ordered pair
In set notation: A x B = {<a,b> | a in A, b in B}.
Example: {1,2,3} x {x,y} = {<1,x>,<1,y>,<2,x>,<2,y>,❤️,x>,❤️,y>}
N-fold cross products
Exercise
The Cross Product
Relationship set = subset of cross product.
Relationships
Relation Types
An employee can work in many departments;
each dept has at most one manager
(an employee is a manager of the department)
(one to many relation type,arrow on the many side)
(department is the key)
Key constraint is represented by an arrow →
Why is the arrow on the many side?
because the many side decide the tuple, the tuple is consist of value of the one side and many slide, their relation’s type is one to many, so the one part always be the same)
Relationship Set Primary Keys
The primary key of a relationship depends on the key constraints in the relationship
Many-to-many – all the non-descriptive attributes of the relationship set
One-to-many – the primary key for the many entity
One-to-one – the primary key of either entity
Exercise 2.2
(because this case only record the most recent offering, the Cid are unique, so the primary key do not contain the attribute semester)
Participation Constraints
Indicate that each entity in an entity set must be involved in at least one relationship
Participation is said to be either total (there is a constraint) or partial (no constraint)
If there is no participation constraint all the entities may still be involved in the relationship
Total participation is indicated by a double line from the relationship to the entity
Or a thick line
Exercise
Extended ER Model
Weak Entities
A weak entity can be identified uniquely only by considering
the primary key of another (owner) entity.
(because many Dependents may have the same pname, so only considering the Employees plus pname, we can identify Dependents)
Owner entity set and weak entity set must participate in a one-to many relationship set (one owner, many weak entities).
(arrow weak)
Weak entity set must have total participation in this identifying
relationship set.
ISA (‘is a’) Hierarchies
If we declare A ISA B, every A entity is also considered to be a B entity.
Non-Binary Relationships
A relationship is does not have to be binary but can include any number of entity sets
Ternary Relationships
Replace Ternary Relationship
We can replace the ternary relationships with binary relationships
But can have only one role on each team
And the relationship pairs are not forced to relate to each other
Bob may have a manager role and Bob may work on the Harmony OS project with a team
But the Harmony OS project may not have a manager
Aggregation
Indicates that a relationship set participates in another relationship set
When should aggregation be used?
When there is a relationship between an entity set and another relationship
Aggregation Example
ER Design Principles
ER Design Principles
Faithfulness
Avoid redundancy
Simplicity
Specify as many constraints as possible
Some constraints cannot be shown in ER diagrams
Example: if a team has more than 10 members, it should have one manager.