Lane
Database system

Database system

导航

introduction

purpose of database system

  1. 早期database application建立在file systems上有data redundancy,data isolation,difficulty in accessing data等问题。
  2. integrity problems(完整性问题)
    在数据库中integrity constraints can be stated explicitly.
  3. Atomicity problems(原子性问题)
    在数据库中主要通过恢复功能来解决原子性问题。
  4. concurrent access anomalies(并发访问异常)
    对于并发操作若不采取任何限制,可能会导致数据混乱。
  5. Security problems

数据库系统提供以上问题的解决措施。

Characteristics of Databases

  • data persistence
  • convenience in accessing data
  • data integrity
  • data security
  • concurrency control for multiple users
  • failure recovery

view of data

  • 三级抽象databases(Hide the complexities,enhance the adaptation to changes)
  • 主要通过两层map来体现数据库的物理数据独立性和逻辑数据独立性。
    1. view level
    2. logical level
    3. physical level

data Models

most important:Relational Model

Database languages

Data Definition Langauge(数据定义语言)

  • DDL compiler generates a set of table templates stored in a data dictionary
  • Data dictionary contains metadata

Data Manipulation Language(数据操作语言)

  • SQL:widely used non-procedural language(陈述式语言)

    1
    2
    3
    select name
    from instructor
    where instructor.ID = '12345'

Database Design

  • Entity Relationship Model(实体-联系模型)
  • Normalization Theory

Database Engine

Storage manager

chapter2-Relational Model

Basic Structure

  • 定义:Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn.Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai $\in$ Di。也就是说虽然我们可视化展示的是一个列表,但是本质是没有顺序的,因为是一个集合。
  • A1,A2….An are attributes
  • R=(A1,A2,..,An) is a relation schema
  • Database shcema – is the logic structure of the database.

Attributes

  1. The set of allowed values for each attribute is called the domain(域) of the attribute.
  2. 通常值是原子的.
  3. 在数据库中为每一个域都规定了相同的空值.

keys

  • 定义:K$\in$R.K is a superkey(超健) of R if values for K are sufficient to identify a unique tuple of each possible relation r(R).
    • Example: {ID} and {ID,name} are both superkeys of instructor.(但是名字不是,因为存在重名的情况)
  • Superkey K is a candidate key(候选健) if K is minimal.
    • Example: {ID} is a candidate key of instructor.
  • One of the candidate keys is selected to be the primary key(主键).
  • Foreign key(外键) constraint from attribute(s) A of relation r1 to the primary key B of relation r2 states that on any database instance, the value of A for each tuple in r1 must also be the value of B for some tuple in r2. (参照完整性,类似于外键限制,但是并不一定局限于主键)

ex.png
解释:由上面的例子,第三排的course中的dept_name就是外键,因为course中的dept_name和department中的dept_name(是主键)是同一个值,所以course中的dept_name就是外键。
waijian
解释:红色箭头表示参照完整性,黑键则表示外键。

Relational Algebra

  • 输入几个关系,通过某个操作得到一个新的关系。
  • Six basic operations:
    • select(横向选择)$\theta$
    • project(投影某几列,并且结果会去重)$\pi$
    • union(对两个关系并起来)
      • 两个关系必须有相同的元数(same number of attributes)
      • attributes的域需要相容
    • set difference -
    • 笛卡尔积(表的乘积) x
    • rename $\rho$ (当对一个关系想做笛卡尔积的时候,可以对其改名再执行笛卡尔积)

Additional Operations

add

  • 自然联结的例子
  • 除法操作
    • 结果里面的元组与除数元组连接一定在被除的关系中

Extended Relational-Algebra-Operations

  • Generalized Projection

  • Aggregate Functions

chapter3-introduction to SQL

Data Definition Langauge

Domain Types in SQL

  • char(n). Fixed length character string, with user-specified length n.(与C语言有差异,因为C语言的字符串的结尾一定为\0,它以字符数组的形式存储,而数据库作为字典直接可以从存储中截取对应长度,不需要额外的标识符)
  • varchar(n). Variable length character strings, with user-specified maximum length n.
  • int. Integer (a finite subset of the integers that is machine-dependent).
  • smallint. Small integer (a machine-dependent subset of the integer domain type).
  • numeric(p,d). Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.
    → number(3,1) allows 44.5 to be store exactly, but neither 444.5 or 0.32
  • real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision.
  • float(n). Floating point number, with user-specified precision of at least n digits.

Built-in Data Types in SQL

  • date
  • time
  • timestamp(data plus time of day)
  • interval(period of time)

Create Table Construct

1
2
3
4
5
6
7
create table instructor (
ID char(5),
name varchar(20) not null,
dept_name varchar(20),
salary numeric(8,2))
insert into instructor values (‘10211’, ’Smith’, ’Biology’, 66000);
insert into instructor values (‘10211’, null, ’Biology’, 66000);//其实这个插入有误,因为已经定义了不能为空
  • 再研究key
  • not null
  • primary key(A1,…,An)
  • foreign key(Am,…An) references r(对于被引用的表如果发生delete或者update,可以进行这四个操作:cascade(那个表的值怎样变,此表的值就怎么变), restrict(限制不变), set null, set default)
1
2
3
4
5
6
7
8
9
Declare ID as the primary key for instructor

create table instructor (
ID char(5),
name varchar(20) not null,
dept_name varchar(20),
salary numeric(8,2),
primary key (ID),
foreign key (dept_name) references department)

Basic Query Structure

The select clause

  • SQL 大小写不分的
  • find the names of all instructors
    • select name
    • from instructor

The from clause

  • The from clause lists the relations involved in the query
    • Corresponds to the Cartesian product operation of the relational algebra

Natural join example

1
2
3
4
5
6
select name, course_id
from instructor, teaches
where instructor.ID = teaches.ID;

select name, course_id
from instructor natural join teaches;

string operations

Ordering the display of tuples

1
2
3
4
5
6
select distinct name
from instructor
order by name desc/asc
我们也可以使用limit来得到选择结果
limit offset,row_count
limit row_count

Aggregate Functions

  • 聚合函数的使用
    • 直接使用聚合函数:适用于对整个表或满足某些条件的数据进行聚合操作。
    • 结合 GROUP BY 使用聚合函数:适用于按某个字段分组后对每组分别进行聚合操作。
    • 因此,是否需要 GROUP BY 取决于具体的业务需求。如果只是对整体数据进行统计,无需分组;如果需要按某些字段分组统计,则必须使用 GROUP BY。
1
2
3
4
5
6
7
8
9
10
11
12
Find the average salary of instructors in the Computer Science department
select avg (salary)
from instructor
where dept_name= ’Comp. Sci.’;
Find the total number of instructors who teach a course in the
Spring 2010 semester
select count (distinct ID)
from teaches
where semester = ’Spring’ and year = 2010
Find the number of tuples in the course relation
select count (*)
from course;
group by clause
  • 当我们使用group by的时候是一个聚合操作,所以最终的的select的结果也是以组的形式存在
  • ex
    group
having clause
1
2
3
4
5
6
7
8
9
predicates in the having clause are applied after the 
formation of groups whereas predicates in the where
clause are applied before forming groups
select dept_name, count (*) as cnt
from instructor
where salary >=100000
group by dept_name
having count (*) > 10
order by cnt;

Nested Subqueries

Set membership

  • 通过in和not in来实现嵌套操作
1
2
3
4
5
6
7
8
9
Find courses offered in Fall 2009 and in Spring 2010


select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and
course_id in (select course_id
from section
where semester = ’Spring’ and year= 2010);

set comparison

  • 利用some和all与给定集合进行比较
1
2
3
4
5
6
7
8
9
10
11
select name
from instructor
where salary > some (select salary
from instructor
where dept_name = ’Biology’);

select name
from instructor
where salary > all (select salary
from instructor
where dept_name = ’Biology’);

test for empty relations

  • 利用exists和not exists
1
2
3
4
5
6
7
8
9
Yet another way of specifying the query “Find all courses taught 
in both the Fall 2009 semester and in the Spring 2010 semester”
select course_id
from section as S
where semester = ’Fall’ and year= 2009 and
exists (select *
from section as T
where semester = ’Spring’ and year= 2010
and S.course_id= T.course_id);

Test for

Modification of the Database

chapter4-intermediate SQL

joined relations

  • join types
    • inner join(默认的笛卡尔积均是inner join)
    • left outer join(from后面的表的所有列均保留,如果没有成功连接上,则置为NULL)
    • right outer join(join后面的表的所有列均保留)
    • full outer join
  • join conditions
    • natural join(自然连接)
    • on condition(条件连接)
    • using(A1,…,An)(等值连接)
  • example
    • 统计每个老师教的课程(但是可能包含没有课程的老师)
      • select id,count(course_id)
      • from instructor natural left outer join teaches
      • group by id;

SQL Data Types and schemas

User-defined types

  • create type construct in SQL creates user-defined type
    • create type Dollars as numeric(12,2) final

domains

  • domain is a user-defined type that specifies a set of values and a set of constraints on the values in the set.domains相比于types,domain可以定义一些约束,比如长度,范围,正则表达式等。
    • craete domain person_name char(20) not null

integrity constraints

Not NULL and Unique constraints

the check clause

views

  • A view provides a mechanism to hide certain data
    from the view of certain users.(虚拟的表,没有实际存在)
  • 好处:简化复杂性,适应能力增加,提供安全权限管理功能。

example views

  • a view of instructors without their salary
    • create view faculty as
      select ID, name, dept_name
      from instructor;
    • 当我们创建好这个view以后,我们可以把这个view当作实体的表进行表的各种操作

views defined using other views

  • 在定义view的基础上,可以再定义一个view
  • create view physics_fall_2009 as
    select course.course_id, sec_id, building, room_number
    from course, section
    where course.course_id = section.course_id
    and course.dept_name = ’Physics’
    and section.semester = ’Fall’
    and section.year = ’2009’;
  • create view physics_fall_2009_watson as
    select course_id, room_number
    from physics_fall_2009
    where building= ’Watson’;
  • 这样的操作其实我们可以在一个语句中实现
    select course_id, room_number
    from physics_fall_2009(已经创建的view)
    where building= ’Watson’;

    select course_id, room_number
    from (select course.course_id, building, room_number
    from course, section
    where course.course_id = section.course_id
    and course.dept_name = ’Physics’
    and section.semester = ’Fall’
    and section.year = ’2009’)(把view创建的语句嵌套在from中)
    where building= ’Watson’;

    select course_id, room_number
    from course, section
    where course.course_id = section.course_id
    and course.dept_name = ’Physics’
    and section.semester = ’Fall’
    and section.year = ’2009’
    and building= ’Watson’;(sql本质运行的一种语句)

update of a view

  • view相当于只是打开一扇窗,所以对view的修改,实际上是对view的表进行修改。当然孙老师说,为了提高查询的高效性,有些时候确实会创建一个临时表(materialized view)。
  • example
    insert into faculty values (’30765’, ’Green’, ’Music’);
    insert into instructor values (’30765’, ’Green’, ’Music’, null);
  • 但是有些时候view的update不能被翻译为对原始表的独立的操作,因为这个view选取的元素可能不是超键。

view and logical data dependence

  • 当我们底层的物理数据发生变化,比如一个表分成了两个表,因为我们用户层面可能用到原来的表,那么我们可以利用现在的两个表生成一个代表原来表的view,对于用户层面没有任何变化,但是底层逻辑发生改变。

Indexes

transactions(事务)

  • 在数据库系统中,事务是指一组操作的集合,这些操作要么全部执行成功,要么全部不执行,以确保数据的一致性和完整性。

transaction properties

  • 原子性(Atomicity):事务是一个不可分割的工作单位,事务中的所有操作要么都做,要么都不做。
  • 一致性(Consistency):事务必须使数据库从一个一致状态变换到另一个一致状态。如果事务执行成功,则数据库处于一致状态;如果事务执行失败,则数据库应恢复到事务开始前的状态。
  • 隔离性(Isolation):多个事务并发执行时,一个事务的执行不应影响其他事务的执行结果。每个事务都应该独立运行,就好像它是唯一正在运行的事务一样。
  • 持久性(Durability):一旦事务提交,它对数据库的更改应该是永久性的,即使系统发生故障。

transaction example

SET AUTOCOMMIT=0;
UPDATE account SET balance=balance -100 WHERE ano=‘1001’;
UPDATE account SET balance=balance+100 WHERE ano=‘1002’;
COMMIT;(上一条事务结束,下一个事务开始)
UPDATE account SET balance=balance -200 WHERE ano=‘1003’;
UPDATE account SET balance=balance+200 WHERE ano=‘1004’;
COMMIT;
UPDATE account SET balance=balance+balance*2.5%;
COMMIT;

transaction operation

  • BEGIN TRANSACTION; 或 START TRANSACTION;:显式地开始一个新的事务。
  • COMMIT;:提交当前事务,使其所有更改永久生效。
  • ROLLBACK;:回滚当前事务,撤销所有未提交的更改,是数据库恢复到事务开始之前的状态。
  • SAVEPOINT savepoint_name;:设置一个保存点,可以在部分回滚时使用。
  • ROLLBACK TO savepoint_name;:回滚到指定的保存点,而不是整个事务。

Autorization(授权)

authorization specification in SQL

  • the grant statement is used to confer authorization:grant (privilege list) on (relation name or view name) to (user list)
  • user list:
    • user-id
    • public,which allows all valid the privilege granted
    • a role(more on this later)

privileges in SQL

grant select on instructor to U1, U2, U3
grant select on department to public
grant update (budget) on department to U1,U2
grant all privileges on department to U1

revoking authorization in SQL

  • the revoke statement is used to revoke authorization:revoke (privilege list) on (relation name or view name) from (user list)

chapter5-advanced SQL

Accessing SQL from a programming language

  • two approaches to access database
    • API
    • Embedded SQL

API

  • ODBC works with C,C++,C#
  • JDBC works with Java
    • Open a connection
    • Create a “statement” object
    • Execute queries using the Statement object to send queries and fetch results
    • Exception mechanism to handle errors
  • Embedded SQL in C
  • SQLC-embedded SQL in Java
  • JPA-OR mapping of Java

JDBC

JDBC code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
//注意:stmt为创建的对象
//Update to database
try {
stmt.executeUpdate(
"insert into instructor values(’77987’, ’Kim’, ’Physics’, 98000)");
} catch (SQLException sqle)
{
System.out.println("Could not insert tuple. " + sqle);
}

//Execute query and fetch and print results
//注意我们返回的是一个集合,是无序的
ResultSet rset = stmt.executeQuery(
"select dept_name, avg (salary)
from instructor
group by dept_name");
while (rset.next()) {
System.out.println(rset.getString("dept_name") + " " +
rset.getFloat(2));
}
Prepared Statement
  • 先生成,类型未知,再利用set语句和update语句进行更新
  • prepared statement也可以避免SQL注入攻击,因为我们提前进行编译,就会防止产生SQL的歧义。
1
2
3
4
5
6
7
PreparedStatement pStmt = conn.prepareStatement(
"insert into instructor values(?,?,?,?)");
pStmt.setString(1, "88877"); pStmt.setString(2, "Perry");
pStmt.setString(3, "Finance"); pStmt.setInt(4, 125000);
pStmt.executeUpdate();
pStmt.setString(1, "88878");
pStmt.executeUpdate();
Metadata
  • JDBC可以利用getMetaData()方法来获取数据库的元数据,比如列名,列类型,列数目等。

ODBC(open database connectivity)

ODBC Code
  • 完全通过函数调用实现
1
2
3
4
5
6
7
8
9
10
11
12
13
int ODBCexample()
{
RETCODE error;
HENV env; /* environment */
HDBC conn; /* database connection */
SQLAllocEnv(&env);
SQLAllocConnect(env, &conn);
SQLConnect(conn,"db.yale.edu", SQL_NTS, "avi", SQL_NTS, "avipasswd", SQL_NTS);
{ …. Do actual work … }
SQLDisconnect(conn);
SQLFreeConnect(conn);
SQLFreeEnv(env);
}

main body

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
char deptname[80];
float salary;
int lenOut1, lenOut2;
HSTMT stmt;
char * sqlquery = "select dept_name, sum (salary)
from instructor
group by dept_name";
SQLAllocStmt(conn, &stmt);
error = SQLExecDirect(stmt, sqlquery, SQL_NTS);
if (error == SQL SUCCESS) {
//将数据库中的列与C语言中的变量绑定
SQLBindCol(stmt, 1, SQL_C_CHAR, deptname , 80, &lenOut1);
SQLBindCol(stmt, 2, SQL_C_FLOAT, &salary, 0 , &lenOut2);
while (SQLFetch(stmt) == SQL_SUCCESS) {
printf (" %s %g\n", deptname, salary);
}
}
SQLFreeStmt(stmt, SQL_DROP)
ODBC prepared statements

Embedded SQL

procedural constructs in SQL

SQL functions

  • 定义函数后再调用
1
2
3
4
5
6
7
8
9
10
11
12
create function dept_count (dept_name varchar(20))
returns integer
begin
declare d_count integer;
select count (* ) into d_count
from instructor
where instructor.dept_name = dept_name
return d_count;
end
select dept_name, budget
from department
where dept_count (dept_name ) > 1

SQL procedures

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
The dept_count function could instead be written as procedure:

create procedure dept_count_proc (in dept_name varchar(20),
out d_count integer)
begin
select count(*) into d_count
from instructor
where instructor.dept_name = dept_count_proc.dept_name
end

Procedures can be invoked either from an SQL procedure or from embedded SQL, using the call statement.

declare d_count integer;
call dept_count_proc( ‘Physics’, d_count);

Procedures and functions can be invoked also from dynamic SQL

Triggers

  • A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
  • ECA rule
    • Event: A modification to the database(insert,delete,update)
    • Condition: A condition that must be satisfied for the trigger to execute
    • Action: An action to be performed when the condition is satisfied
Trigger example
1
2
3
4
5
6
7
8
9
10
11
12
13
"记录大额交易的表"
account_log(account, amount, datetime)
"记录大额交易的触发器"
create trigger account_trigger after update of account on balance
referencing new row as nrow
referencing old row as orow
for each row
when nrow.balance - orow.balance > =200000 or
orow.balance -nrow.balance >=50000
begin
insert into account_log values (nrow.account-number,
nrow.balance-orow.balance , current_time() )
end
Statement Level Triggers
本文总阅读量
本文作者:Lane
本文链接:https://lakerswillwin.github.io/2025/02/17/database/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可