分布式系统课程总结
- Welcome
- What Distributed Systems Are, and Why They Exist
- Read Replication
- Sharding
- Consistent Hashing
- CAP Theorem
- Distributed Transactions
- Distributed Computation Introduction
- Map Reduce
- Synchronization
- Network Time Protocol
- Vector Clocks
- Distributed Consensus: Paxos
- Messaging Introduction
- Apache Kafka
- Zookeeper
Welcome
What is a Distributed System?
A collection of independent computers that appear to its users as one computer.
Three Characteristics
- The computers operate concurrently
- The computer fail independently
- The computer do not share a global clock
个人计算机并不是一个分布式系统,虽然满足前两个条件,但共享了一个全局时钟。
Contents
What Distributed Systems Are, and Why They Exist
作者用咖啡店来类比分布式系统,一个咖啡店很好记住每个客户最中意的饮品信息,并且在客户提出修改时也很方便。但咖啡连锁店面对这些事务时就会很麻烦。
Read Replication
Read Replication 适用于读多写少的情况。
客户提出修改时
问题在于,当最开始的咖啡店变更了客户信息后,也要告知其它咖啡店。但是这个告知过程需要花费时间,就会出现信息不一致的问题(inconsistency)。
Sharding
分片,把客户按照姓名映射成三份,解决写比较多的问题。
问题在于
对 Key-Value 数据库还好,但对关系型数据库要保证有一个可以被分片的键,否则查询起来很麻烦。
Consistent Hashing
复制又出现了不一致的问题
一个强一致性的公式
R+W>N
N:The number of replicas
R:The number of replicas that agree on a read
W: The number of replicas that successfully take a write
一致性哈希的主要优点是拓展方便,像分片需要最开始就决定好分成多少片,以后若要修改会很麻烦。
CAP Theorem
Consistency:强一致性
Availability:可用性
Partial torelance:系统的部分容错性
CAP 不能同时满足,一般都要保证 P ,A 和 C 会牺牲一个。
Distributed Transactions
ACID
Atomic:原子性
Consistent:数据库在一致的状态
Isolated:多个事务同时进行时的隔离性
Durable:持久化
Response to Failure
分布式事务上的任何一个环节都可能出现问题
- Write-off
- Retry
- Compensating Action
Why give up atomicity?
Throughput
Distributed Computation Introduction
- Scatter/Gather
- MapReduce
- Hadoop
- Spark
- Storm
Map Reduce
Synchronization
The concept of now in a distributed system is problematic.
- Estimate time: Last writes win
- Network Time Protocol
- Vector time: 告诉次序即可,并不需要告知时间
Network Time Protocol
Vector Clocks
- A means of providing sequence
- Not a means of telling the time
例子可参考 Why vector clock are easy
以后空了再看看这篇 why-cassandra-doesnt-need-vector-clocks
- Cannot be wrong
- Pushes the complexity to the client
Distributed Consensus: Paxos
Consensus requirement
- Termination: Every process decides a value.
- Validity: If all processes propose a value, then all processes decide the same value.
- Integrity: If a process decides a value, then that value must have been proposed by another process.
- Agreement: All process must agree on the same value.
Happy Path
Better Offer
Other Protocols
Raft 更简单
Blockchain 防止撒谎
Messaging Introduction
Means of loosely coupled of two subsystems 解耦
Consumed by subscribers
Created by producers
Organized into topics
Processed by brokers
Usually persistent over a short time
Messaging Problems
- What if a topic gets too big for one computer?
- What if a computer is not reliable enough?
- How strong can we gaurantee delivery?
Apache Kafka
Zookeeper
这两章是个大头,今后要详细学习,就不写笔记了。