This module will present concepts, architectures and algorithms for data storage, management, and analysis, at a very large scale, especially in distributed settings. The following topics will be covered, each illustrated with a representative system, whose main features will be detailed during lectures:
A strong focus will be given to labs in this class, so that students can gather experience with different existing systems, and understand their respective advantages.
- Introduction to distributed systems (consistency, availability, and the CAP theorem; ACID vs BASE)Massively distributed (cloud-based) filesystems (e.g., HDFS/GFS)Modern distributed computing: MapReduceDistributed NoSQL databases:
- Dynamic Hash Tables (DHTs)Key-value stores“Big Table” - style systemsGraph databases: Neo4J, PregelDistributed triple storesDocument stores: MongoDB
Data analysis tools in the Amazon cloud
A strong focus will be given to labs in this class, so that students can gather experience with different existing systems, and understand their respective advantages.
- Enseignant responsable de l'UE: Fabian Suchanek