Skip to main content

DDIP

Design Data Intensive Application

Why data-intensive?

Data-intensive application if data is its primary challenge.

Compute-intensive if CPU cycles are the bottleneck.

Only architecture of data systems, no deployment, operations, security, management.

Outlines, parts and chapters

Part 1 fundamental ideas, data stored on machine

  • 1 reliability, scalability and maintainability, how we think about it and achieve it?
  • 2 compare different data models and query languages, see their use cases
  • 3 storage engine: how database arrange data on disk?
  • 4 data encoding and schemas

Part 2 Data distributed across multiple machines

  • 5 replication / Availability
  • 6 partitioning/sharding
  • 7 transactions
  • 8 problems in distributed systems
  • 9 consistency and consensus.

Part 3 Derive datasets from other datasets

Applications need to integrate several different databases, caches, indexes.

  • 10 batch processing approach
  • 11 stream processing
  • 12 Put everything together, approaches for building reliable, scalable and maintainable application in the future.