Jan 6, 2016

Scalable Distributed Systems - Introduction

In recent years couple factors have increasingly become important in design of distributed systems i.e., Scalability & Reliability of the system. Over time I picked few things related to these factors. This post series is an attempt to share my modest knowledge on scalability aspects. 

What is Scalability? 
Simply put it is ability of the system to handle increasing load whether it is addition of users or resources or both. Now typically the scale of a system has 3 dimensions -
  1. The quantity dimension i.e., number of users, resources, objects etc that are part of the system
  2. The distribution dimension i.e., geographical distribution of servers, services, data etc. 
  3. The administrative dimension i.e., the number of organizations, multi-tenancy etc
These dimensions in turn affect a whole host of components that are needed for a distributed system. 

Building a scalable system does not happen by accident. Similarly a distributed system is not automatically a scalable system. So it is important to consider the effects of scale in these dimensions early on. 


Effects of Scale
Now the components that typically get affected by scale in the above 3 dimensions are -
  1. Naming 
  2. Service Registries and Service Discovery
  3. Data at Rest - Management, Storage & Distribution
  4. Data in Motion - Caching & Cache management
  5. Security - Authentication & Authorization
  6. Administration - Deployment & Configuration Management @ Scale
  7. Communication - Group communication
  8. Heterogeneity - Interfaces, Languages & Protocols
  9. User View - Data Organization, Summaries, Visualization @ Scale
  10. Reliability - Availability, Performance, Faults & Fault Tolerance
Techniques 
Now the solutions to above typically involve a common set of techniques like -
  1. Replication (i.e., Services Replication, Data Replication)
  2. Partitioning (i.e., Services Partitioning, Data Partitioning)
  3. Distribution (i.e., Geo Distribution of Services and Data)
  4. Caching (i.e., Cache placement & consistency etc)
  5. Messaging i.e., decouple in space, time & synchronization
  6. Data Organization, Summarization & Visualization 
  7. Automation for Deployment & Config Management
  8. Clocks, Consensus, Coordination and Concurrency Control etc primitives
  9. CAP theorem & Various tradeoffs
Wrap up...
In subsequent posts I plan to cover in more detail each of the above components and techniques. I think a good understanding of these key concepts will be helpful for anyone working on distributed and scalable systems. Feel free to let me know your thoughts in the comments section.