Database Sharding
Table of Contents
Introduction
Organizing high volume data is always need of the hour. Database stores the data as records, and it is important to keep them efficiently. In this article, you will go through the definition of database sharding , advantages and disadvantages, methods of doing a sharding .
Database Sharding
An architecture or design pattern of partitioning the data or records available in database into multiple smaller blocks of data horizontally is known as database sharding . For example, in the below image instead of keeping all the records of data in one single database instance its distributed between into multiple shards.
Advantages
Improved query performance
Database query time is long if all records are in a single database instance. Queries are faster after sharding because it will search for records only search in specific shards.
High reliability
By maintaining the records in shards data reliability increases. It is a single point of failure if all records are in a single database instance. Application fails to fetch data if this single database instance goes down. Maintaining a copy of data across multiple shards an application can always fetch data from either of the shards in case of any database instance failing to operate.
Appropriate distribution of traffic
Managing all records in single database instances, distribution of traffic will not happen. By sharding, multiple shards will be getting traffic while writing and reading records. Hence there is a distribution in traffic by sharding .
Disadvantages
Higher infrastructure cost
Infrastructure cost in maintaining data in multiple shards is high in comparison to traditional way of database architecture.
Higher effort in maintanance
Operational and maintenance effort is higher in managing multiple shards rather than managing them into a single database instance.
Methods Of Sharding
Below listed are the different methods of sharding . It is very important to select a suitable method while sharding the data. Issues like loss of data, slow running queries, high resource usage among shards will happen if sharding is not proper.
- Range Based Sharding
- Hashed Sharding
- Directory Sharding
Range based sharding
In range based sharding database records are arranged by range of values. A shard key is assigned for each set of range values. Database records written or read from database based on the shard key which is in turn assigned based on the range of values.
In the below image alphabets between range of A-M is having shard key as 1 and N-Z as 2 . Shard key allocated based on first letter of employee’s name and corresponding range of values.
Hashed sharding
Database record arranged in rows are assigned with a shard key using a function named as hash function. Hash function collect information of records from each row and distributes data among multiple shards.In hashed sharding,records are evenly distributed.
Directory sharding
In directory sharding , a lookup table maintains information about all database records to physical shards. Records stored in a meaningful way but chances of failures are more in this architecture since it is always dependent on lookup table.
In the below image a look up table is arranging employee records based on their departments. Each department is having a shard ID. Records arranged in multiple shards based on the shard ID.