Database Sharding

Database Sharding

Table of Contents

Introduction

Organizing high volume data is always need of the hour. Database stores the data as records, and it is important to keep them efficiently. In this article, you will go through the definition of database sharding , advantages and disadvantages, methods of doing a sharding .

Database Sharding

An architecture or design pattern of partitioning the data or records available in database into multiple smaller blocks of data horizontally is known as database sharding . For example, in the below image instead of keeping all the records of data in one single database instance its distributed between into multiple shards.

Database sharding
Database sharding

Advantages

Improved query performance

Database query time is long if all records are in a single database instance. Queries are faster after sharding because it will search for records only search in specific shards.

High reliability

By maintaining the records in shards data reliability increases. It is a single point of failure if all records are in a single database instance. Application fails to fetch data if this single database instance goes down. Maintaining a copy of data across multiple shards an application can always fetch data from either of the shards in case of any database instance failing to operate.

Appropriate distribution of traffic

Managing all records in single database instances, distribution of traffic will not happen. By sharding, multiple shards will be getting traffic while writing and reading records. Hence there is a distribution in traffic by sharding .

Disadvantages

Higher infrastructure cost

Infrastructure cost in maintaining data in multiple shards is high in comparison to traditional way of database architecture.

Higher effort in maintanance

Operational and maintenance effort is higher in managing multiple shards rather than managing them into a single database instance.

Methods Of Sharding

Below listed are the different methods of sharding . It is very important to select a suitable method while sharding the data. Issues like loss of data, slow running queries, high resource usage among shards will happen if sharding is not proper.

  • Range Based Sharding
  • Hashed Sharding
  • Directory Sharding

Range based sharding

In range based sharding database records are arranged by range of values. A shard key is assigned for each set of range values. Database records written or read from database based on the shard key which is in turn assigned based on the range of values.

In the below image alphabets between range of A-M is having shard key as 1 and N-Z as 2 . Shard key allocated based on first letter of employee’s name and corresponding range of values.

Range Based Sharding

Hashed sharding

Database record arranged in rows are assigned with a shard key using a function named as hash function. Hash function collect information of records from each row and distributes data among multiple shards.In hashed sharding,records are evenly distributed.

Hashed Sharding
Hashed Sharding

Directory sharding

In directory sharding , a lookup table maintains information about all database records to physical shards. Records stored in a meaningful way but chances of failures are more in this architecture since it is always dependent on lookup table.

In the below image a look up table is arranging employee records based on their departments. Each department is having a shard ID. Records arranged in multiple shards based on the shard ID.

Directory Shard
Directory Sharding