Development details of Hashi color quiz entertainment game club system

InfoQ 2022-06-14 14:54:26 阅读数:362

Uniformity  Hash  The algorithm details
Blockchain hash algorithm quiz game development ,hkkf5566, Kejiawei core , Entertainment platform customization , Game club system development
Algorithm principle
The consistency hash algorithm  1997  Proposed by MIT in , It's a special hash algorithm , When removing or adding a server , Be able to change the mapping relationship between existing service requests and processing request servers as little as possible ;
Consistent hashing solves the problem of simple hashing algorithms in distributed hashes (Distributed Hash Table,DHT) Dynamic scaling and other problems in ;
Uniformity  hash  The algorithm is essentially a modular algorithm ;
Hash is a method of compressing data , So as to improve the efficiency of a solution , But because the hash function is limited , Data increase, etc , Hash collision becomes a difficult problem in data compression .
however , It is different from taking the model according to the number of servers , Uniformity  hash  It's a fixed value  2^32  modulus ;
IPv4  The address is  4  Group  8  position  2  Base numbers make up , So use  2^32  You can guarantee every  IP  The address will have a unique mapping ;
① hash  Ring
We can 2^32 A value is abstracted into a ring  ️, The point directly above the circle represents  0, Arrange clockwise , And so on :1、2、3… until 2^32-1, And this is by  2  Of  32  The circle composed of points to the power is collectively referred to as hash Ring ;
②  The server maps to  hash  Ring
When mapping servers , Use hash( The server ip)% 2^32, namely :
Use the server  IP  address  hash  Calculation , Use the hash result to 2^32 modulus , The result must be a  0  To 2^32-1 Integer between ;
And this integer is mapped to  hash  The position on the ring represents a server , In turn node0、node1、node2 Three cache servers are mapped to  hash  On the ring ;
③  object  key  Map to server
In the corresponding  Key  When mapping to a specific server , You need to calculate first  Key  Of  Hash  value :hash(key)% 2^32;
notes : Here  Hash  The function can be mapped to... With the previous calculation server  Hash  The function of the ring is different , As long as the value range and  Hash  The range of the ring is the same ( namely :2^32);
take  Key  Mapping to the server follows the following logic :
From cache objects  key  The position begins , The first server encountered in a clockwise direction , This is the server to which the current object will be cached ;
Suppose we have  "semlinker"、"kakuqo"、"lolo"、"fer"  Four objects , They are abbreviated as  o1、o2、o3  and  o4;
First , Use the hash function to calculate the of this object  hash  value , The range of values is  [0, 2^32-1]:
The mapping relationship of objects in the figure is as follows :
hash(o1) = k1; hash(o2) = k2;
hash(o3) = k3; hash(o4) = k4;
meanwhile  3  Cache servers , Respectively  CS1、CS2  and  CS3:
Then we can see , The mapping relationship between each object and the server is as follows :
K1 => CS1
K4 => CS3
K2 => CS2
K3 => CS1
namely :
The above is consistency  Hash  How it works ;
You can see , Uniformity  Hash  Namely : Put the original single point  Hash  mapping , Transformed into a mapping on a segment of a ring !
Let's take a look at the following server expansion scenarios ;
Server expansion and contraction scenario
①  Fewer servers
hypothesis  CS3  Server failure caused the service to go offline , At this time, it was originally stored in  CS3  Server object  o4, Need to be reassigned to  CS2  The server , Other objects are still stored on the original machine :
At this time, the only data affected is  CS2  and  CS3  Some data between servers !
②  Server increase
If business volume surges , We need to add a server  CS4, After the same  hash  operation , The server ended up in  t1  and  t2  Between servers , The details are shown in the following figure :
here , Only  t1  and  t2  Some objects between servers need to be reassigned ;
In the above example, only  o3  Objects need to be reassigned , That is, it is returned to  CS4  The server ;
As we have said before : If you use a simple mold taking method , When a new server is added, most of the cache may become invalid , After using the consistent hash algorithm , This situation has been greatly improved , Because only a few objects need to be reassigned !
Data skew & Server performance balance problem
elicit questions
In the example given above , Each server is almost evenly distributed to  Hash  On the ring ;
But in the actual scene, it is difficult to choose one  Hash  Function so perfectly hashes each server to  Hash  On the ring ;
here , When the number of server nodes is too small , It is easy to cause data skew due to uneven node distribution ;
As shown in the figure below, most of the cached objects are cached in node-4 Server , This leads to waste of resources of other nodes , Most of the system pressure is concentrated in node-4 Node , Such clusters are very unhealthy :
meanwhile , There's another problem :
Add a new server  CS4  when ,CS4  Only shared  CS1  Server load , The server  CS2  and  CS3  Not because  CS4  Reduce load pressure by adding servers ; If  CS4  The performance of the server is consistent with that of the original server and may even be higher , So this result is not what we expect ;
Virtual node
Answer the above question , We can go through : Virtual nodes are introduced to solve the problem of load imbalance :
That is, each physical server is a group of virtual servers , Place the virtual server on the hash ring , If you want to determine the server of the object , You need to determine the virtual server of the object first , Then the virtual server determines the physical server ;
As shown in the figure below :
In the picture :o1  and  o2  Representing objects ,v1 ~ v6  Represents a virtual server ,s1 ~ s3  Represents the actual physical server ;
Calculation of virtual nodes
Virtual node  hash  The calculation can usually adopt : Of the corresponding node  IP  Address plus number suffix  hash(  The way ;
for instance ,node-1  node  IP  by, Normal calculation node-1 Of  hash  value :
hash( 2^32
Suppose we give  node-1  Set up three virtual nodes ,node-1#1、node-1#2、node-1#3, Carry them on  hash  Take the mold after :
hash( 2^32
hash( 2^32
hash( 2^32
Be careful :
The more virtual nodes are allocated , Map on  hash  The ring will become more uniform , If there are too few nodes, it is difficult to see the effect ;
The introduction of virtual nodes also adds new problems , To do the mapping between virtual nodes and real nodes , object key-> Virtual node -> Conversion between actual nodes ;
Use scenarios
Uniformity  hash  In distributed system, it should be the preferred algorithm to realize load balancing , Its implementation is flexible , It can be implemented on the client , It can also be implemented on middleware , For example, caching middleware, which is widely used in daily life memcached and redis Clusters are useful to it ;
memcached  The cluster is special , Strictly speaking, it can only be regarded as a pseudo cluster , Because its servers can't communicate , The requested distribution route depends entirely on the client to calculate which server the cache object should fall on , And its routing algorithm uses consistency  hash;
also  redis  In the cluster  hash  The concept of slot , Although the implementation is different , But the thought cannot change without its origin , Consistency after reading this article  hash, You understand  redis  The slot is much easier ;
There are many other application scenarios :
RPC frame Dubbo Used to select service providers
Distributed relational database is divided into database and table : Mapping relationship between data and nodes
LVS Load balancing scheduler
Uniformity  Hash  Algorithm implementation
Let's follow the above description , Use  Golang  Achieve a consistency  Hash  Algorithm , This algorithm has some of the following features :
Uniformity  Hash  The core algorithm ;
Support customization  Hash  Algorithm ;
Support custom number of virtual nodes ;
See... For the specific source code :
Let's start to realize !
Structure 、 Errors and constant definitions
①  Structure definition
First, define the data structure of each cache server :
type Host struct {
 // the host id: ip:port
 Name string
 // the load bound of the host
 LoadBound int64
among :
Name: Cache server's  Ip  Address  +  port , Such as :
LoadBound: The cache server is currently processing “ request ” Number of caches , This field contains the consistency of the load boundary value later  Hash Will be used in ;
secondly , Define consistency  Hash  Structure :
// Consistent is an implementation of consistent-hashing-algorithm
type Consistent struct {
 // the number of replicas
 replicaNum int
 // the total loads of all replicas
 totalLoad int64
 // the hash function for keys
 hashFunc func(key string) uint64
 // the map of virtual nodes to hosts
 hostMap map[string]*Host
 // the map of hashed virtual nodes to host name
 replicaHostMap map[uint64]string
 // the hash ring
 sortedHostsHashSet []uint64
 // the hash ring lock
among :
replicaNum: Indicates that each real cache server is in  Hash  Number of virtual nodes in the ring ;
totalLoad: Total cache for all physical servers “ request ” Count ( This field contains the consistency of the load boundary value later  Hash Will be used in );
hashFunc: Calculation  Hash  Ring mapping and  Key  Mapped hash function ;
hostMap: Corresponding to the physical server name  Host  Structure mapping ;
replicaHostMap:Hash  The mapping of the virtual node in the ring to the real cache server name ;
sortedHostsHashSet:Hash  Ring ;
sync.RWMutex: operation  Hash  The read-write lock used in the ring ;
The general structure is shown above , Let's look at some definitions of constants and errors ;
②  Constant and error definitions
Constants are defined as follows :
const (
 // The format of the host replica name
 hostReplicaFormat = `%s%d`
var (
 // the default number of replicas
 defaultReplicaNum = 10
 // the load bound factor
 // ref:
 loadBoundFactor = 0.25
 // the default Hash function for keys
 defaultHashFunc = func(key string) uint64 {
 out := sha512.Sum512([]byte(key))
 return binary.LittleEndian.Uint64(out[:])
respectively :
defaultReplicaNum: By default , Every real physical server is  Hash  Number of virtual nodes in the ring ;
loadBoundFactor: Load boundary factor ( This field contains the consistency of the load boundary value later  Hash Will be used in );
defaultHashFunc: The default hash function , Here's what I'm using  SHA512  Algorithm , And take unsigned int64, This is similar to the above 0~2^32-1 Somewhat different !
hostReplicaFormat: Virtual node name format , The format of the virtual node here is :%s%d, As mentioned above The format is different , But the truth is the same !
There are also some wrong definitions :
var (
 ErrHostAlreadyExists = errors.New("host already exists")
 ErrHostNotFound = errors.New("host not found")
Indicates that the server has been registered , And cache server not found ;
Let's look at the specific method implementation !
register / Unregister cache server
①  Register cache server
The code for registering the cache server is as follows :
func (c *Consistent) RegisterHost(hostName string) error {
 defer c.Unlock()
 if _, ok := c.hostMap[hostName]; ok {
 return ErrHostAlreadyExists
 c.hostMap[hostName] = &Host{
 Name: hostName,
 LoadBound: 0,
 for i := 0; i < c.replicaNum; i++ {
 hashedIdx := c.hashFunc(fmt.Sprintf(hostReplicaFormat, hostName, i))
 c.replicaHostMap[hashedIdx] = hostName
 c.sortedHostsHashSet = append(c.sortedHostsHashSet, hashedIdx)
 // sort hashes in ascending order
 sort.Slice(c.sortedHostsHashSet, func(i int, j int) bool {
 if c.sortedHostsHashSet[i] < c.sortedHostsHashSet[j] {
 return true
 return false
 return nil