Redisのrehash方式

3897 ワード

hash tableは効率的なデータ構造であり、key-valueストレージで広く用いられており、Redisのdictは実際には典型的なhash table実装である.
rehashはhash tableの大きさが需要を満たすことができず、hash衝突後に必要とされる拡張hash tableの操作であるが、実際には通常の方法は確かに追加のhash tableを確立し、元のhash tableのデータを新しいデータに再入力し、新しいhashテーブルを生成することである.
redisのrehashにはlazy rehashingとactive rehashingの2つの方法が含まれています

lazy rehashing:dictを操作するたびにslotのrehash

を実行する

active rehashing:100 msごとに1 ms時間でrehashを行います.

dict実装では主に以下の構造体が用いられるが,実は典型的なチェーンhashである.
1つのdictには2つのhash tableがあり、dictht構造によって管理され、番号は0と1である.
使用は0番hash tableを優先的に使用し、スペースが足りない場合はdictExpandを呼び出してhash tableを拡張します.この場合、1番hash tableをインクリメンタルrehash用に用意します.rehashが完了したら0番をリリースし、1番を0番に保存します.
rehashidxは、ht[0]にrehashを必要とする次の項目のインデックスであり、rehashを必要としない場合は-1に設定されます.すなわち−1の場合はrehashを行わないことを示す.
iteratorsは、現在のdictの反復器数を記録します.主に反復器がある場合にrehashを回避するためです.反復器がある場合にrehashが値の損失や重複を引き起こす可能性があります.
dicthtのtableは配列+ポインタ形式のhashテーブルであり、sizeテーブルhash配列(バケツ)の大きさであり、usedはhashテーブルの要素個数を表し、この2つの値はrehash、resizeプロセスと密接に関連している.sizemaskはsize-1に等しく、hash値を配列にマッピングしやすいようにします.

typedef struct dictEntry {

void *key;

void *val;

struct dictEntry *next;

} dictEntry;

typedef struct dictht {

dictEntry **table;

unsigned long size;//hash    

unsigned long sizemask;//hash     

unsigned long used;//    

} dictht;

typedef struct dict {

dictType *type;

void *privdata;

dictht ht[2];

int rehashidx; /* rehashing not in progress if rehashidx == -1 */

int iterators; /* number of iterators currently running */

} dict;

typedef struct dictIterator {

dict *d;

int table;

int index;

dictEntry *entry, *nextEntry;

} dictIterator;

いつdictが拡張しますか?
データ挿入時にdictKeyIndexが呼び出され、このメソッドでは_が呼び出されます.dictExpandIfNeededは、dictにrehashが必要かどうかを判断し、dictの要素がバケツの個数より大きい場合、dictExpand拡張hashを呼び出す

/* Expand the hash table if needed */


static int _dictExpandIfNeeded(dict *d)


{


/* If the hash table is empty expand it to the intial size,


* if the table is “full” dobule its size. */


if (dictIsRehashing(d)) return DICT_OK;


if (d->ht[0].size == 0)


return dictExpand(d, DICT_HT_INITIAL_SIZE);


if (d->ht[0].used >= d->ht[0].size && dict_can_resize)


return dictExpand(d, ((d->ht[0].size > d->ht[0].used) ?


d->ht[0].size : d->ht[0].used)*2);


return DICT_OK;


}

dictExpandの作業は主にhashテーブルを初期化し、デフォルトでは2倍(単にバケツの2倍ではない)拡大してht[1]に値を割り当て、その後状態をrehashingに変更し、このdictがrehashingを開始する

拡張プロセスの実行方法
rehashは主にdictRehashで完了します.まず、いつrehashが行われるか見てみましょう.
Active rehashing:serverCronでは、バックグラウンドサブスレッドがない場合はincrementallyRehashが呼び出され、最終的にdictRehashMillisecondsが呼び出されます.incrementallyRehashは時間が長く、rehashの個数も多い.ここでは毎回1 millisecond rehash操作を実行する.rehashが完了していない場合は、次のloopで実行を続行します.

/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */


int dictRehashMilliseconds(dict *d, int ms) {


long long start = timeInMilliseconds();


int rehashes = 0;


while(dictRehash(d,100)) {


rehashes += 100;


if (timeInMilliseconds()-start > ms) break;


}


return rehashes;


}

lazy rehashing:_dictRehashStepでもdictRehashが呼び出され、_dictRehashStepは毎回ht[0]からht[1]までの値をrehashするだけですが、_dictRehashStepは、dictGetRandomKey、dictFind、dictGenericDelete、dictAddによって呼び出されるため、dictが削除されるたびに呼び出されるので、rehashのプロセスが加速するに違いありません.
rehashの作り方を見てみましょう.dictRehashは、rehash n個の要素を増分するたびに、自動的にサイズを調整する際にht[1]のサイズが設定されているため、rehashの主なプロセスはht[0]を遍歴してkeyを取得し、そのkeyをht[1]のバケツのサイズで再rehashし、rehashが完了した後にht[0]をht[1]に向け、ht[1]を空にすることである.この過程でrehashidxは非常に重要であり、前回rehashのht[0]の下付き位置を表す.
redisはdictのrehashをバッチで行い,要求をブロックすることなく,比較的優雅に設計されていることがわかる.
ただし、dictFindを呼び出す場合は、2つのdictテーブルをクエリーする必要がある場合があります.唯一の最適化判定は、keyがht[0]に存在せず、rehashing状態でない場合、速度が空に戻ることができる.rehashing状態の場合、ht[0]に値がない場合はht[1]で検索する必要があります.
dictAddの場合、ステータスがrehashingの場合はht[1]に値を挿入し、そうでない場合ht[0]

[Django]Pinterestの作成(7)

Youtube Clone Coding (8. webpack)