Redisキー期限切れポリシーソース解析(不活性削除+定期サンプリング削除)

24165 ワード

前言
概念的に言えば、redisは不活性削除と定期ランダム削除の戦略を採用して期限切れのキーのメモリを解放することを知っているだけで、しかしその中の原理の実現を研究したことがなくて、今日ソースコードを追って、不活性削除と定期サンプリング削除の実現過程を分析しました.
不活性削除
server.cファイルの中で対応するコマンド関数を探して、対応する方法はgetCommandです

struct redisCommand redisCommandTable[] = {
    {"module",moduleCommand,-2,
     "admin no-script",
     0,NULL,0,0,0,0,0,0},

    {"get",getCommand,2,
     "read-only fast @string",
     0,NULL,1,1,1,0,0,0},
    ......
}

getCommand定義はt_string.cファイルで、このメソッドはgetGenericCommandメソッドを呼び出します.

void getCommand(client *c) {
    getGenericCommand(c);
}

int getGenericCommand(client *c) {
    robj *o; 

    if ((o = lookupKeyReadOrReply(c,c->argv[1],shared.null[c->resp])) == NULL)
        return C_OK;

    if (o->type != OBJ_STRING) {
        addReply(c,shared.wrongtypeerr);
        return C_ERR;
    } else {
        addReplyBulk(c,o);
        return C_OK;
    }   
}

lookupKeyReadOrReplyメソッドを呼び出し続け、指定キーが存在するかどうかを検索します.最初のパラメータは現在のclientオブジェクトで、2番目のパラメータはclientオブジェクトの2番目のパラメータです.なぜ2番目のパラメータなのか、これは理解しやすいです.もし私たちのコマンドがget whyであれば、最初のパラメータは現在のコマンドgetで、2番目のパラメータはwhyで、つまり現在のkeyです.serverを見てみましょう.hのclient構造体ソース:

typedef struct client {
    ......
    int argc;               /* Num of arguments of current command. */
    robj **argv;            /* Arguments of current command. */
    ......
}

lookupKeyReadOrReplyメソッドでlookupKeyReadメソッドを呼び出し、lookupKeyReadメソッドでlookupKeyReadWithFlagsメソッドを呼び出します.このメソッドではexpireIfNeededメソッドを呼び出して期限切れを検出します.keyが期限切れになった場合、この関数は1を返します.現在のredisサービスのkeyspaceを更新します.missesカウント(運行維持指標のモニタリングについては別の文章を見ることができます).

/* Lookup a key for read operations, or return NULL if the key is not found
 * in the specified DB.
 *
 * As a side effect of calling this function:
 * 1. A key gets expired if it reached it's TTL.
 * 2. The key last access time is updated.
 * 3. The global keys hits/misses stats are updated (reported in INFO).
 * 4. If keyspace notifications are enabled, a "keymiss" notification is fired.
 *
 * This API should not be used when we write to the key after obtaining
 * the object linked to the key, but only for read only operations.
 *
 * Flags change the behavior of this command:
 *
 *  LOOKUP_NONE (or zero): no special flags are passed.
 *  LOOKUP_NOTOUCH: don't alter the last access time of the key.
 *
 * Note: this function also returns NULL if the key is logically expired
 * but still existing, in case this is a slave, since this API is called only
 * for read operations. Even if the key expiry is master-driven, we can
 * correctly report a key is expired on slaves even if the master is lagging
 * expiring our key via DELs in the replication link. */
robj *lookupKeyReadWithFlags(redisDb *db, robj *key, int flags) {
    robj *val;

    if (expireIfNeeded(db,key) == 1) {
        /* Key expired. If we are in the context of a master, expireIfNeeded()
         * returns 0 only when the key does not exist at all, so it's safe
         * to return NULL ASAP. */
        if (server.masterhost == NULL) {
            server.stat_keyspace_misses++;
            notifyKeyspaceEvent(NOTIFY_KEY_MISS, "keymiss", key, db->id);
            return NULL;
        }

        /* However if we are in the context of a slave, expireIfNeeded() will
         * not really try to expire the key, it only returns information
         * about the "logical" status of the key: key expiring is up to the
         * master in order to have a consistent view of master's data set.
         *
         * However, if the command caller is not the master, and as additional
         * safety measure, the command invoked is a read-only command, we can
         * safely return NULL here, and provide a more consistent behavior
         * to clients accessign expired values in a read-only fashion, that
         * will say the key as non existing.
         *
         * Notably this covers GETs when slaves are used to scale reads. */
        if (server.current_client &&
            server.current_client != server.master &&
            server.current_client->cmd &&
            server.current_client->cmd->flags & CMD_READONLY)
        {
            server.stat_keyspace_misses++;
            notifyKeyspaceEvent(NOTIFY_KEY_MISS, "keymiss", key, db->id);
            return NULL;
        }
    }
    val = lookupKey(db,key,flags);
    if (val == NULL) {
        server.stat_keyspace_misses++;
        notifyKeyspaceEvent(NOTIFY_KEY_MISS, "keymiss", key, db->id);
    }
    else
        server.stat_keyspace_hits++;
    return val;
}

expireIfNeededメソッドでkeyIsExpiredメソッドを呼び出して期限切れかどうかを判断し、期限切れでない場合は0を返し、現在メインサーバでない場合はサーバから変更が許されないため、直接1を返し、メインサーバであればexpired_を更新するkeysカウント.このときserver.lazyfree_lazy_expireは、サーバが定期的にランダムに削除しているかどうかを確認し、定期的に不活性な削除が実行されていない場合は、dbSyncDeleteを呼び出して現在のkeyを同期的に削除し、そうでない場合はdbAsyncDeleteを呼び出して現在のkeyを非同期的に削除します.

int expireIfNeeded(redisDb *db, robj *key) {
    if (!keyIsExpired(db,key)) return 0;

    if (server.masterhost != NULL) return 1;

    server.stat_expiredkeys++;
    propagateExpire(db,key,server.lazyfree_lazy_expire);
    notifyKeyspaceEvent(NOTIFY_EXPIRED,
        "expired",key,db->id);
    int retval = server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :
                                               dbSyncDelete(db,key);
    if (retval) signalModifiedKey(db,key);
    return retval;
}

非同期実行はbioを呼び出す.cファイルのbioCreateBackgroundJobメソッドはバックグラウンドjob(Bio=Background I/O)を作成します.バックグラウンドjobはチェーンテーブル構造に保存され、新しく追加されたものはすべて末尾に配置されます.

void bioCreateBackgroundJob(int type, void *arg1, void *arg2, void *arg3) {
    struct bio_job *job = zmalloc(sizeof(*job));

    job->time = time(NULL);
    job->arg1 = arg1;
    job->arg2 = arg2;
    job->arg3 = arg3;
    pthread_mutex_lock(&bio_mutex[type]);
    listAddNodeTail(bio_jobs[type],job);  //   job      
    bio_pending[type]++;
    pthread_cond_signal(&bio_newjob_cond[type]);
    pthread_mutex_unlock(&bio_mutex[type]);
}

これで、keyの不活性削除の全体的な流れは以下のように紹介されます.

があるkeyにアクセスするときは、まずkeyが存在するかどうかを検出し、存在する場合は下に実行を続行しないと、空の

に直接戻る.

keyが期限切れであるかどうかを確認し、期限切れでない場合は対応するvalue

に直接戻る.

更新keyのmissカウントが期限切れになった場合

server.lazyfree_lazy_expireバックグラウンドラインが定期的にランダムに削除されているかどうかを確認し、直接同期削除されていない場合は

を終了する.

定期的にタスクを削除している場合、bioメソッドを呼び出してバックグラウンドjobを作成し、bioのjobチェーンテーブルの末尾に配置し、タイミングタスクが削除を実行するのを待つ

.

定期的なサンプリング削除
いくつかのredisServerのグローバル変数と変動ルールを明確にします.

struct redisServer {
    ...
    int active_expire_effort;       /* From 1 (default) to 10, active effort. */
    double stat_expired_stale_perc; /* Percentage of keys probably expired */
    ...
}

active_expire_effort:1から10までは、直訳すると有効期間の活性度です.1サイクルでは、時間が経つにつれて、期限切れのキー数が徐々に減少することが多いので、サンプリング基数の変更、タスク間隔の回収、サイクルCPU使用率(時間の制御)、期限切れパーセントバルブ値の制御に使用します.

stat_expired_stale_perc:定期的に削除されるたびに、現在期限切れに淘汰されたキーがサンプリングキーの総数に占める割合が現在の値より大きい場合は、サンプリング削除を繰り返し、タイミングタスクの終了ごとに式に基づいて動的に調整されます.

/* Try to expire a few timed out keys. The algorithm used is adaptive and
 * will use few CPU cycles if there are few expiring keys, otherwise
 * it will get more aggressive to avoid that too much memory is used by
 * keys that can be removed from the keyspace.
 *
 * Every expire cycle tests multiple databases: the next call will start
 * again from the next db, with the exception of exists for time limit: in that
 * case we restart again from the last database we were processing. Anyway
 * no more than CRON_DBS_PER_CALL databases are tested at every iteration.
 *
 * The function can perform more or less work, depending on the "type"
 * argument. It can execute a "fast cycle" or a "slow cycle". The slow
 * cycle is the main way we collect expired cycles: this happens with
 * the "server.hz" frequency (usually 10 hertz).
 *
 * However the slow cycle can exit for timeout, since it used too much time.
 * For this reason the function is also invoked to perform a fast cycle
 * at every event loop cycle, in the beforeSleep() function. The fast cycle
 * will try to perform less work, but will do it much more often.
 *
 * The following are the details of the two expire cycles and their stop
 * conditions:
 *
 * If type is ACTIVE_EXPIRE_CYCLE_FAST the function will try to run a
 * "fast" expire cycle that takes no longer than EXPIRE_FAST_CYCLE_DURATION
 * microseconds, and is not repeated again before the same amount of time.
 * The cycle will also refuse to run at all if the latest slow cycle did not
 * terminate because of a time limit condition.
 *
 * If type is ACTIVE_EXPIRE_CYCLE_SLOW, that normal expire cycle is
 * executed, where the time limit is a percentage of the REDIS_HZ period
 * as specified by the ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC define. In the
 * fast cycle, the check of every database is interrupted once the number
 * of already expired keys in the database is estimated to be lower than
 * a given percentage, in order to avoid doing too much work to gain too
 * little memory.
 *
 * The configured expire "effort" will modify the baseline parameters in
 * order to do more work in both the fast and slow expire cycles.
 */

#define ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP 20 /* Keys for each DB loop. */
#define ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1000 /* Microseconds. */
#define ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC 25 /* Max % of CPU to use. */
#define ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE 10 /* % of stale keys after which
                                                   we do extra efforts. */
void activeExpireCycle(int type) {
    /* Adjust the running parameters according to the configured expire
     * effort. The default effort is 1, and the maximum configurable effort
     * is 10. */
    unsigned long
    effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
    config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
                           ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
    config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
                                 ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
    config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
                                  2*effort,
    config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
                                    effort;

    /* This function has some global state in order to continue the work
     * incrementally across calls. */
    static unsigned int current_db = 0; /* Last DB tested. */
    static int timelimit_exit = 0;      /* Time limit hit in previous call? */
    static long long last_fast_cycle = 0; /* When last fast cycle ran. */

    int j, iteration = 0;
    int dbs_per_call = CRON_DBS_PER_CALL;
    long long start = ustime(), timelimit, elapsed;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
        /* Don't start a fast cycle if the previous cycle did not exit
         * for time limit, unless the percentage of estimated stale keys is
         * too high. Also never repeat a fast cycle for the same period
         * as the fast cycle total duration itself. */
        if (!timelimit_exit &&
            server.stat_expired_stale_perc < config_cycle_acceptable_stale)
            return;

        if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
            return;

        last_fast_cycle = start;
    }

    /* We usually should test CRON_DBS_PER_CALL per iteration, with
     * two exceptions:
     *
     * 1) Don't test more DBs than we have.
     * 2) If last time we hit the time limit, we want to scan all DBs
     * in this iteration, as there is work to do in some DB and we don't want
     * expired keys to use memory for too much time. */
    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;


    /* We usually should test CRON_DBS_PER_CALL per iteration, with
     * two exceptions:
     *
     * 1) Don't test more DBs than we have.
     * 2) If last time we hit the time limit, we want to scan all DBs
     * in this iteration, as there is work to do in some DB and we don't want
     * expired keys to use memory for too much time. */
    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;

    /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU
     * time per iteration. Since this function gets called with a frequency of
     * server.hz times per second, the following is the max amount of
     * microseconds we can spend in this function. */
    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
    timelimit_exit = 0;
    if (timelimit <= 0) timelimit = 1;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST)
        timelimit = config_cycle_fast_duration; /* in microseconds. */

    /* Accumulate some global stats as we expire keys, to have some idea
     * about the number of keys that are already logically expired, but still
     * existing inside the database. */
    long total_sampled = 0;
    long total_expired = 0;


    for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
        /* Expired and checked in a single loop. */
        unsigned long expired, sampled;

        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time
         * in the current DB we'll restart from the next. This allows to
         * distribute the time evenly across DBs. */
        current_db++;

        /* Continue to expire if at the end of the cycle there are still
         * a big percentage of keys to expire, compared to the number of keys
         * we scanned. The percentage, stored in config_cycle_acceptable_stale
         * is not fixed, but depends on the Redis configured "expire effort". */
        do {
            unsigned long num, slots;
            long long now, ttl_sum;
            int ttl_samples;
            iteration++;

            /* If there is nothing to expire try next DB ASAP. */
            if ((num = dictSize(db->expires)) == 0) {
                db->avg_ttl = 0;
                break;
            }
            slots = dictSlots(db->expires);
            now = mstime();


            /* When there are less than 1% filled slots, sampling the key
             * space is expensive, so stop here waiting for better times...
             * The dictionary will be resized asap. */
            if (num && slots > DICT_HT_INITIAL_SIZE &&
                (num*100/slots < 1)) break;

            /* The main collection cycle. Sample random keys among keys
             * with an expire set, checking for expired ones. */
            expired = 0;
            sampled = 0;
            ttl_sum = 0;
            ttl_samples = 0;

            if (num > config_keys_per_loop)
                num = config_keys_per_loop;

            /* Here we access the low level representation of the hash table
             * for speed concerns: this makes this code coupled with dict.c,
             * but it hardly changed in ten years.
             *
             * Note that certain places of the hash table may be empty,
             * so we want also a stop condition about the number of
             * buckets that we scanned. However scanning for free buckets
             * is very fast: we are in the cache line scanning a sequential
             * array of NULL pointers, so we can scan a lot more buckets
             * than keys in the same time. */
            long max_buckets = num*20;
            long checked_buckets = 0;


            while (sampled < num && checked_buckets < max_buckets) {
                for (int table = 0; table < 2; table++) {
                    if (table == 1 && !dictIsRehashing(db->expires)) break;

                    unsigned long idx = db->expires_cursor;
                    idx &= db->expires->ht[table].sizemask;
                    dictEntry *de = db->expires->ht[table].table[idx];
                    long long ttl;

                    /* Scan the current bucket of the current table. */
                    checked_buckets++;
                    while(de) {
                        /* Get the next entry now since this entry may get
                         * deleted. */
                        dictEntry *e = de;
                        de = de->next;

                        ttl = dictGetSignedIntegerVal(e)-now;
                        if (activeExpireCycleTryExpire(db,e,now)) expired++;
                        if (ttl > 0) {
                            /* We want the average TTL of keys yet
                             * not expired. */
                            ttl_sum += ttl;
                            ttl_samples++;
                        }
                        sampled++;
                    }
                }
                db->expires_cursor++;
            }
            total_expired += expired;
            total_sampled += sampled;


            /* Update the average TTL stats for this database. */
            if (ttl_samples) {
                long long avg_ttl = ttl_sum/ttl_samples;

                /* Do a simple running average with a few samples.
                 * We just use the current estimate with a weight of 2%
                 * and the previous estimate with a weight of 98%. */
                if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
                db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
            }

            /* We can't block forever here even if there are many keys to
             * expire. So after a given amount of milliseconds return to the
             * caller waiting for the other active expire cycle. */
            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
                elapsed = ustime()-start;
                if (elapsed > timelimit) {
                    timelimit_exit = 1;
                    server.stat_expired_time_cap_reached_count++;
                    break;
                }
            }
            /* We don't repeat the cycle for the current database if there are
             * an acceptable amount of stale keys (logically expired but yet
             * not reclaimed). */

        } while (sampled == 0 ||
                 (expired*100/sampled) > config_cycle_acceptable_stale);
    }

    elapsed = ustime()-start;
    server.stat_expire_cycle_time_used += elapsed;
    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

    /* Update our estimate of keys existing but yet to be expired.
     * Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) {
        current_perc = (double)total_expired/total_sampled;
    } else
        current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+
                                     (server.stat_expired_stale_perc*0.95);
}

どうですか、コードはぼんやりしているのではないでしょうか.実は私は見たばかりで、ハハハ、今でも細部を完全に理解していませんが、基本的に全体の論理を理解しています.私は重点を言っています.
1、有効期間のアクティブ値を変更する:

effort = server.active_expire_effort-1

2.各タスクが開始される前に、式に従ってサンプリングサイズを変更する必要がある(初期値20):

config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP + ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort

3、各タスクが開始する前に、定時タスク間隔を式に従って変更する必要がある(初期値1000マイクロ秒):

config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION + ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort

4、タスク開始前にCPU使用率バルブ値を大きくする

config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC + 2*effort

5.各タスクが開始される前に、式に従ってサンプリング削除を実行するかどうかのバルブ値の割合を変更する必要があります.

config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE - effort

6.前回時間間隔が短すぎて早期に終了していない場合、現在のグローバル期限切れ率のバルブ値が今回の減少後のバルブ値より小さい場合、現在のタスクは早期に終了する

if (!timelimit_exit && server.stat_expired_stale_perc < config_cycle_acceptable_stale)
    return;

7.前回の開始時間+2倍のタスク間隔が現在の時間よりも大きい場合、その現在のタスクは早期に終了する

if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
            return;

8、config_によるcycle_slow_time_percと現在のサーバCPUは周波数hzを実行し、収集期限切れキーの16回の反復の制限時間を計算し、単位マイクロ秒で、この時間制限を超えるとすぐに現在のタスクを停止し、早期終了回数を記録する.(ここでは定期的にCPUをランダムに回収する時間を制御していることを理解していますが、そうでなければ、ご指導をお願いします)

    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;

    。。。

    /* We can't block forever here even if there are many keys to
    * expire. So after a given amount of milliseconds return to the
    * caller waiting for the other active expire cycle. */
    if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
        elapsed = ustime()-start;
        if (elapsed > timelimit) {
            timelimit_exit = 1;
            server.stat_expired_time_cap_reached_count++;
            break;
        }
    }

9.各タイミングタスクにおいてデータベースが0-16の順に処理し、グローバル変数current_を通過するdb進捗を記録し、次回のタイミングタスクで処理を続行


        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time
         * in the current DB we'll restart from the next. This allows to
         * distribute the time evenly across DBs. */
        current_db++;

10、有効期限を設定するキー(redisDb.expires辞書)からconfig_をランダムに取るkeys_per_loopキーは期限切れ検出を行い、各hashtableにはrehash用の代替hashtableがあり、現在のexpires辞書がrehashを行っていない場合は代替hash tableをスキップします.

for (int table = 0; table < 2; table++) {
    if (table == 1 && !dictIsRehashing(db->expires)) break;

    unsigned long idx = db->expires_cursor;
    idx &= db->expires->ht[table].sizemask;
    dictEntry *de = db->expires->ht[table].table[idx];

    ...
}

11.有効期限切れのkey(関数activeExpireCycleTryExpire)をチェックして削除し、avg_を計算するために有効期限切れのttlと有効期限切れのkeyの数を記録するttl

                    while(de) {
                        /* Get the next entry now since this entry may get
                         * deleted. */
                        dictEntry *e = de;
                        de = de->next;

                        ttl = dictGetSignedIntegerVal(e)-now;
                        if (activeExpireCycleTryExpire(db,e,now)) expired++;
                        if (ttl > 0) {
                            /* We want the average TTL of keys yet
                             * not expired. */
                            ttl_sum += ttl;
                            ttl_samples++;
                        }
                        sampled++;
                    }

12.サンプルが期限切れのデータを監視していない場合、または現在の期限切れの割合がconfigより大きい場合cycle_acceptable_staleバルブ値は、次の検出に進みます.

do {

    ...

} while (sampled == 0 || (expired*100/sampled) > config_cycle_acceptable_stale);

13、サーバのstat_を変更するexpired_stale_percパーセント
数式:server.stat_expired_stale_perc = (current_perc*0.05) + (server.stat_expired_stale_perc*0.95)

    /* Update our estimate of keys existing but yet to be expired.
     * Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) {
        current_perc = (double)total_expired/total_sampled;
    } else
        current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+
                                     (server.stat_expired_stale_perc*0.95);

どうですか.思ったほど難しくないと思いますか.私たちが日常的にやっている業務はこれらより複雑かもしれません.私自身cも基础は普通で、ただ本科で大学の中で学んだ少しの基础で、できないキーワードに出会っても少しずつ调べて、心を静めて行を见て、いつも理解することができます.

Java script[21712][Wecode]テクノロジーブログへようこそ

専制の概念と特徴