MySQL Execution Plan--ファイルソート(file sort)

5315 ワード

MySQLでORDER BY文を処理する場合、クエリがインデックスの規則性を利用できない場合は、追加の操作でデータをソートする必要があります.MySQLには3つのソートアルゴリズムがあります.
1、    (Quick Sort),          ,               ,      ,           ,            ,         。               ,           。
2、    (Quick Sort),              ,          (divide-and-conquer)  (       (divide)             ,  (conquer)               "  "   ,     )。
3、                   ,                    ,          .

 
MySQLには、次の3つのデータのソート方法があります.
1、    (    )
2、    (    )
3、   

ヒープソートはMySQL 5.6に導入されており、LIMIT M OFFSETN操作では、M+N個のレコードを排出する必要があるため、「大頂ヒープ」または「小頂ヒープ」の2種類のヒープソートアルゴリズムを利用して最適化し、すべてのレコードソートに対するリソース消費を回避することができる.
通常のソートと最適化ソートは、レコードを複数のレコードブロックに分割し、各レコードブロックを「クイックソート」アルゴリズムでソートし、ソート後のレコードブロックを「集計ソート」アルゴリズムでマージします.通常のソートと最適化ソートでは、中間ソート結果を格納するために一時ファイルを使用する必要があるため、ファイルソート(FileSort)と呼ばれます.クエリー・プランのextraセクションに「Using filesort」が表示されている場合、クエリーは「通常ソート」または「最適化ソート」を使用していることを示しますが、必ずしもIO操作をもたらすとは限りません.「中間結果セット」が大きい場合にのみ、「中間結果セット」がディスクに書き込まれます.
 
関連パラメータ:
1、  max_length_for_sort_data    FileSort            ,    MySQL 8.0.20      。
2、  sort_buffer_size           buffer  ,    256KB,         "    "     "    "      。
    A)  MySQL 5.7      ,       sort_buffer_size     ,    sort_buffer_size       。
    B)  MySQL 8.0.12      sort_buffer_size    ,             ,    sort_buffer_size         。
3、  read_rnd_buffer_size                  ,    256KB,         "       "   "   "IO  。
4、  tmp_dir             , tmp_dir        tmp_dir           IO    ,      。

 
MySQL公式ドキュメントHow MySQL Does Sorting(filesort)
MySQL has two filesort algorithms for sorting and retrieving results. The original method uses only the ORDER BY columns. The modified method uses not just the ORDER BY columns, but all the columns used in the query.

The optimizer selects which filesort algorithm to use. Prior to MySQL 4.1, it uses the original algorithm. As of MySQL 4.1, it normally uses the modified algorithm except when BLOB or TEXT columns are involved, in which case it uses the original algorithm.

The original filesort algorithm works as follows:

    Read all rows according to key or by table scanning. Rows that do not match the WHERE clause are skipped.
    For each row, store a pair of values in a buffer (the sort key and the row pointer). The size of the buffer is the value of the sort_buffer_size system variable.
    When the buffer gets full, run a qsort (quicksort) on it and store the result in a temporary file. Save a pointer to the sorted block. (If all pairs fit into the sort buffer, no temporary file is created.)
    Repeat the preceding steps until all rows have been read.
    Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.
    Repeat the following until there are fewer than MERGEBUFF2 (15) blocks left.
    On the last multi-merge, only the pointer to the row (the last part of the sort key) is written to a result file.
    Read the rows in sorted order by using the row pointers in the result file. To optimize this, we read in a big block of row pointers, sort them, and use them to read the rows in sorted order into a row buffer. The size of the buffer is the value of the read_rnd_buffer_size system variable. The code for this step is in the sql/records.cc source file.

One problem with this approach is that it reads rows twice: One time when evaluating the WHERE clause, and again after sorting the pair values. And even if the rows were accessed successively the first time (for example, if a table scan is done), the second time they are accessed randomly. (The sort keys are ordered, but the row positions are not.)

The modified filesort algorithm incorporates an optimization such that it records not only the sort key value and row position, but also the columns required for the query. This avoids reading the rows twice. The modified filesort algorithm works like this:

    Read the rows that match the WHERE clause.
    For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query.
    Sort the tuples by sort key value
    Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time.

Using the modified filesort algorithm, the tuples are longer than the pairs used in the original method, and fewer of them fit in the sort buffer (the size of which is given by sort_buffer_size). As a result, it is possible for the extra I/O to make the modified approach slower, not faster. To avoid a slowdown, the optimization is used only if the total size of the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data system variable. (A symptom of setting the value of this variable too high is that you should see high disk activity and low CPU activity.)

 
参考資料:https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_sort_buffer_sizehttps://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_read_rnd_buffer_sizehttps://dev.mysql.com/doc/refman/8.0/en/order-by-optimization.html