Routing and Doc Versioning


1. Routing


how does Elasticsearch know where to store docs?
how are documents found once indexed?
=> through routing

  • Routing : process of resolving a shard for a doc

  • elasticsearch uses hashing to select shard
  • changing routing equation is possible
  • shard_num = hash(_routing) % num_primary_shards
  • 2. How Elasticsearch Reads Data

  • coordinating node receives request
  • find which replication goup to look for using routing
  • select shard using Adaptive replica selection
  • node returns response

  • 3. How Elasticsearch Writes Data

  • coordinating node receives request
  • find shard to write using routing
  • data is sent to primary shard
  • primary shard validates data and save
  • data is locally shared among replication group

  • when replication fails, elasticsearch goes through recovery process
  • ex) when pri shard fail while replication
  • new pri is elected, but data between shards are not the same
  • primary terms and sequence numbers are used to resolve

  • Primary terms : way to distinguish between old and new primary shards
  • counter for how many times the pri shard changed
  • primary terms is appended to write operation
  • can check if pri shard changed since request

  • Sequence number : way to track operation
  • counter incremented for each operation
  • appended to write operation
  • pri shard increases sequence number

  • for large indexes, Elasticsearch uses checkpoints
  • replica has local check point
  • replication group has global checkpoint
  • 4. Document Versioning


  • only stores recent document

  • store version number of document

  • internal vs external versioning
  • internal : default
  • external : when maintaining versions outside of elasticsearch ex. RDBMS

  • above versioning is replaced by a better way
  • 4-1. Optimistic Concurrency Control


    prevent old doc overwriting new doc
  • update should fail if data changed, then try again
  • used to be handled by doc version, but now handled using Primary terms and Sequence number
  • POST /{index name}/_update/{doc name}?if_primary_term={X}&if_seq_no={Y}
  • if primary_term and seq_no are not the same, error is thrown
  • error should be handled in application level and retried