node-haystack Episode 5: Volume


This article talks about the structure of volume file.
Layout
Volume is the entity for service. A volume includes a lot of blocks. Blocks store the exact data.
Volume
The layout of a volume file looks like this:
Field
Size
Description
Magic number of volume
8 bytes
0x6b63617453796148 or { ‘H’, ‘a’, ‘y’, ‘S’, ‘t’, ‘a’, ‘c’, ‘k’ }
Block 0
Varies
block 1
Varies

Varies
block N
Varies
Block
The layout of a block looks this:
Field
Size
Description
Magic number of header
4 bytes
0x53796148 or { ‘H’, ‘a’, ‘y’, ‘S’ }
Key
16 bytes
A uuid value
Cookie
4 bytes
Cookie for avoiding attack
Tag
2 bytes
Optional field for user
Flag
2 bytes
If removed
Data size
4 bytes
Length of data, in byte
Check sum
4 bytes
Parity check sum
Data
Varies
The content
Padding
Varies
0 - 7 bytes, make sure the blocks aligning at 8 bytes boundary
Magic number of footer
4 bytes
0x6b636174 or { ‘t’, ‘a’, ‘c’, ‘k’ }
NOTE The data size field of block is a 32bits integer. That means the maximum size of data field of block is 2^32, or 4GB.
Index
To speed the finding of blocks, a hash is used to index the blocks. The key type of hash is std::string . The following information of a block will be kept in hash:
Field
Size
Description
Key
16bytes
The uuid key
Cookie
4bytes
Tag
2bytes
Flag
2bytes
Since the memory will align at 32bits boundary at least, this field should be harmless. 0x0000 for normal, 0x0001 on removed.
Size
4bytes
Position
8bytes
The offset of beginning of the block in volume file.
Cache
Implementing a C++ cache is painful. The flexible high-level language JavaScript will be a better choice, even it may cause the lost of performance a little bit. The cost is worth.