回転:Hide data inside pointers(ポインタにデータを隠す)

7137 ワード

この文書では、ポインタ内の未使用のビットを使用してデータを非表示にする方法について説明します.
When we write C code, pointers are everywhere. We can make a little extra use of pointers and sneak in some extra information in them. To pull this trick off, we exploit the natural alignment of data in memory.
Data in memory is not stored at any arbitrary address. The processor always reads memory in chunks at the same size as its word size; and thus for efficiency reasons, the compiler assigns addresses to entities in memory as multiples of their size in bytes. Therefore on a 32 bit processor, a 4-byte int will definitely reside at a memory address that is evenly divisible by 4.
Here, I am going to assume a system where size of int and size of pointer are 4 bytes.
Now let us consider a pointer to an int . As said above, the int can be located at memory addresses 0x1000 or 0x1004 or 0x1008, but never at 0x1001 or 0x1002 or 0x1003 or any other address that is not divisible by 4. Now, any binary number which is a multiple of 4 will end with 00 . This essentially means that for any pointer to an int , its 2 lower order bits are always zero.
Now we have 2 bits which communicate nothing. The trick here is put our data into these 2 bits, use them whenever we want and then remove them before we make any memory access by dereferencing the pointer.
Since bitwise operations on pointers don’t go well with the C standard, We will be storing the pointer as an unsigned int .
The following is a naive snippet of the code for brevity. See my github repo - hide-data-in-ptr for the full code.
void put_data(int *p, unsigned int data)
{
    assert(data < 4);
    *p |= data;
}

unsigned int get_data(unsigned int p)
{
    return (p & 3);
}

void cleanse_pointer(int *p)
{
    *p &= ~3;
}

int main(void)
{
    unsigned int x = 701;
    unsigned int p = (unsigned int) &x;

    printf("Original ptr: %u
", p); put_data(&p, 3); printf("ptr with data: %u
", p); printf("data stored in ptr: %u
", get_data(p)); cleanse_pointer(&p); printf("Cleansed ptr: %u
", p); printf("Dereferencing cleansed ptr: %u
", *(int*)p); return 0; }

This will give the following output:
Original ptr:  3216722220
ptr with data: 3216722223
data stored in ptr: 3
Cleansed ptr:  3216722220
Dereferencing cleansed ptr: 701

We can store any number that can be represented by 2 bits in the pointer. Using put_data() , the last 2 bits of the pointer are set as the data to be stored. This data is accessed using get_data() . Here, all bits except the last 2 bits are overwritten as zeroes there by revealing our hidden data. cleanse_pointer() zeroes out the last 2 bits, making the pointer safe for dereferencing. Note that while some CPUs like Intel will let us access unaligned memory locations, certain others like ARM CPU will fault. So, always remember to keep the pointer pointed to an aligned location before dereferencing.
Is this used anywhere in the real world?
Yes, it is. See the implementation of Red Black Trees in the Linux kernel ( link ).
The node of the tree is defined using:
struct rb_node {
    unsigned long  __rb_parent_color;
    struct rb_node *rb_right;
    struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

Here unsigned long __rb_parent_color stores: 1. the address of the parent node 2. the node’s color.
The color is represented as 0 for Red and 1 for Black. Just like in the earlier example, this data is sneaked into the ‘useless’ bits of the parent pointer.
Now see, how the parent pointer and the color information is accessed:
/* in rbtree.h */
#define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))/* in rbtree_augmented.h */
#define __rb_color(pc)     ((pc) & 1)
#define rb_color(rb)       __rb_color((rb)->__rb_parent_color)