We need to understand about coroutine code generation, and how it works with hardwares.
Why don't you always compile your program with optimization flag on? (-O2)
There are costs: it takes much longer time to compile with optimization turned up.
General forms of optimizations:
We still do them to communicate with other programmers.
10.1 Sequencial Optimizations
Most programs are sequential; even concurrent programs.
So sequencial execution is a target of optimization.
dependencies result in partial ordering among and set of statements
R->R can be reordered, but rest of them are not.
For the last case, first line is not needed, so the compiler elides that code.
can you change oreders of control variables?
So there are some rules of order exchange.
compiler may also introduce little bits of parallalism, but its not too significent
10.2 Memory Hierarchy
Paging and caching
You duplicate data to gain performance increase.
Data is eagerly pulled from the disk, and lazily pushed into the disk.
This set up doesn't really work well with concurrent programs.
When you have context switch, all your good setups of registers, cache, memory are destroied and should be saved to hold other context.
Nowadays, since we have such a huge RAM, your computer might run for days and you might only have 3 page changes.
So PAGING IS DEAD.
But you still need to deal with the same problem in cache.
10.2.2 Cache Coherence
Multi-level caches used, each larger but with slower speed, with lower cost.
If my program is loaded to p1, and then context switched to put on p2, the computer needs to resetup all caches which is very time consuming.
the memory is shared accross processors. If multiple processors access same memory, every and each cache may end up having different values for the same conceptual value.
Cache thrashing/False sharing
10.3 Concurrent Optimization
In sequencial execution, strong memory ordering: reading always returns last value written.
In concurrent execution, week memory ordering: reading can return previously written value or value written in future.
However, because compiler optimization for sequencial program may break concurrent program.
i.e. reorderings or RW, WR, WW on disjoint variable does not change the result of sequencial program, but it may for concurrent program.
This may mean something like this
In uC++ we never had to consider for such aggressive reorderings because uC++ prevented such reorderings around lock calls.
//case 1: telling value is ready before update value = 123; bool ready = true; // can be optimized to bool ready = true; value = 123; //case 2: reordering lock calls lock.acquire(); ... lock.release(); // can be optimized to lock.acquire(); lock.release(); ...
In C++ there is a qualifier called
This forces variable loads and stores to/from registers at sequence points.
volatilequalifier in C++ is not made for concurrent programing, so it has some weaknesses.
atomicqualifier is; it prevents eliding and disjoint reordering.
Referenceこの問題について(Lecture 27), 我々は、より多くの情報をここで見つけました https://velog.io/@wjddlstjd396/Lecture-27
Collection and Share based on the CC Protocol