|
|
COMP8320 Tutorial 07 -- week 11, 2011
Transactional Memory and Heterogeneous Multicore
As well as revising the Transactional Memory from Lecture 9,
please read the article mentioned below before the tutorial.
If you have any unresolved questions or things you would like explained
from earlier in the course, please have these ready and let your tutor
know tutor at the start of the session.
Afterwards, discuss in small groups the following questions.
- What modifications would be necessary for the T2 in order to support
HTM (as described in lectures)?
- How would the following interleaving of the transactions A and B
work in the STM and HTM models (both `bounded' and Rock).
What would change if the order of the last 2 steps of A and B were
reversed? Assume that no other (interfering) transactions
occur in the meantime.
Hints: for STM, write down the changes to the (global and
local) timestamps and read/write sets for each transaction at each step.
For HTM, write the changes to the processor running each
transaction cache lines (use the notation A$(x) to denote the
state of the cache line in A holding data item x).
A: begin
A: read x
B: begin
B: read y
B: read x
A: read z
A: write x
A: end
B: write y
B: end
- In HillMarty08, a fundamental idea is that you can combine r
`base cores' into a single larger core. This core would have performance
perf(r) times that a base core, with an indicative
value of perf(r) = sqrt(r).
(i) How would such a speedup be possible, i.e. what kinds of parallelism
would this large core be exploiting? How would you
organize assembling a larger core out of smaller ones?
What savings could you make? (i.e. what parts for the smaller
cores could you discard?) What extra circuitry would be needed?
(ii) Would perf(r) = sqrt(r) be (as) realistic for
r=64? r=256?
(iii) Would perf(r) = sqrt(r) be as realistic for a (typical)
sequential region of code, as for a parallel region?
- In this paper, in what sense are the cores heterogeneous?
Is it sensible to run a single application over these cores?
If so, what implications are there for the compiler and the
operating system (e.g. in scheduling)?
-
Figure 2 seems to indicate that increasing r
is (almost) always worse for symmetric multicore,
a middle value of r is best for asymmetric multicore,
and increasing r is (almost) always better for
dynamic multicore.
What would be the effect on these results
if part of a sequential region had
perf(r) = 1?
What would be the effect on the dynamic results if there was a finite
time required to reconfigure?
Last modified: 19/10/2011, 11:28
|