CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science
School of Computer Science
Printer Friendly Version of this Document

UniSAFE

COMP8320 Tutorial 07 -- week 11, 2011

Transactional Memory and Heterogeneous Multicore

As well as revising the Transactional Memory from Lecture 9, please read the article mentioned below before the tutorial. If you have any unresolved questions or things you would like explained from earlier in the course, please have these ready and let your tutor know tutor at the start of the session.

Afterwards, discuss in small groups the following questions.

  1. What modifications would be necessary for the T2 in order to support HTM (as described in lectures)?

  2. How would the following interleaving of the transactions A and B work in the STM and HTM models (both `bounded' and Rock). What would change if the order of the last 2 steps of A and B were reversed? Assume that no other (interfering) transactions occur in the meantime.

    Hints: for STM, write down the changes to the (global and local) timestamps and read/write sets for each transaction at each step. For HTM, write the changes to the processor running each transaction cache lines (use the notation A$(x) to denote the state of the cache line in A holding data item x).

    A: begin
    A: read x
    B: begin 
    B: read y
    B: read x
    A: read z
    A: write x
    A: end
    B: write y	
    B: end
    

  3. In HillMarty08, a fundamental idea is that you can combine r `base cores' into a single larger core. This core would have performance perf(r) times that a base core, with an indicative value of perf(r) = sqrt(r).

    (i) How would such a speedup be possible, i.e. what kinds of parallelism would this large core be exploiting? How would you organize assembling a larger core out of smaller ones? What savings could you make? (i.e. what parts for the smaller cores could you discard?) What extra circuitry would be needed?

    (ii) Would perf(r) = sqrt(r) be (as) realistic for r=64? r=256?

    (iii) Would perf(r) = sqrt(r) be as realistic for a (typical) sequential region of code, as for a parallel region?

  4. In this paper, in what sense are the cores heterogeneous? Is it sensible to run a single application over these cores? If so, what implications are there for the compiler and the operating system (e.g. in scheduling)?

  5. Figure 2 seems to indicate that increasing r is (almost) always worse for symmetric multicore, a middle value of r is best for asymmetric multicore, and increasing r is (almost) always better for dynamic multicore.

    What would be the effect on these results if part of a sequential region had perf(r) = 1?

    What would be the effect on the dynamic results if there was a finite time required to reconfigure?

Last modified: 19/10/2011, 11:28

Copyright | Disclaimer | Privacy | Contact ANU