CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science
School of Computer Science
Printer Friendly Version of this Document

UniSAFE

Introduction to Computer Systems

Tutorial Laboratory 08 - Caches, x86 Assembly and
Linking and Loading

Week 11


Objectives

This session has the following aims.
  • To deepen your understanding of caches.
  • To give you a brief introduction to the assembly language of a real machine, in this case x86 assembly.
  • To understand load objects and symbol tables.

Tutorial Questions

The following questions on a separate sheet of paper, with your name and student number clearly written. Please ensure your writing is legible. Hand in to your tutor at the beginning of your tutorial / laboratory session.
  1. What is a cache line, and what lengths are these typically?

  2. Define the terms direct-mapped cache and E-way set-associate cache. What are typical values for E?

  3. Consider a program which repeatedly accesses all elements in an int array of of length N in a cyclic fashion (from first to last element). For example, the computation could be:
    	while (1) {
    	  int s = 0, i;
    	  for (i = 0; i < N ; i++)
    	    s += a[i];
            }
    
    Suppose the first iteration has already been completed. Assume that the cache line size is 4 bytes, the cache holds 216 bytes, and the cache is fully associative (note: this is not a realistic cache!), and the replacement policy is least recently used. Calculate the cache hit rate for (a) N = 1 * 213, (b) N = 2 * 213, (c) N = 3 * 213 and (d) N = 4 * 213. Hint: calculate the relative size of a[] to the cache first.

    Explain whether a random replacement policy would perform better or worse in each case.

    Recall that the cache hit rate is calculated as the percentage of (in this case int) load/store operations that occur when the data is in the cache.

  4. Repeat the above for a direct-mapped cache (note that the replacement policy plays no role in this case).

  5. Suppose we now had a line size of 16 bytes (keep the total number of bytes in the cache the same). How would this affect the rate (consider this for both fully associtive and direct-mapped caches)? Will increasing the line size necessarily mean an improvement in the computer's execution rate?

  6. An assumption in the above is that there is no significant amounts of data access other than to the array. Discuss the extent you expect this to hold on Intel x86 architectures. Hint: imagine writing the above loop in PeANUt; what other memory accesses than the array itself would you have?

Laboratory Exercises

x86 Assembly Language

Below is a simple program that prints Hello World! in x86 assembly language. Cut-and-paste this program into a file called "hello.s" and assemble and link it using the commands below.
# Simple Hello World in x86 assembler 
# To compile and run execute:
#  %  gcc -c hello.s
#  %  ld -s -o hello hello.o
#  %  ./hello


.data
hellostr:
	.string "Hello World!\n"
.text
.globl _start
_start:
	movl	$4,%eax    # write(1,hellostr,13)
	movl	$1,%ebx
	movl	$hellostr,%ecx
	movl	$13,%edx
	int	$0x80

	movl	$1,%eax    # exit(0)
        movl    $0,%ebx
	int	$0x80
It works by making two system calls. The first is a write system call which writes the string to standard out. The second is the exit system call. The parameters of the system call are loaded into the general purpose register (%eax, %ebx, %ecx, and %edx) and then the "int $0x80" instruction is call (like the trap instruction rPeANUt). The %eax holds the system call number (4 for write, and 1 for exit). The "movl source,destination" instruction moves a long word from the source to the destination. Immediate values have start with a $.

Add comments to the above code which states the meaning of each of the parameters in the system calls.

Modify the above program such that it prints "Hello World!" in an infinite loop. For this you will need to use a "jmp" instruction and add a label.

Modify the code again such that in an infinite loop it reads characters for the standard input stream and outputs them directly to the standard output stream. Note that, the system call number for read system call is 3 and it has similar parameters to that of write. Also the return value of the system calls is stored in the register %eax.

Test your program by typing something into the terminal and you should see what you typed in echoed back out.

(option) Modify the code such that it converts all the characters to upper case.

Obtain a copy the arrayloop.c code, compile and run it. The program reads from an array in a loop. The size of the array can be varied, however, the total number of reads done remains constant. If each read took exactly the same amount of time for different sized arrays then you would expects this program to always take the same amount of time. However, from your understanding of caches this will clearly not be the case. Experiment with this program by using the "time" command to create a graph with the array size on the x-axis and the time taken on the y-axis. Does this tell you anything about this system?