Things you should know by now #1!!
As we enter week 4 I have constructed a list of things from the course
that I think you should know by now. I've excluded the MD/python
related stuff. This means a list of things that I think I
could write an exam question around, where an exam
question can vary from simple knowlege of (which is low
level easy type of question) to evaluation of (high level) (See
Bloom's Taxonomy)
If you read the following and don't know what I am talking about, try
posting something to the discussion board or come and talk with me.
If it appears that many people do not understand something we can
go over the topic in a Wednesday lecture.
General Background
- Describe two areas where HPC is important, include in your answer
justification for why HPC is required.
- What is Moore's law
Performance Measurement etc
- Difference between CPU, SYS and Elapsed time. When you would use
which, and how they are influence by, e.g. timeslicing. Under what
circumstance is CPU+SYS greater than the Elapsed time?
- What resolution and overhead are, how you would measure them, be
able to deduce what each is if given a list of numbers like
5 8 5 5 8 11 1312 14 5 8 etc. Be able to explain what the large number
(1312) in the above is due to.
- An appreciate for what the resolution of a typical timer might be,
and why. Or given a desire to measure something what sort of timer
would be best suited to that.
- Given some timing output obtained from "time" shell command be
able to interpret it - including things like - here are 5 different
time measurements for different problem sizes what can you say about
the scaling of the problem.
- Be able to read the output from "prof" and interpret, be able to
read the output from "gprof" and construct a calling sequence for the
program with time spent in each routine. Given a profile be able to
suggest where you should start tuning the code, and what the maximum
possible speedup you might obtain by improving the performance of a
routine (this is Amdahl's law).
- Have some idea of what a basic block is (we will return to this)
and what information you can get from a basic block profiler (like
tcov used in the lab). Be able to interpret output from tcov and
deduce scaling and other properties of a code.
- Be aware of standard performance benchmarks, be able to give at
least one advantage and one disadvantage compared with use of your own
code as a benchmark.
- Know what MIPS and MFLOPS are. Have some idea of what performance
in terms of MFLOPS you may expect from a current PC, through to a
supercomputer.
- Understand Amdahl's law and its application in a number of
circumstances. This includes in terms of parallel machine, but also
running in a hetergeneous environment such as 50% of the code running
on a GPU and the rest of the CPU and have different performance for
both parts. Be able to go from machine and code characteristics to
predicting performance, but also to go from a table of observed run
timings under different conditions to being able to tell me something
about the code. Understand and be able to articulate the counter
argument about scalability of the code. Appreciate that Amdahl's law
is just a model and be able to describe why reality is likely to
deviate from this model (eg overhead).
- Be familiar with terms such as speedup, efficiency, scalability.
Floating Point
- Know why floating point is important for scientific computation
- Understand the difference between approximation or truncation
error and rounding error, and be able to give an example of each.
Given some output - like the finite difference with smaller step sizes
- be able to comment on when which source of error is dominant.
- Know the difference between absolute and relative
error. Understand why you should not have a statement like "if (x ==
0.0)" or be very careful about something like "if (x < 1.0e-30)"
- Understand how floating point numbers are represented on a
machine, the base, exponent and precision. That not all machines are
equal, and that there is a choice between how you divide a given
number of bits between precision and range. Appreciate there is a
finite number of FP numbers.
- Understand what a normalized number is, and why we normalize
numbers. What is underflow, overflow, and machine precision. Be able
to write a simple code that could find approximately each of these
three values. Understand what not a number is and how it may
arise. What is a subnormal number and what it means to allow gradual
underflow. If presented with a "toy" system, be able to identify all
of the above.
- Understand rounding, including round to nearest, round to zero,
round to plus or minus infinity. Appreciate how rounding errors
propagate in arithmetic operations. Understand why cancelation is a
major reason for lose of precision, and be able to illustrate this.
- Know what interval arithmetic is and why it can be beneficial in
determining numerical stability of a code.
Modern Microprocessor
- Know what CISC and RISC stands for. Be able to give an example
of each type of processor. Be able to list at least 6 of the 8 the key
characteristics that distinguish RISC and CISC design.
- Know what pipelining is, give a computer related example of
the various stages. What is required at the hardware level to support
pipelining (different pieces of hardware that can operate at the same
time). Know what is meant by pipeline latency and pipeline length. Which
would you rather have - a 6 stage pipeline with each stage clocked at
2 ns or a 10 stage pipeline with each stage clocked at 1 ns? Why?
What sort of operations disrupt the pipeline.
- How dependent instructions are treated in the pipeline. What sort
of instructions may involve multiple steps in the exectuion phase, and
why this may be the case (eg the phases involved in adding two
floats). What are pipeline bubbles! What is meant by software
pipelining. Given a list of assembly instructions and some details of
instruction latencies, how you might rearrange them to achieve
software pipelining. What sort of instructions are usually not
pipelined - alternatively why should you write x*0.5 rather than
x/2.0?
- What is meant by a branch delay slot, and why this concept was
introduced. What it means in the context of the assembly code
generated. What is branch prediction. Be able to describe the
operation of a two bit branch predictor.
- What it means to say that the machine has uniform instruction
length and why this is advantageous.
- What it means to say that you have a load/store architecture. Why
this is advantageous. What it means to take an expression like X = Y +
Z and write it in a load/store fashion.
- What is a superscalar architecture. What is required at the
hardware level to support this (multiple functional units). Appreciate
the difference between superscalar and pipelining. What is the typical
superscalar instruction mix supported on a modern processor. Why the
number of simultaneous scheduled instructions is typically limited to
around 4 (as opposed to say 10). The need for correct instruction mix
to maximize the gain from superscalar architecture.
- What is meant by saying that for RISC systems "the complexity is
in the compiler".
- In what way do RISC systems have multiple register sets?
- What is meant by addressing mode?
- What is meant by out-of-order execution. What this imposses on the
hardware.
- Later ... relate the above to some real processors.