[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
COMP2100/2500
Lecture 21: The C Programming Language ISummary
This is the first of three lectures on the C language. Today's lecture will concentrate on small-scale aspects of C, Lecture 22 will discuss the division of C programs into files and modules, and Lecture 23 will cover pointers, arrays and memory allocation.
The overall aim of the three lectures is to give you enough of an understanding of C that you can read and understand C code, and then after Lecture 26, that you will be able to write Java code which interfaces to existing C code.
Outline of today's lecture
Introduce the language.
Walk through a small example.
Explain how to compile C programs.
Describe the syntax and semantics of small-scale language features.
Further Reading:
The C Programming Language, 2nd (ANSI C) edition.
Brian W. Kernighan & Dennis M. Ritchie.
Prentice-Hall, 1988.
Why are we studying C in COMP2100?
So that you can call existing C code from Java programs.
It's in the course description.
It's used in later courses.
Strong employer demand.
If you know C, then many languages will seem familiar, in particular C++, Perl.
Richard's note: only the first and last reasons seem compelling to me, but they are very compelling.
The History of C
Note that there are three main versions of C. The original design by Kernighan and Ritchie is known as K&R C, and is rarely seen these days. The language was standardized first by ANSI and then ISO as ISO/IEC 9899-1990, Programming Languages - C - we'll call this C90, and this is what is described in the book and is what most C compilers accept. A new revision of the language was published in 1999 as ISO/IEC 9899-1999, Programming Languages - C, - we'll call this C99. We'll be using gcc, which accepts C90 plus some parts of C99 by default, but can accept most of C99 with a compiler option.
Why is C So Popular?
It is small and concise
It is `married' to Unix
It gives system-level control
It has a large user & code base
By 1970s standards C is
portable
expressive
revolutionary
What does C look like?
C syntax is the basis of much of Java, and some C programs can be written in a way that makes them almost impossible to read. But it doesn't have to be like that.
/* Find the maximum value of array[0...len-1]. * PRE: 0 < len <= length of array */ int maximum(int array[], int len) { int i, result; assert(0 < len); result = array[0]; for (i=1; i!=len; i=i+1) { /* INVARIANT: result is max value of array[0...i-1]. * VARIANT: len - i */ if (array[i] > result) { result = array[i]; } } return result; }
Java and C, What's the Difference?
C is missing some features of Java:
Types are not strongly enforced
Not object-oriented
Cannot truly enforce encapsulation of data types
No boolean type
But C does have:
Access to pointers and pointer arithmetic
Access to arbitrary memory addresses
Structure, union, and enumerated types (Java 1.5 has the latter)
A macro preprocessor
Warning: Although there are many similarities, there are also many syntactic differences between the languages.
Hello World in C
1 #include <stdio.h> 2 int main(void) { 3 printf("Hello World\n"); 4 return 0; 5 }Note: There is no enclosing class { ... } or similar.
- Line 1
Tell the preprocessor to include the interface to the standard I/O library. This is like an import statement in Java.
- Line 2
The main function returns an integer (its return code) to the operating system and takes no arguments.
- Line 3
Print the string `Hello World', followed by a newline character. Notice that every instruction must end with a semicolon.
- Line 4
Return status (success) to the operating system.
Compiling C
It's a two step process:
Compile the source files to object files.
Link the object files into an executable. (Merge object files, and connect definitions in one to uses in another)
For a single file program we can do both stages in one step.
gcc -Wall -o foo foo.cNotes:
- gcc
the GNU C Compiler
- -Wall
show all warnings
- -o foo
output the executable to file foo
- foo.c
the C source code file
For a multi file program we perform the two stages in two steps.
gcc -c -Wall foo_1.c ... gcc -c -Wall foo_n.c gcc -o foo foo_1.o ... foo_n.oNotes:
- -c
just compile, don't link
Comments
Characters between /* and the next */ are ignored.
/* INV: result is maximum value of array[0...i-1]. * VAR: len - i */ if (array[i] > result) { result = array[i]; }Note: Comments don't nest. So if you wanted to ``comment out'' that whole block of code, this won't work:
/* /* INV: result is maximum value of array[0...i-1]. * VAR: len - i */ if (array[i] > result) { result = array[i]; } */The // style of comment is not part of C90 but is part of C99. It's best not to use it. (It is possible to construct (bizarre) examples using // that are legal C90 and C99 but with different meanings according to each standard.)
Basic C Data Types
char: 8 bit integer
short int, usually written short: small integer
int: default integer
long int, usually written long: long integer
float: small real
double: default real
long double: extra precision real
Comments:
Integer types can be unsigned. E.g. char means -128 ... 127 (2's complement), unsigned char means 0 ... 255.
Size of types is implementation-dependent, except, e.g, 1 = sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long). Similarly, it could be that the types float, double, and long double are exactly the same, and they can be totally different - it depends on the combination of C compiler and hardware.
There is no boolean type; you have to use int instead. The convention is that 0 means false and anything else means true. (C99 has _Bool.)
Identifiers
For our purposes, these are the same as Java. The one thing to be aware of is that with some very old compilers, as few as 6 characters may be significant! (So identifiers are often heavily abbreviated.) C99 requires that at least 63 characters be significant.
By convention:
Constants are all upper case, with underscore word separators:
#define PI 3.14159265358979Identifiers with an initial underscore are typically reserved for system use (e.g. when writing the internals of operating systems).
Literals
Pointers: NULL is an invalid pointer.
Integers: -1, 0, 1.
Note: These can also be specified in octal (with a leading 0) or hexadecimal (with a leading 0x (or 0X)). For example 16 = 020 = 0x10.
Unsigned integer literals are written with a suffix u (or U); long integer literals are written with a suffix l (or L). For example, 1234ul is an unsigned long integer constant.
Reals: 18.0, 1.8e1.
Note: Such constants are of type double. You can write float and long double constants too by suffixing an f (or F) or l (or L), e.g. 18.0f, 18.0L.
Characters: enclosed in single quotes 'a', 'A', '1', '$', plus special escape sequences: '\n' (newline), '\t' (tab), '\f' (form feed), '\\' (backslash), '\'' (single quote), etc. You can also use the octal and hex (but not decimal) notation similar to the one mentioned above to specify a character by its character code, e.g. '\x41', '\101', and 'A' are three different ways of referring to the same character.
Strings: a sequence of characters enclosed in double quotes on one line:
"If you want to shoot, \"shoot\", don't talk."Note: (1) The special character '\0' is used as a string terminator. (2) The double quote marks need to be `escaped' so that the compiler will treat them as just any other character rather than giving it a special meaning.
The C preprocessor (discussed in detail later) concatenates adjacent string literals. So writing
is the same as writing"abc " "def"Indeed, this is a good way to write a long string constant over multiple lines. (But beware: the maximum allowable length of string constants depends on the compiler!)"abc def"Arrays: {0, 1, 2}
Expressions
One important thing to know about C is that everything is an expression. Every instruction is also an expression that returns a value to its caller. The caller may be non-existent or may choose to ignore the return value, but it is always there. This (as we'll see) can lead to some very strange errors.
Arithmetic Expressions:
For our purposes, these are just like Java.
Increment and Decrement expressions:
These are instructions, but also expressions.
C syntax What it means ++i increment i, then return its new value i++ return the current value of i, then increment it --i decrement i, then return its new value i-- return the current value of i, then decrement it Exercise: What does this mean? a[i]=i++;
Relational Expressions
Like Java, including:
a == b is the logical assertion `a is equal to b'.
a != b is the logical assertion `a is not equal to b'.
Note: The expression/instruction a = b assigns to a the value of b (as in Java) and then returns that value. A very common error for Java programmers writing C is to write
int a, b; . . . if (a=b) { /* do something */ } else { /* do something else */ }Exercise: What will happen?
Boolean Expressions
Remember that before C99 there was no boolean type, and very few people use _Bool. The following operators all operate on integers, to increase the potential for confusion.
C syntax What it means !a not a
a && b a and b
a || b a or b
Note that a single `&' or a single `|' are also legal operators, but they perform bitwise logical operations on their arguments. This is another potential source of really strange errors in C programs.
Note also that `&&' and `||' are short-circuit operators as in Java.
Special assignment operations
C syntax What it means i += j i = i + j
i -= j i = i - j
i *= j i = i * j
i /= j i = i / j
i %= j i = i % j
Statements
As in Java, the semicolon is used as a statement terminator. (Some other languages, such as Pascal, use a semicolon as a statement separator.)
It is also possible to separate statements with a comma. The comma binds more tightly than the semicolon. This turns out to be commonly used when writing for loops.
Block Statements
If S1, ... Sn are all statements, then {S1 ... Sn} is a block statement.
Variable Declarations
Variable declarations can be made in any block. In C90, they must occur at the beginning of the block, whereas in C99, as in Java, they can be interspersed with other statements. The variables only exist within that block. (To say that more formally: the lifetime of a variable is the immediately enclosing block.)
The syntax for variable declarations in C is:
Examples: int x; char c = 'c', d;
Variable declarations may also be made at the top level, i.e. outside the scope of a block. These are called global variables (though that name is a little misleading).
Conditional Statements
As in Java.
Simple if-then:
if (expression) statementIf-then-else:
if (expression) statement1 else statement2Example:
if (x > y) z = x; else z = y;
Multi-way Conditionals using if
if (i > 0) { printf("i is positive\n"); } else if (i < 0) { printf("i is negative\n"); } else { printf("i is zero\n"); }
Multi-way Conditionals using switch
As in Java.
/* Place uppercase version of low in up. */ switch (low) { case 'a': up = 'A'; break; case 'b': up = 'B'; break; ... case 'z': up = 'Z'; break; default: up = low; }Note 1: Case values must be constant expressions (no variables). They are evaluated at compile time.
Note 2: As in Java, those break instructions are necessary, otherwise control drops through to the next case and executes it also. This is a `feature', not an error in the language...
while Loops
As in Java.
while (expression) statementExecute statement while expression is true (that is, nonzero).
There is also a corresponding do loop:
do statement while (expression);
for Loops
As in Java.
for (statement1; expression; statement2) statement3An empty expression equates to true, so for(;;) is an infinite loop. Equivalently, you could write while(1).
In C99, as in Java, you can write for (int i = . . .). You can't do this in C90.
The break and continue Statements
Saying break gives an early exit from a loop. (More precisely: it causes an immediate exit from the innermost enclosing loop.) Saying continue skips immediately to the end of a loop body. (More precisely: it causes an immediate jump to just before the end of the body of the innermost enclosing loop.)
The goto Statement
This causes immediate transfer of control to a labelled location somewhere else in the code. This is almost never OK. Anything more than the most sparing use renders code incomprehensible, unpredictable, impossible to analyse...
The only conceivable acceptable use is in error handling. (Modern languages such as Java use exceptions.)
Optional: read Knuth's classic 1974 paper Structured programming with go to statements. What do you think?
[ANU] [DCS] [COMP2100/2500] [Description] [Schedule] [Lectures] [Labs] [Homework] [Assignments] [COMP2500] [Assessment] [PSP] [Java] [Reading] [Help]
Copyright © 2005, Jim Grundy & Ian Barnes & Richard Walker, The Australian National University
Version 2005.5, Monday, 2 May 2005, 13:34:29 +1000
Feedback & Queries to
comp2100@cs.anu.edu.au