See references below.
An introduction to the basic concepts of version control systems
(VCS);
different types of VCS are noted;
an introduction to the
main operations of the Subversion VCS.
Explain the rationale for using VCS in software industry.
Illustrate the operating principles of Subversion (svn) as an example of a VCS
Give enough background to get started on using Subversion in Lab 1.
What is version control?
Version control (in the most general sense) is a practice of uniquely identifying, classifying, organizing for convenient storage and retrieval, all intellectual products of human activity which have more than transient value. In the software industry, version control allows us to store all revisions of everything which is being (electronically) written during development and maintenance of a software system, organised in chunks that are usually called modules. This obviously includes modules of program code (source files), but it also includes test cases, design documents in words and pictures, user documentation, testing, build tool, and other development scripts, automatic code generator input specification files, development history records (blogs, spreadsshets, wikis, or whatever other tools have been used). In the rest of thses notes we will concentrate on source code modules, but these other examples are also very important in software construction and development.
Different versions of a software module arise in at least three ways. Here are the most common of them:
It is desirable for the development team or individual programmer to keep all the past and present versions of all modules, because
a new revision may introduce a worse error than it fixes; you may want to revert to a previous version, and start that development step again;
different customers may have different versions of the system, which can be represented as diferent branches in the version control system;
the client may try the new version but then decide they liked the old version better;
the system design may change to improve performance, reliability, target platform.
Who knows what other reason would force us to consider something we have done in the past to be usable again. Very often a single module can be reused in a different program altogether, if the module can be found and easily extracted from the system it was originally written for. The version control system allows this also.
Keeping track of all these versions requires a tool— a Version Control System (VCS).
A VCS is a software tool that organises an organised digital storage called a repository. The repository may be a set of files on the user's computer filesystem, or on a remote comnputer; the repository and its controlling software may be centralised on one computer or distributed across many computers. In general all VC systems:
Remember which versions of which modules belong together.
Store descriptions of each change made to a module (a change log).
Be able to identify the version of any module, either source or object code.
Cooperate with a build tool to build particular configurations of a system.
enable collaborative teamwork: the repository can be shared by multiple people, and the VCS ensures two programmers don't destructively interfere with each other's work even if they are working at the same time
Keep the source code and artifacts which can be generated from it (executables, documentations, etc) consistent.
The users typically control the VCS through a collection of commands or a standalone VCS application, or more often within an software development tool. A good software development environment (an IDE, such as eclipse) and the more powerful program text editors (such as emacs) have a built-in interface or plugin to integrate with one or more version control systems.
Program source files (this is the essential item).
Scripts which are used for building applications from the source code: Makefile if the Make utility is used, build.xml script for Ant tool, etc.
Test suites (for unit and integration testing) which can also change during the development (and maintenance) of the software module. A unit testing tool such as JUnit uses source code files, but other kinds of tests such as integration tests, performance tests may be implemented as scripts and collections of datafiles.
Metadata which set up configuration parameters (for creating the release CDs, fiducial data against which the tests are performed etc). Everything without which the application cannot be build, tested and delivered.
Non-code artifacts which are difficult (long, expensive) to generate, or whose generation is subject to license constraints.
Everything which is not a source code but associated with the code evolution (design documents, emails, memos, whatever...).
"Easily produced" means things that can be recreated automatically, for example by compiling from stored source code, or generating (postscript from a text document or diagrams). Not storing such items is important because they are generally larger than their sources (object code and postscript tend to be much larger than the source code or documents that they are generated from), so we pragmatically trade compiler and word processor CPU time for reduced storage space and (more significantly) reduced network transmission tiomes when the VCS repository is across the network.
Version control systems have a long history in software development. RTS and CVS are old UNIX-based systems that are still in use.
For further information if interested look at their manual pages on a Unix system: man cvs, man rcsintro, or websites for CVS, RCS .
More recently developed version control systems that are widely used include Subversion, Git, and Mercurial.
The general feature of all VCS is the existence of a repository as the central place where the master (unchangeable) copy of all versions of the project's files is held. VCS differ in a way how they store the revisions — they may be stored locally only (RCS), or on a remote server (CVS, Subversion); remote storage may be on a sinle computer, centralised (CVS, SVN) or distributed (Git). They also differ in the way they implement conflict resolution.
For ease of learning in this course we choose to use Subversion as an example. This is not necessarily the best choice for real world or larger scale projects, but it is sufficiently powerful and complex to be a useful tool and an intellectutual challenge for this computing course.
This section contains lightly edited extracts from Version Control with Subversion (Copyright © 2002, 2003, 2004, 2005, 2006, 2007, 2008 Ben Collins-Sussman, Brian W. Fitzpatrick, C. Michael Pilato) for the purpose of explanation.
Users (U1, U2, U3,...) who are working on the same project can access the repository at the same time (often using an authentication mechanism), and retrieve (read, check out) the required version of a module for a current work. During the work, the developers can modify the modules that they have checked out. Then they try to deposit (write, check in) it under a new version number back to repository. If more than one developer tries to access the repository simultaneously and retrieve the same version A, a conflict can ensue.
User U1 can modify (A to A') the module and check it in first, then the user U2 (without knowing that the updated version of the module has changed) modifies the original (working) version of the module (A to A''), and also tries to check it in. If the revisions in A' and A'' do not overlap, both modifications can co-exist and everything is OK. But if they do overlap, the modifications in A' will be lost, and the updated version will become A''. This is undesirable and must be avoided—it is not what the developers usually intend to be the result. Only one of the set of changes has been saved in the repository, and there are now two inconsistent versions of the file in the working copies and repository.

If this happened, all of User1's work would be lost when overwritten by User2.
Different VC systems handle this situation differently.
Strict locking (Lock-Modify-Unlock Solution)
Locking may cause administrative problems. User 1 can lock a file and then forget about it. User 2 meanwhile is still waiting to edit the file, his hands are tied. The situation ends up causing a lot of unnecessary delay and wasted time.
Locking may cause unnecessary serialization. When User 1 is editing the beginning of a text file, and User 2 only needs to edit the end of the same file? These changes don't overlap at all (no merger conflict). Both Users could easily edit the file simultaneously, and no great harm would come, assuming the changes were properly merged together. There's no need for them to take turns in this situation, where locking the whole file is too coarse a level of control.
Locking may create a false sense of
security. Consider the situation where User 1 locks and edits file A, while User 2
simultaneously locks and edits file B. But suppose that A and B depend
on one another, and the changes made to each are semantically
incompatible. This may result in new versions of file A and B that do not work together
anymore. The locking system was powerless to prevent the problem—yet it
somehow provided a false sense of security.


Subversion stores and retrieves files in directory subtree structures, reflecting part of the user's home filesystem organisation.
The repository file system is a regular file
tree
To get a working copy, you must check out some subtree of the repository. (The term “check out” doesn't mean locking resources, it simply creates a private (working) copy of the project in your working directory.)
Every item stored in the repository has an associated version number. By default, when a copy is committed to the repository the version number increases. An old version of the system can be retrieved by giving its version number, or the date and time before which it was committed.
Version control systems use different ways to number versions. Subversion uses Global Revision Numbering Scheme, whereby the entire repository tree is marked with a uniform revision number. Every file in a particular revision is labeled by the same token (a numeric number, a date, or a keyword) as the label for the whole project directory (even if some files have not been altered during a particular revision and are identical to the file from the previous revisions).
By contrast CVS uses a per-file numbering scheme, which is awkward, and may cause confusion.
The actual storage requirements and network traffic for copies of files are not as bad as this picture suggests. The repository's view of a new version is a virtual view: Subversion does not transmit or store a duplicate of any file that is unchanged. Even if a file is changed, most version control systems transmit and store only a list of changes to files in each version, having been designed with the expectation that changes are relatively small compared to the whole extent of the file. The performance of a version control system implementation may depend heavily on this design decision.
By default SVN checkout retrieves the latest version. To work with an earlier version the version number is specified in the checkout command.
For each file in a working directory, Subversion records two essential pieces of information in the .svn/ administrative area:
what revision your working file is based on (this is called the file's working revision), and
a timestamp recording when the local copy was last updated by the repository.
Given this information, by talking to the repository, Subversion can tell which of the following four states a working file is in (only one state at a time):
The file is unchanged in the working directory, and no changes to that file have been committed to the repository since its working revision. An svn commit of the file will do nothing, and an svn update of the file will do nothing.
The file has been changed in the working directory, and no changes to that file have been committed to the repository since its base revision. There are local changes that have not been committed to the repository, thus an svn commit of the file will succeed in publishing your changes, and an svn update of the file will do nothing.
The file has not been changed in the working directory, but it has been changed in the repository. The file should eventually be updated, to make it current with the public revision. An svn commit of the file will do nothing, and an svn update of the file will fold the latest changes into your working copy.
The file has been changed both in the working directory, and in the repository. An svn commit of the file will fail with an “out-of-date” error. The file should be updated first; an svn update command will attempt to merge the public changes with the local changes. If Subversion can't complete the merge in a plausible way automatically, it leaves it to the user to resolve the conflict.
The commands svn update and svn commit determine the state of the file by comparing the information about the working copy with information in the repository. These conmnands will take different actions depending on the state, as described above.
If one needs to know the state of the working copy the svn status command will show you the state of any item in your working copy. The status information is only derived from the local situation; it does not compare the working copy with the repository at all. svn status outpus each filename with a code letter beside it. The commonly useful codes are
The command svn status does not normally list the name of a file if it is up to date. Adding the "verbose" switch, as in svn -v status will cause svn to list the status of all files (a missing code letter means "not modified"), with the version number for those that are under version control. Again, it does not determine whether the file is actually up to date against the repository.
SVN can be used to create sidebranches of development and enables them to be kept separate, or merged back together at a later time. See the tutorial documentation for details if you are interested.
The main line of development is kept in trunk,
from which one can grow branches of lines of
development that exists independently of another line, yet still shares
a common history if you look far enough back in time.
Your assignment
projects
will be performed through branches which you will grow from the
original (prototype) code deposited in the main project trunk.
The SVN documentation has a very clear description of the
Basic Work Cycle for using Subversion.
This describes what you should do in your working sessions to ensure
that your working copy and repository copy remain synchronised and can
be used by other team members, if you are working with other
people.
It is important to understand this work cycle and how it manages conflict resolution.
Add files, directories, or symbolic links.
svn checkout URL[@REV]... [PATH]
Check out a working copy from a repository (creates a working copy in your current directory).
svn commit [PATH...]
Send changes from your working copy to the repository.
svn list [TARGET[@REV]...]
List directory entries in the repository.
svn log URL [PATH...] or svn log [PATH...]
Display commit log messages.
svn merge sourceURL1[@N] sourceURL2[@M] [WCPATH]
svn merge sourceWCPATH1@N sourceWCPATH2@M [WCPATH]
svn merge -r N:M SOURCE[@REV] [WCPATH]
Apply the differences between two sources to a working copy path.
svn revert PATH
Undo all local edit.
svn status PATH
Print the status of working copy files and directories.
svn update PATH...
Update your working copy (replace all items with the most recent versions from the repository).
Subclipse is a plugin for the Eclipse integrated program development environment. Using subclipse automatically shows some of the states of your files, and makes the work cycle easier. But it is not all automatic, and the human has to be in the loop.
% svn --version Version control systems are coupled into modern editors and integrated development environments (such as emacs, eclipse). emacs is sensitive to the presence of CVS or RCS directories, and includes commands to check in and check out while you are editing files. Eclipse is sensitive to CVS, and comes with a plugin that directly connects with Subversion—we will try to use this later in the course.
One of the Graphic interface tools for subversion is Rapid SVN, http://rapidsvn.tigris.org/. There you can see the screenshot for major operations (viewing log, performing commit etc). You can download a version (binaries are available for three major platforms), the source code is available for you to develop). But first learn how to use it on the command line.

Copyright © 2006, 2008, 2010, Alexei Khorev and Chris Johnson, The Australian National
University
$Revision: 1.12 $ $Date: 2010/02/22 01:06:48 $ $Author: cwj $
Feedback & Queries to
comp2100@cs.anu.edu.au