COMP2100/2500

Lecture 3: Version Control. Subversion

References

See references below.

Summary

An introduction to the basic concepts of version control systems (VCS);
different types of VCS are noted;
an introduction to the main operations of the Subversion VCS.

Aims


Version Control Systems

What is version control?

Version control (in the most general sense) is a practice of uniquely identifying, classifying, organizing for convenient storage and retrieval, all intellectual products of human activity which have more than transient value. In the software industry, version control allows us to store all revisions of everything which is being (electronically) written during development and maintenance of a software system, organised in chunks that are usually called modules. This obviously includes modules of program code (source files), but it also includes test cases, design documents in words and pictures, user documentation, testing, build tool, and other development scripts, automatic code generator input specification files, development history records (blogs, spreadsshets, wikis, or whatever other tools have been used). In the rest of thses notes we will concentrate on source code modules, but these other examples are also very important in software construction and development.

Different versions of a software module arise in at least three ways. Here are the most common of them:

  1. during software development, many modifications are made to modules, and each of the old versions may be useful, as well as the latest one;
  2. during software maintenance, multiple versions of a software module are produced in response to customer feedback (bug reports etc) and changing specification requirements;
  3. during design and implementation there may be a need for slight variations of the same software module at the same time, aimed at different releases, implementations that are ported to different platforms, and so on).

It is desirable for the development team or individual programmer to keep all the past and present versions of all modules, because

Keeping track of all these versions requires a tool— a Version Control System (VCS).


What Can a Version Control System Do?

A VCS is a software tool that organises an organised digital storage called a repository. The repository may be a set of files on the user's computer filesystem, or on a remote comnputer; the repository and its controlling software may be centralised on one computer or distributed across many computers. In general all VC systems:

The users typically control the VCS through a collection of commands or a standalone VCS application, or more often within an software development tool. A good software development environment (an IDE, such as eclipse) and the more powerful program text editors (such as emacs) have a built-in interface or plugin to integrate with one or more version control systems.


What should developers store in the VCS?

What should not be stored in the VCS?

Anything which can be easily produced in up-to-date form from the items stored under VCS should not generally be stored in the VCS repository.

"Easily produced" means things that can be recreated automatically, for example by compiling from stored source code, or generating (postscript from a text document or diagrams). Not storing such items is important because they are generally larger than their sources (object code and postscript tend to be much larger than the source code or documents that they are generated from), so we pragmatically trade compiler and word processor CPU time for reduced storage space and (more significantly) reduced network transmission tiomes when the VCS repository is across the network.


powerpoint diagrams



Examples of version control systems

Version control systems have a long history in software development. RTS and CVS are old UNIX-based systems that are still in use.

For further information if interested look at their manual pages on a Unix system: man cvs, man rcsintro, or websites for CVS, RCS .

More recently developed version control systems that are widely used include Subversion, Git, and Mercurial.

The general feature of all VCS is the existence of a repository as the central place where the master (unchangeable) copy of all versions of the project's files is held. VCS differ in a way how they store the revisions — they may be stored locally only (RCS), or on a remote server (CVS, Subversion); remote storage may be on a sinle computer, centralised (CVS, SVN) or distributed (Git). They also differ in the way they implement conflict resolution.

For ease of learning in this course we choose to use Subversion as an example. This is not necessarily the best choice for real world or larger scale projects, but it is sufficiently powerful and complex to be a useful tool and an intellectutual challenge for this computing course.


VC Conflict and conflict resolution

Quoted material

This section contains lightly edited extracts from Version Control with Subversion (Copyright © 2002, 2003, 2004, 2005, 2006, 2007, 2008 Ben Collins-Sussman, Brian W. Fitzpatrick, C. Michael Pilato) for the purpose of explanation.

Repository  and its users

Users (U1, U2, U3,...) who are working on the same project can access the repository at the same time (often using an authentication mechanism), and retrieve (read, check out) the required version of a module for a current work. During the work, the developers can modify the modules that they have checked out. Then they try to deposit (write, check in) it under a new version number back to repository. If more than one developer tries to access the repository simultaneously and retrieve the same version A, a conflict can ensue.

User U1 can modify (A to A') the module and check it in first, then the user U2 (without knowing that the updated version of the module has changed) modifies the original (working) version of the module (A to A''), and also tries to check it in. If the revisions in A' and A'' do not overlap, both modifications can co-exist and everything is OK. But if they do overlap, the modifications in A' will be lost, and the updated version will become A''. This is undesirable and must be avoided—it is not what the developers usually intend to be the result. Only one of the set of changes has been saved in the repository, and there are now two inconsistent versions of the file in the working copies and repository.

the following sections expand on the powerpoint diagrams above

Two users, no conflict control

If this happened, all of User1's work would be lost when overwritten by User2.

Different VC systems handle this situation differently.

Strict locking (Lock-Modify-Unlock Solution)            

strict locking session

  • Locking may cause administrative problems. User 1 can lock a file and then forget about it. User 2 meanwhile is still waiting to edit the file, his hands are tied. The situation ends up causing a lot of unnecessary delay and wasted time.

  • Locking may cause unnecessary serialization. When User 1 is editing the beginning of a text file, and User 2 only needs to edit the end of the same file? These changes don't overlap at all (no merger conflict). Both Users could easily edit the file simultaneously, and no great harm would come, assuming the changes were properly merged together. There's no need for them to take turns in this situation, where locking the whole file is too coarse a level of control.

  • Locking may create a false sense of security. Consider the situation where User 1 locks and edits file A, while User 2 simultaneously locks and edits file B. But suppose that A and B depend on one another, and the changes made to each are semantically incompatible. This may result in new versions of file A and B that do not work together anymore. The locking system was powerless to prevent the problem—yet it somehow provided a false sense of security.



    

Optimistic locking (Copy-Modify-Merge Solution)

No locking scheme

In this model, each User contacts the project repository and creates a personal working copy—a local reflection of the repository's files and directories. Users then work in parallel, modifying their private copies. Finally, the private copies are merged together into a new, final version. The version control system often assists with the merging, but ultimately a human being is responsible for making it happen correctly. The User 2 finishes and commits his changes first, and the repository file becomes A' (that version which User 2 created). When later User 1 tries to commit his modified copy A'', the repository will reject his modifications telling him that his file is out-of-date, and he must perform a merger. If modification files A' and A'' do not overlap (their diff  is empty)the merger goes without any problem.

   

Optimistic locking (continued)

Merger with conflict
If the modifications do overlap, this is a conflict, which needs to be resolved. This has to be done by the developers (human), the tools (computer) can not resolve a conflict. When the conflict situation is established, the repository prompts the User 1 to copy (update) the modified version A' from the repository, and compare his modifications  in A'' and those in A'. After negotiation between User 1 and User2 (which may involve someone else), the conflict is resolved, and special markers which the update process created in the mergers file A'+A'' are removed, and the result is presented in a version Awhich can now committed into the repository (and will be accepted because the markers are no longer there).

The quality of user communication (remember PSP!) is crucial here. If the developers communicate poorly, no system will detect a semantic conflict. Therefore, the strict locking will not prevent conflict, but may (and often does) reduces productivity.


 
 
 
 
 
 

Subversion VCS uses an optimistic locking mechanism.


Directory structures and files

Structure

Subversion stores and retrieves files in directory subtree structures, reflecting part of the user's home filesystem organisation.

The repository file system is a regular file tree

The repository filesystem

To get a working copy, you must check out some subtree of the repository. (The term “check out” doesn't mean locking resources, it simply creates a private (working) copy of the project in your working directory.)

Version numbering

Every item stored in the repository has an associated version number. By default, when a copy is committed to the repository the version number increases. An old version of the system can be retrieved by giving its version number, or the date and time before which it was committed.

Version control systems use different ways to number versions. Subversion uses Global Revision Numbering Scheme, whereby the entire repository tree is marked with a uniform revision number. Every file in a particular revision is labeled by the same token (a numeric number, a date, or a keyword) as the label for the whole project directory (even if some files have not been altered during a particular revision and are identical to the file from the previous revisions).

By contrast CVS uses a per-file numbering scheme, which is awkward, and may cause confusion.

Subversion numbering scheme

The actual storage requirements and network traffic for copies of files are not as bad as this picture suggests. The repository's view of a new version is a virtual view: Subversion does not transmit or store a duplicate of any file that is unchanged. Even if a file is changed, most version control systems transmit and store only a list of changes to files in each version, having been designed with the expectation that changes are relatively small compared to the whole extent of the file. The performance of a version control system implementation may depend heavily on this design decision.

Working with versions

By default SVN checkout retrieves the latest version. To work with an earlier version the version number is specified in the checkout command.

Keeping Track of Working Copies in the Repository

For each file in a working directory, Subversion records two essential pieces of information in the .svn/ administrative area:

  • what revision your working file is based on (this is called the file's working revision), and

  • a timestamp recording when the local copy was last updated by the repository.

Given this information, by talking to the repository, Subversion can tell which of the following four states a working file is in (only one state at a time):

state 1: Unchanged, and current

The file is unchanged in the working directory, and no changes to that file have been committed to the repository since its working revision. An svn commit of the file will do nothing, and an svn update of the file will do nothing.

state 2: Locally changed, and current

The file has been changed in the working directory, and no changes to that file have been committed to the repository since its base revision. There are local changes that have not been committed to the repository, thus an svn commit of the file will succeed in publishing your changes, and an svn update of the file will do nothing.

state 3: Unchanged, and out-of-date

The file has not been changed in the working directory, but it has been changed in the repository. The file should eventually be updated, to make it current with the public revision. An svn commit of the file will do nothing, and an svn update of the file will fold the latest changes into your working copy.

state 4: Locally changed, and out-of-date

The file has been changed both in the working directory, and in the repository. An svn commit of the file will fail with an “out-of-date” error. The file should be updated first; an svn update command will attempt to merge the public changes with the local changes. If Subversion can't complete the merge in a plausible way automatically, it leaves it to the user to resolve the conflict.

The commands svn update and svn commit determine the state of the file by comparing the information about the working copy with information in the repository. These conmnands will take different actions depending on the state, as described above.

If one needs to know the state of the working copy the svn status command will show you the state of any item in your working copy. The status information is only derived from the local situation; it does not compare the working copy with the repository at all. svn status outpus each filename with a code letter beside it. The commonly useful codes are

?
the file is not under SVN version control (it has not been svn added)
M
the file has been locally modified since it was last updated or committed
A
the file has been added but has not yet been committed; it will not appear in the repository until after svn commit.

The command svn status does not normally list the name of a file if it is up to date. Adding the "verbose" switch, as in svn -v status will cause svn to list the status of all files (a missing code letter means "not modified"), with the version number for those that are under version control. Again, it does not determine whether the file is actually up to date against the repository.

Trunks and branches

SVN can be used to create sidebranches of development and enables them to be kept separate, or merged back together at a later time. See the tutorial documentation for details if you are interested.

The main line of development is kept in trunk, from which one can grow branches of lines of development that exists independently of another line, yet still shares a common history if you look far enough back in time.
Your assignment projects will be performed through branches which you will grow from the original (prototype) code deposited in the main project trunk.

Subversion's trunk and branches

 


How do I use Subversion in practice?

The SVN documentation has a very clear description of the Basic Work Cycle for using Subversion.
This describes what you should do in your working sessions to ensure that your working copy and repository copy remain synchronised and can be used by other team members, if you are working with other people.

It is important to understand this work cycle and how it manages conflict resolution.

A Summary of Some Subversion Commands

svn add PATH ...

Add files, directories, or symbolic links.

svn checkout URL[@REV]... [PATH]

Check out a working copy from a repository (creates a working copy in your current directory).

svn commit [PATH...]

Send changes from your working copy to the repository.

svn list [TARGET[@REV]...]

List directory entries in the repository.

svn log URL [PATH...] or svn log [PATH...]

Display commit log messages.

svn merge sourceURL1[@N] sourceURL2[@M] [WCPATH]
svn merge sourceWCPATH1@N sourceWCPATH2@M [WCPATH]
svn merge -r N:M SOURCE[@REV] [WCPATH]

Apply the differences between two sources to a working copy path.

svn revert PATH

Undo all local edit.

svn status PATH

Print the status of working copy files and directories.


svn update PATH...

Update your working copy (replace all items with the most recent versions from the repository).

Tools: subclipse

Subclipse is a plugin for the Eclipse integrated program development environment. Using subclipse automatically shows some of the states of your files, and makes the work cycle easier. But it is not all automatic, and the human has to be in the loop.


screenshot of subclipse

References

  1. the SVN working cycle: Basic Work Cycle for using Subversion
  2. (Complete Subversion Manual: Version Control With Subversion the online reference book.
    to check which version your client is running use % svn --version


Advanced topic: GUI front end for Subversion

Version control systems are coupled into modern editors and integrated development environments (such as emacs, eclipse). emacs is sensitive to the presence of CVS or RCS directories, and includes commands to check in and check out while you are editing files. Eclipse is sensitive to CVS, and comes with a plugin that directly connects with Subversion—we will try to use this later in the course.

One of the Graphic interface tools for subversion is Rapid SVN, http://rapidsvn.tigris.org/. There you can see the screenshot for major operations (viewing log, performing commit etc). You can download a version (binaries are available for three major platforms), the source code is available for you to develop).  But first learn how to use it on the command line.

Rapid SVN

____________________________________________________

Copyright © 2006, 2008, 2010, Alexei Khorev and Chris Johnson, The Australian National University
$Revision: 1.12 $ $Date: 2010/02/22 01:06:48 $ $Author: cwj $
Feedback & Queries to comp2100@cs.anu.edu.au