Welcome to my Computer Science Home Page. This page outlines work as part of a PhD within the Research School of Computer Science at the Australian National University.

Currently this page describes work on application of natural language processing to contracts. The scope of proposed research is now extended to include also research on tools for reading and writing 'legal rules' in general (e.g. in contracts or legislation). The research has grown out of experience in contract drafting and considering the needs for practical software tools that drafters face. Currently, little is freely available that might provide real world assistance to the drafting task. Contract drafting is a significant function undertaken by lawyers and others that assists organisations and individuals to define their relationships and implement collaborative projects. While many contracts are completely standard, often contract drafting will involve creating new and complex documents from scratch or by adaptation from existing texts. Available tools to assist drafters to detect and address ambiguity and errors in the drafts they are creating are limited.

Michael Curtotti

This page will progressively update information and materials relating to this research. For further information contact Michael DOT Curtotti at ANU DOT EDU DOT AU

Research Papers

Curtotti, M. and McCreath, E. Enhancing the Visualization of Law October 2012 20th Anniversary Conference of Law via the Internet - Cornell University

Curtotti, M. and McCreath, E. A Corpus of Australian Contract Language 2011 Thirteenth International Conference on Artificial Intelligence and the Law June 2011

Curtotti, M. and McCreath, E. Corpus Based Classification of Text in Australian Contracts Australasian Language Technology Association Workshop 2010, p 18

The Australian Contract Corpus

The work reported on this site depends heavily on a corpus of Australian contract drafts which have been downloaded and compiled from the web. The corpus consists of approximately 1,000,000 words and is comprised of around 250 text versions of contracts and contract drafts. If you are undertaking research at a research institution and would like to obtain a text copy of the corpus for research purposes, please contact:

michael.curtotti [AT] anu.edu.au

The corpus will only be made available for research and will not be made available for re-publication or commercial purposes. Each user is responsible for compliance with fair use rights in their jurisdiction. ANU does not claim or hold copyright in the individual documents comprising the corpus.

Citing the corpus

If you do use the corpus or a corpus based on the Australian please cite:

Curtotti, M. and McCreath, E. A Corpus of Australian Contract Language 2011 Thirteenth International Conference on Artificial Intelligence and the Law

College of Engineering and Computer Science, Australian National University

Research Papers and Associated Materials

This section documents research papers published as part of this research, and provides associated data and tools, where relevant.

Michael Curtotti and Eric McCreath, Enhancing the Visualization of Law presented at the 2012 Twentieth Anniversary Law via the Internet Conference (LVI2012). See LVI 2012.

Michael Curtotti and Eric McCreath, A Corpus of Australian Contract Language presented at the 2011 Thirteenth International Conference on Artificial Intelligence and the Law. See proceedings at http://dl.acm.org/citation.cfm?id=2018358&picked=prox

Slides of presentation prepared by Eric McCreath.

Michael Curtotti and Eric McCreath, Corpus Based Classification of Text in Australian Contracts presented at the Australasian Language Technology Association Workshop 2010 (ALTA2010). See ALTA 2010 workshop proceedings.

Associated Data

Data used in the research reported in the above paper is made available as an archive at the following link

Corpus Based Classification Research Data

Research Code

The following code is made available for research use only. If you identify any bugs, please let us know.

Line Tagging File

The following file has been used to undertake research on classification of "lines" in contracts. Each line is tagged according to a classification system which extracts key data and meta data from a contract. The accuracy of the code was assessed against methods based on machine learning. Line Tagger Python File 

Feature Extractor

The following file has been used to undertake research on classification of "lines" in contracts. The file extracts "features" which are used in machine learning. The feature set includes features such as line length, line position, frequency of parts of speech, occurrence of specific terms having significance in contracts. The feature extractor takes as input files which have been tagged using the tagging file above. Feature Extractor Python File 

Corpus Utilities

The following file provides a series of corpus utilities (building on top of NLTK and MontyLingua classes). The file provides methods for extracting vocabularies, collocations, word frequency plots etc. Corpus Utilities Python File 

Other Work

Current Work: Current work is focussing on (a) an exploration of the use of definitions within contracts and the application of graphs to the identification of error or structural problems in a contract; and (b) enhancing the vizualisation of legislation.

Disclaimer and Copyright Notice

Materials provided on this site are subject to copyright and are made available for research and academic purposes only, including reproduction of research. Apart from such use, all rights are reserved. Contact the website owner for permission for other uses. Materials are made available without warranty of any kind and with specific notice that the material has been developed solely for research use and has not been designed or tested for use outside its original research context. Copyright in third party materials rests with the respective owners. Rights to use third party materials available through this site, rest with the respective copyright owners, and your own enquiries should be made as to fair use rights in your legal jurisdiction.

Last modified: 4 June 2012