Research Plan

Research:

Research topic (Tentative)

Information-Theoretic Foundations of Inductive Reasoning

Overview of research agenda

General Introduction

Inductive reasoning is the fundamental technique we use to make sense of the world. We observe the world, and when we recognise patterns in our observations, we conclude that those patterns may continue to appear in the future. That those patterns represent real objects and processes. When those patterns do continue to appear, we conclude that we have (probably) understood something about the world.

Inductive reasoning is applied in many contexts. We use it in our everyday lives, recognising objects and drawing simple or complex conclusions. All conclusions of science about the world are inductive in nature. Artificial intelligence, machine learning and data mining embody attempts to perform inductive reasoning with computers.

Numerical techniques for inductive reasoning (numerical modelling, machine learning, AI, ...) make many assumptions (and associated beliefs) about underlying processes or phenomena. A key motivator for this research is to make explicit and justify (or not!) the various assumptions (on various levels) that we use. That knowledge can then be used to understand the strength of our conclusions and identify areas (and possibly alternative approaches) where that strength is lacking.

Starting from fundamental principles (such as Occams razor and the belief that there is regularity in the world), I will asses and review the goals of inductive reasoning and the techniques used to achieve them, attempting to identify the (implicit and explicit) assumptions those techniques require.

When attempting to learn from real world data, we choose some models (and their embedded assumptions) that we believe may capture the underlying process(s). Two important questions are : “which models to choose?” (which assumptions to make?) and “how can I gauge the validity of our assumptions given available data?” This research is primarily concerned with the second question (though an understanding of the first is necessary to answer it).

There are two aspects to this question. First, in what sense do I mean validity: Predictive power of the resulting model? Some measure of distance from the “true” process to the chosen model class? Other criteria? Different tasks require different validity measures and each measure is justified by different assumptions and beliefs. There are theoretical limits to what can and can't be obtained from real world data, and feasibly calculated measures require some assumptions about the underlying processes in order to link them to our fundamental principles. The focus here is largely philosophical – an attempt to clearly identify the questions we wish to ask and what we need to believe for those questions to make sense.

The second aspect concerns techniques and approaches to measuring this validity. In a machine learning context, this endeavour is named “model assessment”, and is closely linked with “model selection”[Has09]. Model selection is concerned with selecting the 'best' (set of) models from a wider class, model assessment is concerned with assessing the quality of the selected model. The criteria used for model selection is generally a measure of model quality.

In practical terms, these measures are estimates of 'ideal' quality measures (those that achieve our chosen principles of quality). Generally, we must make some assumptions about the nature of the underlying process before we can say with any confidence that such estimates are good. For example, we hope to model a processes behaviour in all circumstances, and an ideal quality measure takes this into account. However we have no information about its behaviour outside our sampled data – our estimated quality is calculated solely from this data. We must assume that our models can, in some sense, accurately 'reach' unobserved circumstances from our observations. This assumption relates to both our models (a priori assumed) capacity to model the process, and that our data is 'representative'.

These issues have been studied from various viewpoints over a hundred years and more. As a foundation of my research and a starting point for potential novel ideas, I intend to survey treatments of these issues in the research literature and attempt to identify/develop a philosophically sound framework for (primarily numerical) inductive reasoning.

Universal Induction

The Bayesian approach is a well established and arguably sound approach to inductive reasoning. It tells us how we should include our prior beliefs into predictions and extrapolations. It notably fails to inform us about how we should formulate those beliefs in the absence of specific knowledge of a process – how we should construct 'objective' priors.

Addressing this problem, Solomonoff developed [Sol64] developed a theoretically optimal formulation of inductive reasoning. This involved the class of all (computable) models and an objective prior – inductive reasoning with this combination (Solomonoff or Universal induction) will outperform any other inductive system given sufficient time/data. Though universal induction is not computable, it can be approximated and can serve as a gold standard for assessing other approaches. Universal induction elegantly embodies Occams razor.

Using universal induction as a gold standard, I will attempt to develop a universal framework for model assessment and place existing model selection and assessment approaches in this context. The hope is that I will thus provide deeper understanding of the power and limitations of these approaches, and possibly identify novel adjustments and improvements. In particular, I hope to investigate MDL (minimum description length) as a universal criterion for model selection and measure of model quality.

My first ideas concerning model assessment originated in a MaxEnt (maximum entropy) and MEP (maximum entropy production) context. I hope to apply my understanding here in order to develop useful new tools for entropy based modelling.

Summary of Research Tasks and Possible Outcomes

Validity of assumptions and models
Measuring model validity

Potential research outcomes – specific

Milestones that I intend to achieve this year include:

Current Research Tasks

Tasks (in approximate chronological order):

Coursework:

As yet we have not identified courses that would be relevant to the research directions above.

Networking:

Outlined below are my current and intended interactions with the wider research community.

Collaboration:

I have been (and will continue) to discuss my ideas with:

Seminars:

Attendance at NICTA machine learning group research presentations. I may also attend other relevant seminars in Mathematics and/or Computer Science.

Conferences:

ALT/DS 2009 – Algorithmic learning Theory / Discovery Science – Porto, Portugal, 3-5 October 2009

MaxEnt 2010 – Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering – Grenoble, France – July 2010 (tba)

ICML 2010 – International Conference on Machine Learning – Haifa, Israel – June/July 2010

UAI 2010 – Conference on Uncertainty in AI – Haifa, Israel – June/July 2010

COLT 2010 – Conference on Learning Theory – Haifa, Israel – June/July 2010

UICS 2009 – Understanding Intelligent and Complex Systems – Targu Mures, Romania – 22-23 October 2009 – Deadline 31.August.2009

(NIPS – Neural Information Processing Systems – Vancouver, Canada – December 7-12 2009)

Teaching:

Tutoring for “Statisitcal Machine Learning” in first semester. No plans to tutor in second semester 2009.

Funding:

My personal expenses for the next 3 years (given adequate research progress) will be covered by the scholarships I have been awarded (University House Postgraduate Research Scholarship and ANU Supplementary Scholarship).

Expected research expenses:

Due to the largely theoretical direction of my research, I do not expect expenses for materials or other resources. It is possible that in later years, funding may be required for access to computer and/or data resources beyond that which the Computer Science Department can offer. I will pursue this at a later date as required.

Current secure and potential sources of funding for conferences and other travel travel include:

DCS postgraduate research student Professional Development Fund

$1600 per year

Primarily for conference travel, though possibly for study/research outside ANU.

Vice Chancellors HDR student conference travel grant

up to $1,500 per conference

Primarily for conference travel, usually when presenting at an A or A+ conference.

Vice Chancellors HDR student travel grant

up to $5,000

For overseas study/research. Eligible from March 12 2010

Conference volunteer

Reduced or waived conference fees

It is often possible to volunteer to help the smooth running of conferences.



1Here I mean “universal” in sense similar to the “universality” of the Solomonoff prior or Marcus Hutters Universal AI.

[Has09]T. Hastie, R. Tibshirani, & J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction; Ch 7 - Model Assessment and Selection, Springer New York, 2009, 219-259

[Sol64]R. J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1-22 and 224-254, 1964.