Information-Theoretic Foundations of Inductive Reasoning
Inductive reasoning is the fundamental technique we use to make sense of the world. We observe the world, and when we recognise patterns in our observations, we conclude that those patterns may continue to appear in the future. That those patterns represent real objects and processes. When those patterns do continue to appear, we conclude that we have (probably) understood something about the world.
Inductive reasoning is applied in many contexts. We use it in our everyday lives, recognising objects and drawing simple or complex conclusions. All conclusions of science about the world are inductive in nature. Artificial intelligence, machine learning and data mining embody attempts to perform inductive reasoning with computers.
Numerical techniques for inductive reasoning (numerical modelling, machine learning, AI, ...) make many assumptions (and associated beliefs) about underlying processes or phenomena. A key motivator for this research is to make explicit and justify (or not!) the various assumptions (on various levels) that we use. That knowledge can then be used to understand the strength of our conclusions and identify areas (and possibly alternative approaches) where that strength is lacking.
Starting from fundamental principles (such as Occams razor and the belief that there is regularity in the world), I will asses and review the goals of inductive reasoning and the techniques used to achieve them, attempting to identify the (implicit and explicit) assumptions those techniques require.
When attempting to learn from real world data, we choose some models (and their embedded assumptions) that we believe may capture the underlying process(s). Two important questions are : “which models to choose?” (which assumptions to make?) and “how can I gauge the validity of our assumptions given available data?” This research is primarily concerned with the second question (though an understanding of the first is necessary to answer it).
There are two aspects to this question. First, in what sense do I mean validity: Predictive power of the resulting model? Some measure of distance from the “true” process to the chosen model class? Other criteria? Different tasks require different validity measures and each measure is justified by different assumptions and beliefs. There are theoretical limits to what can and can't be obtained from real world data, and feasibly calculated measures require some assumptions about the underlying processes in order to link them to our fundamental principles. The focus here is largely philosophical – an attempt to clearly identify the questions we wish to ask and what we need to believe for those questions to make sense.
The second aspect concerns techniques and approaches to measuring this validity. In a machine learning context, this endeavour is named “model assessment”, and is closely linked with “model selection”[Has09]. Model selection is concerned with selecting the 'best' (set of) models from a wider class, model assessment is concerned with assessing the quality of the selected model. The criteria used for model selection is generally a measure of model quality.
In practical terms, these measures are estimates of 'ideal' quality measures (those that achieve our chosen principles of quality). Generally, we must make some assumptions about the nature of the underlying process before we can say with any confidence that such estimates are good. For example, we hope to model a processes behaviour in all circumstances, and an ideal quality measure takes this into account. However we have no information about its behaviour outside our sampled data – our estimated quality is calculated solely from this data. We must assume that our models can, in some sense, accurately 'reach' unobserved circumstances from our observations. This assumption relates to both our models (a priori assumed) capacity to model the process, and that our data is 'representative'.
These issues have been studied from various viewpoints over a hundred years and more. As a foundation of my research and a starting point for potential novel ideas, I intend to survey treatments of these issues in the research literature and attempt to identify/develop a philosophically sound framework for (primarily numerical) inductive reasoning.
The Bayesian approach is a well established and arguably sound approach to inductive reasoning. It tells us how we should include our prior beliefs into predictions and extrapolations. It notably fails to inform us about how we should formulate those beliefs in the absence of specific knowledge of a process – how we should construct 'objective' priors.
Addressing this problem, Solomonoff developed [Sol64] developed a theoretically optimal formulation of inductive reasoning. This involved the class of all (computable) models and an objective prior – inductive reasoning with this combination (Solomonoff or Universal induction) will outperform any other inductive system given sufficient time/data. Though universal induction is not computable, it can be approximated and can serve as a gold standard for assessing other approaches. Universal induction elegantly embodies Occams razor.
Using universal induction as a gold standard, I will attempt to develop a universal framework for model assessment and place existing model selection and assessment approaches in this context. The hope is that I will thus provide deeper understanding of the power and limitations of these approaches, and possibly identify novel adjustments and improvements. In particular, I hope to investigate MDL (minimum description length) as a universal criterion for model selection and measure of model quality.
My first ideas concerning model assessment originated in a MaxEnt (maximum entropy) and MEP (maximum entropy production) context. I hope to apply my understanding here in order to develop useful new tools for entropy based modelling.
Identify the contexts in which we wish to perform inductive reasoning and identify the fundamental principles and other assumptions that we make in those contexts.
Review and establish philosophically sound criteria for assessing the validity of a (collection of) models of a process, given some data obtained from that process.
Review and establish potential equivalences between different criteria, and in what conditions they may be equivalent.
Identify the theoretical limits to measuring model quality, the practically calculable approximations to ideal quality measures, and the assumptions required for those approximations to be valid.
Establish a “universal”1 and theoretically sound framework for measuring model validity.
Determine to what extent the MDL (Minimum Description Length) principle achieves this.
Apply this framework to existing model selection techniques such as MDL and cross validation.
Based on this framework, develop new techniques for assessing model validity in an ME (Maximum Entropy) or MEP (Maximum Entropy Production) framework and/or develop improvements of existing techniques.
Milestones that I intend to achieve this year include:
Write and publish a survey paper (see “Current Research Tasks” below).
Establish a firm grounding in information theory and model selection techniques.
Write a detailed literature review and concrete research plan (a PhD requirement).
Tasks (in approximate chronological order):
Background reading (see http://drevicko.wikidot.com/reading-list for details - Please contact me for access to this site: ian.wood@anu.edu.au):
Information theory: A deep understanding of information theoretic concepts and techniques is necessary for this research.
Model selection: A thorough review of existing literature in this area. Knowledge gained in this review will be instrumental in determining further research directions.
AI and Reinforcement Learning: This is the context in which much work on model selection and inference has been developed.
Maximum Entropy and related areas: Necessary for application of fundamental theoretical results in ME and related contexts.
Some concrete problems:
Several small, concrete problems related to universal induction have been identified. Solving or attempting to solve these will help establish my technical problem solving abilities (see below).
We expect such exercises to continue in line with the current background reading topics.
Attempts at larger, apparently more difficult problems will be made. Care will be taken not to allow these attempts to dominate my time. Below are two such problems identified at this time. Please see the attached pdf for more details about these problems.
Can we recover a sequence x from it's probability under the Solomonoff (universal) prior?
Find calculable approximations to the universal prior for Bernoulli sequences.
Potential (publishable) survey-like writing tasks:
Describe one philosophical problem on induction from a Solomonoff perspective.
Survey on model selection
A guide how to apply AIT in Machine Learning
As yet we have not identified courses that would be relevant to the research directions above.
Outlined below are my current and intended interactions with the wider research community.
I have been (and will continue) to discuss my ideas with:
Marcus Hutter (primary supervisor)
Roderick Dewar (advisor – ME/MEP for earth system modelling)
Peter Sunehag (advisor – mathematical expertise)
Jay Larson (CS dept adjunct, Argonne university – climate modelling)
Robert Niven (UNSW@ADFA – theory and applications of ME/MEP)
Attendance at NICTA machine learning group research presentations. I may also attend other relevant seminars in Mathematics and/or Computer Science.
ALT/DS 2009 – Algorithmic learning Theory / Discovery Science – Porto, Portugal, 3-5 October 2009
MaxEnt 2010 – Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering – Grenoble, France – July 2010 (tba)
ICML 2010 – International Conference on Machine Learning – Haifa, Israel – June/July 2010
UAI 2010 – Conference on Uncertainty in AI – Haifa, Israel – June/July 2010
COLT 2010 – Conference on Learning Theory – Haifa, Israel – June/July 2010
UICS 2009 – Understanding Intelligent and Complex Systems – Targu Mures, Romania – 22-23 October 2009 – Deadline 31.August.2009
(NIPS – Neural Information Processing Systems – Vancouver, Canada – December 7-12 2009)
Teaching:
Tutoring for “Statisitcal Machine Learning” in first semester. No plans to tutor in second semester 2009.
Funding:
My personal expenses for the next 3 years (given adequate research progress) will be covered by the scholarships I have been awarded (University House Postgraduate Research Scholarship and ANU Supplementary Scholarship).
Expected research expenses:
Conference Fees and Travel
Possible overseas study and/or collaboration.
Due to the largely theoretical direction of my research, I do not expect expenses for materials or other resources. It is possible that in later years, funding may be required for access to computer and/or data resources beyond that which the Computer Science Department can offer. I will pursue this at a later date as required.
Current secure and potential sources of funding for conferences and other travel travel include:
|
DCS postgraduate research student Professional Development Fund |
$1600 per year |
Primarily for conference travel, though possibly for study/research outside ANU. |
|
Vice Chancellors HDR student conference travel grant |
up to $1,500 per conference |
Primarily for conference travel, usually when presenting at an A or A+ conference. |
|
Vice Chancellors HDR student travel grant |
up to $5,000 |
For overseas study/research. Eligible from March 12 2010 |
|
Conference volunteer |
Reduced or waived conference fees |
It is often possible to volunteer to help the smooth running of conferences. |
1Here I mean “universal” in sense similar to the “universality” of the Solomonoff prior or Marcus Hutters Universal AI.
[Has09]T. Hastie, R. Tibshirani, & J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction; Ch 7 - Model Assessment and Selection, Springer New York, 2009, 219-259
[Sol64]R. J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1-22 and 224-254, 1964.