Lex Weaver and Nigel Tao. The Variance Minimizing Constant Reward Baseline for Gradient-Based Reinforcement Learning. Technical report, Department of Computer Science, Australian National University, May 2001 (being updated). [PDF][Postscript]
Lex Weaver and Nigel Tao. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning. In Uncertainty in Artificial Intelligence: Proceedings of the Seventeenth Conference (2001), University of Washington, Seattle WASHINGTON, August 2-5 2001, Morgan Kaufman Publishers, San Francisco CALIFORNIA , ISBN 1-55860-800-1, pages 538-545. [PDF] [Postscript]
Lex Weaver and Jonathan Baxter. STD(lambda): learning state differences with TD(lambda). In Proceedings of the Post-graduate ADFA Conference on Computer Science 2001 (PACCS'01), ADFA Monographs in Computer Science Series (1), ISBN 0-7317-0507-6, Canberra ACT, July 14 2001, pages 63-70. [PDF] [Postscript]
Nigel Tao, Jonathan Baxter, and Lex Weaver. A Multi-Agent, Policy-Gradient approach to Network Routing. In ICML 2001: 18th International Conference on Machine Learning, ISBN 1558607781, Morgan Kaufmann Publishers, Williamstown MA, July 2001. [PDF] [Postscript]
Jonathan Baxter, Peter Bartlett, and Lex Weaver. Experiments with Infinite-Horizon, Policy-Gradient Estimation. In JAIR (Journal of Artificial Intelligence Research), November 2001, Vol. 15, pages 351-381, ISSN 1076-9757. [PDF][Postscript]
Jonathan Baxter, Andrew Tridgell, and Lex Weaver. Learning to Play Chess Using Temporal-Differences. In MACHINE LEARNING, ISSN 0885-6125, Vol. 40 No. 3, September 2000, pages 243-263. [PDF][Postscript]
Jonathan Baxter, Lex Weaver, and Peter Bartlett. Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments. Technical report, CSL, Australian National University, 1999. [PDF][Postscript]
Lex Weaver and Jonathan Baxter. Reinforcement Learning From State and Temporal Differences. Technical report, Department of Computer Science, Australian National University, May 1999 (updated September 1999). [PDF][Postscript]
Jonathan Baxter, Andrew Tridgell, and Lex Weaver. KnightCap: A chess program that learns by combining TD(lambda) with game-tree search. In MACHINE LEARNING Proceedings of the Fifteenth International Conference (ICML '98), ISBN 1-55860-556-8, ISSN 1049-1910, Madison WISCONSIN, July 24-27 1998, pages 28-36. [CoRR: cs.LG/9901002]
J. Baxter, A. Tridgell, and L. Weaver. Experiments in Parameter Learning using Temporal Differences. In the ICCA JOURNAL (Journal of the International Computer Chess Association), ISSN 0920-234X, Vol. 21 No. 2, June 1998, pages 84-99. (The published version has a different format to the postscript and PDF versions provided here, but the content is the same.)[PDF][Postscript]
Jonathan Baxter, Andrew Tridgell, and Lex
Weaver. TDLeaf(lambda): Combining Temporal Difference Learning with
Game-Tree Search. Australian Journal of Intelligent Information
Processing Systems, ISSN 1321-2133, Vol. 5 No. 1, Autumn 1998,
pages 39-43 (invited paper).
Also in the Proceedings of the Ninth
Australian Conference on Neural Networks (ACNN'98), Brisbane QLD,
February 1998, pages 168-172. [PDF][Postscript][CoRR: cs.LG/9901001]
J. Baxter, A. Tridgell, and L. Weaver. KnightCap : A chess program that learns by combining TD(lambda) with minimax search. Technical Report, Department of Systems Engineering, Australian National University, November 1997, 16 pages. [PDF][Postscript]
L. Weaver and T. Bossomaier. Evolution of Neural Networks to Play the Game of Dots-and-Boxes. In Artificial Life V: Poster Presentations, May 16-18 1996, pages 43-50. [PDF][Postscript][CoRR: cs.NE/9809111]
Lex Weaver. Design and Evaluation of Mechanisms for a Multicomputer Object Store. Unpublished Honours thesis. November 1994, 134 pages. [PDF][Postscript][DVI][CoRR: cs.DC/0004010]
Lex Weaver and Chris Johnson. Pre-fetching tree-structured data in distributed memory, Proceedings of the Third Fujitsu Parallel Computing Workshop, pages P1-L-1 to P1-L11, Kawasaki, Japan, November 1994. Fujitsu Laboratories Ltd. [PDF][Postscript][CoRR: cs.DC/9810002]
Lex Weaver and Andrew Lynes. Sorting Integers on the AP1000. Unpublished project report. May 1994, 23 pages. [PDF][Postscript][CoRR: cs.DC/0004013]
2001 July 14, STD(lambda): learning state differences with TD(lambda), presentation at the Post-graduate ADFA Conference on Computer Science 2001 (PACCS'01), Canberra ACT. [Microsoft Powerpoint][PDF][Postscript]
1998 August 5, Looking into TD(lambda) with Function Approximation, seminar given to Adaptive Network Laboratory, Department of Computer Science, University of Massachusetts, Amherst, Ma USA. [Microsoft Powerpoint (doesn't include graphs)]
1998 February 13, TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search, presentation at the Ninth Australian Conference on Neural Networks (ACNN '98), Brisbane QLD. [Microsoft Powerpoint]
1996 March 22, Evolving Neural Networks with Strategic Intelligence, invited seminar given to the School of Information Technology at Charles Sturt University (CSU) - Bathurst Campus. [Postscript]