Tom Leonard - The Life of a Bayesian Boy    

 
CHAPTER 3: UNIVERSITY COLLEGE LONDON
 

Sir Adrian Smith, FRS Dennis V. Lindley Philip Dawid Tony O'Hagan Mervyn Stone
 

 

After I’d confounded myself with a first, Anne Mitchell contacted the University of Glasgow, where I would have studied with David Silvey (what a chance missed!) but after I’d prevaricated about moving to Scotland, she sent me to University College London, on Gower Street, where I was accepted to study for an intensively difficult advanced level Masters and a Ph.D. in the first ever university Statistics department, created by Karl Pearson in 1911.

         My postgraduate supervisor Dennis V. Lindley was regarded as a world leader in Bayesian Statistics, having been converted to the faith by (the highly axiomatic and extremely eccentric) Jimmie Savage (whose religiously normative approach to expected utility had already been refuted by Maurice Allais) during a sabbatical to Chicago in 1954.

         Dennis held the premier statistics chair in Britain; when he moved from Aberystwyth in 1967, an observer said that it was as though a Jehovah’s Witness had been elected Pope. He’d succeeded Karl Pearson’s son Egon, after some detailed negotiations with the UCL chancellor, and the indomitable Florence David felt persuaded to make her move to Riverside permanent.

[While poor Florence didn’t even take to the normal distribution, she’d taught traditional applied statistics to a whole generation of UCL students, including Tony O’Hagan, and was well regarded. She seems to havc been seriously maligned by the Berkeley-school Californians, even though she was highly regarded at Riverside, and  there may have been a touch of homophobia in this. She later wrote to George Box, after one of his students had applied to her for an assistant professorship, saying that, ‘I had to explain to your student that we don’t have Bayesians at Riverside.’]

         Dennis, a former Cambridge don, took a dim view of over-officious university administrators and followed Sir Ronald Fisher, another predecessor at University College, in this highly laudable respect. However, Fisher was totally eccentric, and once
battered one of the ‘beefeaters’ guarding the quadrangle at UCL for being impolite to a woman companion who was trying to climb through a window. Dennis came across to me as a kindly, though superficially arrogant, man with a velvet glove but no iron fist. He encouraged me to try to solve my Ph.D. problem while studying for my Masters.

         Within a few weeks I was able to extend Dennis’s method for the estimation of several exchangeable means and variances (that he’d extended to M-group regression in Iowa City during a remunerative consultancy for the American College Testing program with Mel Novick) to simultaneous inference and shrinkage estimation for several binomial probabilities. I did this by employing logistic transformations and non-conjugate hierarchical prior distributions, and these devices lead to my very first paper in an international journal (Bayesian Methods for Binomial Data, Biometrika 1972).

 
 
Jack Good
 

         My friend Irving Jack Good (with whom I corresponded  about Alan Turing and their multinomial shrinkage estimators for cryptanalysis at Bletchley Park, which were instrumental in solving the Nazi codes) did not believe that my more general Logit/ First Stage Multivariate Normal Prior / hierarchical approach to the analysis of categorical data was sufficiently recognised by other Bayesians. Nevertheless, Alan Agresti and several other authors seem to think that it was a pioneering contribution, along with my external examiner’s (Patricia Altham’s) novel analysis of measures of association for 2x2 contingency tables. Indeed, many others have followed in my footsteps.

[See items (4) and (5) of CDC section for discussions of Jim Albert’s later contributions, and of my 1978 Imperial College short-course lecture notes]

In my Biometrika 1972 paper, I used my method to calculate shrinkage estimates for the pass rates at several different colleges. Philip. J. Smith of the Pacific Halibut Commission applied my approach to estimate the proportions of halibut in several different catches, and implemented various generalisations of my methodology.

[My logit/normal first-stage-prior approach to the analysis of categorical data was reviewed by Leonard and Hsu (1994) in Aspects of Uncertainty: A Tribute to D.V. Lindley (edited by Peter Freeman and Adrian Smith). We also reported an Empirical Bayes analysis I’d developed in about 1977 and relating to the simultaneous estimation of the parameters of several multinomial distributions, via a multivariate normal prior for the different sets of logits.

        When we applied the methodology to O-level data for 40 London high schools, we discovered that the posterior estimates of the grade rates smoothed the raw proportions in a highly complex fashion.

         Leonard and Novick (Journal of Educational Statistics, 1986) describe a further educational testing study in relation to another contingency table, that summarizes their Marine Corps data. Leonard and Hsu (Annals of Statistics, 1992) report an analysis of a portion of the Project Talent American High School Data, where the observations are raw scores.]

I shared my office at University College with Adrian F.M. Smith, a wonderfully inspirational and charismatic Adonis of a man who was to move on to a highly accomplished career[Don at Keble College Oxford, translator of the prestigious works of Bruno De Finetti, many successful Ph.D. students including Michael Goldstein and David Spiegelhalter,  Principal of Queen Mary College London, F.R.S., and a couple of top national leadership positions. Knighted in 2010], and with Daruish Haghighi-Talab, a Persian gentleman with an immense black beard, who studied road systems and was to become a Deputy Director of Official Statistics in Iran.

The skylight in our office was often left open. This was to become a bone of contention in 1977 after Peter Freeman had moved in, and when an over-diligent university administrator insisted that the skylight should be kept closed. During the kafuffle that ensued, Dennis Lindley retired at age 54, albeit with a generously increased pension. He toured the world with his wife Joan into his sixties and a generation of postgraduates missed out on his inspirational guidance.

Six other students studied for the Masters degree at University College at the same time as me, including my Glaswegian friends Ben Torsney and Jim McNicol, and a very nice man from Malaysia. We all took a core measure-theoretic course on Bayesian Inference from Phil Dawid, a brilliant junior lecturer, also out of Imperial College, who was just a year older than me and more recently became a professor at Cambridge.

         I was particularly impressed by Phil’s description of Alan Birnbaum’s Likelihood Principle, its easy justification via the Sufficiency and Conditionality Principles, and the way it sorted the sheep and the goats in statistical methodology. Despite objections by George Barnard and others, I still find the proof of Birnbaum’s 1962 theorem to be extremely convincing and not at
all tautologous.

[The Neyman-Fisher factorization theorem is the key to the whole issue. It is this theorem that introduces the key concept of likelihood into statistical inference, based upon purely frequency considerations, and Birnbaum applies it to an ingeniously constructed mixed experiment to extend its influence to two simple experiments that investigate the same unknown parameter. Birnbaum has been described as one of the most profound thinkers in Statistics ever, and he was a buddy of Adrian Smith. He
was however highly introspective, and took his own life in London in 1976. I have always empathised with him, particularly because of the way he was mistreated by other leading psychometricians during the 1960s. He was seriously anti-authoritarian and there
are some parallels between our life stories]

Phil fully generalised Ericson’s method for Linear Bayes estimation, and our homework was therefore light years ahead of the literature. (He later expressed his irritation at the alternative  procedure I used to quickly derive the estimates during the final exam, though that was in a special case.).

         Phil’s parameterization, using degrees of freedom and prior sample sizes, of the conjugate analysis for the linear model with unknown variance was also superbly simple. This parameterization does not appear to have been published until 1986 when J.J. Shiau (one of Grace Wahba’s Ph.D. students) successfully applied it to partial spline models after I’d included it on my Statistics 775 course in Madison.  

         The entire theory of linear splines can of course be regarded as a special case of the very straightforward Gaussian prior Bayesian paradigm, and I’ve never quite understood what all the fuss was about and why we need to explicitly refer to them at all, though it is important not to over-parameterize when sensibly modelling the prior mean value function and covariance kernel.

[an exponential quadratic prior kernel often works better than an autoregressive kernel since the posterior mean value function can then be infinitely differentiable. See Hsu and Leonard, Biometrika, 1997 where we used a semi-parametric multiple regression
and residual analysis to investigate a binary data set that correlated the mortality of mice with time of exposure to NO2 and degree of exposure. It confirmed John’s tenure at UCSB. Some of the statistical ideas originated from one of my 1982 MRC technical reports, which described my Bayesian approach to semi-parametric logistic regression. See also Raynor, O’Sullivan and Yandell (JASA, 1985)].

I thought that the correspondence between Bayes estimates and smoothing splines was established as early as 1970 by
Kimeldorf and Wahba, in a paper in Ann. Math. Statist. that was cited by 468 other authors. One of Grace’s students was much later quite irritating in the way he mimicked my Bayesian density smoothing techniques with non-linear smoothing splines, though he later referenced me quite generously.

         Many published spline techniques employ a cross-validation technique to empirically estimate a smoothing parameter called lambda. However such techniques usually either mimic or recursively modify Mervyn Stone’s pioneering cross-validation method published in JRSSB (1974, with Discussion) and JRSSB (1977).

All the 1970-71 Masters students at University College were expected to learn advanced probability theory, including convolution semi-groups and domains of attraction, from Feller Volume 2, but Dieter Girmes would come in for weeks on end, wave the book
at us, and tell us about his latest statistical consultancy. We therefore had to assimilate Feller largely on our own.

         Dennis Lindley taught me educational testing in the Princeton tradition, Markov decision processes (with stationarity
theorems that were later republished out of a Department of Decision Theory in Manchester!) from the book by Sheldon Ross, and Masanao Aoki’s stochastic control theory in all its glory. Mervyn Stone taught an option on Art Dempster-style multivariate analysis, with ellipsoids looking like spaceships and which made me feel like a space cadet. While we didn’t learn any real statistics or Berkeley-style asymptotics, this was a Masters degree to be reckoned with. (Derek Teather and I were awarded distinctions. I only say this to emphasise that I was extremely able at that stage in my life)

[During a trip to visit the Rev. Thomas Bayes’ grave in Bunhill Cemetery, Moorgate, the caretaker advised Mervyn Stone that
Bayes was responsible for getting rockets to the moon. This was doubtlessly because of Aoki’s applications of Bayes theorem to stochastic control theory.]

 
Thomas Bayes' family grave
 

After I’d solved my Ph.D. research problem, as initially stated, Dennis gave me a much more difficult, and long unsolved, problem, the Bayesian smoothing of histograms. He’d wanted to use an autoregressive prior for several years, but couldn’t decide on which parameterization to use.

         I’m still not quite sure how delighted  Denis really was when I solved the technical problem (including a tricky approximate
prior to posterior analysis) that very afternoon, while recovering from several lunchtime beers with my fellow students, by assuming an autoregressive prior process for a set of sequentially chosen multinomial log contrasts. The published version in Biometrika 1973 refers instead to the multivariate logits. Derek Teather has since pointed out that it would have been better to assume a second order, rather than first order, process for the logits. In 1986, Marti et al used similar methodology to develop a method for the clinical evaluation of lymphocytosis.

During the summer of 1971, I visited the American College Testing program (A.C.T) in Iowa City, stayed in the Students Union by the river for the first time, and wrote several A.C.T technical reports about my new approach to categorical data analysis.

         While I was there, I met Jim Dickey who was visiting the University of Iowa from Buffalo. He was deeply philosophical and perceptive, and gave me lots of encouragement. A long friendship would ensue. Jim gave a seminar to the University of Iowa Statistics Department, about the scientific reporting of Bayesian posteriors, which he thought should be tracked against the prior. and I was very impressed by Chairman Bob Hogg’s good humour during the discussion.

I was to visit A.C.T. again the next summer. While briefly collaborating with Jim Hickman, who was later to become Dean of Business in Madison, I told him about my new histogram smoothing method, which used an autoregressive prior process for the parameters of interest.  He and Bob Miller later applied similar ideas to Bayesian actuarial graduation in two impressive papers, and I was always very proud of this.

[In June 1972, Jim and his wife took me to see West Branch, Iowa, the birthplace of President Herbert Hoover. And now, in 2012, Jim has sadly passed away in Madison, Wisconsin, aged 79. He was regarded as a great man in the area of the statistics of actuarial science, and his life and career should be celebrated]

In 1984, Bob Miller and his student Bill Fortney were to express similar misgivings about the Lindley-Smith 1972 regression estimates to those that I discuss below. Bob and I sometimes had the same mindset about Bayesians who’re too religious.

During the academic year 1971-2, I was, as a fully fledged Ph.D. student, permitted into our inner sanctum, the University College Statistics academic staff common room, every day for afternoon tea. I found the conversations there to be extremely stimulating
and I still remember Mervyn Stone’s and Phil Dawid’s high quality humour, along with all of the Bayesian arrogance.

         Egon Pearson, the son of the great Karl Pearson, occasionally appeared among the Bayesians. Neil Please, who organised STATLAB, usually slipped off after a single cup of tea. 

         Egon was an impressively tall and wise-sounding old man with white hair. I also liked Rodney Brooks, who had developed the pioneering Bayesian theory of Experimental Design, and Peter Freeman for his unique perceptions of life in general.

[in 1976 Peter published a discussion paper in JRSSB about Alexander Thom’s megalithic yard. His Bayesian analysis of Thom’s prehistoric stone circle data was very well-received].

Meanwhile, Colin Stevens, an unsung hero who was visiting from I.C.I., was quietly developing a mixed model/ Kalman filter approach to Bayesian forecasting that he later published with the much-more-extrovert Jeff Harrison.

         During that year, I prepared and taught a thirty-hour lecture course at City University, following in the footsteps of Adrian Smith, and I received the princely sum of Ł140 for my efforts. I also played football in the intramurals and scored a few goals, and played ping pong with Tony O’Hagan. Other friendly Ph.D. students at the time included Abimbola Sylvester Young from Sierra Leone,
who was given his middle name by Jane Galbraith because she couldn’t pronounce his Christian name and was later a Chief of Statistics in Geneva, and Derek Teather who was to pursue an outstanding career in medical statistics. Jose Bernardo came later.

         A number of leading international academics visited the department for long periods while I was a student there. These included Jim Zidek, Jim Press and Jim Dickey. Jim Bondar visited informally, and hung his coat in the student common room.

 
 
Jim Zidek
 
 

Tom with his father and stepmother outside his house on Pickford Street,
Madison, Wisconsin in 1986.

 

Tom's inaugural lecture in 1996 as Chair of Statistics at the University of
Edinburgh.  Vice Chancellor Sir Stewart Sutherland and Dean Geoffrey
Boulton are in the foreground.

 

Tom with his co-author John Hsu and John's sons on Blackford Hill, Edinburgh
in 1997.  The King's Buildings are in the background.

 

 
 
 
 
  © Thomas Hoskyns Leonard, 2012 - 2013