Tom Leonard - The Life of a Bayesian Boy    

 
CHAPTER 4: THE UNIVERSITY OF WARWICK
 
Sylvia Richardson
 
 
Dr. A. O'Hagan   Dr. Tom Leonard
 

Dear Sir,

I find Dr. Winnifrith's comments (see last issue of "News and Opinion") about the staff/student ratio of the Department of Statistics to be somewhat misleading.  The quotation "there are lies, damned lies, and statistics" arises because of the large number of non-statisticians dabbling with figures when they don't know how to interpret them properly!

      Until 1975 we were one of the "small new developments" which Dr. Winnifrith is happy to excuse in his letter.  After the first intake of MORSE last year we have acquired an official staff/student ratio of 13.2:1 for 5 staff members and "roll-on" is likely to cause this to increase to about 20:1 by 1977/8 unless we obtain new staff appointments.

      We are all naturally unsympathetic towards redundancies in any department. However, universities can only retain their prestige by supporting important new developments rather than pursuing a "lame duck" policy.  New courses can only be developed properly if appointments in fringe or unsuccessful departments are frozen, or if their staff are offered administrative duties to help their overworked colleagues in successful and relevant departments.
 

Tom Leonard
Statistics

 

My very exciting times as a postgraduate student came to an abrupt end, largely because my wife was expecting our first baby. After being dined and wined by Jeff Harrison while he was still at I.C.I. in Cheshire, I was offered a lectureship at the University of Warwick, to start in September 1972. Before taking up the appointment I nevertheless visited Iowa City again in the forlorn hope of persuading A.C.T. to put my methodology into practice.

         However, in June 1972 I learnt from my mother-in-law that my daughter had just been born in Wolverhampton two months prematurely and was surviving in an incubator. I promptly flew back to England, and soon afterwards Professor Harrison invited me to take up my appointment at Warwick straightaway (with a generous starting salary of £1935 per annum that was two points above the bottom of the scale). University College London was therefore suddenly largely a lingering memory and I was out in the Styx.

[I returned to UCL in December 1973 for my oral, after my external examiner Patricia Altham had scrutinized my thesis. She asked lots of perceptive questions, and Dennis was keen to get her back on the train to Cambridge. My parents were absolutely delighted when I returned to Plymouth at Christmas as the family’s first doctor of philosophy]

Jeff Harrison and Mervyn Stone are doubtlessly the two most perceptively intelligent people I have ever met. Perhaps Jeff will forgive me for reporting that during my first week at the University of Warwick (it’s built over former farmland on the southern edge of the City of Coventry), he fell into an eight-foot pit while walking across the mudflats on a campus where the white tiles were still falling onto the students and the Napoleonic Chancellor Jack Butterworth treated the junior academic staff  like guinea pigs.

         Jeff redeemed himself during his second Statistics lecture at the university, by tossing a coin that landed on its edge on a shiny floor, and he celebrates this extremely unlikely event in his departmental history.  It was described as parapsychological by Alan Vaughan in his book Amazing Coincidences when the author was discussing my 1974 letter to The Times on the same topic. Soon after the coin tossing, Jeff was apprehended by Leamington Police on suspicion of smuggling forged bank notes from Belgium, but it was fortunately a question of mistaken identity and he was quickly released. 

Robin Reed, a fellow undergraduate at Imperial College, and a talented probabilist, though never one to take the credit, was instrumental in helping Jeff and I to found the Statistics Department at Warwick. Robin is still there and, after Jeff’s retirement in 2000, he became the longest serving member in the department. I was saddened to see that Jeff’s official history of the department gives Robin and myself an honourable mention (just after the coin tossing) but scarcely acknowledges anybody else.

      Jim Smith’s postscript to the official history is more generous but still fails to give more than passing credit to Keith Ord, Tony O’Hagan and Mike West each of whom moved on to high-flying careers at other universities (Penn State, Nottingham, and Duke), or to Sylvia Richardson. I always found the over-riding atmosphere in the department to be rather megalomaniacal, almost like a group psychosis.

         Nevertheless, I revelled in Jim’s company when he was a Ph.D. student, I enjoyed played squash with Tony, and I found Keith and his wife, the American statistician Janice Derr, to be extremely hospitable and enjoyed playing scrabble with them. Jim wrote an outstanding thesis on Bayesian Catastrophe Theory and the Kalman Filter, nurtured by Jeff Harrison and under the eagle-eyed gaze of the all-consuming pure mathematician Christopher Zeeman (since elevated to greatness) who put Jim and Jeff in touch with the cusps and manifolds developed by Renée Thom.

One of the problems facing Robin, Jeff and I was that the pure mathematicians regarded themselves as el supremos and all the other, supposedly inferior, branches of mathematics as student options. I therefore suggested formulating an alternative undergraduate degree in mathematics, and we developed a new degree called MORSE (Mathematics, Operational Research, Statistics and Economics) for undergraduates within our department. Robin Reed completed most of the spadework.

         We originally wanted to include a colon instead of a comma after Mathematics, but the pure mathematicians fought this tooth and nail in the Senate in the so-called ‘Battle of the Colon’. I deserve credit for describing MORSE as an integrated single honours degree, and for devising the L-shaped corridor competition for our first pamphlet. I recall interviewing countless numbers of school kids and giving them guided tours around our still-muddy campus. We quickly achieved an intake (our very own) of over thirty students a year, and MORSE and MMORSE have grown from strength to strength ever since.

An insightful student from Colombia called Isaac Dyner completed a Masters by research with me on a Bayesian topic. He is now a Professor of Operations Research at the University of Colombia.

         I published several further papers out of my 1971Masters and 1973 Ph.D. theses, including:
A. Bayesian Methods for Two-Way Contingency Tables (JRSSB , namely Journal of the Royal Statistical Society, Series B, 1975) that was published around the same time as Nan Laird’s Harvard Ph.D. thesis that addressed an Empirical Bayes approach to the same topic, which used her freshly developed version of the EM algorithm. While less general, Nan’s approach included superior, asymptotically consistent, estimators for the hyperparameters.

         My paper included an analysis of Karl Pearson’s 14x14 social mobility table concerning the association between the occupations of fathers and their sons. An application of a four-fold exchangeability model lead to reduction to a quasi-independence model, and a convincing fit. Both Dennis Lindley and Henry Daniels were impressed by my practical example, and Irwin Guttman, who attended my 1975 University of London seminar while visiting from Toronto, thinks that it was the best thing that I’ve ever done.

B. A Bayesian Approach to the Linear Model for Unequal Variances (Technometrics, 1975) The key idea here was to use a multivariate normal first stage distribution in the prior assessment, for the log-variances. After I taught this methodology to several animal science postgraduates attending my Statistics 775 graduate Bayesian course at Wisconsin during the1990s, it was applied by Tempelman, Foulley, Gianola and others to animal breeding (Rob Tempelman wrote his thesis on the topic before moving to Michigan State) and it is now an integral component of their literature. The time series special cases for the unreplicated model have been generalised and extended by several authors in econometrics, and my autoregressive process for the log-variances can be used to explain stochastically volatile data.

[I once advised Nick Polson, while we were drinking together, that I discovered stochastic volatility, and he laughed his head off! Nick was also a wonderful gossip.]

 
Nick Polson
 

 C. Some Alternative Approaches to Multi-Parameter Estimation (Biometrika, 1976). This paper quietly corrected my earlier work by incorporating better estimates for my hyperparameters leading to superior estimates for the first stage parameters, thus avoiding the ‘Lindley-Smith collapsing phenomenon’ (see below).

[Perhaps I should call this the Lindley-Smith-Leonard collapsing phenomenon since it had been a feature of four of my previous papers. However, I was simply following orders, and Lindley and Smith have never corrected their estimation routines or retracted the key claims in their 1972 paper. There’s still time to do so, and I think they should. Early in 2012, Dennis published a very lucid letter in RSS News&Notes at age 85 in which he compared Bayesianism with Darwinism]

Similar improvements are suggested in my paper A Bayesian Approach to the Bradley-Terry Model for Paired Comparisons (Biometrics, 1977), which was not part of my thesis work. However, the Biometrics paper was accepted by the editor Foster Cady primarily because he wished to publish Steve Fienberg’s numerical data relating to my ‘dominant and passive squirrel monkeys’ practical example. The approach can also be used to evaluate chess rankings, and the U.S. Chess Federation once showed some interest.

         I moreover proposed convincing hierarchical Bayesian estimators, and also preliminary test estimators, in one-way ANOVA and multinomial contexts in my JRSSB (1976, with Keith Ord) and JASA (1977) articles, and obtained novel critical values for the F and chi-squared statistics .

         Mervyn Stone gave me an unexpectedly bruising time as Editor of our JRSSB article while reducing it from a paper to a note. He subsequently contrasted our alternative to the F-test with his cross-validation procedures and with AIC, in his short paper in JRSSB (1977).

         Hirotugu Akaike later advised me that my critical values were just modifications of his magical number 2. Mervyn didn’t like my uniform priors for the variance components, but they seemed pretty convincing to me.

[Dennis Lindley had recently publicly recanted his previous advocacy of improper priors because he feared the Stone, Dawid and Zidek (JRSSB 1973) marginalization paradoxes. However this paradoxes only occur in pathological situations. Therefore, a general denunciation of improper priors wasn’t really appropriate.  Bayesian methods based on informative priors do frequently smooth away what the data are trying to tell you.]

My method for the simultaneous estimation of the parameters of several multinomial distributions, which employed a Dirichlet-Dirichlet distribution in the prior assessment, only made it to Communications in Statistics (1977) since I couldn’t justify the posterior approximations well enough the convince JRSSB.

         In my notes in Biometrika (1974) and Biometrika (1976, with Tony O’Hagan) I describe two different non-conjugate Bayesian estimation procedures for the location parameter of a normal distribution, which are readily generalisable to a broad range of models. When I first met up with Glen Meeden of the University of Iowa at Ames in 1978, he promptly identified me as ‘the guy who’d suggested that neat modification to the Bayes estimate for the mean of a normal distribution’.

In 1971, I’d made my first ever contribution to the discussion of a paper read to the Royal Statistical Society (see Lindley and Smith, Bayes Estimates for the Linear Model, JRSSB, 1972) that addressed M-group regression, and also ridge estimators for a single multiple regression. The authors made the phenomenal claim that their shrinkage estimators for M-group regression led to 75% improved efficiency when compared with least squares. Indeed, the arch-frequentist Robin Plackett (erroneously) conceded when proposing the vote of thanks that the Bayesian estimates doubled the amount of information in the data. This was to lead to much wider acceptance of the Bayesian approach, which until then had been regarded as too subjective.

I indicated in my contribution to the discussion that this methodology could be extended to the analysis of binomial, Poisson and multinomial data, using logistic, logarithmic and multivariate  logit transformations. I received a pat on my back from my supervisor for my efforts.

However, my career suffered an unfortunately debilitating setback in 1973 when I published a dippy written contribution to Bradley Efron’s and Carl Morris’s paper Combining Several Possibly Related Estimation Problems, which they’d read to the Royal Statistical Society. Indeed, I inadvertedly exposed a flaw in the Lindley-Smith 1972 approach (e.g. Dennis’s and Adrian’s M-group regression estimates with unknown variance components collapsed towards each other much too readily).

         Efron and Morris quite rightly jumped on all three of us during their published reply to the discussion and it became evident that the Lindley-Smith claim of 75% improved predictive efficiency was pie in the sky.

 
 
Bradley Efron   Carl Morris
 

It was not until 1984 that I learnt from Mel Novick in Iowa City that this brief written contribution, rather than personality issues (as suggested to me in 1981 by Bernie Silverman), was the real reason that my Ph.D. supervisor turned against me and became my ‘ripple from above’. (Dennis presumably thought that I was trying to expose himself and Adrian Smith on purpose. I suppose that I was, in subconscious terms, trying to get at the scientific truth. In those days I always thought backwards). In any case, Dennis’s negative attitude was to severely damage my career right into the 1990s, and even apparently influenced, via an indirect route, the tenure prospects of one of my former Ph.D. students. [Maybe it influenced my career until my early retirement in 2001]

         It was not until 1987, when I chatted with Adrian at an ASA conference in San Francisco, that I was finally able to untangle all the scientific problems surrounding his 1972 approach (see also his 1973 papers in Biometrika and JRSSB). Adrian told me, after we’d downed a few drinks, that when he’d computed his Bayesian estimates as an over-enthusiastic student, he’d usually stopped after the first step of the iterations (these yielded quasi-Empirical Bayes estimates rather than the joint posterior modes), rather than converging to the theoretical solution, since that gave the practitioners what they really wanted.

[One of Adrian’s former Ph.D. students once reported to me, during another drinking bout, that Adrian would blink quite persistently while his numerical iterations or MCMC simulations were wondering whether to converge. While this was doubtlessly just a flight of fancy, it did cause me to wonder a bit about Adrian.

             Gelfand and Smith projected a version of MCMC (Markov Chain Monte Carlo) into the Bayesian literature in 1990 and this led to an enormous cottage industry. MCMC can be excellent for computing the marginal posterior densities of parameters in parsimonious models (but only by taking expectations of appropriate unnormalised conditional posterior densities) but not for calculating the posterior expectations of unbounded functions of the parameters. It has tempted many scientists to make their models far too complex, in which case the convergence can be abysmal. If your model overfits the data, then MCMC is unlikely to converge at all well]

To add grist to the gander, the empirical validation of the Lindley-Smith estimates was flawed. It was performed at A.C.T. in Iowa City by three other authors, using Dennis’s earlier version of the estimates. This required a degrees of freedom prior parameter to be set to an arbitrarily small value.

         According to Paul Jackson (personal communication), who found the situation to be extremely amusing, he [and maybe his co-authors, Novick and Thayer] instead carefully fixed the degrees of freedom to a value that was large enough to preclude the devastating effect of the Lindley-Smith joint posterior mode collapsing phenomenon and to yield excellent apparent predictive efficiency. Therefore the apparent empirical validation of the Lindley-Smith estimates, if they had been properly calculated, was based upon something entirely spurious.

         These aspects were pursued by Irwin Guttman and his findings were published in JASA (Journal of the American Statistical Association), 1996, jointly with Sun, Hsu and myself; we showed, by using a limiting argument in a special case, that the Lindley-Smith estimates possess vastly inferior mean squared error problems [ whatever the values of their hyperparameters] when compared with ordinary least squares. Very similar problems hold for classical ridge regression i.e. the much-hyped Hoerl-Kennard ridge estimator is vastly inferior to least squares.

In 1976, I made a contribution to the discussion of the Harrison-Stevens Royal Statistical Society read paper on Bayesian Forecasting where I described how their approach could be extended to non-linear situations using appropriate parametric transformations. Jeff Harrison paid scant attention to these ideas at the time, but Mike West took up the cudgel with some very sound technical work and published them jointly with Jeff  e.g. in their 1997 book Bayesian Forecasting and Dynamic Models.

         Other Bayesian approaches to the forecasting of non-normal data are described in Ch.5 of my 1999 book Bayesian Methods, co-authored with John Hsu, and I remember developing a multivariate forecasting package for proportions of world sales of fibres that made I.C.I. very happy, for a much-needed £700. It was later neatened up by Trevor Gazard and reported, by Jeff, to the 1977 Royal Statistical Society conference in Manchester, and published in the conference’s proceedings. The forecasts varied according to the specification of a discount parameter, but Jeff took care of that.

Later in 1977, I read a paper to the Royal Statistical Society in London entitled ‘Density Estimation, Stochastic Processes and Prior Information’. The vote of thanks was proposed by Peter Whittle and seconded by Bernie Silverman, and I was taken to dinner afterwards with the top brass of the society by its kindly president Professor John Kingman. The paper was well-received by a number of international scholars, and I felt that I had finally arrived.

 
Bernard Silverman F.R.S.
 

         In my paper I tackled the long-unsolved problem of the non-linear prior-informative smoothing of a univariate density, by using a logistic density transform and an Ornstein-Uhlenbeck Gaussian prior process for its derivative, and this was effectively equivalent to non-linear smoothing method for a non-homogeneous Poisson Process. My assumptions lead to a non-linear fourth order differential equation for the posterior estimates, which I converted into a Fredholm integral equation. The solution was doubtlessly a spline and that seemed to please Grace Wahba.

         I described several numerical examples, including an analysis of the flashing-green-man Pelican crossing data, and an analysis of Burch and Parson’s chondrite meteor data, that Jack Good later debated with me in JASA (1982).

         Because of complex measure theoretic problems, I’d used a prior likelihood, rather than a strictly Bayesian approach, and the more religious Bayesians reacted to this technical detail with comical negativity. None of the Bayesian establishment turned up, and I’m still disappointed that they didn’t contribute to the discussion.

[Fully Bayesian versions of this approach were later published in a series of papers (e.g. Biometrika 1991) by Peter J. Lenk of the University of Michigan School of Business, and by Daniel Thorburn of the University of Stockholm, with a variety of applications. Peter earned his tenure on the basis of these ideas. His 1984 Ph.D. thesis 'Bayesian non-parametric predictive distributions' (supervised by Bruce Hill) employed similar assumptions and won the 1985 Savage Prize.]

 
Peter Lenk
 

By this time, the departmental secretary at Warwick was feeling dissuaded from typing my papers and my name was being omitted from most of the department’s advertising. I was of course very perplexed as to why this was happening, since I had already made immense teaching, administrative and research contributions to the department, and was regarded, e.g. by Jeff Harrison, as a relatively nice person.

Perhaps, in hindsight, it was a case of misdirected homophobia or maybe it was just a ripple from above. 

         I was in any case quite relieved when, following a visit to Warwick by Tom Stroud, his colleague Louis Broekhoven, the Director of the Statlab at Queen’s University, Kingston, Ontario, invited me to work with him and Dr. Jim Low of Kingston General Hospital for the first semester of my sabbatical year (1978-79).

[Louis, who was Irish-Dutch-Canadian, had once worked in a group of postgraduates at UCL grinding out asymptotic expansions for Florence David, and he was a fine applied statistician too. He was promoted to full professor on the strength on his work with me.]

 
 
Isaac Dyner
 
 
 
 
  © Thomas Hoskyns Leonard, 2012 - 2013