Tom Leonard - The Life of a Bayesian Boy    

 
CHAPTER 5: SABBATICAL YEAR (1978-79)
 
   
Queen's University, Kingston, Ontario   The University of Michigan at Ann Arbor   The University College of Wales, Aberystwyth
 
 
Gregynog Hall   Hotel Las Fuentes, Alcossebre
 

I was made very welcome by my hosts at Queen’s University, and soon felt at ease in both social and academic terms. Jim Low provided me with a retrospective data set that reported a number of variables for over two thousand mothers and their babies, the first large data set that I’d ever tried to analyse. The problem was to find the variables that best predicted a low measure of the acidosis in blood samples taken from the umbilical cords of the babies during labour. Babies with their measure of acidosis below a certain threshold were said to suffer from ‘fetal metabolic acidosis’ and this could give rise to serious post-natal problems.

         I tried least squares multiple regression for a couple of months, but this invariably led to a remarkably poor fit, and values of R-squared alarmingly close to unity, as the data were so noisy. When everybody had given up on me, I therefore took the radical step for a Bayesian of resorting to data analysis. I tried splitting up the data into subsets according to different levels of gestational age. For each subset, I was subsequently inspired (divinely, or so I believed at the time) to split the data into three groups, corresponding to low, medium and high measures of acidosis. When I examined scatterplots of the birth-weights, I then discovered that, for each level of gestational age, the birth-weight distributions shifted sideways when moving along the three groups.

         Imagine my excitement! I modelled each of the birth-weight distributions using a skewed normal distribution (originally proposed by Edgeworth (JRSSB, 1899), and reinvented by Ralph Bradley and then myself) and maximum likelihood/moment estimators for the three parameters. After an empirical application of Bayes theorem, I was then able to plot the probability of low acidosis against birth-weight at each level of gestational age.

         With the exception of the overweight, overdue babies, the babies at greatest risk were those with low birth-weights, but when considered among babies of similar gestational age. No other variables seemed to affect this conclusion and I therefore refuted the medical folklore that was prevalent at the time; the three main predictors until then were totally useless, at least for the two thousand-or-so babies in that data set.

         About 20% of the babies in the entire data set suffered from fetal metabolic acidosis. I therefore also reported ‘cross-over points’, namely the birth-weights below which my probabilities of low acidosis exceeded 20%. Jim Low took a brief look at the cross-over points and made a remarkable observation by reference to already published critical values for ‘intra-uterine growth retarded’ babies. The critical values were almost identical to my cross-over points, at each level of gestational age! We therefore arrived at a simple and scientifically validated conclusion; the babies at highest risk of fetal metabolic acidosis could be predicted by ultra-sound since these were the intra-uterine growth retarded babies. I’d also learnt that probabilistic prediction, rather than point prediction, can work best for noisy data sets. (perhaps more time series boffins should take heed of this!)

Our conclusions were reported by Jim Low to a joint meeting of the American Obstetric and Gynaecological Societies and well-received, and they were published by several of us, with discussion, in the societies’ journal in 1983. When I presented them to a meeting of the Medical Section of the Royal Statistical Society in 1979, I started to receive some recognition in Britain as an all-round statistician. (Granville Tunnicliffe-Wilson, personal communication)

 
Dr. James Low
Obstetrics and Gynaecology, Kingston General Hospital
 

When I was at Queen’s, I gave a number of well-received seminars around Ontario, and I continued my life-long friendship with (the highly-supportive Bayesian) Irwin Guttman while visiting the University of Toronto.  He frequently visited his buddy Norman Draper at the University of Wisconsin.

[I visited Irwin at SUNY at Buffalo in 1992. We both helped his student Li Sun develop his thesis work on Laplacian approximations for random effects models, and Irwin later worked with me for several months at the University of Edinburgh] 

I also visited Ann Arbor, Michigan, in 1978 to participate in one of Arnold Zellner’s NSF-NBER Bayesian Inference in Econometric and Statistics seminars , and I met Steve Fienberg (for the first time since a conference on categorical data analysis in Newcastle in 1975 ), Morry De Groot, Seymour Geisser, Bruce Hill and several other leading American Bayesians.

         My talk on non-parametric Empirical Bayes and the Efron-Morris baseball batting example was well-received, and Arnold referred to it for long afterwards. My empirical Bayes procedures showed that the batters were divided into two groups, and that was probably because Efron and Morris had taken care to put two different types of batter into the same data set! I later published this analysis in Ann Inst Stat Math (1984).

 
   
Steve Fienberg   Arnold Zellner   Irwin Guttman
 

         My friendship with Bruce Hill was to continue for a number of years even though we disagreed strongly about Bayesian coherence. At the 1979 Valencia conference, he announced during his talk that, ‘I’m looking forward to hearing from Professor Leonard why it’s good to be a sure loser’, but that didn’t deter us.

While I was at Queen’s, I met up with the Dean of Engineering, David Bacon. David was one of George Box’s former students, and he wrote to George telling him about my successful medical data analysis. I was subsequently invited by Gouri Bhattacharyya to move to Madison, Wisconsin, where Norman Draper helped me to apply for my green card.  

         Despite my international successes, I received a torrid reception when I dropped by the University of Warwick early in 1979. I was therefore glad to retreat, albeit highly depressed, to the University College of Wales in Aberystwyth, where I spent the second semester of my sabbatical in the good company of Professor Jim Dickey, his wife Martha and his Welsh-speaking colleagues.

[This was Dennis Lindley’s and Owen Davies’s former stomping ground, and I met Owen several times and went mountain-walking with him in his old age. He was a magnificent applied statistician and experimental design man out of I.C.I., and a predecessor of George Box]

During that period, I stayed in a house previously owned by the historian R.F. Treharne and whiled away much of the time reading the classical detective novels that lined the walls in almost every room.

         I also took time to revise my philosophies of statistics, in the light of my good experiences with the Ontario fetal metabolic acidosis data. It finally dawned upon me that the Bayesian approach is quite incomplete because it requires the mathematical and probabilistic specification of a sampling model, and cannot usually be used to derive suitably meaningful models from the data.

         Moreover, Bayes factor methods for the comparison, or mixing, of several candidate models (or more recently when applied to forensic statistics) are usually quite worthless since they are subject to Lindley’s Paradox (see Dennis’s paper in Biometrika, 1957, that was published after his 1953 paper on unlikelihood but when he was still a diehard frequentist) and other curiously anomalous behaviour, including high sensitivity to perturbations in the prior distribution.

         If information criteria are used for model comparison, then Akaike’s A.I.C. is more convincing than B.I.C., which is based on a very asymptotic approximation to a Bayes factor. However, A.I.C. is only tenuously Bayesian. An alternative called D.I.C. has more recently been proposed by David Spiegelhalter and others, and this approximates A.I.C. While D.I.C usually works well it is not, despite its elegant appearance, strictly speaking Bayesian or justifiable via probabilistic arguments.

         It is consequently essential to separate inductive modelling, in relation to the data and the scientific or real-life background, from deductive inference and prediction, conditional on the choice of sampling model. George Box was at the time thinking along similar lines. Bayes would rule the roost if the sampling model were true. But, though some models are useful, most of them are either wrong or potentially inadequate.

Dennis Lindley visited Aberystwyth for several days during that period and gave a seminar. Given my views on parsimonious statistical modelling, we were no longer seeing at all eye to eye, and I concluded that Dennis was keeping his blinkers on. He was indeed at the time advocating  L.J. Savage’s amazing philosophy that ‘a model should be as big as an elephant’, which is still misleading the economics profession and runs counter to the more generally accepted concept of parameter parsimony.

[The problem with a large model that is too large is that its parameters can’t be well estimated from the data. Dennis always thought that Bayes took care of that. I prefer AIC and Jack Good’s ‘Occam’s razor’.

        In 1983, I advocated the Savage elephant philosophy, on Dennis’s behalf, to the Cincinnati meeting of the American Statistical Association, while reading Dennis’s contribution, in his absence, to the discussion of a paper by Carl Morris. I created further general amusement by describing Americans as both fascists and colonials, also on Dennis’s behalf]

While visiting Aberystwyth, Dennis kindly read a draft of my manuscript on fetal metabolic acidosis and concluded that my methodology amounted to ‘a good piece of data analysis, but not statistics’.  He also said that he always declined to consider any data set that couldn’t be analysed using a simple application of Bayes theorem.

 

 

16th August 2013: Please click on TONY O' HAGAN INTERVIEWS DENNIS LINDLEY for a very historical and illuminating Youtube video, This includes an account, at age 90, by Dennis of his time-honoured 'justifications' of the Bayesian paradigm, together with his totally unmerited attitude (since 1973) towards vague priors, including Sir Harold Jeffreys' celebrated invariance priors and Dennis's own student Jose Bernardo's much respected reference priors. I, quite frankly, find most of Dennis's opinions to be at best unfortunate and at worst completely ****** up, particularly in view of the highly paradoxical nature of the Savage axioms and the virtually tautologous properties of the De Finetti axioms as appropriately strengthened by Kraft, Pratt and Seidenberg (Ann. Math. Statist., 1959) and Villegas, Ann. Math. Statist (1964) [ See Fishburn (Statistical Science, 1986) for a discussion of the very complicated strong additivity and monotone continuity assumptions that are needed to imply countable additivity of the subjective probability measure]. His views on model construction demonstrate a lack of awareness of the true nature of applied statistics. He was however relatively recently awarded the Guy Medal in Gold by the Royal Statistical Society for his contributions.

Dennis also confirms how he encouraged Florence David to leave UCL for California (he'd previously been a bit more explicit to me about this) and, quite remarkably, says that he tried to arrange the early retirement of two of his colleagues at UCL for not being sufficiently Bayesian!! This was around the time that he was influencing a downturn in my career at the University of Warwick. Dennis's account of his own early retirement does not match what actually happened. According to Adrian Smith, Dennis was encouraged to retire after a fight with the administrators over the skylight in Peter Freeman's office.

 

 
 
24th August 2013: Since studying the Dennis Lindley interview, I have debated the relevance of the Savage and extended De Finetti axioms with Professor Peter Wakker on the ISBA website. As a spin-off of this correspondence, I was contacted by Deborah Mayo, a Professor of Philosophy at Virginia Tech, who has proposed some counterexamples to Allan Birnbaum's 1962 justification of the Likelihood Principle via the Sufficiency Principle and Conditionality Principle. Her work may be accessed by clicking on:
http://errorstatistics.com/2013/07/26/new-version-on-the-birnbaum-argument-for-the-slp-slides-for-jsm-talk/.

I leave it to the readers to decide this controversial issue for themselves. I always thought that Birnbaum's proof was elegantly simple and completely watertight, and it would be quite amusing if I was wrong on this key issue.


26th August 2013:  I have now heard from Peter Wakker that Evans, Fraser, and Monette (Canadian Journal of Statistics, 1986) claim that the Likelihood Principle is a direct consequence of the Conditionality Principle, and that the Sufficiency Principle is not needed at all. Phew! There is clearly lots of room for further discussion. Some serious mathematical issues need to be resolved.

 
26th August 2013:  A RESOLUTION OF AN OLD CONTROVERSY
 
Michael Evans of the University of Toronto has just advised me that the proof of Birnbaum's 1962 theorem is not mathematically watertight. It should be correctly stated as follows:

Theorem: If we accept SP and accept CP, and we accept all the 'equivalences' generated jointly with these principles, then we must accept LP.

He also proves:

Theorem: If we accept CP and we accept all the equivalences generated by CP then we must accept LP.
 
Furthermore, it is unclear how one justifies the additional hypotheses that are required to obtain LP.  Michael believes that Deborah Mayo's counterarguments are appropriate. History has been made!
 
Shucks, Dennis! Where does that put the mathematical foundations of the Bayesian paradigm? Both De Finetti and Birnbaum have misled us with technically unsound proofs. I should have listened to George Barnard in 1981.
 
While Professor Mayo's ongoing campaign against LP would appear to be wild and footloose, she has certainly shaken up the Bayesian Establishment.
 

Deborah Mayo

 
 

While I was at Aberystwyth, I was invited to participate in the annual Statistics at Gregynog, a rare honour. The speakers presented their papers in an archaic building with a croquet lawn, which Bradley Efron once described as ‘that nice country house just outside London’. I met Ralph Bradley, who’d just finished a nineteen-year stint as Head of Statistics at Florida State University. He explained the origins of the skewed normal distribution to me.

During this period, I received my eagerly anticipated offer from Gouri Bhattacharyya, the Chairman of Statistics at Wisconsin. One year (1979-1980) as a visiting Associate Professor (for the princely salary of $22000 that almost tripled my stipend at Warwick) followed by a permanent appointment as soon as my tenure could be finalised. Both appointments were initially half-time in Statistics and half-time in the Mathematics Research Center, which was housed on the edge of the campus in the fourteen-storey WARF building and funded by the US Army.

[MRC had been blown up in Sterling Hall during the Vietnam War when the protesting students were being chased with tear gas around the city]

 
The Queen and Castle pub in Kenilworth
 

I felt bad about leaving my recently-purchased house near Kenilworth Castle, but felt forced to do this because of the bizarre situation at the University of Warwick.

Indeed, in later years the influential Bayesian group there was dismantled when two world-leading Bayesians were denied their well-deserved promotions. It was only Jim Smith’s return from University College London that restored any sanity to the situation.

During my sabbatical semester in Aberystwyth, I was delighted to receive an invitation from Jose Bernardo (an outstanding practical Bayesian if ever there was one) to present a discussion paper during June 1979 at the first of the long series of Bayesian Statistics conferences to be organized by the University of Valencia.

         While I was preparing my Valencia conference paper ‘The roles of inductive modelling and coherence in Bayesian Statistics,’ my only objective was to discern scientific truth, rather than to attack the high priests of the Bayesian establishment. I however clarified in my mind that the De Finetti and Savage axiom systems, which were supposed to justify Bayesian inference and decision-making, were at best tautologous with their specific theoretical conclusions and at worst downright misleading.

         Moreover, to insist that a statistician should be ‘coherent’ and Bayesian, when choosing his sampling model in relation to the scientific background, was totally out of line, as well as completely impractical. The ‘sure thing principle’, which requires a decision-maker to maximise his average long term expected utility, is absolute bullshit. For example, most mortals need to hedge against catastrophic losses and others wish to maximise the probability of a certain monetary gain.

[See Leonard and Hsu, Bayesian Methods, 1999, Ch.4]

The first Bayesian Valencia conference, at the Hotel Las Fuentes on the Mediterranean coast between Valencia and Barcelona, was a wonderfully iconic event; I for example met Jack Good, Art Dempster, Hiro Akaike and George Box for the first time. George and I went swimming in the bay together, and Jeff Harrison and I talked about synchronicity with Jack Good on the end of the pier.

I therefore initially  took it as a joke when Adrian advised me that ‘I would be destroyed by the storm that hit me’. I have nevertheless felt disturbed by this warning ever since.

I was impressed by Steve Fienberg’s discussion of Jeff Harrison’s paper on Bayesian Catastrophe Theory and, while Jeff seemed to regard it as a catastrophe, I was glad to renew my friendship with Steve.

[I last met Steve at the 2002 RSS Conference at the University of Plymouth, after my early retirement, but I haven’t been particularly active in Statistics since, apart from helping John Hsu to complete Bayesian Methods in Finance with Rachev et al, and Bishop Brian of Edinburgh with his Diocesan accounts. The conference dinner was held in the great barn at Buckland Abbey on Dartmoor, and that was also the last time I talked to Peter Green. Terry Speed, who was a defence expert witness in the O.J. Simpson case, and I discussed the merits of being honest and emphasising the truth, in the context of DNA profiling. Bruce Weir, the prosecution expert witness during the O.J. Simpson trial, had screwed up on the arithmetic. The British Forensic Science Service could also learn a thing or two from Terry]

The paper that I’d prepared in Aberystwyth was well received at Valencia 1 (e.g. by Jim Dickey, Bill DuMouchel and Jay Kadane), since most of the Bayesians in the audience were also pragmatic statisticians, and I was undeterred when Dennis and a couple of heavily-axiomatized discussants made some unnecessarily angry, and rather puerile, comments regarding my views on the notion of coherence.

         While the paper, that attempted to inject more practicality into applications of the Bayesian paradigm, is seldom cited, I am advised by Mark Steel and Deborah Ashby (personal communication) that it is known to the next generation of Bayesians and has influenced their thinking.

 
 
Mark Steel   Deborah Ashby
 

I drank far too many cointreaus during the final dinner, while George Box and Herb Solomon were singing ‘Our Theorem is Bayes Theorem’, and I puked the seafood up all over the beach. I was deeply saddened to learn, in hindsight, that I had been ferociously stabbed in the back by two leading Bayesians (not including Adrian), who tried to block my escape route to the University of Wisconsin, presumably so that I could wither away promotionless at Warwick.  Dennis has since corroborated this, and Adrian’s dire prophecy was almost correct.

 
Bayesians at Play (archived from Brad Carlin's Collection)
 

         John Deely, a wonderfully honest and perceptive gentleman from Christchurch, has since let a cat out of the bag by advising me that Dennis and Adrian would after that usually ice me out of the conversation, and respond as if I didn’t exist, whenever he tried to talk to them about me. John called this ‘the Tom Leonard mystery’.

 
John Deely
 

Nevertheless, Dennis did express his admiration to John on one occasion about my quick and easy derivation of his, and Adrian’s, M-group regression estimates, that cut out a great deal of extraneous matrix algebra. I’d completed this derivation off the top of my head for Dennis just before the presentation of their paper to the Royal Statistical Society when he couldn’t remember where their formula for the pooled regression came from.

Jim Smith advised me much later that Adrian had told him that he had absolutely nothing against me apart from ‘a slight problem when we were students’. Perhaps this was because I’d admired Adrian so much, as God’s perfect creation. Dennis certainly did so too. It’s not every supervisor who would’ve helped his student so much.

 

 
 
 
 
  © Thomas Hoskyns Leonard, 2012 - 2013