Tom Leonard  The Life of a Bayesian Boy 

CHAPTER
5: SABBATICAL YEAR (197879) 











Queen's
University, Kingston, Ontario 

The University of
Michigan at Ann Arbor 

The University
College of Wales,
Aberystwyth 








Gregynog Hall 

Hotel Las
Fuentes, Alcossebre 


I
was made very welcome by my hosts at Queen’s University, and soon
felt at ease in both social and academic terms. Jim Low provided me
with a retrospective data set that reported a number of variables
for over two thousand mothers and their babies, the first large data
set that I’d ever tried to analyse. The problem was to find the
variables that best predicted a low measure of the acidosis in blood
samples taken from the umbilical cords of the babies during labour.
Babies with their measure of acidosis below a certain threshold were
said to suffer from ‘fetal metabolic acidosis’ and this could give
rise to serious postnatal problems.
I tried least squares multiple regression for a couple of
months, but this invariably led to a remarkably poor fit, and values
of Rsquared alarmingly close to unity, as the data were so noisy.
When everybody had given up on me, I therefore took the radical step
for a Bayesian of resorting to data analysis. I tried splitting up
the data into subsets according to different levels of gestational
age. For each subset, I was subsequently inspired (divinely, or so I
believed at the time) to split the data into three groups,
corresponding to low, medium and high measures of acidosis. When I
examined scatterplots of the birthweights, I then discovered that,
for each level of gestational age, the birthweight distributions
shifted sideways when moving along the three groups.
Imagine my excitement! I modelled each of the birthweight
distributions using a skewed normal distribution (originally
proposed by Edgeworth (JRSSB, 1899), and reinvented by Ralph
Bradley and then myself) and maximum likelihood/moment estimators
for the three parameters. After an empirical application of Bayes
theorem, I was then able to plot the probability of low acidosis
against birthweight at each level of gestational age.
With the exception of the overweight, overdue babies, the
babies at greatest risk were those with low birthweights, but when
considered among babies of similar gestational age. No other
variables seemed to affect this conclusion and I therefore refuted
the medical folklore that was prevalent at the time; the three main
predictors until then were totally useless, at least for the two
thousandorso babies in that data set.
About 20% of the babies in the entire data set suffered
from fetal metabolic acidosis. I therefore also reported ‘crossover
points’, namely the birthweights below which my probabilities of
low acidosis exceeded 20%. Jim Low took a brief look at the
crossover points and made a remarkable observation by reference to
already published critical values for ‘intrauterine growth
retarded’ babies. The critical values were almost identical to my
crossover points, at each level of gestational age! We therefore
arrived at a simple and scientifically validated conclusion; the
babies at highest risk of fetal metabolic acidosis could be
predicted by ultrasound since these were the intrauterine growth
retarded babies. I’d also learnt that probabilistic prediction,
rather than point prediction, can work best for noisy data sets.
(perhaps more time series boffins should take heed of this!)
Our
conclusions were reported by Jim Low to a joint meeting of the
American Obstetric and Gynaecological Societies and wellreceived,
and they were published by several of us, with discussion, in the
societies’ journal in 1983. When I presented them to a meeting of
the Medical Section of the Royal Statistical Society in 1979, I
started to receive some recognition in Britain as an allround
statistician. (Granville TunnicliffeWilson, personal communication)




Dr. James Low
Obstetrics and Gynaecology, Kingston
General Hospital


When I was at Queen’s, I gave a number of wellreceived seminars
around Ontario, and I continued my lifelong friendship with (the
highlysupportive Bayesian) Irwin Guttman while visiting the
University of Toronto. He frequently visited his buddy Norman
Draper at the University of Wisconsin.
[I
visited Irwin at SUNY at Buffalo in 1992. We both helped his student
Li Sun develop his thesis work on Laplacian approximations for
random effects models, and Irwin later worked with me for several
months at the University of Edinburgh]
I
also visited Ann Arbor, Michigan, in 1978 to participate in one of
Arnold Zellner’s NSFNBER Bayesian Inference in Econometric and
Statistics seminars , and I met Steve Fienberg (for the first time
since a conference on categorical data analysis in Newcastle in 1975
), Morry De Groot, Seymour Geisser, Bruce Hill and several other
leading American Bayesians.
My talk on nonparametric Empirical Bayes and the EfronMorris
baseball batting example was wellreceived, and Arnold referred to
it for long afterwards. My empirical Bayes procedures showed that
the batters were divided into two groups, and that was probably
because Efron and Morris had taken care to put two different types
of batter into the same data set! I later published this analysis in
Ann Inst Stat Math (1984).












Steve Fienberg 

Arnold Zellner 

Irwin Guttman 


My friendship with Bruce Hill was to continue for a number
of years even though we disagreed strongly about Bayesian coherence.
At the 1979 Valencia conference, he announced during his talk that,
‘I’m looking forward to hearing from Professor Leonard why it’s good
to be a sure loser’, but that didn’t deter us.
While I was at Queen’s, I met up with the Dean of Engineering, David
Bacon. David was one of George Box’s former students, and he wrote
to George telling him about my successful medical data analysis. I
was subsequently invited by Gouri Bhattacharyya to move to Madison,
Wisconsin, where Norman Draper helped me to apply for my green card.
Despite my international successes, I received a torrid
reception when I dropped by the University of Warwick early in 1979.
I was therefore glad to retreat, albeit highly depressed, to the
University College of Wales in Aberystwyth, where I spent the second
semester of my sabbatical in the good company of Professor Jim
Dickey, his wife Martha and his Welshspeaking colleagues.
[This was Dennis Lindley’s and Owen Davies’s former stomping ground,
and I met Owen several times and went mountainwalking with him in
his old age. He was a magnificent applied statistician and
experimental design man out of I.C.I., and a predecessor of George
Box]
During that period, I stayed in a house previously owned by the
historian R.F. Treharne and whiled away much of the time reading the
classical detective novels that lined the walls in almost every
room.
I also took time to revise my philosophies of statistics,
in the light of my good experiences with the Ontario fetal metabolic
acidosis data. It finally dawned upon me that the Bayesian approach
is quite incomplete because it requires the mathematical and
probabilistic specification of a sampling model, and cannot usually
be used to derive suitably meaningful models from the data.
Moreover, Bayes factor methods for the comparison, or
mixing, of several candidate models (or more recently when applied
to forensic statistics) are usually quite worthless since they are
subject to Lindley’s Paradox (see Dennis’s paper in Biometrika,
1957, that was published after his 1953 paper on unlikelihood but
when he was still a diehard frequentist) and other curiously
anomalous behaviour, including high sensitivity to perturbations in
the prior distribution.
If information criteria are used for model comparison, then
Akaike’s A.I.C. is more convincing than B.I.C., which is based on a
very asymptotic approximation to a Bayes factor. However, A.I.C. is
only tenuously Bayesian. An alternative called D.I.C. has more
recently been proposed by David Spiegelhalter and others, and this
approximates A.I.C. While D.I.C usually works well it is not,
despite its elegant appearance, strictly speaking Bayesian or
justifiable via probabilistic arguments.
It is consequently essential to separate inductive
modelling, in relation to the data and the scientific or reallife
background, from deductive inference and prediction,
conditional on the choice of sampling model. George Box was at the
time thinking along similar lines. Bayes would rule the roost if the
sampling model were true. But, though some models are useful, most
of them are either wrong or potentially inadequate.
Dennis Lindley visited Aberystwyth for several days during that
period and gave a seminar. Given my views on parsimonious
statistical modelling, we were no longer seeing at all eye to eye,
and I concluded that Dennis was keeping his blinkers on. He was
indeed at the time advocating L.J. Savage’s amazing philosophy that
‘a model should be as big as an elephant’, which is still misleading
the economics profession and runs counter to the more generally
accepted concept of parameter parsimony.
[The problem with a large model that is too large is that its
parameters can’t be well estimated from the data. Dennis always
thought that Bayes took care of that. I prefer AIC and Jack Good’s
‘Occam’s razor’.
In 1983, I advocated the Savage elephant philosophy, on
Dennis’s behalf, to the Cincinnati meeting of the American
Statistical Association, while reading Dennis’s contribution, in his
absence, to the discussion of a paper by Carl Morris. I created
further general amusement by describing Americans as both fascists
and colonials, also on Dennis’s behalf]
While visiting Aberystwyth, Dennis kindly read a draft of my
manuscript on fetal metabolic acidosis and concluded that my
methodology amounted to ‘a good piece of data analysis, but not
statistics’. He also said that he always declined to consider any
data set that couldn’t be analysed using a simple application of
Bayes theorem.



16th August 2013: Please click on
TONY O'
HAGAN INTERVIEWS DENNIS LINDLEY for a very historical and
illuminating Youtube video, This includes an account, at age
90, by Dennis of his timehonoured 'justifications' of the
Bayesian paradigm, together with his totally unmerited
attitude (since 1973) towards vague priors, including Sir
Harold Jeffreys' celebrated invariance priors and Dennis's own
student Jose Bernardo's much respected reference priors. I,
quite frankly, find most of Dennis's opinions to be at best
unfortunate and at worst completely ****** up, particularly in
view of the highly paradoxical nature of the Savage axioms and
the virtually tautologous properties of the De Finetti axioms as appropriately strengthened by Kraft,
Pratt and Seidenberg (Ann. Math. Statist., 1959) and Villegas,
Ann. Math. Statist (1964) [ See Fishburn (Statistical Science,
1986) for a discussion of the very complicated strong
additivity and monotone continuity assumptions that are needed
to imply countable additivity of the subjective probability
measure]. His views on model construction demonstrate a lack of
awareness of the true nature of applied statistics. He was
however relatively recently awarded the Guy Medal in Gold by
the Royal Statistical Society for his contributions.
Dennis also confirms how he encouraged
Florence David to leave UCL for California (he'd previously
been a bit more explicit to me about this) and, quite
remarkably, says that he tried to arrange the early retirement
of two of his colleagues at UCL for not being sufficiently
Bayesian!! This was around the time that he was influencing a
downturn in my career at the University of Warwick. Dennis's
account of his own early retirement does not match what
actually happened. According to Adrian Smith, Dennis was
encouraged to retire after a fight with the administrators
over the skylight in Peter Freeman's office.





24th August 2013: Since
studying the Dennis Lindley interview, I have debated the
relevance of the Savage and extended De Finetti axioms with
Professor Peter Wakker on the ISBA website. As a spinoff of
this correspondence, I was contacted by Deborah Mayo, a
Professor of Philosophy at Virginia Tech, who has proposed some
counterexamples to Allan Birnbaum's 1962 justification of the
Likelihood Principle via the Sufficiency Principle and
Conditionality Principle. Her work may be accessed by clicking
on:
http://errorstatistics.com/2013/07/26/newversiononthebirnbaumargumentfortheslpslidesforjsmtalk/.
I leave it to the readers to decide this controversial issue for
themselves. I always thought that Birnbaum's proof was elegantly
simple and completely watertight, and it would be quite amusing
if I was wrong on this key issue.
26th August 2013: I have now heard from Peter Wakker
that Evans, Fraser, and Monette (Canadian Journal of
Statistics, 1986) claim that the Likelihood Principle is a
direct consequence of the Conditionality Principle, and that
the Sufficiency Principle is not needed at all. Phew! There is
clearly lots of room for further discussion. Some serious
mathematical issues need to be resolved.
26th August 2013: A RESOLUTION OF AN OLD
CONTROVERSY
Michael Evans of the University of
Toronto has just advised me that the proof of Birnbaum's
1962 theorem is not mathematically watertight. It should be
correctly stated as follows:
Theorem: If we accept SP and accept CP, and we accept all
the 'equivalences' generated jointly with these principles,
then we must accept LP.
He also proves:
Theorem: If we accept CP and we accept all the
equivalences generated by CP then we must accept LP.
Furthermore, it is unclear how one
justifies the additional hypotheses that are required to
obtain LP. Michael believes that Deborah Mayo's
counterarguments are appropriate. History has been made!
Shucks, Dennis! Where does that put the
mathematical foundations of the Bayesian paradigm? Both De
Finetti and Birnbaum have misled us with technically unsound
proofs. I should have listened to George Barnard in 1981.
While Professor Mayo's ongoing campaign
against LP would appear to be wild and footloose, she has
certainly shaken up the Bayesian Establishment.




Deborah Mayo 



While I was at Aberystwyth, I was invited to participate in the
annual Statistics at Gregynog, a rare honour. The speakers presented
their papers in an archaic building with a croquet lawn, which
Bradley Efron once described as ‘that nice country house just
outside London’. I met Ralph Bradley, who’d just finished a
nineteenyear stint as Head of Statistics at Florida State
University. He explained the origins of the skewed normal
distribution to me.
During this period, I received my eagerly anticipated offer from
Gouri Bhattacharyya, the Chairman of Statistics at Wisconsin. One
year (19791980) as a visiting Associate Professor (for the princely
salary of $22000 that almost tripled my stipend at Warwick) followed
by a permanent appointment as soon as my tenure could be finalised.
Both appointments were initially halftime in Statistics and
halftime in the Mathematics Research Center, which was housed on
the edge of the campus in the fourteenstorey WARF building and
funded by the US Army.
[MRC
had been blown up in Sterling Hall during the Vietnam War when the
protesting students were being chased with tear gas around the city]




The Queen and Castle pub in Kenilworth 

I
felt bad about leaving my recentlypurchased house near Kenilworth
Castle, but felt forced to do this because of the bizarre situation
at the University of Warwick.
Indeed, in later years the influential Bayesian group there was
dismantled when two worldleading Bayesians were denied their
welldeserved promotions. It was only Jim Smith’s return from
University College London that restored any sanity to the situation.
During my sabbatical semester in Aberystwyth, I was delighted to
receive an invitation from Jose Bernardo (an outstanding practical
Bayesian if ever there was one) to present a discussion paper during
June 1979 at the first of the long series of Bayesian Statistics
conferences to be organized by the University of Valencia.
While I was preparing my Valencia conference paper ‘The roles
of inductive modelling and coherence in Bayesian Statistics,’ my
only objective was to discern scientific truth, rather than to
attack the high priests of the Bayesian establishment. I however
clarified in my mind that the De Finetti and Savage axiom systems,
which were supposed to justify Bayesian inference and
decisionmaking, were at best tautologous with their specific
theoretical conclusions and at worst downright misleading.
Moreover, to insist that a statistician should be
‘coherent’ and Bayesian, when choosing his sampling model in
relation to the scientific background, was totally out of line, as
well as completely impractical. The ‘sure thing principle’, which
requires a decisionmaker to maximise his average long term expected
utility, is absolute bullshit. For example, most mortals need to
hedge against catastrophic losses and others wish to maximise the
probability of a certain monetary gain.
[See Leonard and Hsu, Bayesian Methods, 1999, Ch.4]
The
first Bayesian Valencia conference, at the Hotel Las Fuentes on the
Mediterranean coast between Valencia and Barcelona, was a
wonderfully iconic event; I for example met Jack Good, Art Dempster,
Hiro Akaike and George Box for the first time. George and I went
swimming in the bay together, and Jeff Harrison and I talked about
synchronicity with Jack Good on the end of the pier.
I
therefore initially took it as a joke when Adrian advised me that
‘I would be destroyed by the storm that hit me’. I have nevertheless
felt disturbed by this warning ever since.
I
was impressed by Steve Fienberg’s discussion of Jeff Harrison’s
paper on Bayesian Catastrophe Theory and, while Jeff seemed to
regard it as a catastrophe, I was glad to renew my friendship with
Steve.
[I
last met Steve at the 2002 RSS Conference at the University of
Plymouth, after my early retirement, but I haven’t been particularly
active in Statistics since, apart from helping John Hsu to complete
Bayesian Methods in Finance with Rachev et al, and
Bishop Brian of Edinburgh with his Diocesan accounts. The conference
dinner was held in the great barn at Buckland Abbey on Dartmoor, and
that was also the last time I talked to Peter Green. Terry Speed,
who was a defence expert witness in the O.J. Simpson case, and I
discussed the merits of being honest and emphasising the truth, in
the context of DNA profiling. Bruce Weir, the prosecution expert
witness during the O.J. Simpson trial, had screwed up on the
arithmetic. The British Forensic Science Service could also learn a
thing or two from Terry]
The
paper that I’d prepared in Aberystwyth was well received at Valencia
1 (e.g. by Jim Dickey, Bill DuMouchel and Jay Kadane), since most of
the Bayesians in the audience were also pragmatic statisticians, and
I was undeterred when Dennis and a couple of heavilyaxiomatized
discussants made some unnecessarily angry, and rather puerile,
comments regarding my views on the notion of coherence.
While the paper, that attempted to inject more practicality
into applications of the Bayesian paradigm, is seldom cited, I am
advised by Mark Steel and Deborah Ashby (personal communication)
that it is known to the next generation of Bayesians and has
influenced their thinking.








Mark Steel 

Deborah Ashby 


I
drank far too many cointreaus during the final dinner, while George
Box and Herb Solomon were singing ‘Our Theorem is Bayes Theorem’,
and I puked the seafood up all over the beach. I was deeply saddened
to learn, in hindsight, that I had been ferociously stabbed in the
back by two leading Bayesians (not including Adrian), who tried to
block my escape route to the University of Wisconsin, presumably so
that I could wither away promotionless at Warwick. Dennis has since
corroborated this, and Adrian’s dire prophecy was almost correct.




Bayesians at Play (archived from Brad
Carlin's Collection) 

John Deely, a wonderfully honest and perceptive gentleman
from Christchurch, has since let a cat out of the bag by advising me
that Dennis and Adrian would after that usually ice me out of the
conversation, and respond as if I didn’t exist, whenever he tried to
talk to them about me. John called this ‘the Tom Leonard mystery’. 



John Deely 

Nevertheless, Dennis did express his admiration to John on one
occasion about my quick and easy derivation of his, and Adrian’s,
Mgroup regression estimates, that cut out a great deal of
extraneous matrix algebra. I’d completed this derivation off the top
of my head for Dennis just before the presentation of their paper to
the Royal Statistical Society when he couldn’t remember where their
formula for the pooled regression came from.
Jim
Smith advised me much later that Adrian had told him that he had
absolutely nothing against me apart from ‘a slight problem when we
were students’. Perhaps this was because I’d admired Adrian so much,
as God’s perfect creation. Dennis certainly did so too. It’s not
every supervisor who would’ve helped his student so much. 

