|In the aftermath of Dennis
Lindley’s 2013 YouTube video interview by Tony O’Hagan, it became
abundantly apparent to me, following detailed correspondence with
Peter Wakker, Deborah Mayo and Michael Evans, that Bayesian
Statistics can no longer be justifiably motivated by the Axioms of
Subjective Probability, or the Axioms of Utility and Modified
Utility, or by the way the Sufficiency and Conditionality Principles
had been previously thought to imply our all-sacred Likelihood
Principle. This puts our paradigm into a totally different, and
quite refreshing, ballpark. Maybe axiomatized coherence, incoherence
and sure-loser principles will soon be relics, strange artefacts of
the past. Deck the halls with balls of holly!
The wide diversity of novel
Bayesian theoretical methods developed and practical situations
analysed during the current century are well-represented in the
eleven published papers which have received the Mitchell prize
(awarded by ISBA) since the turn of the century. They include:
2002 (Genetics): Jonathan
Pritchard, Matthew Stevens, and Peter Donnelly
Inference of Population
Structure using Multilocus Genotype Data
2003 (Medicine) Jeff
Morris, Marina Vannucci, Phil Brown, and Ray Carroll
Nonparametric Modeling of Hierarchical Function in Colon
2006 (Biology) Ben
Redelings and Marc Suchard
Joint Bayesian Estimation
of alignment and phylogeny
2007 (Sociology) Tian
Zheng, Matthew Salganik, and Andrew Gelman
How many people do you
know in prison? Using overdispersion in count data to estimate
social structure in networks.
2010 (Astronomy) Ian
Vernon, Michael Goldstein, and Richard Bower
Galaxy Formation: A
Bayesian Uncertainty Analysis
The Mitchell Prize is
named after Toby J. Mitchell, and it was established by ISBA after
his tragic death from leukaemia in 1993. He would have been glad to
have inspired so much brilliantly useful research.
Toby was a dedicated
Bayesian, and he is well known for his philosophy
The greater the
amount of information, the less you actually know.
Toby was a Senior
Research Staff Member at Oak Ridge National Laboratory in Tennessee.
He won the George Snedecor Award in 1978 (with Bruce Turnbull) and
made incisive contributions to Statistics, especially in biometry
and engineering applications. He was a magnificent collaborator and
a very insightful scientist.
Norman Draper took one
of the last pictures of Toby Mitchell, and he has kindly sent it to
colleague Toby J. Mitchell (d 1993). Tom and Toby taught Statistics
courses at US Military Bases and the
space station in Huntsville Alabama, together, and visited Washington D.C.
from Adelphi, Maryland in 1980.
See Author’s Notes
(below) for details of several of the Ph.D. theses whose authors
have been awarded ISBA’s one of Savage Prizes during the current
century. We have clearly been blessed with some talented
up-and-coming Bayesian researchers.
In the remainder of Ch.
7, I will critique some of the Bayesian literature in greater
detail. This is to give readers a better understanding of the
quality and relevance of 21 st. century research and of
the degree of competence, applied nous, and originality of our
In 1863, Major General
George Pickett was one of three generals who led the assault under
Longstreet at the Battle of Gettysburg, as the Confederate Army
charged towards the Unionist lines in all its Napoleonesque glory.
In the year 2000, Dennis Lindley came out of retirement at age 77 to
confront a full meeting of the Royal Statistical Society with his
magnificently written invited discussion paper ‘The Philosophy of
Statistics’. While Pickett’s charge ended in utter disaster for the
Dixies, Dennis would, in 2002, be awarded the Society’s Guy Medal in
Gold for his life-time of endeavours, having retired from UCL a
quarter of a century earlier.
Here is a brief précis
of the ideas and opinions expressed in the first seven sections of
is the study of uncertainty. Statistical inference should be based
on probability alone. Progress is therefore dependent upon the
construction of a probability model.
Uncertainty can only be measured by probability. The likelihood
principle follows from the basic role played by probability. I
(Dennis) modify my previous opinions by saying that more emphasis
should be placed on model construction than on formal inference.
Formal inference is a systematic procedure within the calculus of
probability. Model construction cannot be so systematic.
Statisticians should be the experts at handling uncertainty. We do
not study the mechanism of rain, only whether it will rain. We are,
as practitioners, therefore dependant on others. We will suffer,
even as theoreticians if we remain too divorced from the science. We
define ‘the client’ as the person e.g. scientist or lawyer, who
encounters uncertainty in their field of study. Statistics is
ordinarily associated with (numerical) data. It is the link between
uncertainty, or variability in the data, and that in the topic
itself that has occupied statisticians. The passage from process
(sampling model) to data is clear. It is when we attempt to go from
data to process that difficulties occur.
3. Uncertainty: It is
only by associating numbers with any scientific concept that the
concept can be understood. Statisticians consequently need to
measure uncertainty by numbers, in a way that they can combine them.
It is proposed to measure your uncertainty with an event happening
by comparison with a standard e.g. relating to balls drawn from an
4. Uncertainty and
probability: The addition and product rules, together with the
convexity rule always lies in the convex unit interval, are the
defining rules of probability, at least for a finite number of
events. The conclusion is that the measurements of uncertainty can
be described by the calculus of probability. The uncertainty related
to bets placed on gambles, the use of odds, combined with the
impossibility of a Dutch book, leads back to probability as before.
A further approach due to De Finetti extracts a penalty score from
you after you have stated your uncertainty of an event, which
depends on whether the event is subsequently shown to be true or
false. This [frequency!] approach provides an empirical check
on the quality of your probability assessments and hence on a test
of your abilities as a statistician. The gathering and publishing of
data remains an essential part of statistics today.
Measurements of uncertainty must obey the rules of probability
calculus. This is intended to be self-evident and that you would
feel foolish if you were to be caught violating it. Axioms (which
refer to a ‘more likely’ binary relation) all lead to probability
being the only satisfactory explanation of uncertainty, though some
writers have considered the axioms carefully and produced objection.
[Lindley does not
cite the 1986 paper by Fishburn or the key source references
quoted by Fishburn].
It is therefore
convenient to state formally the rules of the [conditional]
probability calculus. They are Rule 1 (convexity), Rule 2 (addition;
Cromwell’s rule), and Rule 3 (multiplication). It is easy to extend
the addition rule, for two events, to a finite number of events. I
prefer to use De Finetti’s Rule 4 (conglomerity) to justify
addition for an infinite sequence of events. Conglomerity is in the
spirit of a class of rules known as ‘sure things’. It is ‘easy to
verify’ that the [horribly complicated] Rule 4 follows from rules
1-3 when the partition is finite. [Is it?]
6. Significance and
Confidence: Statisticians DO use measures of uncertainty that do
NOT combine according to the rules of probability calculus. Let H
denote a (null) hypothesis. Then a statistician may advise the
client to use a significance level [Dennis means p-value or
significance probability] that is, assuming that H is true, the
probability of the observed, or more extreme, data is calculated.
This usage flies in the face of the arguments above which assert
that uncertainty about H needs to be measured as a probability of H.
This is an example of the prosecutor’s fallacy.
If a parameter θ is
uncertain, a statistician will typically recommend a confidence
interval. The development above based on measured uncertainty will
use a probability density for θ, and perhaps an interval of that
density. Again we have a contrast similar to the prosecutor’s
Statisticians have paid
inadequate attention to the relationships between the statements
that they make and the sample sizes on which they are based.
How do you combine
several data sets concerning the same hypothesis, each with its own
The conclusion that
probability is the only measure of uncertainty is not just a pat on
the back but strikes at many of the basic statistical activities.
7. Inference: Let the
parameter be (θ, α) where α is a nuisance, and let x be the
observation. The uncertainty in x needs to be described
probabilistically. Let p(x l θ, α) denote the probability of x given
θ and α, and describe the uncertainty in the parameter by a
probability p (θ, α). The revised uncertainty in the light of the
data can then be evaluated from the probability calculus by the
probability p (θ, α I x), the constant of proportionality dependent
upon x, not the parameters. Since α is not of interest it can be
eliminated by the probability calculus to give p (θ l x), which
evaluates the revised uncertainty in θ. This unfortunate terminology
is accompanied by some other which is even worse; p (θ) is often
called the prior distribution, p (θ l x) the posterior.
The main protest against
the Bayesian position is that the inference is considered within the
sections of Lindley’s paper, which contain further quite influential
content, were entitled
8. Subjectivity 9.Models
10. Data Analysis 11. Models again 12. Optimality 13. The likelihood
principle 14. Frequentist concepts 15. Decision Analysis 16.
Likelihood principle (again) 17. Risk 18. Science 19. Criminal Law.
During his conclusions,
The philosophy here has
three fundamental tenets; first, that uncertainties should be
described by probabilities; second, that consequences should have
their merits described by utilities; third, that the optimal
decision combines the probabilities; third that the optimum decision
procedures combines the probabilities and utilities by calculating
expected utility and then maximising that.
These tenets were of
course adopted by Raiffa and Schlaifer as long ago as 1961, and
embraced with open arms by Dennis since about 1955. Dennis’s ideas
on utility refer to Leonard ‘Jimmie’ Savage’s 1954 balls-up of an
entreaty The Foundations of Statistics.
Dennis’s suggestions on
modeling pay lip service to Box’s celebrated 1980 paper . See
also my 1979 Valencia 1 contribution  where I emphasise the
necessity for the statistician to interact with the client in
relation to the scientific background of the data. The discussion of
criminal law in Dennis’s section 19 quite surprisingly and
unequivocally advocates the Bayes factor approach described in 1991
in Colin Aitken’s and D.A. Stoney’s book The Use of Statistics in
During the discussion of
Dennis’s paper the issue was raised as to whether he was addressing
‘statistical probability’ as opposed to real applied statistics. It
would have certainly been good to see some more numbers. While of
historical significance, Dennis’s treatise does not, as might have
been instead anticipated, spell out a new and novel position for the
twenty-first century, but rather encouraged us to refer to past long
In the same year as
Lindley’s historical presentation to the Royal Statistical Society,
Fernando Quintana, one of our happy Bayesians from Chile, and
Michael Newton wrote a much more important paper in JASA which
concerned computational aspects of semi-parametric Bayesian analysis
together with applications to the modeling of multiple binary
sequences. The contrasts, in terms of
statistical quality and scientific impact, with
Lindley’s paper could not have been more enormous.
Two interesting applications
in archaeology were published during the year 2000.
Yanan Fan and Stephen
Brookes, then working at the University of Bristol, published a
fascinating analysis in the Statistician of sets of
prehistoric corbelled tomb data which were collected from a variety
of sites around Europe. They investigated how earlier analyses of
tomb data, e.g. by Caitlin Buck and her co-workers, where structural
changes were anticipated in the shape of the tomb at various depths,
could be extended and improved by considering a wider change of
models. The authors also investigated the extent to which these
analyses may be useful in addressing questions concerning the origin
of tomb building technologies, particularly in distinguishing
between corbelled domes built by different civilisations, as
well as the processes involved in their construction.
Fan and Brookes found no
evidence to dispute or support a previous claim that a β slope
parameter could be used to distinguish between domes of different
origins through a comparison of their shape. By considering a range
of sampling models, they showed that previous analyses may have been
misleading in this regard.
The authors analysed
radius data taken above the lintel of the tomb, firstly by
considering the Cavanagh-Laxton formulation which takes the
log-radiuses to depend upon corresponding depths in a three
parameter non-linear regression model with i.i.d. normal error
terms. They also investigated change-point modifications to the
Cavanagh-Laxton model which permitted the two parameters within its
regression component to change at different depths.
The authors were able to
avoid detailed algebraic prior to posterior analyses, by immediately
referring to MCMC and its generalisations. This enabled them to
apply a complicated version of acceptance sampling which calculated
the posterior probabilities for several choices of their sampling
models; their posterior inferences were doubtlessly quite sensitive
to the prior assumptions within any particular model.
Fan and Brookes reported
their conditional posterior inferences for the four top models for
their Stylos data set, and their posterior model
probabilities and conditional posterior inferences for the
Nuranghe, Achladia and Dimini data.
This was a fascinating
Bayesian analysis which, however, deserves further practical
investigation, e.g. using maximum likelihood and AIC. It is
important not to be blinded by the Metropolis-Hastings algorithm to
the point where it could snow over and blur both conceptual
inadequacies and your statistical conclusions, or encourage you into
employing over-complex sampling models.
Caitlin Buck and Sujit
Sahu of the Universities of Cardiff and Southampton were a bit
cagier when they were analysing a couple of two-way contingency
tables, which relate to refuse mounds in Awatowi, Arizona, and to
Mesolithic stone tools from six different sites in Southern England.
One of their objectives in their paper in Applied Statistics
was to give insights about the relative dates of deposition of
archaeological artifacts even when stratigraphic information is
The first of the
contingency tables cross-classifies proportions of five different
pottery types against layer. The second table cross-classifies seven
different types of microlith against site. Both tables would have
benefited from a preliminary log-linear interaction analysis since
this would have helped the authors to assess the strength of
information, and practical significance, in the data, before
disguising it within a specialised choice of model.
The authors instead
referred to an extension of a Robinson-Kendall (RK) model which
imposes serial orderings on the cell probabilities, and to a
hierarchical Bayesian model for seriation. They completed their
posterior computations using a hybrid Monte Carlo method based on
Langevin diffusion, and compared different models by reference to
posterior predictive densities. They justified their model
comparison procedure by reference to an entropy-like divergence
The authors showed that
the Awatoni data is better modelled using the RK model, but that
both models give the same ordering d e f g h I j of the
For the stone tool
data, the extended RK model and a hierarchical canonical correlation
model give quite different orderings of the six sites. The
co-authors surmise, after some deliberation, that the RK model gives
the better fit.
Buck and Sahu thereby
concluded that the point estimates of the relative archaeological
orders provided by erstwhile non-Bayesian analyses fail to capture
other orderings which, given that the data are inherently noisy,
could be considered appropriate seriations. A most impressive
applied Bayesian analysis, and set of conclusions! The authors were
extremely careful when choosing the particular Bayesian techniques
which were best suited to their own situation. They, for example,
steered clear of the Bayes factor trap and the related
super-sensitive posterior probabilities for candidate sampling
Caitlin Buck is
Professor of Statistics at the University of Sheffield. She has
published widely in the areas of archaeology, palaeo-environmental
sciences, and scientific dating, and she is a leader in her field.
In his JASA vignette ‘Bayesian Analysis : A Look at Today and
Thoughts of Tomorrow’, Jim Berger catalogues successful Bayesian
applications in the areas of archaeology, atmospheric sciences,
economics and econometric , education, epidemiology, engineering,
genetics, hydrology, law, measurement and assay, medicine, physical
sciences, quality management, and social sciences.
Berger rolling out the barrel
Jim also provides the reader with an
excellent reference list for Bayesian developments in twenty
different areas, most controversially ‘causality’, but also
including graphical models and Bayesian networks, non-parametrics
and function estimation, and testing, model selection, and variable
Jim agrees with many
statisticians that subjective probability is the soul of Bayesian
Statistics. In many problems, use of subjective prior information is
clearly essential, and in others it is readily available; use of
subjective Bayesian analysis for such problems can provide dramatic
Jim also discusses the
non-objectivity of ‘objective’ Bayesian analysis, together with
robust Bayesian, frequentist Bayes, and quasi-Bayesian analysis. I
guess I can’t see too much difference between his objective Bayes
analysis and quasi-Bayes analysis since both refer to vague priors.
Jim Berger concludes
with his reflections on Bayesian computation and software.
The ASA’s vignettes for
the year 2000 (see the December 2000 issue of JASA) also include
very informative items by other authors on statistical decision
theory (Larry Brown), MCMC (Olivier Cappé and Christian Robert),
Gibbs sampling (Alan Gelfand), the variable selection problem (Ed
George), hierarchical models (James Hobert), and hypothesis testing
and Bayes factors (John Marden), a wonderful resource.
In 2000, Ehsan Soofi and his
three co-authors published a paper on maximum entropy Dirichlet
modeling of consumer choice, in the Proceedings of Bayesian
Ehsan Soofi is
Distinguished Professor of Business Statistics at UW Milwaukee. His
research focuses on developing statistical information measures and
showing their use in economic and business applications.
Ehsan once encouraged me
to purchase a new tie while we were attending a Bayesian seminar
In the same year, Mark
Berliner, Christopher Wikle and Noel Cressie co-authored a
fascinating paper ‘Long-lead prediction of Pacific SST’s via
Bayesian dynamic models’ in the Journal of Climate.
The following important
Bayesian papers were published in Royal Statistical Society journals
during the year 2000:
A Bayesian Lifetime
Model for the ‘Not 100’ Billboard Songs (Eric Bradlow and Peter
Hierarchical Bayesian Modeling: Tropical Ocean Surface Winds
(Christopher Wikle et al)
Regression on Curves with Application to a Spectroscopic Calibration
Problem (Philip Brown, Tom Fearn, and Marina Vannucci)
Approach to Improve Wavelet Thresholding for Image Noise Reduction (Maarten
Jensen and Adhemar Bultheel)
Evolutionary Monte Carlo with Applications to Bayesian Mixture
Models (Faming Liang and Felix Martini)
A Bayesian Time-Course
Model for Functional Magnetic Resonance Data (Christopher Genovese,
Bayesian Regression Modeling with Interaction and Smooth Effects (Paul Gustafson)
Inference for Dynamic Mixture Models (Richard Gerlach, Chris Carter
and Robert Kohn)
opinion in the UK water industry: an experimental study (Paul Garthwaite and Tony O’Hagan)
The diversity of the
authors, their subject matter, and their countries of origin, is
most impressive. The ever-evolving Bayesian spatio-temporal process
In their discussion papers
in JASA in 2000, Christopher Genovese investigated his
Bayesian time-course model for functional magnetic resonance data,
and Phil Dawid considered causal inference without counterfactuals,
and I’m sure that he was glad to do without them.
Also in JASA in
2000, Balgobin Nandram, Joe Sedransk and Linda Williams Pickle
described their Bayesian Analysis for chronic obstructive pulmonary
disease, and David Dunson and Haibo Zhou proposed a Bayesian model
for fucundability and sterility. Way to go, guys!
In 2000, Orestis
Papasouliotis described a Bayesian spatio-temporal model in the
fifth chapter of his University of Edinburgh Ph.D. thesis. Orestis
used this to develop a search procedure for discovering efficient
pairs of injector and producer oilwells in hydrocarbon reservoir
where the geological rifts may create a propensity for long-term
Orestis’ endeavours led
to our two international patents with Ian Main FRSE, Professor of
Seismology and Rock Physics in our Department of Geology and
Geophysics. Ian also enjoys playing the guitar, and singing in
sessions in Edinburgh’s Old Town.
Ian Main FRSE
Orestis and I co-authored
four papers with Ian Main and his colleagues in the Geophysics
literature, in 1999, 2001, 2006 and 2007, the first two on topics
relating to the activity of earthquakes. So I guess we turned Ian
into a Bayesian seismologist and geophysicist.
He is currently developing a fully Bayesian
method for earthquake hazard calculation with Richard Chandler of
The fourth of these
articles was published in Structurally Complex Reservoirs by the
Geological Society of London. Kes Heffer, of the Institute of
Petroleum Engineering at Heriot-Watt was one of the co-authors of
our 2006 and 2007 publications. He was formerly a leading researcher
Papasouliotis is a very accomplished M&S scientist with Merck Serono
Pharmaceuticals in Geneva. He visited me again in Edinburgh this
year, and he hopes to fly over again soon with his wife and daughter
to visit the penguins in the zoo.
Orestis Papasouliotis, with his wife and daughter
By the inception of Anno
Domini 2001, a fresh generation of Bayesians was already moving
way ahead of the quasi-religious practices and cults of the past, as
they rose to help meet the social challenges of our economically
volatile, and medically and genetically challenged era.
The quality of the new
era material was re-emphasised as the century turned. In their paper
in the Journal of Computational Biology, Michael Newton,
Christina Kendziorski, Craig Richmond, Frederick Blattner and
Kam-Wah Tsui reported their improved statistical inferences
concerning gene expressions from micro-assay data. An inspired
effort from five free spirits of Bayesian persuasion.
modelled intensity levels R and approximate target values G by
independent Poisson variates, conditionally on different means and
the same coefficient of variation c. Then the ratio of the means ρ
is the parameter of interest. They take a hierarchical approach
where the conditional means are assumed to be independent and
identically Gamma distributed. The marginal posterior distribution
of ρ, given R and G, is then the distribution of the ratio of two
updated Gamma variates. Moreover, the posterior mean of ρ may be
described as a differential expression of the form
(R+ν) / (G+ν)
Where ν reflects three
expressions of interest, including the prior predictive mean of R.
Michael Newton and his
co-authors estimate c and the two prior parameters pragmatically,
using marginal likelihood, and then apply their methodology to
heat-shock and E-coli data. They furthermore refer to some simple
A wonderfully succinct
analysis which did not require any simulations at all, since exact
algebraic expressions were available using probability calculus.
This sort of analysis is hopefully not becoming a lost art.
In their 2001 paper in Biometrics, James Albert and Sid Chib
applied their sequential ordinal modeling methodology to the
analysis of survival data.
In 2002, Sid co-authored
a paper in the Journal of Econometrics with Federico Nardari
and Neil Shephard which highlighted even more MCMC procedures for
stochastic volatility models.
Siddhartha Chib is the
Harry C. Hartford Professor of Econometrics at Washington University
in St. Louis, and has a fine history of applying Bayesian methods,
e.g. to Economics, very much in the Arnold Zellner tradition. In
2008, he and Edward Greenberg published a review of hierarchical
Bayes modeling in the New Palgrave Dictionary of Economics.
Formerly a don at
Cambridge, Neil Shephard is now Professor of Statistics and
Economics at Harvard University. His work is occasionally Bayesian,
though he does sometimes experience technical difficulties when
constraining his unknown covariance matrices to the interior of the
In 2001, Gareth Roberts
co-authored a paper with Jesper Møller and Antonietta Mira in
JRSSB about perfect slice samplers. I hope he spiced them with
In 2004, Gareth Roberts,
Omiros Papaspiliopoulos and Petros Dellaportas wrote another famous
paper in JRSSB, this time about Bayesian Inference for
non-Gaussian Ornstein-Uhlenbeck stochastic volatility processes.
These processes are likely to handle stochastic volatility pretty
well because of their elasticity properties. I developed a
log-Gaussian doubly stochastic Poisson version of them in 1978 in
 and applied it to the flashing Green Man pedestrian crossing
Gareth Roberts F.R.S, is
distinguished for his work spanning Applied Probability, Bayesian
Statistics and Computational Statistics. He has made fundamental
contributions to crucial convergence, stability theory, extensions
to the Metropolis-Hastings algorithm and adaptive MCMC, infinite
dimensional simulation problems, and inference in stochastic
processes. His work has found application in the study of epidemics
such as Avian influenza and foot and mouth disease.
As Professor of
Statistics and Director of CRiSM at the University of Warwick,
Gareth is one of the most distinguished Bayesians to have graced
that now world-renowned institution. He obtained his Ph.D. there in
1988 on the topic ‘Some boundary hitting problems for diffusion
processes’ and has taken the Bayesian paradigm to fresh heights in
the quarter of a century since.
Petros Dellaportas is
Professor of Statistics and Economics at the University of Athens.
He has developed and computed Bayesian solutions for a variety of
complex problems, including hierarchical, GARCH, and generalised
was awarded his Ph.D. in 2003 at the University of Lancaster. His
thesis topic was ‘Non-centred parameterizations for hierarchical
models and data augmentation’.
Formerly an Assistant
Professor in Statistics at the University of Warwick, Omiros is
currently ICREA Research Professor in the Department of Economics at
Universitat Pompeu Fabria.
Omiros received the Guy
Medal in Bronze from the Royal Statistical Society in 2010. He is
one of our up and coming young stars, with a name to fit.
Also in 2001, Chris Glasbey
and Kanti Mardia presented their seminal, effectively Bayesian paper
‘A penalised likelihood approach to image warping’ to a full meeting
of the Royal Statistical Society. The authors achieved new frontiers
in Image Analysis by identifying a new Fourier-Von Mises model with
phase differences between Fourier-transformed images having Von
Mises distributions. They used their a posteriori smoothing
procedures to (a) register a remote-sensed image with a map (b) to
align microscope images from different optics, and (c) to
discriminate between different images of fish from photographic
images. Even Chris Glasbey was impressed.
Doubtlessly one of
Britain’s and India’s most brilliantly productive mathematical
statisticians, Kanti Mardia has made a number of important Bayesian
and effectively Bayesian contributions. After a pre-eminent career,
he is currently taking time out as Senior Research Fellow in the
Department of Mathematics at the University of Leeds. While other
leading statisticians are better at self-promotion, Kanti deserves
all the accolades that our profession can give him.
Kevin Patrick Murphy was an
Associate Professor in Computer Science and Statistics at the
University of British Columbia until 2012, but now works as a
research scientist for Google. In 2001 he published the three
somewhat Bayesian papers,
Linear Time Inference in
Hierarchical HMMs (with Mark Paskin) in Neural Info. Proc.
The Factored Frontier
Algorithm for Approximate Inference in DBNs (with Yair Weiss)
Uncertainty in Artificial Intelligence
Particle Filtering for Dynamic Bayesian Networks (with Stuart
Russell). In Sequential Monte Carlo Methods in Practice
Kevin Murphy has published
extensively in A.I., Machine Intelligence, Bayesian Statistics, and
probabilistic graphical models, with applications to information
extraction, machine reading, knowledge-based construction, computer
vision and computational biology.
remarkable 2012 book Machine Learning a Probabilistic Perspective
is more Bayesian than some approaches to machine intelligence, he is
more concerned about frequency properties than most other Bayesians
working in this area. Good for him!
Radford Neal, another
Canadian Bayesian Statistician and Machine Intelligence expert,
reported his work on annealed importance sampling in 2001 in
Statistics and Computing. This is one of several very useful
papers which Radford has published on Bayesian simulation.
In their JASA papers
in 2001, William Bolstad and Samuel Manda described their Bayesian
investigation of child mortality in Malawi which referred to family
and community random effects, our doughty Durham friends Peter
Craig, Michael Goldstein, Jonathan Rougier and Allan Seheult told us
all about their Bayesian forecasting for complex systems, Peter
Westfall and Keith Soper used their priors to improve animal
carcinogeniety tests, and three musketeers from the Orient, namely
Hoon Kim, Dongchu Sun and Robert Tsutakawa followed in the footsteps
of Hickman and Miller by proposing a bivariate Bayesian method for
estimating mortality rates with a conditional autoregressive model.
Enrique González and Josep
Ginebra Molins of the University Polytechnique of Catalonia
published their book Bayesian Heuristics for Multi-Period Control
in 2001, after working on lots of similarly impressive Bayesian
research in Barcelona.
In that same year, Aaron
Ellison’s book An Introduction to Bayesian Inference for
Ecological Research and Environmental Decision Making was
published on-line on JSTOR.
Fahrmeir and Stefan Lang reported their Bayesian semi-parametric
analysis of multi-categorical time-space data in the Annals of
the Institute of Mathematical Statistics. They applied their
methodology most effectively to the analysis of monthly unemployment
data from the German Federal Employment Office, and they reported
some of their results spatially as well as temporally on maps of
Fahrmeir and Lang also
reported their Bayesian inferences for generalised additive mixed
models based on Markov random field priors in Applied Statistics.
They use discretized versions of prior processes which could
alternatively be representable via the mean value functions and
covariance kernels of Gaussian processes, and may, if necessary,
incorporate spatial covariates. Their prior to posterior analysis is
completed by MCMC inference.
The authors apply their
methodology to forest damage and to duration of employment data.
While their theory is fairly routine and unembellished, the
applications are interesting.
In the same issue of
Applied Statistics, David Dunson and Gregg Dinse of the U.S.
National Institute of Environmental Health Sciences in Research
Triangle Park report their Bayesian incident analysis of
tumorigenicity data. In most animal carcinogenicity experiments,
tumours are not observable in live animals, and censoring of the
tumour onset times is informative. Dunson and Dinse focus on the
incidence of tumours and censored onset times without restricting
tumour lethality, relying on cause-of-death data, or requiring
The authors’ sampling
model for the four observable outcomes at each death time combine
multistate stochastic, probit, and latent variables assumptions, and
also model covariate effects. Their prior distributions are elicited
from experts in the subject area and refer also to a meta-analysis
which employs a random effects model. These complex assumptions are
applied to a triphosphate study, yielding some interesting posterior
inferences via Gibbs sampling.
This is a top-quality,
though highly subjective, investigation which does not refer to
model comparison criteria. The method adjusts for animal survival
and tumour lethality through a multi-state model of tumorigenesis
and death. If I had refereed this paper then I would have advised
the authors to check out their assumptions and conclusions a bit
more in empirical terms, just in case they became the state of the
| David Dunson is, like
Jim Berger, currently an Arts and Sciences Distinguished Professor
in the Department of Statistical Science at Duke University. His
Bayesian research interests focus on complex medical data sets and
machine learning applications, and include image and shape analysis.
| I once met Gregg Dinse
in Wisconsin. An extremely charming man, he is good at interacting
with subject matter experts, and is also a very prolific applied
article ‘Likelihood and Bayesian Analysis of Mixtures’ was published
in Statistical Modeling in 2001. Murray is nowadays
Professorial Fellow in the Department of Statistics at the
University of Melbourne. He has published many highly innovative
papers in Bayesian areas.
In their path-breaking 2002 invited paper to the Royal Statistical
Society, David Spiegelhalter, Nicky Best, Brad Carlin and Angelika
van der Linde proposed a Bayesian measure of model complexity and
fit called DIC (the Deviance Information Criterion) that
facilitates the comparison of various choices of sampling model for
a specified data set.
| Let L(θ) denote the
log-likelihood for your px1 vector θ of parameters when a
particular sampling model, with p parameters, is assumed to be true,
and consider the deviance,
where C is an arbitrary
constant which cancels out in the calculations. Let ED denote the
posterior expectation of the deviance, subject to your choice of
prior assumptions for θ. Then ED measures how well the model under
consideration fits the data.
The effective number
of parameters is, by definition
Where ξ is some
convenient-to-calculate Bayes estimate for θ, such as the posterior
mean vector, if it exists, or, much more preferably the
vector of posterior medians (which give invariance of q under
non-linear transformations of the parameters). Spiegelhalter et al
quite brilliantly suggest referring to
DIC = ED+q
which penalises ED, or D(ξ),
according to the number of effective parameters in the model.
At the model comparison
stage of your analysis, you could consider using a vague, e.g,
Jeffreys or reference prior for θ, and only referring to your
informative prior assumptions when considering your model-based
inferences, since their elicitation and application might only serve
to confuse this pre-inferential procedure. If you are considering
several candidate models, then simply choose the one with the
smallest DIC, maybe after cleaning-up the data using an exploratory
data analysis. If q is close to p, and ξ is close to the maximum
likelihood vector of θ ,then DIC will be well-approximated by
Akaike’s criterion AIC. However, when considering models with
complex likelihoods, DIC will sometimes be easier to calculate e.g.
using Importance Sampling, MCMC or acceptance sampling.
Spiegelhalter et al
justify DIC via convincing, though approximate, information
theoretic arguments which refer to Kullback-Liebler divergence.
Similar arguments would appear to hold when ED is replaced by the
posterior median of the deviance; this parallels taking ξ to denote
the posterior median vector of θ.
Since reference priors
maximise Lindley’s expected measure of information it would appear
natural, in principle at least, to use them when calculating DIC.
However, it is often easier to express Jeffreys’ invariance prior in
algebraic terms, or to use some other sensible form of vague prior.
Information Criterion has greatly enhanced the Bayesian paradigm
during the course of the last decade. It has taken Bayesians into
the ‘unconstrained-by-restrictive-sampling-models’ ballpark
envisioned by George Box in 1980, and enables us, including the
Economists, to determine scientifically meaningful,
parameter-parsimonious sampling models, e.g. as special cases of a
larger all-inclusive model, which are by no means as big as an
An inferential model
comparison procedure which refers to the full posterior
distributions of the deviances is discussed below in Author’s
Notes. In connection with this, it might be possible to develop
alternatives, also relating to cross-validation, to Box’s 1980
overall modeling checking criterion, which he believed to be
appropriate for situations where you have no specific alternative
model in mind.
The deviance information
criterion DIC provides Bayesians with a particularly useful and
straightforward way of applying Occam’s razor. See Jeffreys
and Berger  for a historical discussion of the razor. But
Jeffreys and Berger quote the thirteenth century English Franciscan
friar, logician, physicist and theologian William of Ockham, who
first envisioned the philosophies of the razor,
‘Pluralitas non est ponenda sine necessitate’
(Plurality is not to be fixed without necessity)
I wonder whether the
Bayesian Goddess has already elevated the four discovers of DIC to
immortality, as the Four Horsemen of Ockham’s Apocalypse
But perhaps the Goddess
of Statistics should wait and see. DIC is not applicable to all
models, as demonstrated by Sylvia Richardson and Christian Robert in
the discussion of the 2002 paper, and by Angelika van der Linde in
her 2005 paper on DIC in variable selection, in Statistica
Angelika puts DIC into
context and points to some possible alternatives. One problem is
that we do not, as yet, have an established and elaborate theory for
the estimation of information theoretic quantities like
Kullback-Liebler divergence. In their 2008 paper in Statistics,
Van der Linde and Tutz illustrate this problem for the coefficient
of variation for regression models, and this can also be reasonably
related to Kullback-Liebler diagnostics.
Gerhard Tutz is a
Professor of Statistics at the University of Munich. He has
published many high quality papers in Bayes-related areas.
In his 2002 paper ‘On irrelevance of alternatives and opinion
pooling’ in the Brazilian Journal of Probability and
Statistics, Gustavo Gilardoni of the University of Brasilia
considered the implications of two modified versions of the
‘irrelevance of alternatives’ axiom. The consequences included a
characterization of the Logarithmic Opinion Pool. This looks very
In 1993, Gustavo
published an article in the Annals of Statistics with Murray
Clayton on the reaching of a consensus using De Groot’s iterative
Phil Dawid, Julia
Mortera, V. Pascal and D. Boxel reported their probabilistic expert
systems for forensic evidence from DNA profiling in 2002 in the
Scandinavian Journal of Statistics, another magnificent piece of
Julia Mortera is
Professor of Statistics at the University of Rome. She has published
numerous high quality papers on Bayesian methods, including a number
of joint papers with Phil Dawid and other authors, including Steffen
Lauritzen, on Bayesian inference in forensic identification. She is
one of our leading applied women Bayesians, and a very pleasant lady
too. Maybe I should compose a sonnet about her. The Forensic
Empress of Rome, maybe.
Persi Diaconis and Susan
Holmes of Stanford University took a Bayesian peek into Feller
Volume 1 in 2002 in their fascinating article in Sankyā,
and developed Bayesian versions of three classical problems: the
birthday problem, the coupon collector’s problem, and the matching
problem. In each case the Bayesian component involves a prior on the
underlying probability mechanism, which could appreciably change the
Persi had previously
published his finite forms of De Finetti’s beautiful exchangeability
theorem (De Finetti’s theorem was praised in one of William Feller’s
treasured footnotes in Feller Volume 2), a fundamental paper with
David Freedman on Bayesian consistency, and a dozen De Finetti-style
results in search of a theory.
In 2011, Persi Diaconis
and Ron Graham were to co-author Magical Mathematics: The
Mathematical Ideas that Animate Great Magical Tricks.
Persi is a wonderful statistical probabilist and a great
In 2002, Ming-Hui Chen,
David Harrington and Joseph Ibrahim described some useful Bayesian
cure rate models for malignant melanoma, in the online Wiley
I’m quite interested in
all this, since it’s now fully two years since an artistic-looking
mole was removed from on my wrist leaving a dog-bite shaped scar
which I’ve recently flashed to a couple of eminent audiences during
my ongoing public health campaign.
The models included
1. A piecewise
exponential model: The authors proposed a semi-parametric
development based on a piecewise constant hazard of the
proportionate hazards model. The degree of nonparametricity is
controlled by J, the number of intervals in the partition.
2. A parametric cure
rate model: This assumes that a certain fraction ρ of the
population are ‘cured’ and the remaining 1- ρ are not cured. The
survivor function for the entire population is ρ+ (1-ρ ) S(t) where
S (t) is the survivor function for the non-cured group in the
population. The authors construct an elaborate choice of S(t) which
refers to the numbers of metastatic competent tumour cells for each
of n subjects, depending on a parameter of the random times taken
for the tumour cells to produce detectable metastatic disease.
3. A semi-parametric
cure rate model: This takes the survivor function to be
representable by a piecewise hazards model. The degree of
nonparametricity is controlled by J the number of unknown parameters
in the model.
For each of these
models, the authors construct a power prior to represent the
prior information concerning the unknown parameters. Their choices
are proportional to the product of a subjectively assessed beta
distribution and a power of the likelihood when it conditions on a
set of hypothetical prior observations.
The authors assess their
models by n separate CPO statistics. The i th. CPO statistic is just
the predictive density of the observed response variable for case i,
when conditioned on the remaining n-1 observed response variables.
They also refer to the average log-pseudo-Bayes factor B
which averages the logs of the CPO statistics.
The authors completed
Bayesian analyses of the E1690 time-to-event data for high risk
melanoma using MCMC and variations of each of the three preceding
choices of sampling model. The cure rate models performed equally
well; they fit the data slightly better than the piecewise
exponential model, according to the values of the B-statistics.
These results have considerable implications on designing studies in
high risk melanoma.
Thank you, guys! After a
non-malignant mole on my shoulder vanished during a recent scare, my
predictive probability of a malignant recurrence is now close to
Kevin Gross, Bruce Craig and
William Hutchison published their insightful paper ‘Bayesian
estimation of a demographic matrix model from stage-frequency data’
in 2002 in Ecology.
Bruce Craig is
nowadays Professor of Statistics and director of the statistical
consulting service at Purdue. His areas of interest include Bayesian
hierarchical modeling, protein structure determination, and the
design and analysis of micro-array experiments.
The year 2002 was indeed
a good one for Bayesian applications. Carmen Fernandez, Eduardo Ley
and Mark Steel modelled the catches of cod, Greenland halibut,
redfish, roundhouse grenadier and skate in a north-west Atlantic
fishery, and reported their conclusions in Applied Statistics.
Not to be outdone, David
Laws and Tony O’Hagan proposed a hierarchical Bayes model for
multi-location auditing in the on-line Wiley library. During their
somewhat adventurous presentation they introduced the notion of the
fractional error or taint of a transaction. They
proposed a complicated procedure for the elicitation of their prior
parameters, and their prior to posterior analysis involved oodles of
ratios and products of Gamma functions.
On a more theoretic
note, Stuart Barber, Guy Nason and Bernie Silverman of the
University of Bristol reported on their posterior probability
intervals for wavelet thresholding in JRSSB. They
approximated the first four simulants of the posterior distribution
of each wavelet coefficient by linear combinations of wavelet
scaling functions, and then fit a probability distribution to the
approximate cumulants. Their method assumed either independent
normal mixture priors for the wavelet coefficients or limiting forms
of these mixtures, and this yielded a posterior distribution which
was difficult to handle with exact computations. They however showed
that their approximate posterior credibility intervals possessed
good frequency coverage.
It is not obvious
whether the authors’ adaptive Bayesian wavelet and Bayes Thresh
choices of prior model made their posterior analysis overly
complicated. Indeed their prior formulations meant that the
posterior distribution was not overly robust to outliers. It might
be worth assuming a suitable prior distribution on function space
for the entire wavelet regression function, and then approximating
this on a linear subspace in order to develop a more sensible joint
prior distribution for the wavelet coefficients
In their articles in
JASA in 2002, Valen Johnson, Robert Deaner and Carel von Shenk
very bravely performed a Bayesian analysis of some rank data for
primate intelligence experiments, B.M. Golam Kibrua, Li Sun, Jim
Zidek and Nhu Le reported their Nostradamus-style Bayesian spatial
prediction of random space-time fields for mapping PM 2,5 exposure,
and Steven Scott came out of hiding and reviewed the recent Bayesian
recursive computing methodology for hidden Markov models.
Suitably encouraged, I now
survey the RSS journals for 2003 which I’ve just discovered
languishing with my long-discarded Irving Welsh novels in the
tottery John Lewis bookcase in my spare room.
Alexandra Schmidt and
Tony O’Hagan co-authored an important paper about Bayesian
inferences for non-stationary covariance structures via spatial
Ian Dryden, Mark
Scarr, and Charles Taylor report their Bayesian texture
segmentation of weed and crop images using reversible jump MCMC
methods. They model their pixel intensities using second order
Gaussian Markov random fields and the second-order stationary Potts
model. They take the number of textures in a particular image to
have a prior truncated Poisson distribution, and compute some
interesting, though not that detailed, statistically smoothed onion,
carrot, and sugar-beet images. Their trace plots for the simulated
posteriors are not entirely convincing.
Maura Mezzetti of
the University of Roma Tor Vergata and her five worthy co-authors
propose a Bayesian compartmental model for the evaluation of
1,3-butadiene (BD) metabolism. This refers to three differential
equations which represent the quantity of BD in three compartments
as a function of time. The equations depend upon the blood flows
through the compartments, the blood-tissue specific partition
coefficients, the total blood flow, the alveolar ventilation rates,
and body weights.
Mezzetti et al propose a
hierarchical model which assigns prior distributions to the
population parameters and to the individual parameters. They fit
their proposed pharmokinetic model to the BD data with some
interesting general conclusions.
addressed the problem of sample size determination by using a
Bayesian semi-parametric approach. He expressed his prior
distribution for an unknown sampling distribution as a convolution
of Dirichlet processes and selected the optimal size n of his random
sample by maximizing the posterior expectation of a utility function
which invokes the cost in utiles of taking n observations. Stephen’s
numerical example is absolutely gobsmacking, and, given the hefty
prior specifications required to choose a single value n, I am left
to ponder about the practical importance of his approach.
and Roger Hutching applied their Bayesian updating procedures to
data from the UK water industry. Experts are expected to express
their subjective probabilities for n binary outcomes, both as prior
estimates, and as ‘study estimates’ when more information is
available after a typically expensive study is undertaken. Moreover
the better, ‘study estimates’ should be sufficient for the prior
As the joint p.m.f. of
the binary responses typically requires the specification of an
immense number of joint probabilities, the authors refer to a
‘threshold copula’ which generates dependence between the binary
responses for specified marginal distributions by taking conditional
probits to be correlated, using a multivariate normal distribution.
They then employ ‘Jeffreys conditionalization’ as an updating
procedure. Their elicitation of the probability assessments and the
correlations from experts was supervised by statisticians, though
reportedly not in ideal fashion. Finally, they completed a sound
MCMC analysis of pipe data provided by South West Water Services, a
most fascinating study.
Fernando Quintana and Pilar Iglesias present a decision
theoretic formulation of product partition models (PPMs) that allows
a formal treatment of different decision problems such as estimation
or hypothesis testing together with clustering methods. The PPMs are
thereby constructed in the context of model selection. A Dirichlet
process prior is assumed for the unknown sampling distribution, and
the posterior inferences are used to detect outliers in a Chilean
stock-market set. An excellent contribution from Bayesian Chile. Why
don’t you give the plucky Bolivians their corridor to the sea back,
guys? You’re hogging too much of the coast.
Maria Rita Sebastiani
uses Markov random-field models to estimate local labour markets,
using a Bayesian texture segmentation approach, and applies these
techniques to data from 287 communes in Tuscany.
Stuart Coles and Luis
Pericchi used likelihood and Bayesian techniques to estimate
probabilities of future extreme levels of a process (i.e.
catastrophes) based upon historical data which consist of annual
maximum observations and may be modelled as a random sample from a
member of the generalised extreme value (GEV) family of
distributions which possess location and scale parameters together
with a kurtosis parameter. They in particular compute the predictive
distribution, given the historical data of a future annual maximum
observation Z, and relatively vague, though proper, prior
assumptions for the three unknown parameters. Their maximum
likelihood and Bayes procedures were applied to good effect to a set
of rainfall data from Venezuela. While the authors’ prior to
posterior analysis amounted to an extremely simple application of
standard Bayesian techniques, the practical conclusions were both
perceptive and fascinating.
While still on the
subject of rainfall, the Scottish BIOSS biomathematicians David
Allcroft and Chris Glasbey described how to use a latent
Gaussian Markov random-field model for spatiotemporal rainfall
disaggregation. They transform the rainfall observations to supposed
normality using an empirically estimated quadratic function of an
empirically estimated power transformation. In so doing they censor
any, albeit very informative, zero values of rainfall. Not a good
start! I’m surprised that they were allowed to publish it. The
authors then crank the MCMC handle and analysis a retrospective set
of rainfall data from aways in the Red Basin river valley in
Yoshiko Ogata, Koichi
Katsura and Masaharu Tanemura used a Bayesian hierarchical model
on tessellated spatial regions to investigate earthquakes, which of
course occur heterogeneously in space and time. Their assumptions
generalise a space-time epidemic-type aftershock model, and they
check them out by reference to an elaborate space-time residual
analysis. The authors constructed a MAP estimate of the
non-homogeneous region Poisson intensity across a coastal region of
Japan, and a MAP estimate of their hierarchical space-time model.
This was a magnificent piece of work.
Yoshiko Ogata and his
colleagues have published many papers during the last three decades
which concern the aftershocks of earthquakes. He is on the faculty
of the Institute of Statistical Mathematics in Tokyo.
Maria Rita Sebastiani is
Professor of Economics at the Sapienzia University of Rome. She
obtained her doctorate there in 1998 with the title
per la stima dei mercati locali del lavaro
Maria’s areas of
research interest include Bayesian inference, hierarchical spatial
modelling, business mortality risk, European populations and
demography, and transition probabilities to illness, dependency, and
Elsewhere in 2003, William Penny and Karl Friston of the Welcome
Trust Centre for Neuro-imaging at UCL used mixtures of generalised
linear models for functional neuro-imaging in their article in
IEEE Trans Med Imaging, and constructed some interesting
posterior probability maps.
Gary Chamberlain and
Guido Imbens reported their supposedly non-parametric applications
of Bayesian inference in 2003 in the Journal of Business &
Economic Statistics. Gary is currently the Louis Berkman
Professor of Economics at Harvard.
Moreover, Linda Garside
and Darren Wilkinson applied their dynamic lattice-Markov spatio-temporal
models to environment data in Bayesian Statistics 7, and
Harry Martz and Michael Hamada addressed uncertainty in counts and
operating time in estimating Poisson occurrence rates in their
article in Reliability Engineering&System Safety.
Japanese-American statistician Michael Hamada obtained his Ph.D.
from the University of Wisconsin-Madison during the early 1980s
where he attended my Bayesian course and worked as a project
assistant with Jeff Wu and myself at the U.S. Army’s ill-fated Math
Research Center, which was to finally get squished during the Gulf
War of 1991. Mike nowadays solves top level problems at the Los
Alamos National Laboratory in New Mexico. In 2008 he published the
outstanding book Bayesian Reliability with Alyson
Wilson, Shane Reese and Harry Martz.
Harry Martz is the
principal associate director for Global Security at Los Alamos. Now
that’s a good application of Bayesian reliability. And of addressing
uncertainty in counts.
Bradley Efron’s ASA Presidential address, delivered in Toronto
during August 2004, was entitled Bayesian, Frequentists, and
Professor Efron said, in
summary, that ‘My guess is that a combination of Bayesian and
frequentist ideas will be needed to deal with our increasingly
intense scientific environment.’
Brad discussed his ideas
in the context of breast cancer risk, data from an imaging scan
which quite clearly distinguished between 7 supposedly normal
children and 7 dyslectic children, and a bivariate scatterplot which
measured kidney function against age.
When concluding his
address, Professor Efron said,
‘Now the planets
may be aligning for Statistics. New technology, electronic
computation, has broken the bottleneck of computation that limited
classical statistical theory. At the same time an onrush of new
questions has come upon us, in the form of huge data sets and large
scale inference problems. I believe that the statisticians of this
generation will participate in a new age of statistical innovation
that might rival the golden age of Fisher, Neyman, Hotelling, and
Bradley Efron was
applauded by Frequentists and Bayesians alike, while the Goddess
Fortune hovered in the background.
In his 2004 paper in the Annals of Statistics, Stephen Walker
used martingales to investigate Bayesian consistency. He derives
sufficient conditions for both Hellinger and Kullback-Liebler
consistency, which do not rely on the use of a sieve, together with
some alternative conditions for Hellinger consistency, a splendid
I’m wondering whether to
write a poem about Stephen’s brave exploits. I could call it ‘Ode to
a Martingale’. There is a martingale on Berkeley Square which drives
everybody spare? Perhaps not.
Stephen is now Professor
of Mathematics at the University of Texas in Austin. His research
focuses on Bayesian parametric and semi-parametric methods, with
applications in medical statistics. He obtained his Ph.D. from
Imperial College London in 1995, where he was supervised by Jon
Richard Boys and Daniel
Henderson published their Bayesian approach to DNA sequence
segmentation in 2004 in Biometrics. Many DNA sequences
display compositional heterogeneity in the form of segments of
similar structure. The authors identified such segments using a
(real) Markov chain governed by a hidden Markov model. They quite
novelly assumed that the order of dependence q and the number of
parameters r were unknown, and took these parameters to possess
independent truncated Poisson priors. The vectors of transition
probabilities were taken, in the prior assessment, to possess
independent Dirichlet distributions.
Boys and Henderson
applied their computer-simulated prior-to-posterior MCMC procedure
to an analysis of the bacteriophage lambda, a parasite of the
intestine bacterium Enchiridia coli. They computed a very
illuminating joint posterior p.m.f. for the parameters q and r, and
checked this out with a prior sensitivity analysis. A highly
original and very thorough piece of work.
Richard Boys is
Professor of Statistics at the University of Newcastle at
Newcastle-upon-Tyne.. He applies his Bayesian ideas to science,
social science, and medicine, and he also has research interests in
statistical biomathematics and stochastic systems biology. Daniel
Henderson is a hard-working teaching fellow in the same department.
colleague Professor Darren Wilkinson is also well-published in
similar sorts of areas. When I first met him in 1997, he was still
green behind the ears, but he is now highly accomplished. The
Bayesian Geordies are good to drink with. They sip their beer in
silence for lengthy periods of time, while making the occasional wry
wisecrack out of the depths of their grey matter.
In their JASA papers of 2004, Scott Berry and five co-authors
employed their Bayesian survival analysis with non-proportional
hazards for a metanalysis of combination pravastatin-aspirin, the
redoubtable Jay Kadane and Nicole Lazar discussed various
model-checking criteria, Stephen Walker, Paul Damien and Peter Lenk
investigated priors with a Kullback-Liebler property, and Nidhan
Choudhuri, Subhashis Ghoshal and Anindaya Roy considered the
Bayesian estimation of a spectral density.
In 2004, the ever
insightful Michael Hamada, Valen Johnson, Leslie Moore and Joanne
Wendelberger co-authored a useful paper in Technometrics on
Bayesian prediction and tolerance intervals.
In the same year, George Streftaris and Gavin
Gibson published their very useful paper 'Bayesian Inference for
stochastic epidemics in closed populations' in Statistical
George and Gavin are respectively Senior Lecturer and
Professor in the Department of Actuarial Mathematics and
Statistics at Heriot-Watt.
In the Royal Statistical
Society journals of 2004,
Patrick Wolfe, Simon
Godsill and Wee-Jing Ng described their Bayesian variable
selection and regularization methodologies for time-series surface
estimation. They, in particular, analysed the Gabor regression
model, and investigated Frame theory, sparsity, and related prior
Dan Cornfield, Lehel
Csató, David Evans and Manfred Opper published an invited
discussion paper about their Bayesian analysis of the scatterometer
wind retrieval inverse problem. They showed how Gaussian processes
can be used efficiently with a variety of likelihood models, using
local forward observation models and direct inverse models for the
scatterometer. Their vector Gaussian process priors are very useful.
Randall Eubank and five
co-authors discussed smoothing spline estimation in varying
coefficient models. They used the Kalman filter to compute their
posterior inferences, and developed Bayesian intervals for the
coefficient curves. A very competent piece of research.
Jeremy Oakley and Tony
O’Hagan described their posterior sensitivity analysis of
complex models. This is a very intense paper with lots of novel
ideas, which is well worth scrutinizing in detail. I think.
Tae Young Yang
described his Bayesian binary segmentation procedure for detecting
streakiness in sports. He employed an interesting
integer-valued change-point model but falls straight into the Bayes
factor trap. His posterior inferences would be very sensitive to
small changes, e.g. unit changes in the ‘prior sample sizes’, to the
parameters of his beta priors. What a shame they didn’t teach you
that at Myongji University, Tae.
successfully investigated the assertions that Barry Bonds was, and
Javy Lopez wasn’t, a streaky home run hitter during the 2001 and
1988 seasons, whether the Golden Warriors were a streaky basketball
team during the 2000-2001 season, and whether Tiger Woods was a
streaky golfer during September 1996-June 2001. This is all velly
velly interesting stuff!
Gary Koop followed
that up by describing his Bayesian techniques for modelling the
evolution of entire distributions over time. He uses his techniques
to model the distribution of team performance in Major League
baseball between 1901 and 2000.
Konstandinos Politis and
Lennart Robertson described a forecasting system which predicts
the dispersal of contamination on a large scale grid following a
nuclear accident. They employed a hierarchical Bayesian forecasting
model with multivariate normal assumptions, and computed some
convincing estimated dispersion maps.
Carolyn Rutter and
Gregory Simon of the Center for Health Studies in Seattle
described their Bayesian method for estimating the accuracy of
recalled depression among outpatients suffering from supposed
bipolar disorder [mood swings may well be symptomatic of physical,
e.g organ, gland or sleep, malfunctions rather than the famously
hypothetical ‘biochemical imbalance’] who took part in LIFE
interviews. In the study under consideration, each of 376 patients
were interviewed twice by phone. One of various problems with this
approach is that telephone interviews are likely to increase the
accuracy of recall i.e. during the second telephone call the patient
may remember his first interview rather than his previous mood
It does seem strange
that these patients only received follow-up telephone interviews as
a mode of treatment. In Britain they would have been more closely
monitored during their mood swings by their community psychiatric
nurses (CPNs). There are often big differences between the moods
observed by the CPNs and the moods perceived by the patients (David
Morris, personal communication) neither of which, of course, may
Researchers at the
Center for Health Studies in Seattle have also been known to
advocate the use of an ‘optimal’ regime of atypical anti-psychotic
drugs which have statistically established high probabilities
(around 70% in the short term as established by the CATIE study) of
causing intolerable physical side effects among patients suffering
from schizophrenia. This apparent mental disorder could, however, be
caused by calcification of the pineal gland, and maybe even from
high oil concentration in the skin, or from a variety of possible
I therefore have some
mild a priori reservations regarding the sorts of advice the
center in Seattle might give which could influence future mental
health treatment. Rutter and Simon would appear to have omitted a
number of important symptom variables from their analysis e.g.
relating to lack of sleep and thyroid dysfunction. Their conclusions
could therefore be a bit spurious.
Fulvio De Santis, Marco
Perone Pacifico and Valeria Sambucini describe some Bayesian
procedures for determining the ‘optimal’ predictive sample size n
for case control studies.
Let ψ denote a parameter
of interest, for example the log-measure of association in a 2x2
contingency table, and consider the 100(1-α)% highest posterior
density (HPD) Bayesian interval for ψ . Then, according to the
length probability criterion (LPC), we should choose the value of n
which minimizes the expected length of this interval, where the
expectation should be taken with respect to the joint prior
predictive distribution of the endpoints.
The authors generalise
LPC, in order to take variability into account, by choosing the
smallest n such that the prior predictive probability of having an
interval estimate whose length is greater or less than a given
threshold is limited by a chosen level.
In the context of
hypothesis testing they recommend choosing the smallest n such that
the probability that neither the null nor the alternative hypothesis
is ‘strongly supported’ is less than a chosen threshold, where a
further threshold needs to be specified in order to determine
The authors apply these
ideas to a practical example concerning the possible association
between non-Hodgkin’s lymphoma and exposure to herbicide.
Observations for 1145 patients are used as a training sample that
helps to determine the, very strong, choice of prior distribution,
which is doubtlessly needed to justify the normal approximations.
I’m sorry if I’m
sounding too frequentist, but wouldn’t it be simpler, in terms of
prior specification, to choose the value of n which minimises the
strength, i.e. the average power with respect to some prior measure,
for a test of specified size?
Francesco de Pasquale,
Piero Barone, Giovanni Sebastini and Julian Stander describe an
integrated Bayesian methodology for analysing dynamic magnetic
resonance images of human breasts. The methods comprise image
restoration and classification steps. The authors use their
methodology to analyse a DMRI sequence of 20 two-dimensional images
of 256x256 pixels of the same slice of breast. An absolutely
splendid and highly influential contribution.
Nicky Best and Sylvia
Richardson co-authored nine splendid joint papers with other
co-workers between 2005 and 2009. They include articles on modelling
complexity in health and social science: Bayesian graphical models
as a tool for combining multiple sources of information, improving
ecological inference using individual-level data, studying place
effects on health by synthesising individual and area-level incomes,
and adjusting for self-selection bias in case control studies
Nicky Best is Professor
of Statistics and Epidemiology at Imperial College London. She and
Deborah Ashby recently developed a Bayesian approach to complex
clinical diagnoses, with a case study in child abuse. They reported
this, with Frank Dunstan, David Foreman and Neil McIntosh in an
invited paper to the Royal Statistical Society in 2013.
In George Barnard and David
Cox’s days, the statisticians at ICL taught in a charming house on
Exhibition Road. and in the Victorianesque Huxley Building, which
tagged onto the similarly spacious Victoria and Albert Museum and
where, according to George Box, a statistician was once crushed to
death by the much dreaded lift.
infamous Huxley Building lift shaft
Philip Prescott writes
from the University of Southampton ‘the lift had large metal gates
that would have to be careful closed in order that the lift would
open properly. It was usually quicker to walk up the stairs, or ran
if we were late for our lectures.’
Bayesians at ICL are much safer nowadays.
I last visited ICL in
1978 to teach a short course on Bayesian Categorical Data Analysis.
This was during the final stages of the World Cup in Argentina (I
remember David Cox telling me that Argentina had just beaten Peru
six nil). Nowadays, the statisticians are housed in much more modern
premises on Queen’s Gate, and the life expectancy has improved
Nicky Best received
the Royal Statistical Society’s Guy Medal in Bronze in 2004.
While Sylvia Richardson
is slightly more mathematically inclined, Nicky also shows
considerable practical acumen. Both Nicky and Sylvia have been
heavily involved in Imperial College’s BIAS research program, which
addresses social science data that are notoriously full of missing
values, non-responses, selection biases and other idiosyncrasies.
Bayesian graphical and hierarchical models offer a natural tool for
linking many different sub-models and data sources, though they may
not provide the final answer.
In 2005 the second edition of the best-selling book Statistics
for Experimenters, by George Box, J. Stuart Hunter, and Bill
Hunter was published, a quarter of a century after the first
edition, but with the new subtitle Design, Innovation, and
Discovery. The second edition incorporated many new ideas at Stu
Hunter’s suggestion, such as the optimal design of experiments and
J. Stuart Hunter is
considered by many people to be a wonderful character, a gifted
scientist, and one of the most important and influential
statisticians of the last half century, especially with regard to
applying statistics to problems in industry. He is currently
investigating developments in data mining and machine learning.
In their JASA
papers of 2005, Peter Müller discussed applied Bayesian modeling,
Michael Elliott and Rod Little described their Bayesian evaluation
of the 2000 census, using ACE survey data and demographic analysis,
and Antonio Lijio, Igor Prünster and Stephen Walker investigated the
consistency of semi-parametric normal mixtures for Bayesian density
John Pratt published his paper ‘How many balance functions does it
take to determine a utility function?’ in 2005 in the Journal of
Risk and Uncertainty.
John is the William
Ziegler Professor Emeritus of Business Administration at Harvard. He
is a traditional Bayesian of the old school, but with all sorts of
Peter Rossi and Greg
Allenby published their book Bayesian Statistics and Marketing
with John Wiley in 2005.
Peter is James Collins
Professor of Marketing, Statistics, and Economics at UCLA. He used
to be one of Arnold Zellner’s happy crew in the University of
Chicago Business School.
Dario Spanò and Robert
C. Griffiths co-authored their paper on transition functions with
Dirichlet and Poisson-Dirichlet stationary distributions in
Oberwolfach Reports in 2005.
Dario Spanò obtained his
Ph.D. in Mathematical Statistics from the University of Pavia in
2003. In 2013 he was promoted to Associate Professor of Statistics
at the University of Warwick, where he is also director of the M.Sc.
program and a diehard Bayesian to boot.
Robert C. Griffiths FRS
is Professor of Statistics at the University of Oxford.
Also in 2005, Geoff McLachlan and David Peel co-authored A
Bayesian Analysis of Mixture Models for the Wiley on-line
library. This is the fourth chapter of their 2004 book Finite
Mixture Models, and it is very important in terms of
semi-parametric multivariate density estimation. This is a situation
where proper priors are always needed, e.g. any improper prior for
the mixing probabilities will invariably lead to an improper
posterior. Morris de Groot was the first to tell me that. Maybe a
maximum likelihood procedure using the EM algorithm would sometimes
work better in practical terms. Geoff has produced a wonderful
computer package which does just that. See , p3. Orestis
Papasouliotis and I applied Geoff’s package to the Lepenski Vir
Mesolithic and Neolithic skeletons data to excellent effect.
Professor McLachlan is
interested in applications in medicine and genetics. He is a
Distinguished Senior Research Fellow at the University of
Queensland. In 2011, he was awarded the Pitman Medal, the
Statistical Society of Australia’s highest honour.
Laurent Itti and Pierre
Baldi published their paper ‘Bayesian Surprise Attracts Human
Attention’ in 2005 in Advances in Neural Processing Systems.
The following year, the authors were to characterise surprise in
humans and monkeys, and to model what attracts human gaze over
natural dynamic scenes.
Researchers in the murky
interior of the USC Hedco Neuroscience building in Los Angeles, who
are heavily involved in the Bayesian theory of surprise, later
developed a ‘bottoms-up visual surprise’ model for event detection
in natural dynamic scenes. I wouldn’t be surprised by whatever the
heady people at Hedco come up with next.
Jay Kadane and four
co-authors reported their conjugate analysis of the
Conway-Maxwell-Poisson (CMP) distribution in 2005 in Bayesian
Analysis. This distribution adds an extra parameter to the
Poisson distribution to model overdispersion and under-dispersion.
oft-referred-to four-decade-old paradigm (Do transform to a
priori normality!), I might, if the mood strikes, instead assume
a, more flexible, bivariate normal prior for the logs of the two
parameters. This non-conjugate prior extends quite naturally to a
multivariate normal prior for the parameters in generalised linear
models which refer to CMP sampling assumptions.
During 2005, Luis Pericchi
published an article in Elsevier R.V. Handbook of Statistics
entitled ‘Model Selection and Hypothesis Testing based on Objective
Probabilities and Bayes Factors’.
Luis Pericchi is
Professor of Mathematics at the University of Puerto Rico. He has
authored numerous influential papers on Bayesian methodology and its
applications, and is highly regarded.
Samuel Kou, Sunney Lie
and Jun Liu of Harvard University reported their Bayesian analysis
of single-molecule experimental data in 2005 in the invited
discussion paper to the Royal Statistical Society. They investigate
an interesting two-state model for Y(t) the total number of photons
arriving up to time t. This is equivalent to the photons arrivals
following a doubly stochastic Poisson process, and the authors use
their two-state formulation to investigate Brownian motion, and to
solve problems relating to the Deoxyribonucleic acid hairpin and
fluorescence life-time experiments.
Professor Alan Hawkes
of the University College of Wales at Swansea was slightly puzzled
when he proposed the Vote of Thanks, and he raised a few quibbles.
For example, the authors’ process for photon observation depends on
an underlying gamma process which is not dependant on the sequence
of pulses. And yet it is the pulses that emit photon emission. The
authors addressed these quibbles in their written reply.
In 2005, Thomas Louis
published an article in Clinical Trials on the fundamental
concepts of Bayesian methods.
Louis is Professor of
Biostatistics at the Johns Hopkins Bloomberg School of Health in
Baltimore. He has many Bayesian research interests, for example in
the analysis of medical longitudinal, spatial, and observational
data, and he is very well published.
During 2005 in Applied
Samuel Mwalili, of the Jomo
Kenyatta University of Agriculture and Technology in Nairobi,
Emmanuel Lesaffre and Dominic Declerck used a Bayesian ordinal
regression model to correct for inter-observer measurement error in
a geographical health study;
Jaime Peters and five
colleagues at the University of Leicester used Bayesian procedures
to investigate the cross-design synthesis of epidemiological and
toxicological evidence, using some ingeniously vague choices for
their prior parameters;
Claudio Verzilli, of the
London School of Hygiene and Tropical Medicine, John Whittaker,
Nigel Stallard and Daniel Chapman proposed a hierarchical Bayesian
multivariate adaptive regression spline model for predicting the
functional consequences of amino-acid polymorphisms, and used it to
investigate the lac repressor molecule in Escherida coli. In the
absence of lactose, this molecule binds to the DNA double helix
upstream of the genes that code for enzymes;
Sujit Sahu and Kanti Mardia
propose a Bayesian kriged Kalman model for the short-term
forecasting of air pollution levels, which assumes that the spatial
covariance kernel belongs to the Matérn family. The authors apply
their Bayesian analysis most successfully to the New York air
Sujit Sahu is Professor of
Statistics at the University of Southampton. He is interested in
Bayesian modeling for interpreting large and complex data sets in a
wide range of application areas.
In 2006, Stephen Pitt
reported his efficient Bayesian inferences, with David Chan and
Robert Kohn, for Gaussian copula regression models in Biometrika.
He is one of the most insightful of our up-and-coming Bayesian
Economists. I remember chewing the rag with him in 1998 at Valencia
Stephen is Professor of
Economics at the University of Warwick, where he has worked as
liaison officer for the highly Bayesian, integrated single honours
MORSE (Mathematics, Operational Research, Statistics, and Economics)
degree which I helped Robin Reed and the Economists to create in
1975. Stephen’s research areas include financial time series,
non-Gaussian state models, stochastic volatility models, and MCMC.
The campus of the
University of Warwick is situated in the southern fringes of the
once bombed out City of Coventry several miles north of the ruins of
the historic Kenilworth Castle where the rebellious thirteenth
century leader Simon de Montfort, the Ill-fated King Edward the
Second, and King Henry the Fifth, the victor at Agincourt, both
stayed. The better preserved Warwick Castle, with its vibrant
dungeons and strutting peacocks, lies even further to the south.
Therein, the evil Kingmaker of the Wars of the Roses once lived, a
century or so after the losing King at Bannockburn’s butch lover
Piers Gaveston was beheaded by irritated nobles in a ditch nearby.
The University of
Warwick of the late 1960s and early 1970s just consisted of a few
loosely white-tiled buildings scattered across a long tract of farm
land, and the junior academics were reportedly exploited as guinea
pigs by the Napoleonic Vice-Chancellor. However, after that
gentleman melted into thin air, the campus slowly became
jam-packed with buildings and eventually accommodated one of the
most thriving universities in Europe. The MORSE degree, which
started in 1975 with 30 students a year, became world famous with
undergraduate intake for the MORSE and MMORSE degrees is 150 a year,
with many students coming from overseas. A new interdepartmental
B.Sc. degree in Data Analysis will start in 2014.
Getting back to 2006,
Petros Dellaportas, Nial Friel and Gareth Roberts reported their
Bayesian model selection criterion for partially (finitely) observed
diffusion models in Biometrika. For a fixed model
formulation, the strong dependence between the missing paths and the
volatility of the diffusion can be broken down using one of Gareth’s
previous methods. The authors described how this method may be
extended via reversible jump MCMC to the case of model selection. As
is ever the case with reversible jump MCMC, my mind boggles as to
how long the simulations will dither and wander before wandering
close to the theoretical solution.
Nial Friel is Associate
Professor and Head of Statistics at University College Dublin. His
research interests include Bayesian inference for statistical
network models, social network analysis, and model selection. He
obtained his Ph.D. in 1999 from the University of Glasgow.
In his interesting 2006 article in Statistics in Society,
Gene Hahn re-examined informative prior elicitation through the lens
of MCMC methods. After reviewing the literature, he stated four
principles for prior specification relating to the need (1) to
elicit prior distributions which are of flexible form (2) to
minimize the cognitive demands on the expert (3) to minimize the
demands on the statistician, and (4) to develop prior elicitation
methodologies which can be easily applied to a wide range of models.
With these ambitious,
though somewhat expedient, principles in mind, Hahn recommended
eliciting non-conjugate priors by reference to Kullback-Liebler
divergence. He applied his ideas to inference about a regression
parameter in the context of set of data on rainfall in York.
Overall, an intriguing study, which seeks to simplify the plethora
of modern prior elicitation techniques.
| Gene Hahn is Associate
Professor of Information and Decision Sciences at Salisbury
University in Maryland. His research interests include management
decision making, Bayesian inference, and international operations
including offshore and global supply chain management.
In contrast to Gene Hahn’s
prior informative approach, Trevor Sweeting, Gaura Datta, and Malay
Ghosh proposed deriving vague non-subjective priors by minimizing
predictive entropy loss. Their suggestions in their exquisitely
mathematical paper ‘Nonsubjective priors via predictive relative
entropy regret’ in the Annals of Statistics (2006) may be
contrasted reference priors. Sweeting is one of those wonderful
English surnames which makes you proud to feel that you’re British.
Trevor Sweeting is
Emeritus Professor of Statistics at UCL. His interests also include
Bayesian computations, Laplacian Approximations, and Bayesian
semi-parametric hierarchical modeling, and he has published
extensively within the Bayesian paradigm. He is one of the later
generation of UCL Bayesians who, together with Tom Fearn, picked up
the pieces during the decades following Dennis Lindley’s dramatic
departure in 1977, which had all the ingredients of a Shakespearean
The UCL statisticians are now housed in modern premises on Tottenham
Court Road, a couple of hundred yards to the west of the Malet
Street quadrangle, where feisty Administrators and beefeaters once
roamed and well away from the padded skeleton of the purveyor of
happiness Jeremy Bentham, which is still on show in a glass case. An
innocuously quiet, unassuming Bayesian called Rodney Brooks still
beavers away in the Stats department over forty years after he
published a straightforward version of Bayesian Experimental Design
in his UCL Ph.D. thesis, and Mervyn Stone still floats through, over
a century after Karl Pearson first moved into Malet Street with an
autocratic demeanour which was only to be rivalled by Sir Ronald
Fisher. According to the industrial statistician and crafty Welsh
mountain walker Owen Davies, who succeeded Dennis Lindley to the
Chair in Aberystwyth in 1967, Karl and Sir Ronald both accused the
other of behaving too autocratically; indeed their interdepartmental
dispute over who should teach which course was never to be
UCL Centre for Computational Statistics and Machine Learning
In their 2006 paper in Statistics in Society, John Hay,
Michelle Haynes, Tony Pettitt and Thu Tran investigated a Bayesian
hierarchical model for the analysis of categorical longitudinal data
from a large social survey of immigrants to Australia.
The observed binary
responses can be arranged in an NxJxT array, where each of the N
individuals in the sample can be in any one of J states at each of T
times. Under appropriate multinomial assumptions, the authors took
the three-dimensional array of multivariate logits to be constrained
by a linear model that may or may not have random components, but
which depends upon NxT vectors of explanatory variables and NxT
vectors of lagged variables. While the authors could have explained
a bit more what they were up to, they referred to WinBUGS for their,
presumably first stage multivariate normal, prior specifications and
used DIC when comparing different special cases of their general
model specification. Then they used their model-specific inferences
to draw some tolerably interesting applied conclusions from their
John Hay published his
book Statistical Modeling for Non-Gaussian Time Series Data with
Explanatory Variables out of his 1999 Ph.D.thesis at the
Queensland University of Technology (QUT) in Brisbane.
Tony Pettitt is
Professor of Statistics at QUT. His areas of interest, while working
in that neck of the woods, include Bayesian Statistics, neurology,
inference for transmissions of pathogens and disease, motor unit
number registration, and Spatial Statistics.
In 1788 seven ships set
forth from the Mayflower Steps in Plymouth, England, packed with
West Country petty criminals and sheep stealers, landed in Botany
Bay, and founded Sydney, Australia. No offence, Tony. I was just
trying to wax lyrical. Good luck on your motor registrations.
George Kuczera of the Department of Engineering of the University of
Newcastle in New South Wales has established himself as a world
authority on the theory and applications of Bayesian methods in
hydrology and water resources.
Mark Steel, our friendly Dutchman at the University of Warwick, was
his usually dynamic self in 2006. He published the following three
joint papers in JASA during that same year:
Order Based Dependent
Dirichlet Processes (with Jim Griffin),
representation of skewed normal distributions (with José Ferreira),
geostatistical modelling (with M.Blanca Palacios).
Mark also co-authored
papers in 2006 with Jim Griffin in the Journal of Econometrics,
and with J.T. Ferreira in the Canadian Journal of Statistics.
By coincidence, our very
own Peter Diggle and his mate Soren Lophaven also published a paper
on Bayesian geostatistical design in 2006, but in the
Scandinavian Journal of Statistics.
Peter Diggle is a
Distinguished Professor of Statistics at the University of Lancaster
down by the Lake District. Peter has authored a number of Bayesian
papers, and his research interests are in spatial statistics,
longitudinal data, and environmental epidemiology, with applications
in the biomedical, clinical, and health sciences. He is very highly
Peter is President-elect
of the Royal Statistical Society, and received the Guy Medal in
Silver in 1997.
Jim Griffin is Professor
of Statistics at the University of Kent. His research interests
include Bayesian semi-parametrics, slice sampling, high frequency
financial data, variable selection, shrinkage priors and stochastic
In their 2006 papers in
JASA, Knashawn Morales, Joseph Ibraham, Chien-Jen Chen and
Louise Ryan applied their Bayesian model averaging techniques to
benchmark dose estimation for arsenic in drinking water, Nicholas
Heard, Christopher Holmes and David Stephens described their
quantitative study of the gene regulation involved in the immune
response of anopheline mosquitoes, and Niko Kaciroti and his five
courageous co-authors proposed a Bayesian procedure for clustering
longitudinal ordinal outcomes for the purpose of evaluating an
asthma education program.
In their 2007 in Statistics in Society, David Ohlssen, Linda
Sharples and David Spiegelhalter proposed a hierarchical modelling
framework for identifying unusual performance in health-care
providers. In a special case where the patients are nested within
surgeons a two-way hierarchical linear logistic regression model is
assumes for the zero-one counts which incorporates provider effects
and the logits of EuroSCOREs are used as covariates for case-mix
adjustments. This is a special case of a more general formulation
which contains the same sort of parametrization.
effects are taken to either be fixed or to constitute a random
sample from some distribution e.g. a normal or heavier-tailed
t-distribution, a mixture of distributions, or a non-parametric
distribution. I would personally use a large equally weighted
mixture of normal distributions with common unknown dispersion, and
unknown locations. If this random effects distribution is estimated
from the data by either hierarchical or empirically Bayesian
procedures, then appropriate posterior inferences will detect
unusual performances by the health care providers. The authors
seemed to take a long time saying this, but their job was then well
Meanwhile, John Quigley
and Tim Bedford of the University of Strathclyde in Glasgow used
Empirical Bayes techniques to estimate the rate of occurrence of
rare events on railways. They published their results in the
Journal of Reliability and System Safety.
Jean-Michel Marin and
Christian Robert published their challenging and high level book
Bayesian Core: A Practical Approach to Computational Bayesian
Statistics in 2007.The authors address complex Bayesian
computational problems in regression and variable selection,
generalised linear models, capture-recapture models, dynamic models,
and image analysis.
If computer R labs are
added to three hours a week lectures, then culturally-adjusted
graduate students with a proper mathematical background can
reportedly hope to achieve a complete picture of Marin and Robert’s
treatise within a single semester. I hope that they are also given
an intuitive understanding of the subjective ideas involved, and are
not just taught how to crank the computational handle.
In their article in
Accident Analysis and Prevention, Tom Brijs, Dimitris Karlis,
Filip Van den Bossche and Geert Wets used an ingenious two-stage
multiplicative Poisson model to rank hazardous road sites according
to numbers of accidents, fatalities, slight injuries, and serious
injuries. They assumed independent gamma priors for the seven sets
of unknown parameters, before referring to an MCMC analysis, and
derived the posterior distributions of the ‘expected cost’
parameters for the different sites. These are expressible as
complicated functions of the model parameters.
The authors utilised
this brilliant representation to analyse the official traffic
accidents on 563 road intersections in the Belgian city of Leuven
for the years 1991-98. Their MCMC algorithm converged quite easily
because the sampling model was so well-conditioned. Their posterior
boxplots described their conclusions in quite beautiful fashion, and
they checked out their model using a variety of predictive p-values.
Their Bayesian p-values for accidents, fatalities, severely injured
and slightly injured were respectively 0.208, 0.753, 0.452 and
0.241. The authors therefore deemed their fit to be satisfactory.
This is what our paradigm is all about, folk!
In their 2007 JASA papers,
Chunfang Fin, Jason Fine and Brian Yandell applied their unified
semi-parametric framework for quantitative trait loci to the
analysis of spike phenotypes, Anna Grohovac Rappold, Michael Lavine
and Susan Lozier used subjective likelihood techniques to assess the
trends in the ocean’s mixed layer depth, Daniel Cooley, Doug Nychka
and Philippe Naveau used Bayesian spatial models to analyse extreme
precipitation return levels, and Bo Cai and David Dunson used
Bayesian multivariate isotonic regression splines in their
In their 2007 paper
in Applied Statistics, Roland De La Cruz-Mesia, Fernando
Quintana and Peter Müller used semi-parametric Bayesian
classification to obtain longitudinal markers for 173 pregnant women
who are measured for β human chorionic gonadotropin hormone during
the first 80 days of gestational age.
The data consisted of
the observed response vector y for each patient for the known
time-points at which their hormone level was observed, together with
a zero-one observation x indicating whether the pregnancy was
regarded as normal or abnormal. A two-stage sampling model was
assumes, where the y’s were taken to be independent given a matching
set of random effects parameters. Then the random effects were
assumed to be independent with distributions depending on the
corresponding x’s a vector φ of common unknown parameters, and two
random effects G parameters The posterior classification
probabilities (as to whether a further patient with another y vector
has a normal or abnormal pregnancy) can then be obtained in terms of
the prior probabilities by a simple application of Bayes rule.
When applying this
procedure, it would have seemed important to use empirical estimates
for φ and distributions of the G random effects parameters, in order
to avoid the interpretative difficulties inherent in Bayes factors.
However, the authors represented φ in terms of latent variables in
the form of random matrices, and assumed dependent Dirichlet priors
for the distributions of the G parameters. Phew, Gor blimey!
The author’s proceeded
merrily along with their MCMC computations, and got the posterior
inferences which they, as full-dress hierarchical semi-parametric
Bayes factorists, may well have deserved. A nous more practical
common sense would not have gone amiss.
Jose Bernardo presented a high-powered invited discussion paper in
Sort in 2007 on objective Bayesian point and region
estimation in location-scale models. I don’t know what Jose meant by
‘objective’. Isn’t it a bit of a problem for Bayesians to use this
evocative sort of terminology? Observational data are usually more
subjective than objective.
In 2007, the book
Gaussian Markov Random Fields by Håvard Rue and Leonhard Held
was a runner-up for ISBA’s De Groot Prize.
Whoops! We almost missed
out on three exciting publications in the 2007 RSS journals:
Yuan Ji and seven
co-authors used Bayesian mixture models for complex high dimensional
count data in phage display experiments, and proposed an interesting
hierarchical prior distribution for the parameters in a Poisson/
log-linear model for the observed counts, which were taken at
different phages and from different, e.g. mouse, organs. They then
referred to a full-dress, parametric Bayesian analysis. The wonders
of MCMC will never cease
Brent Coull, Joel Schwarz and Helen Suh used semi-parametric latent
variable regression models for the spatiotemporal modelling of
mobile source particles in the greater Boston area. They used a
non-linear factor analytic model together with geoadditive
semi-parametric regression assumptions, and sorted out the
identifiability problems with their informative prior
specifications. Their MCMC analysis yielded some beautiful-looking,
and neatly coloured maps, of Boston, Massachusetts.
Nicole Augustin, Stefan
Lang, Monica Musio and Klaus von Wilpert, God bless their cotton
socks, applied Bayesian structural additive regression to a spatial
model for the needle losses of pine-trees in the forests of
Baden-Württemberg. A delightful bed-time read.
Pilar Iglesias-Zuazola, the spiritual leader of Bayesians in Chile,
passed away on the third of March 2007. She was one of the leading
Bayesian researchers in Latin America, and an outstanding educator
who would visit Chilean pupils in their high schools.
After working with
Carlos Pereira, Pilar received her doctorate from the University of
São Paulo in Brazil. Upon becoming a faculty member of the Catholic
University of Chile (PUC), she started making contact with Bayesians
at other Chilean Universities. Consequently, the University of La
Serena, Chile, hosted the First Bayesian Workshop in January 1996.
In 2010, ISBA instituted
the Pilar Iglesias Travel Award in Pilar’s honour. Recipients to
date include Delson Chivabu (South Africa), Jose Ramiro (Chile),
Fernando do Nascimento (Brazil), and Francisco Torres-Arles (Chile).
In 2007, Samprit Banerjee,
Brian Yandell and Neng jun Yi co-authored a seminal paper ‘Bayesian
Quantitative Trait Loci Mapping for Multiple Traits’ in Genetics.
In 2008 Brian Yandell
taught a course on Quantitative Trait Loci (QTL) mapping at the
University of Wisconsin-Madison. The goals of his QTL study included
(1) discovering underlying biochemistry (2) finding useful
candidates for medical intervention (3) discovering how the genome
is organised (4) discerning units of natural selection, and (5)
predicting phenotype or breeding value. An outstanding enterprise.
In their 2008 paper in
JASA, Dimitris Fouskasis and David Draper compared stochastic
optimization methods for variable selections in binary outcome
prediction, with application to health policy. They published two
further important Bayesian papers in 2009, with Ioannis Ntzoutras,
which related to Bayesian variable selection with application to
cost-effective measurement of quality of health care. Fouskasis and
Ntzoutras? They sound like two up-an-coming names for the future.
I’m sure they’re the best of buddies.
David Draper advocates
‘Bayesian-Frequentist fusion’, and thinks that Gauss, Galton and
Fisher were early Bayesians who may have fused. He is a Professor of
Applied Mathematics and Statistics at the University of California
at Santa Cruz. A past president of ISBA, he is a prolific applied
Bayesian researcher and prize-winning short-course teacher. There’s
a great mug shot of him on the Internet.
In their 2008 JASA
papers, Chiara Sabatti and Kenneth Lange used Bayesian Gaussian
mixture models to analyse high-density genotype arrays, Shane Jensen
and Jun Liu used a Bayesian clustering procedure to analyse
transcription factor binding motifs, Abel Rodriguez, David Dunson
and Alan Gelfand drew nested Dirichlet processes to the world’s
attention, and David Dunson, Ya Xue and Lawrence Carin used a matrix
stick-breaking process to develop a flexible Bayesian meta-analysis.
I enjoy breaking my matrix sticks at bedtime, Professor Dunson, and
you’re one of the world’s leading Bayesian researchers.
In the same year, the
Spanish mermaid Carmen Fernandez and her three not-so-fishy
co-authors reported their Bayesian analysis of a two stage biomass
model for the Bay of Biscay anchovy, in the ICES Journal of
Marine Science. Carmen is a very enthusiastic research
scientist at the Spanish Institute of Oceanography, and she has
published 33 very useful papers in Statistics and Fisheries
journals, several of them jointly with her friend and mentor Mark
Steel of the University of Warwick.
Also in 2008, Lei Sun and Murray Clayton co-authored ‘Bayesian
analysis of cross-classification spatial data, and Murray Clayton
and his six co-authors reported ‘Predicting spatial patterns of fire
on a California landscape’ in the International Journal of
Murray Clayton is
Professor of Statistics and Plant Pathology at the University of
Wisconsin-Madison. As one of the leading Canadian Bayesians he has
published a number of high quality theoretical articles, and many
application papers e.g, in the agricultural sciences. As a student
of Don Berry of the University of Minnesota, Murray is well able to
mix Bayesian theory with practical relevance. Don has done all sorts
of important things too, as well as running a remunerative Bayesian
Murray’s eminent colleague in the University of Wisconsin’s
Department of Political Science, teaches multitudinous advanced
level Bayesian methodology courses to social scientists. Up to a few
years ago, he was using my book  with John Hsu for course
material, with lots of emphasis on the first chapter on likelihood
formulations and frequency procedures. I’m glad that his students
could understand it. The Statistics graduate students at Wisconsin
used to experience lots of difficulties constructing the
likelihoods, particularly if they hadn’t bothered to take Advanced
Calculus, and some of them didn’t even multiply from 1 to n. But
after that Bayes was a snitch (as long as they were told what the
prior was!). I was once advised that the best thing I taught them
was ‘that neat formula for completing the sum of squares’. Anyway,
Professor Franklin is getting more package orientated nowadays, so
some of the arts of yore may be getting lost.
Trevor Park and George
Casella introduced the Bayesian Lasso in JASA in 2008,
as an interpretation of the Lasso of Tibshirani which
estimates linear regression coefficients through constrained least
squares. The authors showed that the Lasso estimate can be
interpreted as a Bayesian posterior modal estimate when the
parameters have independent Laplace double exponential priors. A
connection with independent double exponential distributions
provides full conditional posterior distributions for MCMC, and the
interval estimates provided by the Bayesian Lasso help to guide
Alan Izenman published his celebrated book Modern Multivariate
Analysis Techniques in the same year. Alan takes a broad
perspective on multivariate analysis in the light of the remarkable
advances in computation and data storage and the ready availability
of huge data sets which have been the keys to the growth of the new
disciplines of data mining and machine engineering, Meanwhile, the
enormous success of the Human Genome Project has opened up the field
of bioinformatics. The book presents an integrated mixture of theory
and applications, and of classical, Bayesian, and modern
multivariate analysis techniques.
Alan Izenman is Senior Research Professor in
Statistics at Temple University. In 1976 he and Sandy Zabell
investigated the 1965 New York City blackout and showed that this
did not substantively affect the city’s birth-rate as previously
advertised. See , p95. In this case ‘induced births’ provide the
confounding variable. A wonderful message to popularist
Alan also researches on
the interaction between Statistics and the Law, and he has used
Bayesian methods to draw inferences about the amount of drugs
previously smuggled in by a defendant in the pit of his stomach.
THE INLA PACKAGE:
During 2009, two professors
from Norway and a Frenchman from Paris published a breath-taking
paper in JRSSB which, together with the discussion thereof,
graced 73 pages of the Society’s journal.
inference for latent Gaussian models by using integrated nested
Laplacian approximations is doubtlessly the
most exciting and far-reaching Bayesian paper of the 21 st. century.
The now much-celebrated co-authors were Håvard Rue and Sara Martino
of the Norwegian University for Science and Technology in Trondheim,
and Nicholas Chopin from the Research Centre for Economics and
Statistics in Paris.
Much of Scotland was
once part of the Archdiocese of Trondheim, and maybe that should now
include Edinburgh, and perhaps even London too. The co-authors
completed all of their computations by application of INLA, their
computer package which is thoroughly documented by Martino and Rue
. This manual refers to Sara Martino’s 2007 Ph.D. thesis and to
review papers by Ludwig Fahrmeir and Gerhard Tutz, and others.
The co-authors wrote, in
regression models are perhaps the most commonly used class of models
in statistical applications. It includes, among others,(generalised)
linear models, (generalised) additive models, smoothing spline
models, state space models, semi-parametric regression, spatial and
spatio-temporal models, log-Gaussian Cox processes, and
geostatistical and geoadditive models. We consider approximate
Bayesian inference in a popular subset of structured additive
regression models, latent Gaussian models, where the latent field is
Gaussian, controlled by a few hyperparameters and with non-Gaussian
response variables. The posterior marginal are not available in
closed form owing to the non-Gaussian response variables. For such
models, MCMC methods can be implemented, but they are not without
problems, in terms of both convergence and computational time.
In some practical
applications, the extent of these problems is such that MCMC is
simply not an appropriate tool for routine analysis. We show that,
by using an INLA approximation and its simplified version, we can
compute very accurate approximations to the posterior marginals. The
main benefit of these approximations is computational; where MCMC
algorithms need hours or days to run, our approximations provide
more precise estimates in seconds or minutes. Another advantage with
our approach is its generality, which makes it possible to perform
Bayesian analysis in an automatic, streamlined way, and to compute
model comparison criteria and various predictive measures, so that
models can be compared and the model under study can be challenged.
What more can I say? Well, here goes-----The important
influences of the numerically highly accurate conditional Laplacian
approximations of the 1980s, and the soundly based Importance
Sampling computations which began in 1978, took a fair drenching
during the MCMC fever of the 1990s. It may take a little time, but
the tide is beginning to turn towards techniques which can be
algebraically justified and which will more fully test the
mathematical expertise of our Ph.D. students and up-and-coming
researchers as they strive to develop technical talents of their
conditional Laplacian approximations can be used to address most
sampling models under the sun, not just the wonderful range of
structurally additive models considered by Rue, Martino, and Chopin.
Our journals will become filled with wondrous algebra once again,
and our computations will be extraordinarily accurate right down the
tails of our marginal posteriors.
It is as if the
Valhalla-esque Gods Woden and Thor and the Goddess Freyja of
Fólkvangr have returned to guide us. Maybe the more ardent of our
never-ever-convergent simulators will decide to take the night bus
to Vulcan, the Ninth World, or wherever.
In their 2009 JASA paper, David Henderson, Richard Boys, Kim
Krishnan, Conor Lawless and Darren Wilkinson described their
Bayesian emulation and calibration of a stochastic computer model of
mitochondrial DNA deletions in nigra neurons.
Micheus and Christopher Wikle investigated their hierarchical
non-overlapping random disc growth model, Pulak Ghosh, Sanjib Basu
and Ram Tiwari performed a Bayesian analysis of cancer rates using
parametric and semi-parametric joinpoint regression models, Jeff
Gill and George Casella described their specification and estimation
of non-parametric priors for ordinal social science models, and
Susie Bayarri, Jim Berger and seven worthy co-authors predicted
vehicle worthiness by validating computer models for functional and
Not to forget
Christopher Paciorek and Jason McLachlan’s mapping of ancient
forests. These worthy co-authors developed Bayesian inferential
techniques for spatio-temporal trends in forest composition using
fossil pollen proxy record.
Beat that! JASA
excelled itself in 2009. However, never one to take the back seat,
our very own Bradley Efron described his empirical Bayes estimates
for large-scale prediction problems. Do you remember the proud days
of yore when you were shrinking group means towards zero and
inventing the bootstrap, Brad?
Still in 2009, Eva Riccomagno and Jim Smith reported their geometry
of causal probability trees which are algebraically constrained, in
the co-edited volume Optimal Design and Related Areas in
Optimization and Statistics.
Gee whiz, Charlie Brown!
I never knew that Statistics could prove causality. I’m sure that
the eighteenth century philosopher David Hume (see my Ch.2) wouldn’t
have approved of causal probability trees unless they were called
And Fabio Rigat and Jim
Smith published their non-parametric dynamic time series approach,
which they applied to the detection of neural dynamics, in the
Annals of Applied Statistics, a wonderful contribution.
Professor Jim Q. Smith’s
highly innovative Bayesian research began at the University of
Warwick during the 1970s, and it is still continuing unabated. He
published his book Bayesian Decision Analysis with
Cambridge University Press in 2010, and has recently worked on
military training applications of his ‘decision making under
That’s an interesting
application of posterior expected loss, guys. I hope that the
soldiers benefit from maximising their expected utility. Of course,
if they’re dead then optimal long term expectations won’t help them.
Jim is currently the
holder of an EPSRC grant with Liz Dowler and Rosemary Collier to
investigate ways groups of experts can ensure coherence of their
judgements when managing food crises. That’s a tall order, Jim. Many
of our working population are currently starving and struggle to
remain both coherent and in the land of the living.
To cap Jim’s exploits,
M.H. Rahaman Khan and Ewart Shaw published a paper in the
International Journal of Interdisciplinary Social Sciences in
2009. In this paper the reported their hierarchical modeling
approach for investigating for determinants of contraceptive use in
That reminds me of the
time I got a rubber stuck in my ear during a chess tournament in
Oshkosh, Wisconsin, guys. All the medics just stood around and
laughed. But I won the tournament and became champion of North-East
Wisconsin for 1992.
Ewart Shaw is Principal
Teaching Fellow in Statistics at the University of Warwick. He is
also a well-published researcher, and his research interests include
Bayesian Inference, numerical methods, number theory, coding theory,
computer algebra in statistics, survival analysis, medical
statistics and splines.
Ewart’s a big
contributor and deserves more of the cherry pie.
Speaking of cherry pie,
Lyle Broemeling’s 2009 text Bayesian Methods for Measures of
Agreement draws on data taken from various studies at the
University of Texas MD Anderson Cancer Center. An admirable
And Mohammad Raqab and
Mohamed Madi co-authored an article in Metron in 2009
describing their Bayesian analysis for the exponentiated Raleigh
Madi is Professor of
Statistics and Associate Dean of Economics and Business at the
United Arab Emirates University, and a prolific Bayesian researcher.
Raqab is Professor of
Statistics at the University of Jordan in Amman.
In 2009, ISBA awarded the prestigious De Groot Prize to Carl Edward
Rasmussen and Christopher K.I. Williams for their book Gaussian
Processes for Machine Learning.
To cap that,
Sandy Zabell described his philosophy of inductive logic from a
Bayesian perspective in The Development of Modern Logic (ed.by
Leila Haaparenta for Oxford University Press). Sandy would’ve
doubtlessly got on well with Richard Price.
In the same year,
Sebastjan Strasek, Stefan Lang and numerous co-authors published a
paper in the Annals of Epidemiology with the
impressively long title,
Use of penalized splines
in extended Cox-Type hazard regression to flexible estimate the
effect of time-varying serum uric acid on risk of cancer-incidence:
a prospective population study in 78850 men.
Thank you, gentleman. I’ll
keep taking my allopurinol.
Stefan Lang, is a
University Professor of Statistics in Innsbruck. His research
interests include Bayesian semi-parametric regression, and
applications in marketing science, development economics and
Rob Kass wrote an excellent analysis of Sir Harold Jeffreys’ legacy
in Statistical Science in 2009, largely by reference to
Jeffreys’ Theory of Probability. Sir Harold often
approximated the posterior distribution by a normal distribution
centred on the maximum likelihood estimate, and he was also a great
fan of Bayes factors. So he weren’t perfect.
Gene Hwang, Jing Qiu and
Zhigen Zhao of the Universities of Cornell and Missouri reported
their empirical Bayes confidence intervals in 2009 in JRSSB.
Their estimates and intervals smooth and shrink both the means and
the variances in the heterogeneous one-way ANOVA model.
Quite remarkably, the
authors make exactly the same exchangeable prior assumptions for the
treatment means and log-variances that I proposed in my 1973 Ph.D.
thesis and in my 1975 paper in Technometrics. However, the
authors derive some elegant, but approximate double-shrinkage
confidence intervals, empirically estimate the hyperparameters in
quite appealing fashion, and algebraically derive some outstanding
frequency coverage probabilities. Perhaps I should put their
solution into another Appendix. I only wish that I’d had the nous to
derive these very useful results myself.
BAYESIAN GENOMICS: In 2009, the much-respected high-flying
professors Matthew Stephens and David Balding of the University of
Chicago and Imperial College London published their paper, ‘Bayesian
statistical methods for genetic association structure’ in Nature
The authors write,
methods have recently made great inroads into many areas of science,
and this is now extending to the assessment of association between
genetic and disease of other phenotypes. We review these methods,
focussing on single SNP tests in genome-wide association studies. We
discuss the advantages of the Bayesian approach over classical
(frequency) approaches in this setting and provide tutorials in
basic analysis steps, including practical guidelines for appropriate
prior specification. We demonstrate the use of Bayesian methods for
fine mapping in candidate regions, discuss meta-analysis and provide
guidance for refereeing manuscripts that contain Bayesian analysis.
The approach subsequently reported by the authors depends almost
entirely on the sorts of Bayes factors which I have critiqued during
the course of this concise history, and Matthew and David do make
some attempts to address the paradoxes, prior sensitive problems,
and poor frequency properties that are usually associated with this
‘measures of evidence. However, Matthew and Stephen could circumvent
these difficulties by associating a Baskurt-Evans-style Bayesian
p-value with each of their Bayes factors.
Why don’t you trying
perturbing your conditional prior distributions under your composite
alternative hypotheses with a tiny blob of probability way out in
the right tail, guys? I think that your Bayes factors, however
fine-tuned, would go bananas. Keep the mass of the blob of
probability constant and zoom it right off towards infinity. I think
that the tails of the ‘outlier prone’ mixture priors you select will
still be too thin to adequately accommodate this.
[Professor Balding kindly responded to these comments in
early January 2014 by partly agreeing with them. He feels that he
and his co-author should have focussed more on an estimation, rather
than a Bayes factor approach. However, Professor Stephen has advised
us that their Bayes factors would remain stable if we let a blob of
probability in the right tail zoom off to infinity. This surprises
me since their conditional prior distributions under the alternative
hypothesis refer to mixtures of thin-tailed normal distributions.
However, the Bayes factors will anyway be highly dependent upon the
choices of these conditional prior distributions. David advises me
that Bayes factors were first proposed in this context by Peter
Donnelly and his co-authors in 2007 in their landmark paper in
The derivations proposed by
Stephens and Balding do depend heavily on specific genetic
assumption. Perhaps we should look for something entirely different,
like a genetic-assumption-free direct data analysis of the
statistical observations provided, which could then be parametrized
by a mainstream statistical sampling model.
During 2009, Byron Morgan and his several co-authors published their
book Bayesian Analysis for Population Ecology with CRC
Byron Morgan is
Professor of Applied Statistics at the University of Kent, and his
research interests include Bayesian methods and population dynamics,
stochastic models for molecular biology, and statistical ecology.
In their 2010 article in the Annals of Applied Statistics,
Xia Wang and Dipak Dey applied their Bayesian generalised extreme
value regression methodology for binary response data to an
application to electronic payments system adoption. Xia is an
assistant professor at the University of Cincinnati, with an
interest in applying her Bayesian ideas to genomics and proteomics
data, and spatial and spatio-temporal statistics. She is clearly a
very bright young researcher. In their article in the JSM 2011
Proceedings, she and Nell Sedransk reported their analysis of
some Bayesian models on biomarker discovery using spectral count
data in the label-free environment.
The eminent Indian
statistician Dipak Dey has co-authored several papers with Xia Wang.
He is a Distinguished Professor of Statistics at the University of
Connecticut, and his many influential Bayesian and decision
theoretic publications include his 1998 book Practical
nonparametric and semi-parametric Bayesian statistics,
Xia is in good company.
Her buddy Nell Sedransk is the Associate Director of the U.S.
National Institute of Statistical Sciences, and Professor of
Statistics at North Carolina State University.
During the late 1970s,
Nell and her husband Joe were two of the last faculty members to
leave the once celebrated Department of Statistics at SUNY at
Buffalo, which did not resurrect itself until several years
afterwards. Nell and Joe were always very dynamic, and Nell has made
many Bayesian contributions. Her application areas of interest
include physiology, medicine, multi-observer scoring in the social
sciences, and ethical designs for clinical trials.
Joe Sedransk is
currently Professor of Statistics at Case Western University in
Cleveland. He has also published a number of important Bayesian
papers. I met Nell and Joe when I gave a seminar at SUNY at Albany
in 1978. They were very kind and hospitable and took me for a drive
in Vermont. I ate a massive T-bone steak for lunch, and felt guilty
afterwards because it cost so much.
In their 2010 articles in JASA, J. McLean Sloughter, Tilmann
Gneiting and Adrian Raftery used Bayesian model averaging and
ensembles for probabilistic wind forecasting, and Lu Ren, David
Dunson, Scott Lindroth and Lawrence Carin attained the fabled
heights by analysing music using dynamic semi-parametric Bayesian
models. Scott Lindroth is Professor of Music and Vice-Provost for
the Arts at Duke University.
Moreover, Kwang Woo Ahn,
Kung Sik Chan and my old friend Michael Kosov addressed a problem in
pathogen diversity using their Bayesian inferences for incomplete
multinomial data, and Soma Dhavala and six determined co-authors
performed a gene expression analysis of their bovine salmonella data
by reference to their Bayesian modeling of MRSS data.
Not to be outdone,
Jonathan Stroud and four quite predictable co-authors developed an
ensemble Kalman filter and smoother for assimilating satellite data,
and Morgan C.Wang, Mike Daniels, Daniel Scharfstein and Susan Land
proposed a Bayesian shrinkage model for incomplete longitudinal data
and applied it to a breast cancer prevention trial.
Also in 2010, Teddy Seidenfeld, Mark Schervish and Jay Kadane
reported their coherent loss functions on uncertainty in Synthese,
and Jay Kadane published ‘Amalgamating Bayesian experts: a sceptical
view’ in Rethinking Risk Measurement and Reporting, Vol. 1.
All deep blue high Bayesian stuff.
Good on you, Jay! I’m
In the same year, Tom
Fearn et al reported their inverse, classical, and non-parametric
calibrations in a Bayesian framework, in the context of infrared
spectroscopy, in the Journal of Near Infrared Spectroscopy.
Tom is Head of
Statistical Science at UCL. He has worked in many application areas,
including food and agriculture, analytic chemistry, and medicine.
His 1975 Biometrika paper ‘A Bayesian Approach to Growth
Curves’ came out of his UCL Ph.D. thesis. His Ph.D. supervisor was
Dennis Lindley, and his department is still as active as ever,
though in less charismatic ways.
Manuel Wiesenfarth and
Thomas Kneib attempted in 2010 to use Bayesian geoadditive selection
models in JRSSC to correct for non-randomly selected data in
a two-way hierarchy, a most ambitious task worthy of Hercules
The authors’ selection
equation was formulated as a two-way binary probit model, and the
correlations between the response variables were induced by a latent
Gaussian model representation. Uniform priors were assumed for the
parametric effects, Bayesian P-splines were used to model the
non-parametric effects, a Markov random-field was used to model the
spatial effects, and further prior assumptions were made to
facilitate a full hierarchical Bayesian analysis. The MCMC computed
posterior inferences led to a very interesting analysis of a set of
relief supply data from communities in Pakistan which were affected
by the 2005 earthquake in Azad Jammu Kashmir province. A wonderful
piece of work.
In the same year, Qi
Long, Rod Little, and Xihong Lin addressed, in their paper in
Applied Statistics, the conceptually formidable task of
estimating ‘causal’ effects in trials involving multi-treatment
arms. They applied their theoretically intense Bayesian procedure to
the analysis of the ‘women take pride’ cardiac bothersome data, and
reported the posterior mean and standard deviation of a variety of
‘causal’ effects, together with the corresponding 95% Bayesian
credible intervals. A difficult article to unravel, but I am sure
that it was all good stuff.
Arnaud Doucet and Roman Holenstein read a long invited discussion
paper to the Royal Statistical Society in 2009 and it was published
the following year in the Series B of the Society’s journal , at
which time the new terminology ‘Particle MCMC’ was thrust like a ray
of beaming sunlight into the Bayesian literature. Particle MCMC can
be used to deal with high dimensionality and complex patterns of
dependence in statistical models. Whether PMCMC works well in
practice, only time will tell. Maybe we need a special journal for
purely computational articles like this.
In their 2010 article
‘Perceiving is believing: A Bayesian approach to explaining the
positive symptoms of schizophrenia’ in Nature Reviews:
Neurosciences, Paul Fletcher and Chris Firth of the Universities
of Cambridge and Aarhus write:
Advances in cognitive neuroscience offer us new ways to understand
the symptoms of mental illness by uniting basic neurochemical and
neurophysiological observations with the conscious experiences that
characterize these symptoms. Cognitive theories about the positive
symptoms of schizophrenia —hallucinations and delusions--- have
tended to treat perception and belief formation as distinct
processes. However, recent advances in computational neuroscience
have led us to consider the unusual perceptive experiences of
patients and their sometimes bizarre beliefs as part of the same
core abnormality---a disturbance in error-dependant updating of
inferences and beliefs about the world. We suggest that it is
possible to understand these symptoms in terms of a disturbed
hierarchical Bayesian framework, without recourse to separate
considerations of experience and belief.
Thank you, gentlemen.
Perhaps you should consider calcification of the pineal gland as a
Sonia Petrone and Piero
Veronese co-authored ‘Feller Operators and mixture priors in
Bayesian Non-Parametrics’ in 2010 in Statistica Sinica.
Sonia is a Bocconi Full
Professor of Statistics at Bocconi University. She will be the next
president of ISBA in 2014. Good luck, Sonia!
Petrone (ISBA President 2014)
In 2010, Murray Aitkin
published his path breaking book Statistical Inference: An
Integrated Bayesian/Likelihood Approach. Rather than
using Bayes factors or DIC, Murray refers to likelihood ratios as
the primary measure of evidence for statistical model parameters and
for the models themselves. He then uses Bayesian non-informative
inferences to interpret the likelihood ratios.
For further discussion,
see Author’s Notes (below). I have always admired Murray
Aitken’s ingeniously pragmatic approach to Bayesian inference.
Alicia Carriquiry and three
courageous co-authors reported their Bayesian assessment of the
effect of highway passes on crashes and crash rates in 2011 in the
Journal of Safety Research.
Alicia is Professor of
Statistics and Associate Provost at Iowa State. She served as
president of ISBA in 2001.
In 2011, Jay Kadane was
awarded the much-sought-after De Groot prize by ISBA for his
eloquently written book Principles of Uncertainty.
published his conference paper ‘Bayesian probability and methods’ in
the Proceedings of the first Africa-Sweden Conference in
Mathematics in the same year.
Thorburn is Professor of
Statistics at the University of Stockholm, and he has, with Håvard
Rue, done much to foster the Bayesian paradigm in Scandinavia.
Rob Kass published an
invited discussion paper in Statistical Science in 2011
entitled ‘Statistical Inference; The Big Picture.’
Kass wrote to the effect
Statistics has moved
beyond the frequentist-Bayesian controversy of the past. Instead a
philosophy compatible with statistical practice which I refer to
here as ‘statistical pragmatism’ serves as a foundation for
inference. Statistical pragmatism is inclusive and emphasise the
assumptions that connect statistical models with observed data.
In his diagram of
the ‘big picture’, Rob connects the data of the real world with the
interactions between the scientific models and statistical models of
the theoretical world, and indicates that the data and the
interaction between the scientific and statistical models together
imply the statistical and scientific conclusions.
That’s all too true,
Professor Kass. You’re teaching some of us Grannys how to suck eggs.
But don’t forget to emphasise that the statistician should be
interacting with the scientific expert as often as he can during
In the same year, William
Kleber, Adrian Raftery and Tilmann Gneiting co-authored a paper in
JASA entitled ‘Geostatistical model averaging for locally
calibrated probabilistic quantitative precipitation forecasting’. I
am sure that they took Rob Kass’s notion of statistical pragmatism
In 2011, Jennifer Hill published her paper
Bayesian Non-Parametric Modeling for Causal Inference in the
Journal of Computational and Graphical Statistics.
Jennifer's very insightful second line of work pursues
strategies for exploring the impact of violations of typical
assumptions that require that all possibly confounding variables
have been measured. She is a very dynamic Associate Professor of
Applied Statistics at NYU.
Brian Reich, Montserrat
Fuentes and David Dunson proposed some brand new methods for
Bayesian spatial quantile regression in their outstanding 2011 lead
article in JASA.
As they were a touch
keener on money supply, Alexandro Cruz-Marcelo, Katherine Ensor and
Gary Rossner investigated corporate bonds by using a semi-parametric
hierarchical model to estimate term structure.
David Dunson retaliated
by investigating non-parametric Bayes stochastically ordered latent
class models with Hong-Xia Yang and Sean O’Brien.
In a rare, high quality,
single-authored paper, Yhua Zhao analysed high throughput assays by
reference to posterior probabilities and expected rates of discovery
in multiple hypothesis testing.
And Shane Jensen and
Stephen Shore rolled out the barrel by using semi-parametric
Bayesian modeling to investigate volatility heterogeneity,
Other highlights of 2011
included the tutorial Bayesian Non-Parametrics with Peter
Orbanz of Columbia University and Yeh Whye Tey of Oxford University,
which was taught in the context of Machine Learning,
data-augmentation approach in JASA by J. Ghosh and Merlise
Clyde employinb Rao-Blackwellization for variable selection and
model averaging in linear and binary regression,
A paper in Demography
by Adrian Raftery and six co-authors entitled, ‘Probabilistic
Properties of The Total Fertility Rate of All Countries’,
An article in the
Proceedings of Maximum Entropy by Jonathan Botts and Ning Xiang
on Bayesian inference for acoustic impedence,
A splendid paper in
JASA by Qing Zhou on multi-domain sampling with application to
the structural inference of Bayesian networks.
George Casella (1951-2012), a leading figure in the field of
Statistics passed away in June 2012, after a nine-year battle with
multiple myeloma. He was 61.
to Ed George and Christian Robert,
George Casella’s influence on research and education in
Statistics was broad and profound. He published over 200 articles,
co-authored nine books, and mentored 48 MS and Ph.D. students. His
publications included high impact contributions to Bayesian
analysis, clustering, confidence estimation, empirical Bayes,
frequentist decision theory, hypothesis testing, model selection,
Monte Carlo methods, and ridge regression. Of his books, Statistical
Inference (with Roger Berger) became the introduction of choice to
mathematical statistics for vast numbers of graduate students; this
is certainly the book that had the most impact on the community at
In 1996, George joined a legendary figure of Statistics, Erich
Lehmann, to write a thorough revision of the already classical
Theory of Point Estimation which Lehmann had written himself in
1983. This collaboration resulted in a more modern, broader, and
more profound book that continues to be a key reference for courses
in mathematical statistics.
An ISI highly cited researcher, George Casella was elected a
Foreign Member of the Spanish Royal Academy of Sciences, selected as
a Medallion lecturer for the IMS, and received the Distinguished
Alumnus Award from Purdue University. His laughter remains with us.
In 2012, David Rios Insua, Fabrizio Ruggeri and Michael Wiper
published their splendid volume on Bayesian Analysis of
Stochastic Process Models. This follows Rios Insua’s and
Ruggeri’s publication in 2000 of their co-edited collection of
lecture notes, Robust Bayesian Analysis.
Michael Wiper is on the faculty of the Carlos the Third
University of Madrid. His fields of interest include Bayesian
Statistics, inference for stochastic processes, and software
Fabrizio Ruggeri is a research director at the Italian National
Research Council’s Institute of Mathematical Applications and
Information Technology in Milan. He was an outstanding president of
ISBA in 2012, and his areas of Bayesian application include
healthcare, quality and reliability.
David Rios Insua is full professor at Rey Juan Carlos University
in Madrid. He is the son and disciple of Sixto Rios (1913-2008), the
father of Spanish Statistics. Sixto founded the Decision Analysis
group of the Spanish Royal Academy of Sciences during the 1960s, and
David is the youngest Fellow of the same illustrious academy. He has
applied his Bayesian methodology to neurononal networks, adversial
risk analysis, counter-terrorism, and many other areas.
Jésus Palomo, who was
one of Rios Insua’s and Ruggeri’s Ph.D. students, was a cited
finalist for ISBA’s Savage Prize in 2004. His thesis title was
Bayesian methods in bidding processes.
In his JASA paper of
2012, Tristan Zajoric reported his Bayesian inferences for dynamic
treatment regimes, and used them to improve mobility, equity and
efficiency in student tracking.
In the same issue,
Alexandro Rodrigues and Peter Diggle used Bayesian estimation and
prediction for low-rank doubly stochastic log-Gaussian Poisson
process models, with fascinating applications in criminal
If that wasn’t
enough to put me into a neurosis, Lane Burgette and Jerome Reiter
described some non-parametric Bayesian imputation techniques when
some data are missing due to the mid-study switching of measurement
methods, and, to cap that, the ubiquitous Valen Johnson and David
Roswell investigated Bayesian model selection criteria in
In their 2012 JASA
discussion paper, the remarkable Bayesian quintet consisting of
Ioanna Manolopoulov, Melanie Matthew, Michael Calaban, Mike West and
Thomas Kepler employed Bayesian spatio-dynamic modeling in cell
motality studies, in reference to nonlinear taxic fields guiding the
hyperactive quartet of Laura Hatfield, Mark Boye, Michelle Hackshaw
and Bradley Carlin used multi-level models to predict survival times
and longitudinal patent-reported outcomes with many zeros.
And in the lead article
of the 500 th. issue of the Journal of the American
Statistical Association, the Famous Five, namely William Astle,
Maria De Iorio, Sylvia Richardson, David Stephens and Timothy Ebbels,
investigated their Bayesian model of NMR spectra for the
deconvolution and quantification of metabolites in complex
Also in the celebrated
500 th. issue, Michelle Danaher, Anindya Roy, Zhen Chen, Sunni
Mumford and Enrique Silverman analysed the Biocycle study using
Minkowski-Wehl priors for models with parameter constraints. A
vintage year for the connoisseurs!
During August 2012, Tony O’
Hagan led an interdisciplinary ISBA online debate entitled Higgs
Boson-Digest and Discussion about the statistics relating to the
Higgs Boson and the Hadron Collider, which concerned the physicists
predetermined standards for concluding that a particle resembles the
elusive Boson. The physicists require a test statistic to be at
least five standard errors from a null hypothesis, but didn’t
understand how to interpret this e.g. in relation to practical
significance when the confidence interval is very narrow. Some of
the participants in the debate ignored the possibility that the test
statistic might not be approximately normally distributed. While the
physicists take their observed counts to be Poisson distributed,
this assumption could itself be inaccurate since it could be
influenced by overdispersion. However, a number of imaginative
solutions were proposed by the Bayesian participants.
Simulated particle traces from an LHC collision in which a Higgs
Boson is produced.
Image Credit: Lucas Taylor
Other highlights of the year
2012 included the following papers in the Royal Statistical
in probabilistic ecotoxicological risk assessment, by Peter Craig,
Graeme Hickey, Robert Luttik and Andy Hart,
Space-time modelling of
coupled spatiotemporal environmental variables, by Luigi Ippoliti,
Pasquale Valentini and Dani Gamerman,
Bayesian L-optimal exact
design for experiments with biological kinetic models, by Steven
Gilmour and Luzia Trinca
Combining outputs from the
North American Regional Climate Change Assessment Program by using a
Bayesian hierarchical model, by Emily Kang, Noel Cressie and Stephan
Variable selection for high
dimensional Bayesian density estimation to human exposure
simulation, by Brian Reich and his four worthy but under exposed
Thomas Bayes was the real
winner in the US presidential elections on 9 th. November 2012,
according to a message to ISBA members from Charles Hogg. In 2010,
Charles had published a magnificent discussion paper in Bayesian
Analysis with Jay Kadane, Jong Soo Lee and Sara Majetich
concerning their inferential Bayesian error analysis for small angle
neutron scattering data sets.
As reported by Charles
Hogg, Nate Silver constructed a Bayesian model in 2008 to forecast
the US general election results. Silver won fame for correctly
predicting 49 of the 50 States, as well as every Senate race. That
brought him a New York Times column and a much higher profile.
In 2012, Nate’s
continuing predictions that Obama would win earned him a curious
backlash among pundits. While few of the criticisms had any merit,
most were mathematically illiterate, indignantly mocking the idea
that the race was anything other than a toss-up, Nevertheless, Nate
confounded his critics by correctly predicting every single state.
Charles Hogg was
quick to advise us that Nate did strike lucky in Florida. Nate
‘called’ this state with a 50.3% Bayesian probability, essentially
the proverbial coin-toss, a lucky gold coin perhaps. Way to go, Mr.
Silver! Did you use a Jeffreys prior or a conjugate one?
In his 2013 Kindle book
The Signal and the Noise, which is about Bayesian prediction
in general, Nate Silver assigned a prior probability of 1/20000 to
the event that at least one plane is intentionally crashed into a
Manhattan skyscraper on a given day. He then used Bayes theorem to
update his prior probability to a probability of 0.385 that one
plane crash is part of a Terrorist attack, and then to a probability
of 99.99% that two plane crashes amount to a terrorist attack.
Maybe Nate should rename
his book The Sound and the Fury, though a guy called Bill
Faulkner once used a title like that. Calling it The Power and
the Glory would, nowadays, be much too naff.
2013, Nate left the New York Times for a role at ESPN and ABC news.
Perhaps he will provide us with a Bayesian commentary on the next
Superbowl game. I have a prior probability of 0.2179 that the Green
Bay Packers will win, but a sneaking suspicion that the Cowboys will
In the June 2013 issue of
Scientific American, Hans Christian Bauer debates whether
Quantum Bayesianism can fix the paradoxes of Quantum Mechanics. The
A new version of quantum
theory sweeps away the bizarre paradoxes of the microscopic world.
The cost? Quantum
information only exists in your imagination.
That’s a great application
of the Bayesian paradigm, Mr. Bauer, or I at least imagine so.
The 2013 volume Bayesian Theory and Applications, edited by
Paul Damien, Petros Dellaportas, Nicholas Polson and David Stephens,
contains 33 exciting papers and includes outstanding sections on
dynamic models, and exchangeability.
Andrew Lawson’s 2013
book Bayesian Disease Mapping: hierarchical modeling in spatial
epidemiology highlighted some of the recent applications of the
2008 Rue-Martino computer package  for conditional Laplacian
Various applications of
INLA to Bayesian non-phylodynamics are described by Julia Palacios
and Vladimir Minin in the Proceedings of the 28 th.
Conference on Uncertainty in Artificial Intelligence.
In the 2013 issue of
Biometrics & Biostatistics, Xiao-Feng Wang of the Cleveland
Learner Research Institute describes some applications of INLA to
Bayesian non-parametric regression and density estimation.
INLA has arrived and the
pendulum is beginning to swing! Maybe right back to the Halcyon days of the
mathematically Bayesian theoretical research era of the 1970s and
1980s, no less, when Bayesians needed to know all sorts of stuff.
In his 2013 YouTube video What are Bayesian Methods?
Professor Simon French said that after he took his first statistics
course in 1970 he thought that ANOVA was a Russian mathematician.
However, his first course in Bayesian Statistics changed his life,
and he subsequently learnt much more about the paradigm from Adrian
Simon French is Director
of the Risk Initiative and Statistical Consultancy Unit at the
University of Warwick. His 1986 text Decision Theory was
followed by his book Statistical Decision Theory with David
Rios Insua. However, Simon’s work has now become generally more
applied. He is looking at ways of supporting real-life decision
makers facing major strategic and risk issues.
In their JASA 2013 articles, Roee Gutman, Christopher
Afendulis and Alan Zaslavsky proposed a Bayesian file-linking
procedure for analysing end-of-life costs, and Man-Wai Ho, Wanzhu Tu,
Pulak Ghosh and Ram Tiwari performed a nested Dirichlet process
analysis of cluster randomized trial data for geriatric care
Riten Mitra, Peter
Müller, Shoudan Liang, Lu Yue and Yuan Ji investigated a Bayesian
graphical model for chIP-Seq data on histone modifications, Drew
Linzer recounted the dynamic Bayesian forecasting of U.S.
presidential elections, and Curtis Storlie and his five reliable
co-authors reported their Bayesian reliability analysis of
neutron-induced errors in high performance computing software;
Yueqing Wang, Xin
Jiang, Bin Yu and Ming Jiang threw aside their parasols and reported
their hierarchical approach for aerosol retrieval using MISR data,
Josue Martinez, Kirsten Bohn, Ray Carroll and Jeffrey Morris put an
end to their idle chatter and described their study of Mexican
free-tailed bat chirp principles, which employed Bayesian functional
mixed models for non-stationary acoustic time series, while Juhee
Lee, Peter Müller and Yuan Ji chummed up and described their
nonparametric Bayesian model for local clustering with application
Michael Goldstein and Leanna House exchanged vows and reported their
second-order exchangeability analysis for multimodel ensembles, and
Francesco Stingo, Michele Guindani, Marina Vannucci and Vince
Calhoun presented the best possible image while describing their
integrative modeling approach to imaging genetics.
In March 2013, the
entire statistical world mourned the death of George Edward Pelham
Box F.R.S. (1929-2013) in Madison-Wisconsin in March 2013 at the
ripe old age of ninety-four, in the arms of his third wife Claire.
As a wit, a kind man, and a statistician, he and his ‘Mr.
Regression’ buddy Norman Draper have fostered many successful
careers in American industry and academia. Pel was a son-in-law of
Sir Ronald Fisher and the father, with his second wife Joan Fisher
Box, of two of Fisher’s grandchildren. Pel’s life, works and
immortality should be celebrated by all Bayesians, because of the
multitudinous ways in which his virtues have enhanced the diversity,
richness and scientific reputation of our paradigm.
Born in Gravesend, Box
was the co-inventor of the Box-Cox and Box-Müller transformations
and the Box-Pierce and Ljung-Box tests, a one-time army sergeant who
worked on the Second World War defences to Nazi poisonous gases, and
an erstwhile scientist with Imperial Chemical Industries. His
much-publicised first divorce transformed his career, and his older
son Simon Box also survives him.
Bisgaard making a presentation to George Box
I envision Pel sitting
there watching us from the twelfth floor of the crimson cube of
Heaven, kicking his dog while the Archangels Stephen One and Stephen
Two flutter to his bidding, and with a fine-tuned secromobile
connection to the, as ever obliging, Chairman of Statistics in
hometown Madison. Even in his retirement and maybe in death, George
Box was the quintessential ripple from above
On 27th August 2013, Alan Gelfand of Duke University
received a Distinguished Achievement Medal from the ASA Section on
Statistics and the Environment at the Joint Statistical Meetings in
Montreal. The award recognizes his seminal work and leadership in
Bayesian spatial statistics, in particular hierarchical modeling
with applications in the natural and environmental sciences. Well
done, Alan! You’re not just an MCMC whiz kid.
I once nodded off during
one of Alan’s seminars and missed out on the beef. When I woke up, I
asked an utterly inane question. Ah well. You win some and you lose
highlights of the year 2013 included the article in Applied
Statistics by David Lunn, Jessica Barrett, Michael Sweeting and
Simon Thompson of the MRC Biostatistics Unit in Cambridge on fully
Bayesian hierarchical modelling with applications to meta-analysis.
The authors applied their multi-stage generalised linear models to
data sets concerning pre-eclampsia and abdominal aortic aneurism.
Also, a paper in
Statistics in Society by Susan Paddock and Terrance Savitsky of
the RAND Corporation, Santa Monica on the Bayesian hierarchical
semi-parametric modelling of longitudinal post-treatment outcomes
from open enrolment therapy groups. The authors apply their
methodology to a case-study which compares the post-treatment
depressive symptom score for patients on a building discovery
program with the scores for usual care clients.
After reading the
large number of Bayesian papers published in the March 2013 issue of
JASA, in particular in the Applications and Case Studies
section, Steve Scott declared, on Google,
It clearly goes to show
that Bayes has gone a long way from the intellectually appealing but
too hard to implement approach to the approach that many
practitioners now feel to be both natural and easy to use.
TO SOME OF THE ISSUES RAISED IN DENNIS LINDLEY'S
During the Fall of 2013, I participated in an on-line ISBA debate
concerning some of issues raised in Dennis Lindley’s 2013 YouTube
interview by Tony O’Hagan, who phrased some of his rather probing
questions quite craftily. I was somewhat concerned when Dennis, now
in his 90 th year, confirmed his diehard opposition to
improper priors, criticised his own student Jose Bernardo, and
confirmed that he’d encouraged Florence David to leave UCL in 1967.
He, moreover, advised Tony that he’d wanted two colleagues at UCL to
retire early because they weren’t ‘sufficiently Bayesian’. This led
me to wonder how many other statistical (e.g. applied Bayesian)
careers Dennis had negatively influenced due to his individualistic
opinions about his subject. And whatever did happen to Dennis’s
fabled velvet glove?
I was also concerned
about Dennis’s apparent long-surviving naivety in relation to the
Axioms of Coherence and Expected Utility. He was still maintaining
that only very simple sets of axioms are needed to imply their
required objective, i.e. that you should be ‘coherent’ and behave
like a Bayesian, and I found it difficult to decide whether he was
‘pulling the wool’, or unfortunately misguided, or, for some bizarre
reason, perfectly correct. I was also more than a little concerned
about Dennis’s rather bland adherence to the Likelihood Principle
(LP) despite all the ongoing controversy regarding Birnbaum’s 1962
I wonder what the
unassuming Uruguayan mathematician Cesareo Villegas, who refuted the
De Finetti axiom system as long ago as 1964, would have thought
about all of this. He may well have been too tactful to say
This is what I think,
albeit rather cryptically and with apologies to George Orwell and
‘How many fingers?’ asked Emperor Dennis O’Brien, stroking his
schlosshund Bruno’s furry chest.
‘A billion billion,
Your Imperial Majesty, just like the neo-Savageous axiom system’,
answered bumptious young Winston, turning green with awe.
forearms seemed to wither like a frog’s, as if he, like the sadly
exiled Tom Winston, was way East of Eden. In Kate’s place in
‘There are only
three, you blithering nincompoop’, raged O’Brien. ‘You don’t even
understand simple arithmetic. You’re incoherent!’
‘I’m sorry, sorry,
Your Munificence,’ wailed Winston, as the Royal Acolytes clapped him
in red hot irons.