
In the aftermath of Dennis
Lindley’s 2013 YouTube video interview by Tony O’Hagan, it became
abundantly apparent to me, following detailed correspondence with
Peter Wakker, Deborah Mayo and Michael Evans, that Bayesian
Statistics can no longer be justifiably motivated by the Axioms of
Subjective Probability, or the Axioms of Utility and Modified
Utility, or by the way the Sufficiency and Conditionality Principles
had been previously thought to imply our allsacred Likelihood
Principle. This puts our paradigm into a totally different, and
quite refreshing, ballpark. Maybe axiomatized coherence, incoherence
and sureloser principles will soon be relics, strange artefacts of
the past. Deck the halls with balls of holly! 



How
many fingers? 

The wide diversity of novel
Bayesian theoretical methods developed and practical situations
analysed during the current century are wellrepresented in the
eleven published papers which have received the Mitchell prize
(awarded by ISBA) since the turn of the century. They include:
2002 (Genetics): Jonathan
Pritchard, Matthew Stevens, and Peter Donnelly
Inference of Population
Structure using Multilocus Genotype Data
2003 (Medicine) Jeff
Morris, Marina Vannucci, Phil Brown, and Ray Carroll
WaveletBased
Nonparametric Modeling of Hierarchical Function in Colon
Carcinogenesis
2006 (Biology) Ben
Redelings and Marc Suchard
Joint Bayesian Estimation
of alignment and phylogeny
2007 (Sociology) Tian
Zheng, Matthew Salganik, and Andrew Gelman
How many people do you
know in prison? Using overdispersion in count data to estimate
social structure in networks.
2010 (Astronomy) Ian
Vernon, Michael Goldstein, and Richard Bower
Galaxy Formation: A
Bayesian Uncertainty Analysis
The Mitchell Prize is
named after Toby J. Mitchell, and it was established by ISBA after
his tragic death from leukaemia in 1993. He would have been glad to
have inspired so much brilliantly useful research.
Toby was a dedicated
Bayesian, and he is well known for his philosophy 

The greater the
amount of information, the less you actually know. 

Toby was a Senior
Research Staff Member at Oak Ridge National Laboratory in Tennessee.
He won the George Snedecor Award in 1978 (with Bruce Turnbull) and
made incisive contributions to Statistics, especially in biometry
and engineering applications. He was a magnificent collaborator and
a very insightful scientist.
Norman Draper took one
of the last pictures of Toby Mitchell, and he has kindly sent it to
me. 



Tom's
colleague Toby J. Mitchell (d 1993). Tom and Toby taught Statistics
courses at US Military Bases and the
space station in Huntsville Alabama, together, and visited Washington D.C.
from Adelphi, Maryland in 1980. 

See Author’s Notes
(below) for details of several of the Ph.D. theses whose authors
have been awarded ISBA’s one of Savage Prizes during the current
century. We have clearly been blessed with some talented
upandcoming Bayesian researchers.
In the remainder of Ch.
7, I will critique some of the Bayesian literature in greater
detail. This is to give readers a better understanding of the
quality and relevance of 21^{ }st. century research and of
the degree of competence, applied nous, and originality of our
researchers.




George
Pickett 

In 1863, Major General
George Pickett was one of three generals who led the assault under
Longstreet at the Battle of Gettysburg, as the Confederate Army
charged towards the Unionist lines in all its Napoleonesque glory.
In the year 2000, Dennis Lindley came out of retirement at age 77 to
confront a full meeting of the Royal Statistical Society with his
magnificently written invited discussion paper ‘The Philosophy of
Statistics’. While Pickett’s charge ended in utter disaster for the
Dixies, Dennis would, in 2002, be awarded the Society’s Guy Medal in
Gold for his lifetime of endeavours, having retired from UCL a
quarter of a century earlier.
Here is a brief précis
of the ideas and opinions expressed in the first seven sections of
Dennis’s paper:
Summary: Statistics
is the study of uncertainty. Statistical inference should be based
on probability alone. Progress is therefore dependent upon the
construction of a probability model.
1. Introduction:
Uncertainty can only be measured by probability. The likelihood
principle follows from the basic role played by probability. I
(Dennis) modify my previous opinions by saying that more emphasis
should be placed on model construction than on formal inference.
Formal inference is a systematic procedure within the calculus of
probability. Model construction cannot be so systematic.
2. Statistics:
Statisticians should be the experts at handling uncertainty. We do
not study the mechanism of rain, only whether it will rain. We are,
as practitioners, therefore dependant on others. We will suffer,
even as theoreticians if we remain too divorced from the science. We
define ‘the client’ as the person e.g. scientist or lawyer, who
encounters uncertainty in their field of study. Statistics is
ordinarily associated with (numerical) data. It is the link between
uncertainty, or variability in the data, and that in the topic
itself that has occupied statisticians. The passage from process
(sampling model) to data is clear. It is when we attempt to go from
data to process that difficulties occur.
3. Uncertainty: It is
only by associating numbers with any scientific concept that the
concept can be understood. Statisticians consequently need to
measure uncertainty by numbers, in a way that they can combine them.
It is proposed to measure your uncertainty with an event happening
by comparison with a standard e.g. relating to balls drawn from an
urn.
4. Uncertainty and
probability: The addition and product rules, together with the
convexity rule always lies in the convex unit interval, are the
defining rules of probability, at least for a finite number of
events. The conclusion is that the measurements of uncertainty can
be described by the calculus of probability. The uncertainty related
to bets placed on gambles, the use of odds, combined with the
impossibility of a Dutch book, leads back to probability as before.
A further approach due to De Finetti extracts a penalty score from
you after you have stated your uncertainty of an event, which
depends on whether the event is subsequently shown to be true or
false. This [frequency!] approach provides an empirical check
on the quality of your probability assessments and hence on a test
of your abilities as a statistician. The gathering and publishing of
data remains an essential part of statistics today.
5. Probability:
Measurements of uncertainty must obey the rules of probability
calculus. This is intended to be selfevident and that you would
feel foolish if you were to be caught violating it. Axioms (which
refer to a ‘more likely’ binary relation) all lead to probability
being the only satisfactory explanation of uncertainty, though some
writers have considered the axioms carefully and produced objection.
[Lindley does not
cite the 1986 paper by Fishburn or the key source references
quoted by Fishburn].
It is therefore
convenient to state formally the rules of the [conditional]
probability calculus. They are Rule 1 (convexity), Rule 2 (addition;
Cromwell’s rule), and Rule 3 (multiplication). It is easy to extend
the addition rule, for two events, to a finite number of events. I
prefer to use De Finetti’s Rule 4 (conglomerity) to justify
addition for an infinite sequence of events. Conglomerity is in the
spirit of a class of rules known as ‘sure things’. It is ‘easy to
verify’ that the [horribly complicated] Rule 4 follows from rules
13 when the partition is finite. [Is it?]
6. Significance and
Confidence: Statisticians DO use measures of uncertainty that do
NOT combine according to the rules of probability calculus. Let H
denote a (null) hypothesis. Then a statistician may advise the
client to use a significance level [Dennis means pvalue or
significance probability] that is, assuming that H is true, the
probability of the observed, or more extreme, data is calculated.
This usage flies in the face of the arguments above which assert
that uncertainty about H needs to be measured as a probability of H.
This is an example of the prosecutor’s fallacy.
If a parameter θ is
uncertain, a statistician will typically recommend a confidence
interval. The development above based on measured uncertainty will
use a probability density for θ, and perhaps an interval of that
density. Again we have a contrast similar to the prosecutor’s
fallacy.
Statisticians have paid
inadequate attention to the relationships between the statements
that they make and the sample sizes on which they are based.
How do you combine
several data sets concerning the same hypothesis, each with its own
significance level?
The conclusion that
probability is the only measure of uncertainty is not just a pat on
the back but strikes at many of the basic statistical activities.
7. Inference: Let the
parameter be (θ, α) where α is a nuisance, and let x be the
observation. The uncertainty in x needs to be described
probabilistically. Let p(x l θ, α) denote the probability of x given
θ and α, and describe the uncertainty in the parameter by a
probability p (θ, α). The revised uncertainty in the light of the
data can then be evaluated from the probability calculus by the
probability p (θ, α I x), the constant of proportionality dependent
upon x, not the parameters. Since α is not of interest it can be
eliminated by the probability calculus to give p (θ l x), which
evaluates the revised uncertainty in θ. This unfortunate terminology
is accompanied by some other which is even worse; p (θ) is often
called the prior distribution, p (θ l x) the posterior.
The main protest against
the Bayesian position is that the inference is considered within the
probability calculus.
The remaining
sections of Lindley’s paper, which contain further quite influential
content, were entitled
8. Subjectivity 9.Models
10. Data Analysis 11. Models again 12. Optimality 13. The likelihood
principle 14. Frequentist concepts 15. Decision Analysis 16.
Likelihood principle (again) 17. Risk 18. Science 19. Criminal Law.
20. Conclusions
During his conclusions,
Dennis states,
The philosophy here has
three fundamental tenets; first, that uncertainties should be
described by probabilities; second, that consequences should have
their merits described by utilities; third, that the optimal
decision combines the probabilities; third that the optimum decision
procedures combines the probabilities and utilities by calculating
expected utility and then maximising that.
These tenets were of
course adopted by Raiffa and Schlaifer as long ago as 1961, and
embraced with open arms by Dennis since about 1955. Dennis’s ideas
on utility refer to Leonard ‘Jimmie’ Savage’s 1954 ballsup of an
entreaty The Foundations of Statistics.
Dennis’s suggestions on
modeling pay lip service to Box’s celebrated 1980 paper [67]. See
also my 1979 Valencia 1 contribution [16] where I emphasise the
necessity for the statistician to interact with the client in
relation to the scientific background of the data. The discussion of
criminal law in Dennis’s section 19 quite surprisingly and
unequivocally advocates the Bayes factor approach described in 1991
in Colin Aitken’s and D.A. Stoney’s book The Use of Statistics in
Forensic Science.
During the discussion of
Dennis’s paper the issue was raised as to whether he was addressing
‘statistical probability’ as opposed to real applied statistics. It
would have certainly been good to see some more numbers. While of
historical significance, Dennis’s treatise does not, as might have
been instead anticipated, spell out a new and novel position for the
twentyfirst century, but rather encouraged us to refer to past long
vamped philosophies.
In the same year as
Lindley’s historical presentation to the Royal Statistical Society,
Fernando Quintana, one of our happy Bayesians from Chile, and
Michael Newton wrote a much more important paper in JASA which
concerned computational aspects of semiparametric Bayesian analysis
together with applications to the modeling of multiple binary
sequences. The contrasts, in terms of
statistical quality and scientific impact, with
Lindley’s paper could not have been more enormous. 



Fernando Quintana 

Two interesting applications
in archaeology were published during the year 2000.
Yanan Fan and Stephen
Brookes, then working at the University of Bristol, published a
fascinating analysis in the Statistician of sets of
prehistoric corbelled tomb data which were collected from a variety
of sites around Europe. They investigated how earlier analyses of
tomb data, e.g. by Caitlin Buck and her coworkers, where structural
changes were anticipated in the shape of the tomb at various depths,
could be extended and improved by considering a wider change of
models. The authors also investigated the extent to which these
analyses may be useful in addressing questions concerning the origin
of tomb building technologies, particularly in distinguishing
between corbelled domes built by different civilisations, as
well as the processes involved in their construction.
Fan and Brookes found no
evidence to dispute or support a previous claim that a β slope
parameter could be used to distinguish between domes of different
origins through a comparison of their shape. By considering a range
of sampling models, they showed that previous analyses may have been
misleading in this regard.
The authors analysed
radius data taken above the lintel of the tomb, firstly by
considering the CavanaghLaxton formulation which takes the
logradiuses to depend upon corresponding depths in a three
parameter nonlinear regression model with i.i.d. normal error
terms. They also investigated changepoint modifications to the
CavanaghLaxton model which permitted the two parameters within its
regression component to change at different depths.
The authors were able to
avoid detailed algebraic prior to posterior analyses, by immediately
referring to MCMC and its generalisations. This enabled them to
apply a complicated version of acceptance sampling which calculated
the posterior probabilities for several choices of their sampling
models; their posterior inferences were doubtlessly quite sensitive
to the prior assumptions within any particular model.
Fan and Brookes reported
their conditional posterior inferences for the four top models for
their Stylos data set, and their posterior model
probabilities and conditional posterior inferences for the
Nuranghe, Achladia and Dimini data.
This was a fascinating
Bayesian analysis which, however, deserves further practical
investigation, e.g. using maximum likelihood and AIC. It is
important not to be blinded by the MetropolisHastings algorithm to
the point where it could snow over and blur both conceptual
inadequacies and your statistical conclusions, or encourage you into
employing overcomplex sampling models.
Caitlin Buck and Sujit
Sahu of the Universities of Cardiff and Southampton were a bit
cagier when they were analysing a couple of twoway contingency
tables, which relate to refuse mounds in Awatowi, Arizona, and to
Mesolithic stone tools from six different sites in Southern England.
One of their objectives in their paper in Applied Statistics
was to give insights about the relative dates of deposition of
archaeological artifacts even when stratigraphic information is
unavailable.
The first of the
contingency tables crossclassifies proportions of five different
pottery types against layer. The second table crossclassifies seven
different types of microlith against site. Both tables would have
benefited from a preliminary loglinear interaction analysis since
this would have helped the authors to assess the strength of
information, and practical significance, in the data, before
disguising it within a specialised choice of model.
The authors instead
referred to an extension of a RobinsonKendall (RK) model which
imposes serial orderings on the cell probabilities, and to a
hierarchical Bayesian model for seriation. They completed their
posterior computations using a hybrid Monte Carlo method based on
Langevin diffusion, and compared different models by reference to
posterior predictive densities. They justified their model
comparison procedure by reference to an entropylike divergence
measure.
The authors showed that
the Awatoni data is better modelled using the RK model, but that
both models give the same ordering d e f g h I j of the
archaeological layers.
For the stone tool
data, the extended RK model and a hierarchical canonical correlation
model give quite different orderings of the six sites. The
coauthors surmise, after some deliberation, that the RK model gives
the better fit.
Buck and Sahu thereby
concluded that the point estimates of the relative archaeological
orders provided by erstwhile nonBayesian analyses fail to capture
other orderings which, given that the data are inherently noisy,
could be considered appropriate seriations. A most impressive
applied Bayesian analysis, and set of conclusions! The authors were
extremely careful when choosing the particular Bayesian techniques
which were best suited to their own situation. They, for example,
steered clear of the Bayes factor trap and the related
supersensitive posterior probabilities for candidate sampling
models.
Caitlin Buck is
Professor of Statistics at the University of Sheffield. She has
published widely in the areas of archaeology, palaeoenvironmental
sciences, and scientific dating, and she is a leader in her field.




Caitlin Buck 

In his JASA vignette ‘Bayesian Analysis : A Look at Today and
Thoughts of Tomorrow’, Jim Berger catalogues successful Bayesian
applications in the areas of archaeology, atmospheric sciences,
economics and econometric , education, epidemiology, engineering,
genetics, hydrology, law, measurement and assay, medicine, physical
sciences, quality management, and social sciences. 



Jim
Berger rolling out the barrel 

Jim also provides the reader with an
excellent reference list for Bayesian developments in twenty
different areas, most controversially ‘causality’, but also
including graphical models and Bayesian networks, nonparametrics
and function estimation, and testing, model selection, and variable
selection.
Jim agrees with many
statisticians that subjective probability is the soul of Bayesian
Statistics. In many problems, use of subjective prior information is
clearly essential, and in others it is readily available; use of
subjective Bayesian analysis for such problems can provide dramatic
gains.
Jim also discusses the
nonobjectivity of ‘objective’ Bayesian analysis, together with
robust Bayesian, frequentist Bayes, and quasiBayesian analysis. I
guess I can’t see too much difference between his objective Bayes
analysis and quasiBayes analysis since both refer to vague priors.
Jim Berger concludes
with his reflections on Bayesian computation and software.
The ASA’s vignettes for
the year 2000 (see the December 2000 issue of JASA) also include
very informative items by other authors on statistical decision
theory (Larry Brown), MCMC (Olivier Cappé and Christian Robert),
Gibbs sampling (Alan Gelfand), the variable selection problem (Ed
George), hierarchical models (James Hobert), and hypothesis testing
and Bayes factors (John Marden), a wonderful resource. 


In 2000, Ehsan Soofi and his
three coauthors published a paper on maximum entropy Dirichlet
modeling of consumer choice, in the Proceedings of Bayesian
Statistical Sciences.
Ehsan Soofi is
Distinguished Professor of Business Statistics at UW Milwaukee. His
research focuses on developing statistical information measures and
showing their use in economic and business applications.
Ehsan once encouraged me
to purchase a new tie while we were attending a Bayesian seminar
together.
In the same year, Mark
Berliner, Christopher Wikle and Noel Cressie coauthored a
fascinating paper ‘Longlead prediction of Pacific SST’s via
Bayesian dynamic models’ in the Journal of Climate.
The following important
Bayesian papers were published in Royal Statistical Society journals
during the year 2000:
A Bayesian Lifetime
Model for the ‘Not 100’ Billboard Songs (Eric Bradlow and Peter
Fader)
Spatiotemporal
Hierarchical Bayesian Modeling: Tropical Ocean Surface Winds
(Christopher Wikle et al)
Bayesian Wavelet
Regression on Curves with Application to a Spectroscopic Calibration
Problem (Philip Brown, Tom Fearn, and Marina Vannucci)
Empirical Bayes
Approach to Improve Wavelet Thresholding for Image Noise Reduction (Maarten
Jensen and Adhemar Bultheel)
RealParameter
Evolutionary Monte Carlo with Applications to Bayesian Mixture
Models (Faming Liang and Felix Martini)
A Bayesian TimeCourse
Model for Functional Magnetic Resonance Data (Christopher Genovese,
with discussion)
Bayesian Regression Modeling with Interaction and Smooth Effects (Paul Gustafson)
Efficient Bayesian
Inference for Dynamic Mixture Models (Richard Gerlach, Chris Carter
and Robert Kohn)
Quantifying expert
opinion in the UK water industry: an experimental study (Paul Garthwaite and Tony O’Hagan)
The diversity of the
authors, their subject matter, and their countries of origin, is
most impressive. The everevolving Bayesian spatiotemporal process
rolls on!
In their discussion papers
in JASA in 2000, Christopher Genovese investigated his
Bayesian timecourse model for functional magnetic resonance data,
and Phil Dawid considered causal inference without counterfactuals,
and I’m sure that he was glad to do without them.
Also in JASA in
2000, Balgobin Nandram, Joe Sedransk and Linda Williams Pickle
described their Bayesian Analysis for chronic obstructive pulmonary
disease, and David Dunson and Haibo Zhou proposed a Bayesian model
for fucundability and sterility. Way to go, guys!
In 2000, Orestis
Papasouliotis described a Bayesian spatiotemporal model in the
fifth chapter of his University of Edinburgh Ph.D. thesis. Orestis
used this to develop a search procedure for discovering efficient
pairs of injector and producer oilwells in hydrocarbon reservoir
where the geological rifts may create a propensity for longterm
correlation.
Orestis’ endeavours led
to our two international patents with Ian Main FRSE, Professor of
Seismology and Rock Physics in our Department of Geology and
Geophysics. Ian also enjoys playing the guitar, and singing in
sessions in Edinburgh’s Old Town.




Ian Main FRSE 

Orestis and I coauthored
four papers with Ian Main and his colleagues in the Geophysics
literature, in 1999, 2001, 2006 and 2007, the first two on topics
relating to the activity of earthquakes. So I guess we turned Ian
into a Bayesian seismologist and geophysicist.
He is currently developing a fully Bayesian
method for earthquake hazard calculation with Richard Chandler of
UCL.
The fourth of these
articles was published in Structurally Complex Reservoirs by the
Geological Society of London. Kes Heffer, of the Institute of
Petroleum Engineering at HeriotWatt was one of the coauthors of
our 2006 and 2007 publications. He was formerly a leading researcher
with BP.
Nowadays, Orestis
Papasouliotis is a very accomplished M&S scientist with Merck Serono
Pharmaceuticals in Geneva. He visited me again in Edinburgh this
year, and he hopes to fly over again soon with his wife and daughter
to visit the penguins in the zoo. 



Orestis Papasouliotis, with his wife and daughter 

By the inception of Anno
Domini 2001, a fresh generation of Bayesians was already moving
way ahead of the quasireligious practices and cults of the past, as
they rose to help meet the social challenges of our economically
volatile, and medically and genetically challenged era.
The quality of the new
era material was reemphasised as the century turned. In their paper
in the Journal of Computational Biology, Michael Newton,
Christina Kendziorski, Craig Richmond, Frederick Blattner and
KamWah Tsui reported their improved statistical inferences
concerning gene expressions from microassay data. An inspired
effort from five free spirits of Bayesian persuasion.




Michael Newton 

The investigators
modelled intensity levels R and approximate target values G by
independent Poisson variates, conditionally on different means and
the same coefficient of variation c. Then the ratio of the means ρ
is the parameter of interest. They take a hierarchical approach
where the conditional means are assumed to be independent and
identically Gamma distributed. The marginal posterior distribution
of ρ, given R and G, is then the distribution of the ratio of two
updated Gamma variates. Moreover, the posterior mean of ρ may be
described as a differential expression of the form 

(R+ν) / (G+ν) 

Where ν reflects three
expressions of interest, including the prior predictive mean of R.
Michael Newton and his
coauthors estimate c and the two prior parameters pragmatically,
using marginal likelihood, and then apply their methodology to
heatshock and Ecoli data. They furthermore refer to some simple
modelchecking techniques.
A wonderfully succinct
analysis which did not require any simulations at all, since exact
algebraic expressions were available using probability calculus.
This sort of analysis is hopefully not becoming a lost art.
In their 2001 paper in Biometrics, James Albert and Sid Chib
applied their sequential ordinal modeling methodology to the
analysis of survival data.




Jim
Albert 

In 2002, Sid coauthored
a paper in the Journal of Econometrics with Federico Nardari
and Neil Shephard which highlighted even more MCMC procedures for
stochastic volatility models.
Siddhartha Chib is the
Harry C. Hartford Professor of Econometrics at Washington University
in St. Louis, and has a fine history of applying Bayesian methods,
e.g. to Economics, very much in the Arnold Zellner tradition. In
2008, he and Edward Greenberg published a review of hierarchical
Bayes modeling in the New Palgrave Dictionary of Economics.
Formerly a don at
Cambridge, Neil Shephard is now Professor of Statistics and
Economics at Harvard University. His work is occasionally Bayesian,
though he does sometimes experience technical difficulties when
constraining his unknown covariance matrices to the interior of the
parameter space.
In 2001, Gareth Roberts
coauthored a paper with Jesper Møller and Antonietta Mira in
JRSSB about perfect slice samplers. I hope he spiced them with
chunky marmalade.
In 2004, Gareth Roberts,
Omiros Papaspiliopoulos and Petros Dellaportas wrote another famous
paper in JRSSB, this time about Bayesian Inference for
nonGaussian OrnsteinUhlenbeck stochastic volatility processes.
These processes are likely to handle stochastic volatility pretty
well because of their elasticity properties. I developed a
logGaussian doubly stochastic Poisson version of them in 1978 in
[64] and applied it to the flashing Green Man pedestrian crossing
data.
Gareth Roberts F.R.S, is
distinguished for his work spanning Applied Probability, Bayesian
Statistics and Computational Statistics. He has made fundamental
contributions to crucial convergence, stability theory, extensions
to the MetropolisHastings algorithm and adaptive MCMC, infinite
dimensional simulation problems, and inference in stochastic
processes. His work has found application in the study of epidemics
such as Avian influenza and foot and mouth disease.
As Professor of
Statistics and Director of CRiSM at the University of Warwick,
Gareth is one of the most distinguished Bayesians to have graced
that now worldrenowned institution. He obtained his Ph.D. there in
1988 on the topic ‘Some boundary hitting problems for diffusion
processes’ and has taken the Bayesian paradigm to fresh heights in
the quarter of a century since.
Petros Dellaportas is
Professor of Statistics and Economics at the University of Athens.
He has developed and computed Bayesian solutions for a variety of
complex problems, including hierarchical, GARCH, and generalised
linear models.
Omiros Papaspiliopoulos
was awarded his Ph.D. in 2003 at the University of Lancaster. His
thesis topic was ‘Noncentred parameterizations for hierarchical
models and data augmentation’.
Formerly an Assistant
Professor in Statistics at the University of Warwick, Omiros is
currently ICREA Research Professor in the Department of Economics at
Universitat Pompeu Fabria.
Omiros received the Guy
Medal in Bronze from the Royal Statistical Society in 2010. He is
one of our up and coming young stars, with a name to fit.
Also in 2001, Chris Glasbey
and Kanti Mardia presented their seminal, effectively Bayesian paper
‘A penalised likelihood approach to image warping’ to a full meeting
of the Royal Statistical Society. The authors achieved new frontiers
in Image Analysis by identifying a new FourierVon Mises model with
phase differences between Fouriertransformed images having Von
Mises distributions. They used their a posteriori smoothing
procedures to (a) register a remotesensed image with a map (b) to
align microscope images from different optics, and (c) to
discriminate between different images of fish from photographic
images. Even Chris Glasbey was impressed.








Chris Glasbey
FRSE 

Kanti Mardia 


Doubtlessly one of
Britain’s and India’s most brilliantly productive mathematical
statisticians, Kanti Mardia has made a number of important Bayesian
and effectively Bayesian contributions. After a preeminent career,
he is currently taking time out as Senior Research Fellow in the
Department of Mathematics at the University of Leeds. While other
leading statisticians are better at selfpromotion, Kanti deserves
all the accolades that our profession can give him.
Kevin Patrick Murphy was an
Associate Professor in Computer Science and Statistics at the
University of British Columbia until 2012, but now works as a
research scientist for Google. In 2001 he published the three
somewhat Bayesian papers,
Linear Time Inference in
Hierarchical HMMs (with Mark Paskin) in Neural Info. Proc.
Systems
The Factored Frontier
Algorithm for Approximate Inference in DBNs (with Yair Weiss)
Uncertainty in Artificial Intelligence
RaoBlackwellised
Particle Filtering for Dynamic Bayesian Networks (with Stuart
Russell). In Sequential Monte Carlo Methods in Practice
(SpringerVerlag)
Kevin Murphy has published
extensively in A.I., Machine Intelligence, Bayesian Statistics, and
probabilistic graphical models, with applications to information
extraction, machine reading, knowledgebased construction, computer
vision and computational biology.
While Murphy’s
remarkable 2012 book Machine Learning a Probabilistic Perspective
is more Bayesian than some approaches to machine intelligence, he is
more concerned about frequency properties than most other Bayesians
working in this area. Good for him!
Radford Neal, another
Canadian Bayesian Statistician and Machine Intelligence expert,
reported his work on annealed importance sampling in 2001 in
Statistics and Computing. This is one of several very useful
papers which Radford has published on Bayesian simulation.




Radford Neal 

In their JASA papers
in 2001, William Bolstad and Samuel Manda described their Bayesian
investigation of child mortality in Malawi which referred to family
and community random effects, our doughty Durham friends Peter
Craig, Michael Goldstein, Jonathan Rougier and Allan Seheult told us
all about their Bayesian forecasting for complex systems, Peter
Westfall and Keith Soper used their priors to improve animal
carcinogeniety tests, and three musketeers from the Orient, namely
Hoon Kim, Dongchu Sun and Robert Tsutakawa followed in the footsteps
of Hickman and Miller by proposing a bivariate Bayesian method for
estimating mortality rates with a conditional autoregressive model.
Enrique González and Josep
Ginebra Molins of the University Polytechnique of Catalonia
published their book Bayesian Heuristics for MultiPeriod Control
in 2001, after working on lots of similarly impressive Bayesian
research in Barcelona.




Josep
GinbraMolins 

In that same year, Aaron
Ellison’s book An Introduction to Bayesian Inference for
Ecological Research and Environmental Decision Making was
published online on JSTOR.
Moreover, Ludwig
Fahrmeir and Stefan Lang reported their Bayesian semiparametric
analysis of multicategorical timespace data in the Annals of
the Institute of Mathematical Statistics. They applied their
methodology most effectively to the analysis of monthly unemployment
data from the German Federal Employment Office, and they reported
some of their results spatially as well as temporally on maps of
Germany.
Fahrmeir and Lang also
reported their Bayesian inferences for generalised additive mixed
models based on Markov random field priors in Applied Statistics.
They use discretized versions of prior processes which could
alternatively be representable via the mean value functions and
covariance kernels of Gaussian processes, and may, if necessary,
incorporate spatial covariates. Their prior to posterior analysis is
completed by MCMC inference.
The authors apply their
methodology to forest damage and to duration of employment data.
While their theory is fairly routine and unembellished, the
applications are interesting.
In the same issue of
Applied Statistics, David Dunson and Gregg Dinse of the U.S.
National Institute of Environmental Health Sciences in Research
Triangle Park report their Bayesian incident analysis of
tumorigenicity data. In most animal carcinogenicity experiments,
tumours are not observable in live animals, and censoring of the
tumour onset times is informative. Dunson and Dinse focus on the
incidence of tumours and censored onset times without restricting
tumour lethality, relying on causeofdeath data, or requiring
interim sacrifices.
The authors’ sampling
model for the four observable outcomes at each death time combine
multistate stochastic, probit, and latent variables assumptions, and
also model covariate effects. Their prior distributions are elicited
from experts in the subject area and refer also to a metaanalysis
which employs a random effects model. These complex assumptions are
applied to a triphosphate study, yielding some interesting posterior
inferences via Gibbs sampling.
This is a topquality,
though highly subjective, investigation which does not refer to
model comparison criteria. The method adjusts for animal survival
and tumour lethality through a multistate model of tumorigenesis
and death. If I had refereed this paper then I would have advised
the authors to check out their assumptions and conclusions a bit
more in empirical terms, just in case they became the state of the
art. 

David Dunson is, like
Jim Berger, currently an Arts and Sciences Distinguished Professor
in the Department of Statistical Science at Duke University. His
Bayesian research interests focus on complex medical data sets and
machine learning applications, and include image and shape analysis. 



David
Dunson 

I once met Gregg Dinse
in Wisconsin. An extremely charming man, he is good at interacting
with subject matter experts, and is also a very prolific applied
researcher. 



Greg
Dinse 

Murray Aitkin’s
article ‘Likelihood and Bayesian Analysis of Mixtures’ was published
in Statistical Modeling in 2001. Murray is nowadays
Professorial Fellow in the Department of Statistics at the
University of Melbourne. He has published many highly innovative
papers in Bayesian areas.
In their pathbreaking 2002 invited paper to the Royal Statistical
Society, David Spiegelhalter, Nicky Best, Brad Carlin and Angelika
van der Linde proposed a Bayesian measure of model complexity and
fit called DIC (the Deviance Information Criterion) that
facilitates the comparison of various choices of sampling model for
a specified data set.




Bradley Carlin 

Let L(θ) denote the
loglikelihood for your px1 vector θ of parameters when a
particular sampling model, with p parameters, is assumed to be true,
and consider the deviance, 



where C is an arbitrary
constant which cancels out in the calculations. Let ED denote the
posterior expectation of the deviance, subject to your choice of
prior assumptions for θ. Then ED measures how well the model under
consideration fits the data.
The effective number
of parameters is, by definition 



Where ξ is some
convenienttocalculate Bayes estimate for θ, such as the posterior
mean vector, if it exists, or, much more preferably the
vector of posterior medians (which give invariance of q under
nonlinear transformations of the parameters). Spiegelhalter et al
quite brilliantly suggest referring to 

DIC = ED+q
=D(ξ)+2q
which penalises ED, or D(ξ),
according to the number of effective parameters in the model.
At the model comparison
stage of your analysis, you could consider using a vague, e.g,
Jeffreys or reference prior for θ, and only referring to your
informative prior assumptions when considering your modelbased
inferences, since their elicitation and application might only serve
to confuse this preinferential procedure. If you are considering
several candidate models, then simply choose the one with the
smallest DIC, maybe after cleaningup the data using an exploratory
data analysis. If q is close to p, and ξ is close to the maximum
likelihood vector of θ ,then DIC will be wellapproximated by
Akaike’s criterion AIC. However, when considering models with
complex likelihoods, DIC will sometimes be easier to calculate e.g.
using Importance Sampling, MCMC or acceptance sampling.
Spiegelhalter et al
justify DIC via convincing, though approximate, information
theoretic arguments which refer to KullbackLiebler divergence.
Similar arguments would appear to hold when ED is replaced by the
posterior median of the deviance; this parallels taking ξ to denote
the posterior median vector of θ.
Since reference priors
maximise Lindley’s expected measure of information it would appear
natural, in principle at least, to use them when calculating DIC.
However, it is often easier to express Jeffreys’ invariance prior in
algebraic terms, or to use some other sensible form of vague prior.
The Deviance
Information Criterion has greatly enhanced the Bayesian paradigm
during the course of the last decade. It has taken Bayesians into
the ‘unconstrainedbyrestrictivesamplingmodels’ ballpark
envisioned by George Box in 1980, and enables us, including the
Economists, to determine scientifically meaningful,
parameterparsimonious sampling models, e.g. as special cases of a
larger allinclusive model, which are by no means as big as an
elephant.
An inferential model
comparison procedure which refers to the full posterior
distributions of the deviances is discussed below in Author’s
Notes. In connection with this, it might be possible to develop
alternatives, also relating to crossvalidation, to Box’s 1980
overall modeling checking criterion, which he believed to be
appropriate for situations where you have no specific alternative
model in mind.
The deviance information
criterion DIC provides Bayesians with a particularly useful and
straightforward way of applying Occam’s razor. See Jeffreys
and Berger [1] for a historical discussion of the razor. But
Jeffreys and Berger quote the thirteenth century English Franciscan
friar, logician, physicist and theologian William of Ockham, who
first envisioned the philosophies of the razor,


‘Pluralitas non est ponenda sine necessitate’
(Plurality is not to be fixed without necessity) 

I wonder whether the
Bayesian Goddess has already elevated the four discovers of DIC to
immortality, as the Four Horsemen of Ockham’s Apocalypse
perchance.
But perhaps the Goddess
of Statistics should wait and see. DIC is not applicable to all
models, as demonstrated by Sylvia Richardson and Christian Robert in
the discussion of the 2002 paper, and by Angelika van der Linde in
her 2005 paper on DIC in variable selection, in Statistica
Neerlandica.
Angelika puts DIC into
context and points to some possible alternatives. One problem is
that we do not, as yet, have an established and elaborate theory for
the estimation of information theoretic quantities like
KullbackLiebler divergence. In their 2008 paper in Statistics,
Van der Linde and Tutz illustrate this problem for the coefficient
of variation for regression models, and this can also be reasonably
related to KullbackLiebler diagnostics.
Gerhard Tutz is a
Professor of Statistics at the University of Munich. He has
published many high quality papers in Bayesrelated areas.




Gerhard Tutz 

In his 2002 paper ‘On irrelevance of alternatives and opinion
pooling’ in the Brazilian Journal of Probability and
Statistics, Gustavo Gilardoni of the University of Brasilia
considered the implications of two modified versions of the
‘irrelevance of alternatives’ axiom. The consequences included a
characterization of the Logarithmic Opinion Pool. This looks very
interesting, Gustavo.
In 1993, Gustavo
published an article in the Annals of Statistics with Murray
Clayton on the reaching of a consensus using De Groot’s iterative
pooling.
Phil Dawid, Julia
Mortera, V. Pascal and D. Boxel reported their probabilistic expert
systems for forensic evidence from DNA profiling in 2002 in the
Scandinavian Journal of Statistics, another magnificent piece of
work.
Julia Mortera is
Professor of Statistics at the University of Rome. She has published
numerous high quality papers on Bayesian methods, including a number
of joint papers with Phil Dawid and other authors, including Steffen
Lauritzen, on Bayesian inference in forensic identification. She is
one of our leading applied women Bayesians, and a very pleasant lady
too. Maybe I should compose a sonnet about her. The Forensic
Empress of Rome, maybe.
Persi Diaconis and Susan
Holmes of Stanford University took a Bayesian peek into Feller
Volume 1 in 2002 in their fascinating article in Sankyā,
and developed Bayesian versions of three classical problems: the
birthday problem, the coupon collector’s problem, and the matching
problem. In each case the Bayesian component involves a prior on the
underlying probability mechanism, which could appreciably change the
answer.
Persi had previously
published his finite forms of De Finetti’s beautiful exchangeability
theorem (De Finetti’s theorem was praised in one of William Feller’s
treasured footnotes in Feller Volume 2), a fundamental paper with
David Freedman on Bayesian consistency, and a dozen De Finettistyle
results in search of a theory.




Persi
Diaconis 

In 2011, Persi Diaconis
and Ron Graham were to coauthor Magical Mathematics: The
Mathematical Ideas that Animate Great Magical Tricks.
Persi is a wonderful statistical probabilist and a great
entertainer.
In 2002, MingHui Chen,
David Harrington and Joseph Ibrahim described some useful Bayesian
cure rate models for malignant melanoma, in the online Wiley
library.
I’m quite interested in
all this, since it’s now fully two years since an artisticlooking
mole was removed from on my wrist leaving a dogbite shaped scar
which I’ve recently flashed to a couple of eminent audiences during
my ongoing public health campaign.
The models included
1. A piecewise
exponential model: The authors proposed a semiparametric
development based on a piecewise constant hazard of the
proportionate hazards model. The degree of nonparametricity is
controlled by J, the number of intervals in the partition.
2. A parametric cure
rate model: This assumes that a certain fraction ρ of the
population are ‘cured’ and the remaining 1 ρ are not cured. The
survivor function for the entire population is ρ+ (1ρ ) S(t) where
S (t) is the survivor function for the noncured group in the
population. The authors construct an elaborate choice of S(t) which
refers to the numbers of metastatic competent tumour cells for each
of n subjects, depending on a parameter of the random times taken
for the tumour cells to produce detectable metastatic disease.
3. A semiparametric
cure rate model: This takes the survivor function to be
representable by a piecewise hazards model. The degree of
nonparametricity is controlled by J the number of unknown parameters
in the model.
For each of these
models, the authors construct a power prior to represent the
prior information concerning the unknown parameters. Their choices
are proportional to the product of a subjectively assessed beta
distribution and a power of the likelihood when it conditions on a
set of hypothetical prior observations.
The authors assess their
models by n separate CPO statistics. The i th. CPO statistic is just
the predictive density of the observed response variable for case i,
when conditioned on the remaining n1 observed response variables.
They also refer to the average logpseudoBayes factor B
which averages the logs of the CPO statistics.
The authors completed
Bayesian analyses of the E1690 timetoevent data for high risk
melanoma using MCMC and variations of each of the three preceding
choices of sampling model. The cure rate models performed equally
well; they fit the data slightly better than the piecewise
exponential model, according to the values of the Bstatistics.
These results have considerable implications on designing studies in
high risk melanoma.
Thank you, guys! After a
nonmalignant mole on my shoulder vanished during a recent scare, my
predictive probability of a malignant recurrence is now close to
zero.
Kevin Gross, Bruce Craig and
William Hutchison published their insightful paper ‘Bayesian
estimation of a demographic matrix model from stagefrequency data’
in 2002 in Ecology.
Bruce Craig is
nowadays Professor of Statistics and director of the statistical
consulting service at Purdue. His areas of interest include Bayesian
hierarchical modeling, protein structure determination, and the
design and analysis of microarray experiments.
The year 2002 was indeed
a good one for Bayesian applications. Carmen Fernandez, Eduardo Ley
and Mark Steel modelled the catches of cod, Greenland halibut,
redfish, roundhouse grenadier and skate in a northwest Atlantic
fishery, and reported their conclusions in Applied Statistics.
Not to be outdone, David
Laws and Tony O’Hagan proposed a hierarchical Bayes model for
multilocation auditing in the online Wiley library. During their
somewhat adventurous presentation they introduced the notion of the
fractional error or taint of a transaction. They
proposed a complicated procedure for the elicitation of their prior
parameters, and their prior to posterior analysis involved oodles of
ratios and products of Gamma functions.
On a more theoretic
note, Stuart Barber, Guy Nason and Bernie Silverman of the
University of Bristol reported on their posterior probability
intervals for wavelet thresholding in JRSSB. They
approximated the first four simulants of the posterior distribution
of each wavelet coefficient by linear combinations of wavelet
scaling functions, and then fit a probability distribution to the
approximate cumulants. Their method assumed either independent
normal mixture priors for the wavelet coefficients or limiting forms
of these mixtures, and this yielded a posterior distribution which
was difficult to handle with exact computations. They however showed
that their approximate posterior credibility intervals possessed
good frequency coverage.
It is not obvious
whether the authors’ adaptive Bayesian wavelet and Bayes Thresh
choices of prior model made their posterior analysis overly
complicated. Indeed their prior formulations meant that the
posterior distribution was not overly robust to outliers. It might
be worth assuming a suitable prior distribution on function space
for the entire wavelet regression function, and then approximating
this on a linear subspace in order to develop a more sensible joint
prior distribution for the wavelet coefficients
In their articles in
JASA in 2002, Valen Johnson, Robert Deaner and Carel von Shenk
very bravely performed a Bayesian analysis of some rank data for
primate intelligence experiments, B.M. Golam Kibrua, Li Sun, Jim
Zidek and Nhu Le reported their Nostradamusstyle Bayesian spatial
prediction of random spacetime fields for mapping PM 2,5 exposure,
and Steven Scott came out of hiding and reviewed the recent Bayesian
recursive computing methodology for hidden Markov models.
Suitably encouraged, I now
survey the RSS journals for 2003 which I’ve just discovered
languishing with my longdiscarded Irving Welsh novels in the
tottery John Lewis bookcase in my spare room.
Alexandra Schmidt and
Tony O’Hagan coauthored an important paper about Bayesian
inferences for nonstationary covariance structures via spatial
deformations.
Ian Dryden, Mark
Scarr, and Charles Taylor report their Bayesian texture
segmentation of weed and crop images using reversible jump MCMC
methods. They model their pixel intensities using second order
Gaussian Markov random fields and the secondorder stationary Potts
model. They take the number of textures in a particular image to
have a prior truncated Poisson distribution, and compute some
interesting, though not that detailed, statistically smoothed onion,
carrot, and sugarbeet images. Their trace plots for the simulated
posteriors are not entirely convincing.
Maura Mezzetti of
the University of Roma Tor Vergata and her five worthy coauthors
propose a Bayesian compartmental model for the evaluation of
1,3butadiene (BD) metabolism. This refers to three differential
equations which represent the quantity of BD in three compartments
as a function of time. The equations depend upon the blood flows
through the compartments, the bloodtissue specific partition
coefficients, the total blood flow, the alveolar ventilation rates,
and body weights.




Maura
Mezzetti 

Mezzetti et al propose a
hierarchical model which assigns prior distributions to the
population parameters and to the individual parameters. They fit
their proposed pharmokinetic model to the BD data with some
interesting general conclusions.
Stephen Walker
addressed the problem of sample size determination by using a
Bayesian semiparametric approach. He expressed his prior
distribution for an unknown sampling distribution as a convolution
of Dirichlet processes and selected the optimal size n of his random
sample by maximizing the posterior expectation of a utility function
which invokes the cost in utiles of taking n observations. Stephen’s
numerical example is absolutely gobsmacking, and, given the hefty
prior specifications required to choose a single value n, I am left
to ponder about the practical importance of his approach.
Michail Papathomas
and Roger Hutching applied their Bayesian updating procedures to
data from the UK water industry. Experts are expected to express
their subjective probabilities for n binary outcomes, both as prior
estimates, and as ‘study estimates’ when more information is
available after a typically expensive study is undertaken. Moreover
the better, ‘study estimates’ should be sufficient for the prior
estimates.
As the joint p.m.f. of
the binary responses typically requires the specification of an
immense number of joint probabilities, the authors refer to a
‘threshold copula’ which generates dependence between the binary
responses for specified marginal distributions by taking conditional
probits to be correlated, using a multivariate normal distribution.
They then employ ‘Jeffreys conditionalization’ as an updating
procedure. Their elicitation of the probability assessments and the
correlations from experts was supervised by statisticians, though
reportedly not in ideal fashion. Finally, they completed a sound
MCMC analysis of pipe data provided by South West Water Services, a
most fascinating study.
Fernando Quintana and Pilar Iglesias present a decision
theoretic formulation of product partition models (PPMs) that allows
a formal treatment of different decision problems such as estimation
or hypothesis testing together with clustering methods. The PPMs are
thereby constructed in the context of model selection. A Dirichlet
process prior is assumed for the unknown sampling distribution, and
the posterior inferences are used to detect outliers in a Chilean
stockmarket set. An excellent contribution from Bayesian Chile. Why
don’t you give the plucky Bolivians their corridor to the sea back,
guys? You’re hogging too much of the coast.
Maria Rita Sebastiani
uses Markov randomfield models to estimate local labour markets,
using a Bayesian texture segmentation approach, and applies these
techniques to data from 287 communes in Tuscany.
Stuart Coles and Luis
Pericchi used likelihood and Bayesian techniques to estimate
probabilities of future extreme levels of a process (i.e.
catastrophes) based upon historical data which consist of annual
maximum observations and may be modelled as a random sample from a
member of the generalised extreme value (GEV) family of
distributions which possess location and scale parameters together
with a kurtosis parameter. They in particular compute the predictive
distribution, given the historical data of a future annual maximum
observation Z, and relatively vague, though proper, prior
assumptions for the three unknown parameters. Their maximum
likelihood and Bayes procedures were applied to good effect to a set
of rainfall data from Venezuela. While the authors’ prior to
posterior analysis amounted to an extremely simple application of
standard Bayesian techniques, the practical conclusions were both
perceptive and fascinating.
While still on the
subject of rainfall, the Scottish BIOSS biomathematicians David
Allcroft and Chris Glasbey described how to use a latent
Gaussian Markov randomfield model for spatiotemporal rainfall
disaggregation. They transform the rainfall observations to supposed
normality using an empirically estimated quadratic function of an
empirically estimated power transformation. In so doing they censor
any, albeit very informative, zero values of rainfall. Not a good
start! I’m surprised that they were allowed to publish it. The
authors then crank the MCMC handle and analysis a retrospective set
of rainfall data from aways in the Red Basin river valley in
Arkansas.
Yoshiko Ogata, Koichi
Katsura and Masaharu Tanemura used a Bayesian hierarchical model
on tessellated spatial regions to investigate earthquakes, which of
course occur heterogeneously in space and time. Their assumptions
generalise a spacetime epidemictype aftershock model, and they
check them out by reference to an elaborate spacetime residual
analysis. The authors constructed a MAP estimate of the
nonhomogeneous region Poisson intensity across a coastal region of
Japan, and a MAP estimate of their hierarchical spacetime model.
This was a magnificent piece of work.
Yoshiko Ogata and his
colleagues have published many papers during the last three decades
which concern the aftershocks of earthquakes. He is on the faculty
of the Institute of Statistical Mathematics in Tokyo.
Maria Rita Sebastiani is
Professor of Economics at the Sapienzia University of Rome. She
obtained her doctorate there in 1998 with the title 

Modelli spaziali
per la stima dei mercati locali del lavaro 

Maria’s areas of
research interest include Bayesian inference, hierarchical spatial
modelling, business mortality risk, European populations and
demography, and transition probabilities to illness, dependency, and
death.
Elsewhere in 2003, William Penny and Karl Friston of the Welcome
Trust Centre for Neuroimaging at UCL used mixtures of generalised
linear models for functional neuroimaging in their article in
IEEE Trans Med Imaging, and constructed some interesting
posterior probability maps.
Gary Chamberlain and
Guido Imbens reported their supposedly nonparametric applications
of Bayesian inference in 2003 in the Journal of Business &
Economic Statistics. Gary is currently the Louis Berkman
Professor of Economics at Harvard.
Moreover, Linda Garside
and Darren Wilkinson applied their dynamic latticeMarkov spatiotemporal
models to environment data in Bayesian Statistics 7, and
Harry Martz and Michael Hamada addressed uncertainty in counts and
operating time in estimating Poisson occurrence rates in their
article in Reliability Engineering&System Safety.




Michael Hamada 

The respected
JapaneseAmerican statistician Michael Hamada obtained his Ph.D.
from the University of WisconsinMadison during the early 1980s
where he attended my Bayesian course and worked as a project
assistant with Jeff Wu and myself at the U.S. Army’s illfated Math
Research Center, which was to finally get squished during the Gulf
War of 1991. Mike nowadays solves top level problems at the Los
Alamos National Laboratory in New Mexico. In 2008 he published the
outstanding book Bayesian Reliability with Alyson
Wilson, Shane Reese and Harry Martz.
Harry Martz is the
principal associate director for Global Security at Los Alamos. Now
that’s a good application of Bayesian reliability. And of addressing
uncertainty in counts.
Bradley Efron’s ASA Presidential address, delivered in Toronto
during August 2004, was entitled Bayesian, Frequentists, and
Scientists.
Professor Efron said, in
summary, that ‘My guess is that a combination of Bayesian and
frequentist ideas will be needed to deal with our increasingly
intense scientific environment.’
Brad discussed his ideas
in the context of breast cancer risk, data from an imaging scan
which quite clearly distinguished between 7 supposedly normal
children and 7 dyslectic children, and a bivariate scatterplot which
measured kidney function against age.
When concluding his
address, Professor Efron said,
‘Now the planets
may be aligning for Statistics. New technology, electronic
computation, has broken the bottleneck of computation that limited
classical statistical theory. At the same time an onrush of new
questions has come upon us, in the form of huge data sets and large
scale inference problems. I believe that the statisticians of this
generation will participate in a new age of statistical innovation
that might rival the golden age of Fisher, Neyman, Hotelling, and
Wald.’
Bradley Efron was
applauded by Frequentists and Bayesians alike, while the Goddess
Fortune hovered in the background.




Bradley Efron 

In his 2004 paper in the Annals of Statistics, Stephen Walker
used martingales to investigate Bayesian consistency. He derives
sufficient conditions for both Hellinger and KullbackLiebler
consistency, which do not rely on the use of a sieve, together with
some alternative conditions for Hellinger consistency, a splendid
contribution.
I’m wondering whether to
write a poem about Stephen’s brave exploits. I could call it ‘Ode to
a Martingale’. There is a martingale on Berkeley Square which drives
everybody spare? Perhaps not.
Stephen is now Professor
of Mathematics at the University of Texas in Austin. His research
focuses on Bayesian parametric and semiparametric methods, with
applications in medical statistics. He obtained his Ph.D. from
Imperial College London in 1995, where he was supervised by Jon
Wakefield.
Richard Boys and Daniel
Henderson published their Bayesian approach to DNA sequence
segmentation in 2004 in Biometrics. Many DNA sequences
display compositional heterogeneity in the form of segments of
similar structure. The authors identified such segments using a
(real) Markov chain governed by a hidden Markov model. They quite
novelly assumed that the order of dependence q and the number of
parameters r were unknown, and took these parameters to possess
independent truncated Poisson priors. The vectors of transition
probabilities were taken, in the prior assessment, to possess
independent Dirichlet distributions.
Boys and Henderson
applied their computersimulated priortoposterior MCMC procedure
to an analysis of the bacteriophage lambda, a parasite of the
intestine bacterium Enchiridia coli. They computed a very
illuminating joint posterior p.m.f. for the parameters q and r, and
checked this out with a prior sensitivity analysis. A highly
original and very thorough piece of work.
Richard Boys is
Professor of Statistics at the University of Newcastle at
NewcastleuponTyne.. He applies his Bayesian ideas to science,
social science, and medicine, and he also has research interests in
statistical biomathematics and stochastic systems biology. Daniel
Henderson is a hardworking teaching fellow in the same department.
Richard’s departmental
colleague Professor Darren Wilkinson is also wellpublished in
similar sorts of areas. When I first met him in 1997, he was still
green behind the ears, but he is now highly accomplished. The
Bayesian Geordies are good to drink with. They sip their beer in
silence for lengthy periods of time, while making the occasional wry
wisecrack out of the depths of their grey matter.








Richard Boys 

Darren
Wilkinson 


In their JASA papers of 2004, Scott Berry and five coauthors
employed their Bayesian survival analysis with nonproportional
hazards for a metanalysis of combination pravastatinaspirin, the
redoubtable Jay Kadane and Nicole Lazar discussed various
modelchecking criteria, Stephen Walker, Paul Damien and Peter Lenk
investigated priors with a KullbackLiebler property, and Nidhan
Choudhuri, Subhashis Ghoshal and Anindaya Roy considered the
Bayesian estimation of a spectral density.
In 2004, the ever
insightful Michael Hamada, Valen Johnson, Leslie Moore and Joanne
Wendelberger coauthored a useful paper in Technometrics on
Bayesian prediction and tolerance intervals.
In the same year, George Streftaris and Gavin
Gibson published their very useful paper 'Bayesian Inference for
stochastic epidemics in closed populations' in Statistical
Modeling.
George and Gavin are respectively Senior Lecturer and
Professor in the Department of Actuarial Mathematics and
Statistics at HeriotWatt.
In the Royal Statistical
Society journals of 2004,
Patrick Wolfe, Simon
Godsill and WeeJing Ng described their Bayesian variable
selection and regularization methodologies for timeseries surface
estimation. They, in particular, analysed the Gabor regression
model, and investigated Frame theory, sparsity, and related prior
dependence structures.
Dan Cornfield, Lehel
Csató, David Evans and Manfred Opper published an invited
discussion paper about their Bayesian analysis of the scatterometer
wind retrieval inverse problem. They showed how Gaussian processes
can be used efficiently with a variety of likelihood models, using
local forward observation models and direct inverse models for the
scatterometer. Their vector Gaussian process priors are very useful.
Randall Eubank and five
coauthors discussed smoothing spline estimation in varying
coefficient models. They used the Kalman filter to compute their
posterior inferences, and developed Bayesian intervals for the
coefficient curves. A very competent piece of research.
Jeremy Oakley and Tony
O’Hagan described their posterior sensitivity analysis of
complex models. This is a very intense paper with lots of novel
ideas, which is well worth scrutinizing in detail. I think.
Tae Young Yang
described his Bayesian binary segmentation procedure for detecting
streakiness in sports. He employed an interesting
integervalued changepoint model but falls straight into the Bayes
factor trap. His posterior inferences would be very sensitive to
small changes, e.g. unit changes in the ‘prior sample sizes’, to the
parameters of his beta priors. What a shame they didn’t teach you
that at Myongji University, Tae.
Tae nevertheless
successfully investigated the assertions that Barry Bonds was, and
Javy Lopez wasn’t, a streaky home run hitter during the 2001 and
1988 seasons, whether the Golden Warriors were a streaky basketball
team during the 20002001 season, and whether Tiger Woods was a
streaky golfer during September 1996June 2001. This is all velly
velly interesting stuff!
Gary Koop followed
that up by describing his Bayesian techniques for modelling the
evolution of entire distributions over time. He uses his techniques
to model the distribution of team performance in Major League
baseball between 1901 and 2000.
Konstandinos Politis and
Lennart Robertson described a forecasting system which predicts
the dispersal of contamination on a large scale grid following a
nuclear accident. They employed a hierarchical Bayesian forecasting
model with multivariate normal assumptions, and computed some
convincing estimated dispersion maps.
Carolyn Rutter and
Gregory Simon of the Center for Health Studies in Seattle
described their Bayesian method for estimating the accuracy of
recalled depression among outpatients suffering from supposed
bipolar disorder [mood swings may well be symptomatic of physical,
e.g organ, gland or sleep, malfunctions rather than the famously
hypothetical ‘biochemical imbalance’] who took part in LIFE
interviews. In the study under consideration, each of 376 patients
were interviewed twice by phone. One of various problems with this
approach is that telephone interviews are likely to increase the
accuracy of recall i.e. during the second telephone call the patient
may remember his first interview rather than his previous mood
state.
It does seem strange
that these patients only received followup telephone interviews as
a mode of treatment. In Britain they would have been more closely
monitored during their mood swings by their community psychiatric
nurses (CPNs). There are often big differences between the moods
observed by the CPNs and the moods perceived by the patients (David
Morris, personal communication) neither of which, of course, may
match reality.
Researchers at the
Center for Health Studies in Seattle have also been known to
advocate the use of an ‘optimal’ regime of atypical antipsychotic
drugs which have statistically established high probabilities
(around 70% in the short term as established by the CATIE study) of
causing intolerable physical side effects among patients suffering
from schizophrenia. This apparent mental disorder could, however, be
caused by calcification of the pineal gland, and maybe even from
high oil concentration in the skin, or from a variety of possible
physical causes.
I therefore have some
mild a priori reservations regarding the sorts of advice the
center in Seattle might give which could influence future mental
health treatment. Rutter and Simon would appear to have omitted a
number of important symptom variables from their analysis e.g.
relating to lack of sleep and thyroid dysfunction. Their conclusions
could therefore be a bit spurious.
Fulvio De Santis, Marco
Perone Pacifico and Valeria Sambucini describe some Bayesian
procedures for determining the ‘optimal’ predictive sample size n
for case control studies.
Let ψ denote a parameter
of interest, for example the logmeasure of association in a 2x2
contingency table, and consider the 100(1α)% highest posterior
density (HPD) Bayesian interval for ψ . Then, according to the
length probability criterion (LPC), we should choose the value of n
which minimizes the expected length of this interval, where the
expectation should be taken with respect to the joint prior
predictive distribution of the endpoints.
The authors generalise
LPC, in order to take variability into account, by choosing the
smallest n such that the prior predictive probability of having an
interval estimate whose length is greater or less than a given
threshold is limited by a chosen level.
In the context of
hypothesis testing they recommend choosing the smallest n such that
the probability that neither the null nor the alternative hypothesis
is ‘strongly supported’ is less than a chosen threshold, where a
further threshold needs to be specified in order to determine
‘strong support’.
The authors apply these
ideas to a practical example concerning the possible association
between nonHodgkin’s lymphoma and exposure to herbicide.
Observations for 1145 patients are used as a training sample that
helps to determine the, very strong, choice of prior distribution,
which is doubtlessly needed to justify the normal approximations.
I’m sorry if I’m
sounding too frequentist, but wouldn’t it be simpler, in terms of
prior specification, to choose the value of n which minimises the
strength, i.e. the average power with respect to some prior measure,
for a test of specified size?
Francesco de Pasquale,
Piero Barone, Giovanni Sebastini and Julian Stander describe an
integrated Bayesian methodology for analysing dynamic magnetic
resonance images of human breasts. The methods comprise image
restoration and classification steps. The authors use their
methodology to analyse a DMRI sequence of 20 twodimensional images
of 256x256 pixels of the same slice of breast. An absolutely
splendid and highly influential contribution. 


Nicky Best and Sylvia
Richardson coauthored nine splendid joint papers with other
coworkers between 2005 and 2009. They include articles on modelling
complexity in health and social science: Bayesian graphical models
as a tool for combining multiple sources of information, improving
ecological inference using individuallevel data, studying place
effects on health by synthesising individual and arealevel incomes,
and adjusting for selfselection bias in case control studies
Nicky Best is Professor
of Statistics and Epidemiology at Imperial College London. She and
Deborah Ashby recently developed a Bayesian approach to complex
clinical diagnoses, with a case study in child abuse. They reported
this, with Frank Dunstan, David Foreman and Neil McIntosh in an
invited paper to the Royal Statistical Society in 2013.
In George Barnard and David
Cox’s days, the statisticians at ICL taught in a charming house on
Exhibition Road. and in the Victorianesque Huxley Building, which
tagged onto the similarly spacious Victoria and Albert Museum and
where, according to George Box, a statistician was once crushed to
death by the much dreaded lift.




The
infamous Huxley Building lift shaft 

Philip Prescott writes
from the University of Southampton ‘the lift had large metal gates
that would have to be careful closed in order that the lift would
open properly. It was usually quicker to walk up the stairs, or ran
if we were late for our lectures.’
Doubtlessly, the
Bayesians at ICL are much safer nowadays.
I last visited ICL in
1978 to teach a short course on Bayesian Categorical Data Analysis.
This was during the final stages of the World Cup in Argentina (I
remember David Cox telling me that Argentina had just beaten Peru
six nil). Nowadays, the statisticians are housed in much more modern
premises on Queen’s Gate, and the life expectancy has improved
somewhat.
Nicky Best received
the Royal Statistical Society’s Guy Medal in Bronze in 2004.
While Sylvia Richardson
is slightly more mathematically inclined, Nicky also shows
considerable practical acumen. Both Nicky and Sylvia have been
heavily involved in Imperial College’s BIAS research program, which
addresses social science data that are notoriously full of missing
values, nonresponses, selection biases and other idiosyncrasies.
Bayesian graphical and hierarchical models offer a natural tool for
linking many different submodels and data sources, though they may
not provide the final answer.
In 2005 the second edition of the bestselling book Statistics
for Experimenters, by George Box, J. Stuart Hunter, and Bill
Hunter was published, a quarter of a century after the first
edition, but with the new subtitle Design, Innovation, and
Discovery. The second edition incorporated many new ideas at Stu
Hunter’s suggestion, such as the optimal design of experiments and
Bayesian Analysis.




J.
Stuart Hunter 

J. Stuart Hunter is
considered by many people to be a wonderful character, a gifted
scientist, and one of the most important and influential
statisticians of the last half century, especially with regard to
applying statistics to problems in industry. He is currently
investigating developments in data mining and machine learning.
In their JASA
papers of 2005, Peter Müller discussed applied Bayesian modeling,
Michael Elliott and Rod Little described their Bayesian evaluation
of the 2000 census, using ACE survey data and demographic analysis,
and Antonio Lijio, Igor Prünster and Stephen Walker investigated the
consistency of semiparametric normal mixtures for Bayesian density
estimation,
John Pratt published his paper ‘How many balance functions does it
take to determine a utility function?’ in 2005 in the Journal of
Risk and Uncertainty.
John is the William
Ziegler Professor Emeritus of Business Administration at Harvard. He
is a traditional Bayesian of the old school, but with all sorts of
inspirational ideas.
Peter Rossi and Greg
Allenby published their book Bayesian Statistics and Marketing
with John Wiley in 2005.
Peter is James Collins
Professor of Marketing, Statistics, and Economics at UCLA. He used
to be one of Arnold Zellner’s happy crew in the University of
Chicago Business School.
Dario Spanò and Robert
C. Griffiths coauthored their paper on transition functions with
Dirichlet and PoissonDirichlet stationary distributions in
Oberwolfach Reports in 2005.




Dario Spanò 

Dario Spanò obtained his
Ph.D. in Mathematical Statistics from the University of Pavia in
2003. In 2013 he was promoted to Associate Professor of Statistics
at the University of Warwick, where he is also director of the M.Sc.
program and a diehard Bayesian to boot.
Robert C. Griffiths FRS
is Professor of Statistics at the University of Oxford.
Also in 2005, Geoff McLachlan and David Peel coauthored A
Bayesian Analysis of Mixture Models for the Wiley online
library. This is the fourth chapter of their 2004 book Finite
Mixture Models, and it is very important in terms of
semiparametric multivariate density estimation. This is a situation
where proper priors are always needed, e.g. any improper prior for
the mixing probabilities will invariably lead to an improper
posterior. Morris de Groot was the first to tell me that. Maybe a
maximum likelihood procedure using the EM algorithm would sometimes
work better in practical terms. Geoff has produced a wonderful
computer package which does just that. See [15], p3. Orestis
Papasouliotis and I applied Geoff’s package to the Lepenski Vir
Mesolithic and Neolithic skeletons data to excellent effect.
Professor McLachlan is
interested in applications in medicine and genetics. He is a
Distinguished Senior Research Fellow at the University of
Queensland. In 2011, he was awarded the Pitman Medal, the
Statistical Society of Australia’s highest honour.
Laurent Itti and Pierre
Baldi published their paper ‘Bayesian Surprise Attracts Human
Attention’ in 2005 in Advances in Neural Processing Systems.
The following year, the authors were to characterise surprise in
humans and monkeys, and to model what attracts human gaze over
natural dynamic scenes.
Researchers in the murky
interior of the USC Hedco Neuroscience building in Los Angeles, who
are heavily involved in the Bayesian theory of surprise, later
developed a ‘bottomsup visual surprise’ model for event detection
in natural dynamic scenes. I wouldn’t be surprised by whatever the
heady people at Hedco come up with next.
Jay Kadane and four
coauthors reported their conjugate analysis of the
ConwayMaxwellPoisson (CMP) distribution in 2005 in Bayesian
Analysis. This distribution adds an extra parameter to the
Poisson distribution to model overdispersion and underdispersion.
Following my
oftreferredto fourdecadeold paradigm (Do transform to a
priori normality!), I might, if the mood strikes, instead assume
a, more flexible, bivariate normal prior for the logs of the two
parameters. This nonconjugate prior extends quite naturally to a
multivariate normal prior for the parameters in generalised linear
models which refer to CMP sampling assumptions.
During 2005, Luis Pericchi
published an article in Elsevier R.V. Handbook of Statistics
entitled ‘Model Selection and Hypothesis Testing based on Objective
Probabilities and Bayes Factors’.




Luis
Pericchi 

Luis Pericchi is
Professor of Mathematics at the University of Puerto Rico. He has
authored numerous influential papers on Bayesian methodology and its
applications, and is highly regarded.
Samuel Kou, Sunney Lie
and Jun Liu of Harvard University reported their Bayesian analysis
of singlemolecule experimental data in 2005 in the invited
discussion paper to the Royal Statistical Society. They investigate
an interesting twostate model for Y(t) the total number of photons
arriving up to time t. This is equivalent to the photons arrivals
following a doubly stochastic Poisson process, and the authors use
their twostate formulation to investigate Brownian motion, and to
solve problems relating to the Deoxyribonucleic acid hairpin and
fluorescence lifetime experiments.
Professor Alan Hawkes
of the University College of Wales at Swansea was slightly puzzled
when he proposed the Vote of Thanks, and he raised a few quibbles.
For example, the authors’ process for photon observation depends on
an underlying gamma process which is not dependant on the sequence
of pulses. And yet it is the pulses that emit photon emission. The
authors addressed these quibbles in their written reply.
In 2005, Thomas Louis
published an article in Clinical Trials on the fundamental
concepts of Bayesian methods.
Louis is Professor of
Biostatistics at the Johns Hopkins Bloomberg School of Health in
Baltimore. He has many Bayesian research interests, for example in
the analysis of medical longitudinal, spatial, and observational
data, and he is very well published.



During 2005 in Applied
Statistics:
Samuel Mwalili, of the Jomo
Kenyatta University of Agriculture and Technology in Nairobi,
Emmanuel Lesaffre and Dominic Declerck used a Bayesian ordinal
regression model to correct for interobserver measurement error in
a geographical health study;
Jaime Peters and five
colleagues at the University of Leicester used Bayesian procedures
to investigate the crossdesign synthesis of epidemiological and
toxicological evidence, using some ingeniously vague choices for
their prior parameters;
Claudio Verzilli, of the
London School of Hygiene and Tropical Medicine, John Whittaker,
Nigel Stallard and Daniel Chapman proposed a hierarchical Bayesian
multivariate adaptive regression spline model for predicting the
functional consequences of aminoacid polymorphisms, and used it to
investigate the lac repressor molecule in Escherida coli. In the
absence of lactose, this molecule binds to the DNA double helix
upstream of the genes that code for enzymes;
Sujit Sahu and Kanti Mardia
propose a Bayesian kriged Kalman model for the shortterm
forecasting of air pollution levels, which assumes that the spatial
covariance kernel belongs to the Matérn family. The authors apply
their Bayesian analysis most successfully to the New York air
pollution data.




Sujit
Sahu 

Sujit Sahu is Professor of
Statistics at the University of Southampton. He is interested in
Bayesian modeling for interpreting large and complex data sets in a
wide range of application areas.
In 2006, Stephen Pitt
reported his efficient Bayesian inferences, with David Chan and
Robert Kohn, for Gaussian copula regression models in Biometrika.
He is one of the most insightful of our upandcoming Bayesian
Economists. I remember chewing the rag with him in 1998 at Valencia
6.
Stephen is Professor of
Economics at the University of Warwick, where he has worked as
liaison officer for the highly Bayesian, integrated single honours
MORSE (Mathematics, Operational Research, Statistics, and Economics)
degree which I helped Robin Reed and the Economists to create in
1975. Stephen’s research areas include financial time series,
nonGaussian state models, stochastic volatility models, and MCMC.
The campus of the
University of Warwick is situated in the southern fringes of the
once bombed out City of Coventry several miles north of the ruins of
the historic Kenilworth Castle where the rebellious thirteenth
century leader Simon de Montfort, the Illfated King Edward the
Second, and King Henry the Fifth, the victor at Agincourt, both
stayed. The better preserved Warwick Castle, with its vibrant
dungeons and strutting peacocks, lies even further to the south.
Therein, the evil Kingmaker of the Wars of the Roses once lived, a
century or so after the losing King at Bannockburn’s butch lover
Piers Gaveston was beheaded by irritated nobles in a ditch nearby.




Warwick Castle 

The University of
Warwick of the late 1960s and early 1970s just consisted of a few
loosely whitetiled buildings scattered across a long tract of farm
land, and the junior academics were reportedly exploited as guinea
pigs by the Napoleonic ViceChancellor. However, after that
gentleman melted into thin air, the campus slowly became
jampacked with buildings and eventually accommodated one of the
most thriving universities in Europe. The MORSE degree, which
started in 1975 with 30 students a year, became world famous with
it.
The current
undergraduate intake for the MORSE and MMORSE degrees is 150 a year,
with many students coming from overseas. A new interdepartmental
B.Sc. degree in Data Analysis will start in 2014.
Getting back to 2006,
Petros Dellaportas, Nial Friel and Gareth Roberts reported their
Bayesian model selection criterion for partially (finitely) observed
diffusion models in Biometrika. For a fixed model
formulation, the strong dependence between the missing paths and the
volatility of the diffusion can be broken down using one of Gareth’s
previous methods. The authors described how this method may be
extended via reversible jump MCMC to the case of model selection. As
is ever the case with reversible jump MCMC, my mind boggles as to
how long the simulations will dither and wander before wandering
close to the theoretical solution.




Nial
Friel 

Nial Friel is Associate
Professor and Head of Statistics at University College Dublin. His
research interests include Bayesian inference for statistical
network models, social network analysis, and model selection. He
obtained his Ph.D. in 1999 from the University of Glasgow.
In his interesting 2006 article in Statistics in Society,
Gene Hahn reexamined informative prior elicitation through the lens
of MCMC methods. After reviewing the literature, he stated four
principles for prior specification relating to the need (1) to
elicit prior distributions which are of flexible form (2) to
minimize the cognitive demands on the expert (3) to minimize the
demands on the statistician, and (4) to develop prior elicitation
methodologies which can be easily applied to a wide range of models.
With these ambitious,
though somewhat expedient, principles in mind, Hahn recommended
eliciting nonconjugate priors by reference to KullbackLiebler
divergence. He applied his ideas to inference about a regression
parameter in the context of set of data on rainfall in York.
Overall, an intriguing study, which seeks to simplify the plethora
of modern prior elicitation techniques.




Gene
Hahn 

Gene Hahn is Associate
Professor of Information and Decision Sciences at Salisbury
University in Maryland. His research interests include management
decision making, Bayesian inference, and international operations
including offshore and global supply chain management. 


In contrast to Gene Hahn’s
prior informative approach, Trevor Sweeting, Gaura Datta, and Malay
Ghosh proposed deriving vague nonsubjective priors by minimizing
predictive entropy loss. Their suggestions in their exquisitely
mathematical paper ‘Nonsubjective priors via predictive relative
entropy regret’ in the Annals of Statistics (2006) may be
contrasted reference priors. Sweeting is one of those wonderful
English surnames which makes you proud to feel that you’re British.
Trevor Sweeting is
Emeritus Professor of Statistics at UCL. His interests also include
Bayesian computations, Laplacian Approximations, and Bayesian
semiparametric hierarchical modeling, and he has published
extensively within the Bayesian paradigm. He is one of the later
generation of UCL Bayesians who, together with Tom Fearn, picked up
the pieces during the decades following Dennis Lindley’s dramatic
departure in 1977, which had all the ingredients of a Shakespearean
play.
The UCL statisticians are now housed in modern premises on Tottenham
Court Road, a couple of hundred yards to the west of the Malet
Street quadrangle, where feisty Administrators and beefeaters once
roamed and well away from the padded skeleton of the purveyor of
happiness Jeremy Bentham, which is still on show in a glass case. An
innocuously quiet, unassuming Bayesian called Rodney Brooks still
beavers away in the Stats department over forty years after he
published a straightforward version of Bayesian Experimental Design
in his UCL Ph.D. thesis, and Mervyn Stone still floats through, over
a century after Karl Pearson first moved into Malet Street with an
autocratic demeanour which was only to be rivalled by Sir Ronald
Fisher. According to the industrial statistician and crafty Welsh
mountain walker Owen Davies, who succeeded Dennis Lindley to the
Chair in Aberystwyth in 1967, Karl and Sir Ronald both accused the
other of behaving too autocratically; indeed their interdepartmental
dispute over who should teach which course was never to be
completely resolved.




The
UCL Centre for Computational Statistics and Machine Learning 

In their 2006 paper in Statistics in Society, John Hay,
Michelle Haynes, Tony Pettitt and Thu Tran investigated a Bayesian
hierarchical model for the analysis of categorical longitudinal data
from a large social survey of immigrants to Australia.
The observed binary
responses can be arranged in an NxJxT array, where each of the N
individuals in the sample can be in any one of J states at each of T
times. Under appropriate multinomial assumptions, the authors took
the threedimensional array of multivariate logits to be constrained
by a linear model that may or may not have random components, but
which depends upon NxT vectors of explanatory variables and NxT
vectors of lagged variables. While the authors could have explained
a bit more what they were up to, they referred to WinBUGS for their,
presumably first stage multivariate normal, prior specifications and
used DIC when comparing different special cases of their general
model specification. Then they used their modelspecific inferences
to draw some tolerably interesting applied conclusions from their
social survey.
John Hay published his
book Statistical Modeling for NonGaussian Time Series Data with
Explanatory Variables out of his 1999 Ph.D.thesis at the
Queensland University of Technology (QUT) in Brisbane.
Tony Pettitt is
Professor of Statistics at QUT. His areas of interest, while working
in that neck of the woods, include Bayesian Statistics, neurology,
inference for transmissions of pathogens and disease, motor unit
number registration, and Spatial Statistics.
In 1788 seven ships set
forth from the Mayflower Steps in Plymouth, England, packed with
West Country petty criminals and sheep stealers, landed in Botany
Bay, and founded Sydney, Australia. No offence, Tony. I was just
trying to wax lyrical. Good luck on your motor registrations.




The
Mayflower Steps 

Meanwhile, Professor
George Kuczera of the Department of Engineering of the University of
Newcastle in New South Wales has established himself as a world
authority on the theory and applications of Bayesian methods in
hydrology and water resources.
Mark Steel, our friendly Dutchman at the University of Warwick, was
his usually dynamic self in 2006. He published the following three
joint papers in JASA during that same year:
Order Based Dependent
Dirichlet Processes (with Jim Griffin),
A constructive
representation of skewed normal distributions (with José Ferreira),
NonGaussian Bayesian
geostatistical modelling (with M.Blanca Palacios).
Mark also coauthored
papers in 2006 with Jim Griffin in the Journal of Econometrics,
and with J.T. Ferreira in the Canadian Journal of Statistics.
By coincidence, our very
own Peter Diggle and his mate Soren Lophaven also published a paper
on Bayesian geostatistical design in 2006, but in the
Scandinavian Journal of Statistics.
Peter Diggle is a
Distinguished Professor of Statistics at the University of Lancaster
down by the Lake District. Peter has authored a number of Bayesian
papers, and his research interests are in spatial statistics,
longitudinal data, and environmental epidemiology, with applications
in the biomedical, clinical, and health sciences. He is very highly
regarded.
Peter is Presidentelect
of the Royal Statistical Society, and received the Guy Medal in
Silver in 1997.
Jim Griffin is Professor
of Statistics at the University of Kent. His research interests
include Bayesian semiparametrics, slice sampling, high frequency
financial data, variable selection, shrinkage priors and stochastic
frontier models.
In their 2006 papers in
JASA, Knashawn Morales, Joseph Ibraham, ChienJen Chen and
Louise Ryan applied their Bayesian model averaging techniques to
benchmark dose estimation for arsenic in drinking water, Nicholas
Heard, Christopher Holmes and David Stephens described their
quantitative study of the gene regulation involved in the immune
response of anopheline mosquitoes, and Niko Kaciroti and his five
courageous coauthors proposed a Bayesian procedure for clustering
longitudinal ordinal outcomes for the purpose of evaluating an
asthma education program.


In their 2007 in Statistics in Society, David Ohlssen, Linda
Sharples and David Spiegelhalter proposed a hierarchical modelling
framework for identifying unusual performance in healthcare
providers. In a special case where the patients are nested within
surgeons a twoway hierarchical linear logistic regression model is
assumes for the zeroone counts which incorporates provider effects
and the logits of EuroSCOREs are used as covariates for casemix
adjustments. This is a special case of a more general formulation
which contains the same sort of parametrization.
The provider
effects are taken to either be fixed or to constitute a random
sample from some distribution e.g. a normal or heaviertailed
tdistribution, a mixture of distributions, or a nonparametric
distribution. I would personally use a large equally weighted
mixture of normal distributions with common unknown dispersion, and
unknown locations. If this random effects distribution is estimated
from the data by either hierarchical or empirically Bayesian
procedures, then appropriate posterior inferences will detect
unusual performances by the health care providers. The authors
seemed to take a long time saying this, but their job was then well
done.
Meanwhile, John Quigley
and Tim Bedford of the University of Strathclyde in Glasgow used
Empirical Bayes techniques to estimate the rate of occurrence of
rare events on railways. They published their results in the
Journal of Reliability and System Safety.
JeanMichel Marin and
Christian Robert published their challenging and high level book
Bayesian Core: A Practical Approach to Computational Bayesian
Statistics in 2007.The authors address complex Bayesian
computational problems in regression and variable selection,
generalised linear models, capturerecapture models, dynamic models,
and image analysis.
If computer R labs are
added to three hours a week lectures, then culturallyadjusted
graduate students with a proper mathematical background can
reportedly hope to achieve a complete picture of Marin and Robert’s
treatise within a single semester. I hope that they are also given
an intuitive understanding of the subjective ideas involved, and are
not just taught how to crank the computational handle.
In their article in
Accident Analysis and Prevention, Tom Brijs, Dimitris Karlis,
Filip Van den Bossche and Geert Wets used an ingenious twostage
multiplicative Poisson model to rank hazardous road sites according
to numbers of accidents, fatalities, slight injuries, and serious
injuries. They assumed independent gamma priors for the seven sets
of unknown parameters, before referring to an MCMC analysis, and
derived the posterior distributions of the ‘expected cost’
parameters for the different sites. These are expressible as
complicated functions of the model parameters.
The authors utilised
this brilliant representation to analyse the official traffic
accidents on 563 road intersections in the Belgian city of Leuven
for the years 199198. Their MCMC algorithm converged quite easily
because the sampling model was so wellconditioned. Their posterior
boxplots described their conclusions in quite beautiful fashion, and
they checked out their model using a variety of predictive pvalues.
Their Bayesian pvalues for accidents, fatalities, severely injured
and slightly injured were respectively 0.208, 0.753, 0.452 and
0.241. The authors therefore deemed their fit to be satisfactory.
This is what our paradigm is all about, folk!
In their 2007 JASA papers,
Chunfang Fin, Jason Fine and Brian Yandell applied their unified
semiparametric framework for quantitative trait loci to the
analysis of spike phenotypes, Anna Grohovac Rappold, Michael Lavine
and Susan Lozier used subjective likelihood techniques to assess the
trends in the ocean’s mixed layer depth, Daniel Cooley, Doug Nychka
and Philippe Naveau used Bayesian spatial models to analyse extreme
precipitation return levels, and Bo Cai and David Dunson used
Bayesian multivariate isotonic regression splines in their
carcinogeniety studies,
In their 2007 paper
in Applied Statistics, Roland De La CruzMesia, Fernando
Quintana and Peter Müller used semiparametric Bayesian
classification to obtain longitudinal markers for 173 pregnant women
who are measured for β human chorionic gonadotropin hormone during
the first 80 days of gestational age.
The data consisted of
the observed response vector y for each patient for the known
timepoints at which their hormone level was observed, together with
a zeroone observation x indicating whether the pregnancy was
regarded as normal or abnormal. A twostage sampling model was
assumes, where the y’s were taken to be independent given a matching
set of random effects parameters. Then the random effects were
assumed to be independent with distributions depending on the
corresponding x’s a vector φ of common unknown parameters, and two
random effects G parameters The posterior classification
probabilities (as to whether a further patient with another y vector
has a normal or abnormal pregnancy) can then be obtained in terms of
the prior probabilities by a simple application of Bayes rule.
When applying this
procedure, it would have seemed important to use empirical estimates
for φ and distributions of the G random effects parameters, in order
to avoid the interpretative difficulties inherent in Bayes factors.
However, the authors represented φ in terms of latent variables in
the form of random matrices, and assumed dependent Dirichlet priors
for the distributions of the G parameters. Phew, Gor blimey!
The author’s proceeded
merrily along with their MCMC computations, and got the posterior
inferences which they, as fulldress hierarchical semiparametric
Bayes factorists, may well have deserved. A nous more practical
common sense would not have gone amiss.
Jose Bernardo presented a highpowered invited discussion paper in
Sort in 2007 on objective Bayesian point and region
estimation in locationscale models. I don’t know what Jose meant by
‘objective’. Isn’t it a bit of a problem for Bayesians to use this
evocative sort of terminology? Observational data are usually more
subjective than objective.
In 2007, the book
Gaussian Markov Random Fields by Håvard Rue and Leonhard Held
was a runnerup for ISBA’s De Groot Prize.
Whoops! We almost missed
out on three exciting publications in the 2007 RSS journals:
Yuan Ji and seven
coauthors used Bayesian mixture models for complex high dimensional
count data in phage display experiments, and proposed an interesting
hierarchical prior distribution for the parameters in a Poisson/
loglinear model for the observed counts, which were taken at
different phages and from different, e.g. mouse, organs. They then
referred to a fulldress, parametric Bayesian analysis. The wonders
of MCMC will never cease
Alexandros Gryparis,
Brent Coull, Joel Schwarz and Helen Suh used semiparametric latent
variable regression models for the spatiotemporal modelling of
mobile source particles in the greater Boston area. They used a
nonlinear factor analytic model together with geoadditive
semiparametric regression assumptions, and sorted out the
identifiability problems with their informative prior
specifications. Their MCMC analysis yielded some beautifullooking,
and neatly coloured maps, of Boston, Massachusetts.
Nicole Augustin, Stefan
Lang, Monica Musio and Klaus von Wilpert, God bless their cotton
socks, applied Bayesian structural additive regression to a spatial
model for the needle losses of pinetrees in the forests of
BadenWürttemberg. A delightful bedtime read.
Pilar IglesiasZuazola, the spiritual leader of Bayesians in Chile,
passed away on the third of March 2007. She was one of the leading
Bayesian researchers in Latin America, and an outstanding educator
who would visit Chilean pupils in their high schools.




Pilar
Iglesias 

After working with
Carlos Pereira, Pilar received her doctorate from the University of
São Paulo in Brazil. Upon becoming a faculty member of the Catholic
University of Chile (PUC), she started making contact with Bayesians
at other Chilean Universities. Consequently, the University of La
Serena, Chile, hosted the First Bayesian Workshop in January 1996.
In 2010, ISBA instituted
the Pilar Iglesias Travel Award in Pilar’s honour. Recipients to
date include Delson Chivabu (South Africa), Jose Ramiro (Chile),
Fernando do Nascimento (Brazil), and Francisco TorresArles (Chile).
In 2007, Samprit Banerjee,
Brian Yandell and Neng jun Yi coauthored a seminal paper ‘Bayesian
Quantitative Trait Loci Mapping for Multiple Traits’ in Genetics.
In 2008 Brian Yandell
taught a course on Quantitative Trait Loci (QTL) mapping at the
University of WisconsinMadison. The goals of his QTL study included
(1) discovering underlying biochemistry (2) finding useful
candidates for medical intervention (3) discovering how the genome
is organised (4) discerning units of natural selection, and (5)
predicting phenotype or breeding value. An outstanding enterprise.
In their 2008 paper in
JASA, Dimitris Fouskasis and David Draper compared stochastic
optimization methods for variable selections in binary outcome
prediction, with application to health policy. They published two
further important Bayesian papers in 2009, with Ioannis Ntzoutras,
which related to Bayesian variable selection with application to
costeffective measurement of quality of health care. Fouskasis and
Ntzoutras? They sound like two upancoming names for the future.
I’m sure they’re the best of buddies.








Dimitris
Fouskasis 

Ioannis
Ntzoutras 


David Draper advocates
‘BayesianFrequentist fusion’, and thinks that Gauss, Galton and
Fisher were early Bayesians who may have fused. He is a Professor of
Applied Mathematics and Statistics at the University of California
at Santa Cruz. A past president of ISBA, he is a prolific applied
Bayesian researcher and prizewinning shortcourse teacher. There’s
a great mug shot of him on the Internet.
In their 2008 JASA
papers, Chiara Sabatti and Kenneth Lange used Bayesian Gaussian
mixture models to analyse highdensity genotype arrays, Shane Jensen
and Jun Liu used a Bayesian clustering procedure to analyse
transcription factor binding motifs, Abel Rodriguez, David Dunson
and Alan Gelfand drew nested Dirichlet processes to the world’s
attention, and David Dunson, Ya Xue and Lawrence Carin used a matrix
stickbreaking process to develop a flexible Bayesian metaanalysis.
I enjoy breaking my matrix sticks at bedtime, Professor Dunson, and
you’re one of the world’s leading Bayesian researchers.
In the same year, the
Spanish mermaid Carmen Fernandez and her three notsofishy
coauthors reported their Bayesian analysis of a two stage biomass
model for the Bay of Biscay anchovy, in the ICES Journal of
Marine Science. Carmen is a very enthusiastic research
scientist at the Spanish Institute of Oceanography, and she has
published 33 very useful papers in Statistics and Fisheries
journals, several of them jointly with her friend and mentor Mark
Steel of the University of Warwick.




Carmen
Fernandez 

Also in 2008, Lei Sun and Murray Clayton coauthored ‘Bayesian
analysis of crossclassification spatial data, and Murray Clayton
and his six coauthors reported ‘Predicting spatial patterns of fire
on a California landscape’ in the International Journal of
Wildland Fire.
Murray Clayton is
Professor of Statistics and Plant Pathology at the University of
WisconsinMadison. As one of the leading Canadian Bayesians he has
published a number of high quality theoretical articles, and many
application papers e.g, in the agricultural sciences. As a student
of Don Berry of the University of Minnesota, Murray is well able to
mix Bayesian theory with practical relevance. Don has done all sorts
of important things too, as well as running a remunerative Bayesian
consultancy business.




Murray
Clayton 

Charles Franklin,
Murray’s eminent colleague in the University of Wisconsin’s
Department of Political Science, teaches multitudinous advanced
level Bayesian methodology courses to social scientists. Up to a few
years ago, he was using my book [15] with John Hsu for course
material, with lots of emphasis on the first chapter on likelihood
formulations and frequency procedures. I’m glad that his students
could understand it. The Statistics graduate students at Wisconsin
used to experience lots of difficulties constructing the
likelihoods, particularly if they hadn’t bothered to take Advanced
Calculus, and some of them didn’t even multiply from 1 to n. But
after that Bayes was a snitch (as long as they were told what the
prior was!). I was once advised that the best thing I taught them
was ‘that neat formula for completing the sum of squares’. Anyway,
Professor Franklin is getting more package orientated nowadays, so
some of the arts of yore may be getting lost.
Trevor Park and George
Casella introduced the Bayesian Lasso in JASA in 2008,
as an interpretation of the Lasso of Tibshirani which
estimates linear regression coefficients through constrained least
squares. The authors showed that the Lasso estimate can be
interpreted as a Bayesian posterior modal estimate when the
parameters have independent Laplace double exponential priors. A
connection with independent double exponential distributions
provides full conditional posterior distributions for MCMC, and the
interval estimates provided by the Bayesian Lasso help to guide
variable selection. 



George
Casella 

Alan Izenman published his celebrated book Modern Multivariate
Analysis Techniques in the same year. Alan takes a broad
perspective on multivariate analysis in the light of the remarkable
advances in computation and data storage and the ready availability
of huge data sets which have been the keys to the growth of the new
disciplines of data mining and machine engineering, Meanwhile, the
enormous success of the Human Genome Project has opened up the field
of bioinformatics. The book presents an integrated mixture of theory
and applications, and of classical, Bayesian, and modern
multivariate analysis techniques.
Alan Izenman is Senior Research Professor in
Statistics at Temple University. In 1976 he and Sandy Zabell
investigated the 1965 New York City blackout and showed that this
did not substantively affect the city’s birthrate as previously
advertised. See [15], p95. In this case ‘induced births’ provide the
confounding variable. A wonderful message to popularist
statisticians everywhere.
Alan also researches on
the interaction between Statistics and the Law, and he has used
Bayesian methods to draw inferences about the amount of drugs
previously smuggled in by a defendant in the pit of his stomach. 



Alan Izenman 

THE INLA PACKAGE:
During 2009, two professors
from Norway and a Frenchman from Paris published a breathtaking
paper in JRSSB which, together with the discussion thereof,
graced 73 pages of the Society’s journal.
Approximate Bayes
inference for latent Gaussian models by using integrated nested
Laplacian approximations is doubtlessly the
most exciting and farreaching Bayesian paper of the 21 st. century.
The now muchcelebrated coauthors were Håvard Rue and Sara Martino
of the Norwegian University for Science and Technology in Trondheim,
and Nicholas Chopin from the Research Centre for Economics and
Statistics in Paris.
Much of Scotland was
once part of the Archdiocese of Trondheim, and maybe that should now
include Edinburgh, and perhaps even London too. The coauthors
completed all of their computations by application of INLA, their
computer package which is thoroughly documented by Martino and Rue
[71]. This manual refers to Sara Martino’s 2007 Ph.D. thesis and to
review papers by Ludwig Fahrmeir and Gerhard Tutz, and others.
The coauthors wrote, in
summary,
Structurally additive
regression models are perhaps the most commonly used class of models
in statistical applications. It includes, among others,(generalised)
linear models, (generalised) additive models, smoothing spline
models, state space models, semiparametric regression, spatial and
spatiotemporal models, logGaussian Cox processes, and
geostatistical and geoadditive models. We consider approximate
Bayesian inference in a popular subset of structured additive
regression models, latent Gaussian models, where the latent field is
Gaussian, controlled by a few hyperparameters and with nonGaussian
response variables. The posterior marginal are not available in
closed form owing to the nonGaussian response variables. For such
models, MCMC methods can be implemented, but they are not without
problems, in terms of both convergence and computational time.
In some practical
applications, the extent of these problems is such that MCMC is
simply not an appropriate tool for routine analysis. We show that,
by using an INLA approximation and its simplified version, we can
compute very accurate approximations to the posterior marginals. The
main benefit of these approximations is computational; where MCMC
algorithms need hours or days to run, our approximations provide
more precise estimates in seconds or minutes. Another advantage with
our approach is its generality, which makes it possible to perform
Bayesian analysis in an automatic, streamlined way, and to compute
model comparison criteria and various predictive measures, so that
models can be compared and the model under study can be challenged.
What more can I say? Well, here goesThe important
influences of the numerically highly accurate conditional Laplacian
approximations of the 1980s, and the soundly based Importance
Sampling computations which began in 1978, took a fair drenching
during the MCMC fever of the 1990s. It may take a little time, but
the tide is beginning to turn towards techniques which can be
algebraically justified and which will more fully test the
mathematical expertise of our Ph.D. students and upandcoming
researchers as they strive to develop technical talents of their
own.
Generalisations of
conditional Laplacian approximations can be used to address most
sampling models under the sun, not just the wonderful range of
structurally additive models considered by Rue, Martino, and Chopin.
Our journals will become filled with wondrous algebra once again,
and our computations will be extraordinarily accurate right down the
tails of our marginal posteriors.
It is as if the
Valhallaesque Gods Woden and Thor and the Goddess Freyja of
Fólkvangr have returned to guide us. Maybe the more ardent of our
nevereverconvergent simulators will decide to take the night bus
to Vulcan, the Ninth World, or wherever.








Sara Martino 

Havard Rue 


In their 2009 JASA paper, David Henderson, Richard Boys, Kim
Krishnan, Conor Lawless and Darren Wilkinson described their
Bayesian emulation and calibration of a stochastic computer model of
mitochondrial DNA deletions in nigra neurons.
Moreover, Athanasios
Micheus and Christopher Wikle investigated their hierarchical
nonoverlapping random disc growth model, Pulak Ghosh, Sanjib Basu
and Ram Tiwari performed a Bayesian analysis of cancer rates using
parametric and semiparametric joinpoint regression models, Jeff
Gill and George Casella described their specification and estimation
of nonparametric priors for ordinal social science models, and
Susie Bayarri, Jim Berger and seven worthy coauthors predicted
vehicle worthiness by validating computer models for functional and
hierarchical data,
Not to forget
Christopher Paciorek and Jason McLachlan’s mapping of ancient
forests. These worthy coauthors developed Bayesian inferential
techniques for spatiotemporal trends in forest composition using
fossil pollen proxy record.
Beat that! JASA
excelled itself in 2009. However, never one to take the back seat,
our very own Bradley Efron described his empirical Bayes estimates
for largescale prediction problems. Do you remember the proud days
of yore when you were shrinking group means towards zero and
inventing the bootstrap, Brad?
Still in 2009, Eva Riccomagno and Jim Smith reported their geometry
of causal probability trees which are algebraically constrained, in
the coedited volume Optimal Design and Related Areas in
Optimization and Statistics.
Gee whiz, Charlie Brown!
I never knew that Statistics could prove causality. I’m sure that
the eighteenth century philosopher David Hume (see my Ch.2) wouldn’t
have approved of causal probability trees unless they were called
something different.
And Fabio Rigat and Jim
Smith published their nonparametric dynamic time series approach,
which they applied to the detection of neural dynamics, in the
Annals of Applied Statistics, a wonderful contribution.
Professor Jim Q. Smith’s
highly innovative Bayesian research began at the University of
Warwick during the 1970s, and it is still continuing unabated. He
published his book Bayesian Decision Analysis with
Cambridge University Press in 2010, and has recently worked on
military training applications of his ‘decision making under
conflict’ procedures.
That’s an interesting
application of posterior expected loss, guys. I hope that the
soldiers benefit from maximising their expected utility. Of course,
if they’re dead then optimal long term expectations won’t help them.
Jim is currently the
holder of an EPSRC grant with Liz Dowler and Rosemary Collier to
investigate ways groups of experts can ensure coherence of their
judgements when managing food crises. That’s a tall order, Jim. Many
of our working population are currently starving and struggle to
remain both coherent and in the land of the living.
To cap Jim’s exploits,
M.H. Rahaman Khan and Ewart Shaw published a paper in the
International Journal of Interdisciplinary Social Sciences in
2009. In this paper the reported their hierarchical modeling
approach for investigating for determinants of contraceptive use in
Bangladesh.
That reminds me of the
time I got a rubber stuck in my ear during a chess tournament in
Oshkosh, Wisconsin, guys. All the medics just stood around and
laughed. But I won the tournament and became champion of NorthEast
Wisconsin for 1992.
Ewart Shaw is Principal
Teaching Fellow in Statistics at the University of Warwick. He is
also a wellpublished researcher, and his research interests include
Bayesian Inference, numerical methods, number theory, coding theory,
computer algebra in statistics, survival analysis, medical
statistics and splines.
Ewart’s a big
contributor and deserves more of the cherry pie.
Speaking of cherry pie,
Lyle Broemeling’s 2009 text Bayesian Methods for Measures of
Agreement draws on data taken from various studies at the
University of Texas MD Anderson Cancer Center. An admirable
enterprise.
And Mohammad Raqab and
Mohamed Madi coauthored an article in Metron in 2009
describing their Bayesian analysis for the exponentiated Raleigh
distribution.
Madi is Professor of
Statistics and Associate Dean of Economics and Business at the
United Arab Emirates University, and a prolific Bayesian researcher.
Raqab is Professor of
Statistics at the University of Jordan in Amman.
In 2009, ISBA awarded the prestigious De Groot Prize to Carl Edward
Rasmussen and Christopher K.I. Williams for their book Gaussian
Processes for Machine Learning.
To cap that,
Sandy Zabell described his philosophy of inductive logic from a
Bayesian perspective in The Development of Modern Logic (ed.by
Leila Haaparenta for Oxford University Press). Sandy would’ve
doubtlessly got on well with Richard Price.
In the same year,
Sebastjan Strasek, Stefan Lang and numerous coauthors published a
paper in the Annals of Epidemiology with the
impressively long title,
Use of penalized splines
in extended CoxType hazard regression to flexible estimate the
effect of timevarying serum uric acid on risk of cancerincidence:
a prospective population study in 78850 men.
Thank you, gentleman. I’ll
keep taking my allopurinol.




Stefan
Lang 

Stefan Lang, is a
University Professor of Statistics in Innsbruck. His research
interests include Bayesian semiparametric regression, and
applications in marketing science, development economics and
insurance mathematics.
Rob Kass wrote an excellent analysis of Sir Harold Jeffreys’ legacy
in Statistical Science in 2009, largely by reference to
Jeffreys’ Theory of Probability. Sir Harold often
approximated the posterior distribution by a normal distribution
centred on the maximum likelihood estimate, and he was also a great
fan of Bayes factors. So he weren’t perfect.
Gene Hwang, Jing Qiu and
Zhigen Zhao of the Universities of Cornell and Missouri reported
their empirical Bayes confidence intervals in 2009 in JRSSB.
Their estimates and intervals smooth and shrink both the means and
the variances in the heterogeneous oneway ANOVA model.
Quite remarkably, the
authors make exactly the same exchangeable prior assumptions for the
treatment means and logvariances that I proposed in my 1973 Ph.D.
thesis and in my 1975 paper in Technometrics. However, the
authors derive some elegant, but approximate doubleshrinkage
confidence intervals, empirically estimate the hyperparameters in
quite appealing fashion, and algebraically derive some outstanding
frequency coverage probabilities. Perhaps I should put their
solution into another Appendix. I only wish that I’d had the nous to
derive these very useful results myself.
BAYESIAN GENOMICS: In 2009, the muchrespected highflying
professors Matthew Stephens and David Balding of the University of
Chicago and Imperial College London published their paper, ‘Bayesian
statistical methods for genetic association structure’ in Nature
Reviews Genetics.
The authors write,


Bayesian statistical
methods have recently made great inroads into many areas of science,
and this is now extending to the assessment of association between
genetic and disease of other phenotypes. We review these methods,
focussing on single SNP tests in genomewide association studies. We
discuss the advantages of the Bayesian approach over classical
(frequency) approaches in this setting and provide tutorials in
basic analysis steps, including practical guidelines for appropriate
prior specification. We demonstrate the use of Bayesian methods for
fine mapping in candidate regions, discuss metaanalysis and provide
guidance for refereeing manuscripts that contain Bayesian analysis.
The approach subsequently reported by the authors depends almost
entirely on the sorts of Bayes factors which I have critiqued during
the course of this concise history, and Matthew and David do make
some attempts to address the paradoxes, prior sensitive problems,
and poor frequency properties that are usually associated with this
‘measures of evidence. However, Matthew and Stephen could circumvent
these difficulties by associating a BaskurtEvansstyle Bayesian
pvalue with each of their Bayes factors.
Why don’t you trying
perturbing your conditional prior distributions under your composite
alternative hypotheses with a tiny blob of probability way out in
the right tail, guys? I think that your Bayes factors, however
finetuned, would go bananas. Keep the mass of the blob of
probability constant and zoom it right off towards infinity. I think
that the tails of the ‘outlier prone’ mixture priors you select will
still be too thin to adequately accommodate this.
[Professor Balding kindly responded to these comments in
early January 2014 by partly agreeing with them. He feels that he
and his coauthor should have focussed more on an estimation, rather
than a Bayes factor approach. However, Professor Stephen has advised
us that their Bayes factors would remain stable if we let a blob of
probability in the right tail zoom off to infinity. This surprises
me since their conditional prior distributions under the alternative
hypothesis refer to mixtures of thintailed normal distributions.
However, the Bayes factors will anyway be highly dependent upon the
choices of these conditional prior distributions. David advises me
that Bayes factors were first proposed in this context by Peter
Donnelly and his coauthors in 2007 in their landmark paper in
Nature]
The derivations proposed by
Stephens and Balding do depend heavily on specific genetic
assumption. Perhaps we should look for something entirely different,
like a geneticassumptionfree direct data analysis of the
statistical observations provided, which could then be parametrized
by a mainstream statistical sampling model.
During 2009, Byron Morgan and his several coauthors published their
book Bayesian Analysis for Population Ecology with CRC
Press.




Byron
Morgan 

Byron Morgan is
Professor of Applied Statistics at the University of Kent, and his
research interests include Bayesian methods and population dynamics,
stochastic models for molecular biology, and statistical ecology.
In their 2010 article in the Annals of Applied Statistics,
Xia Wang and Dipak Dey applied their Bayesian generalised extreme
value regression methodology for binary response data to an
application to electronic payments system adoption. Xia is an
assistant professor at the University of Cincinnati, with an
interest in applying her Bayesian ideas to genomics and proteomics
data, and spatial and spatiotemporal statistics. She is clearly a
very bright young researcher. In their article in the JSM 2011
Proceedings, she and Nell Sedransk reported their analysis of
some Bayesian models on biomarker discovery using spectral count
data in the labelfree environment.




Xia
Wang 

The eminent Indian
statistician Dipak Dey has coauthored several papers with Xia Wang.
He is a Distinguished Professor of Statistics at the University of
Connecticut, and his many influential Bayesian and decision
theoretic publications include his 1998 book Practical
nonparametric and semiparametric Bayesian statistics,
Xia is in good company.
Her buddy Nell Sedransk is the Associate Director of the U.S.
National Institute of Statistical Sciences, and Professor of
Statistics at North Carolina State University.
During the late 1970s,
Nell and her husband Joe were two of the last faculty members to
leave the once celebrated Department of Statistics at SUNY at
Buffalo, which did not resurrect itself until several years
afterwards. Nell and Joe were always very dynamic, and Nell has made
many Bayesian contributions. Her application areas of interest
include physiology, medicine, multiobserver scoring in the social
sciences, and ethical designs for clinical trials.




Nell
Sedransk 

Joe Sedransk is
currently Professor of Statistics at Case Western University in
Cleveland. He has also published a number of important Bayesian
papers. I met Nell and Joe when I gave a seminar at SUNY at Albany
in 1978. They were very kind and hospitable and took me for a drive
in Vermont. I ate a massive Tbone steak for lunch, and felt guilty
afterwards because it cost so much.
In their 2010 articles in JASA, J. McLean Sloughter, Tilmann
Gneiting and Adrian Raftery used Bayesian model averaging and
ensembles for probabilistic wind forecasting, and Lu Ren, David
Dunson, Scott Lindroth and Lawrence Carin attained the fabled
heights by analysing music using dynamic semiparametric Bayesian
models. Scott Lindroth is Professor of Music and ViceProvost for
the Arts at Duke University.
Moreover, Kwang Woo Ahn,
Kung Sik Chan and my old friend Michael Kosov addressed a problem in
pathogen diversity using their Bayesian inferences for incomplete
multinomial data, and Soma Dhavala and six determined coauthors
performed a gene expression analysis of their bovine salmonella data
by reference to their Bayesian modeling of MRSS data.
Not to be outdone,
Jonathan Stroud and four quite predictable coauthors developed an
ensemble Kalman filter and smoother for assimilating satellite data,
and Morgan C.Wang, Mike Daniels, Daniel Scharfstein and Susan Land
proposed a Bayesian shrinkage model for incomplete longitudinal data
and applied it to a breast cancer prevention trial.
Also in 2010, Teddy Seidenfeld, Mark Schervish and Jay Kadane
reported their coherent loss functions on uncertainty in Synthese,
and Jay Kadane published ‘Amalgamating Bayesian experts: a sceptical
view’ in Rethinking Risk Measurement and Reporting, Vol. 1.
All deep blue high Bayesian stuff.
Good on you, Jay! I’m
sceptical too.
In the same year, Tom
Fearn et al reported their inverse, classical, and nonparametric
calibrations in a Bayesian framework, in the context of infrared
spectroscopy, in the Journal of Near Infrared Spectroscopy.
Tom is Head of
Statistical Science at UCL. He has worked in many application areas,
including food and agriculture, analytic chemistry, and medicine.
His 1975 Biometrika paper ‘A Bayesian Approach to Growth
Curves’ came out of his UCL Ph.D. thesis. His Ph.D. supervisor was
Dennis Lindley, and his department is still as active as ever,
though in less charismatic ways.
Manuel Wiesenfarth and
Thomas Kneib attempted in 2010 to use Bayesian geoadditive selection
models in JRSSC to correct for nonrandomly selected data in
a twoway hierarchy, a most ambitious task worthy of Hercules
himself.
The authors’ selection
equation was formulated as a twoway binary probit model, and the
correlations between the response variables were induced by a latent
Gaussian model representation. Uniform priors were assumed for the
parametric effects, Bayesian Psplines were used to model the
nonparametric effects, a Markov randomfield was used to model the
spatial effects, and further prior assumptions were made to
facilitate a full hierarchical Bayesian analysis. The MCMC computed
posterior inferences led to a very interesting analysis of a set of
relief supply data from communities in Pakistan which were affected
by the 2005 earthquake in Azad Jammu Kashmir province. A wonderful
piece of work.
In the same year, Qi
Long, Rod Little, and Xihong Lin addressed, in their paper in
Applied Statistics, the conceptually formidable task of
estimating ‘causal’ effects in trials involving multitreatment
arms. They applied their theoretically intense Bayesian procedure to
the analysis of the ‘women take pride’ cardiac bothersome data, and
reported the posterior mean and standard deviation of a variety of
‘causal’ effects, together with the corresponding 95% Bayesian
credible intervals. A difficult article to unravel, but I am sure
that it was all good stuff.
Christophe Andreu,
Arnaud Doucet and Roman Holenstein read a long invited discussion
paper to the Royal Statistical Society in 2009 and it was published
the following year in the Series B of the Society’s journal , at
which time the new terminology ‘Particle MCMC’ was thrust like a ray
of beaming sunlight into the Bayesian literature. Particle MCMC can
be used to deal with high dimensionality and complex patterns of
dependence in statistical models. Whether PMCMC works well in
practice, only time will tell. Maybe we need a special journal for
purely computational articles like this.
In their 2010 article
‘Perceiving is believing: A Bayesian approach to explaining the
positive symptoms of schizophrenia’ in Nature Reviews:
Neurosciences, Paul Fletcher and Chris Firth of the Universities
of Cambridge and Aarhus write:
Advances in cognitive neuroscience offer us new ways to understand
the symptoms of mental illness by uniting basic neurochemical and
neurophysiological observations with the conscious experiences that
characterize these symptoms. Cognitive theories about the positive
symptoms of schizophrenia —hallucinations and delusions have
tended to treat perception and belief formation as distinct
processes. However, recent advances in computational neuroscience
have led us to consider the unusual perceptive experiences of
patients and their sometimes bizarre beliefs as part of the same
core abnormalitya disturbance in errordependant updating of
inferences and beliefs about the world. We suggest that it is
possible to understand these symptoms in terms of a disturbed
hierarchical Bayesian framework, without recourse to separate
considerations of experience and belief.
Thank you, gentlemen.
Perhaps you should consider calcification of the pineal gland as a
confounding variable.
Sonia Petrone and Piero
Veronese coauthored ‘Feller Operators and mixture priors in
Bayesian NonParametrics’ in 2010 in Statistica Sinica.
Sonia is a Bocconi Full
Professor of Statistics at Bocconi University. She will be the next
president of ISBA in 2014. Good luck, Sonia!




Sonia
Petrone (ISBA President 2014) 

In 2010, Murray Aitkin
published his path breaking book Statistical Inference: An
Integrated Bayesian/Likelihood Approach. Rather than
using Bayes factors or DIC, Murray refers to likelihood ratios as
the primary measure of evidence for statistical model parameters and
for the models themselves. He then uses Bayesian noninformative
inferences to interpret the likelihood ratios.
For further discussion,
see Author’s Notes (below). I have always admired Murray
Aitken’s ingeniously pragmatic approach to Bayesian inference.
Alicia Carriquiry and three
courageous coauthors reported their Bayesian assessment of the
effect of highway passes on crashes and crash rates in 2011 in the
Journal of Safety Research.
Alicia is Professor of
Statistics and Associate Provost at Iowa State. She served as
president of ISBA in 2001.
In 2011, Jay Kadane was
awarded the muchsoughtafter De Groot prize by ISBA for his
eloquently written book Principles of Uncertainty.
Daniel Thorburn
published his conference paper ‘Bayesian probability and methods’ in
the Proceedings of the first AfricaSweden Conference in
Mathematics in the same year.
Thorburn is Professor of
Statistics at the University of Stockholm, and he has, with Håvard
Rue, done much to foster the Bayesian paradigm in Scandinavia.
Rob Kass published an
invited discussion paper in Statistical Science in 2011
entitled ‘Statistical Inference; The Big Picture.’
Kass wrote to the effect
that,
Statistics has moved
beyond the frequentistBayesian controversy of the past. Instead a
philosophy compatible with statistical practice which I refer to
here as ‘statistical pragmatism’ serves as a foundation for
inference. Statistical pragmatism is inclusive and emphasise the
assumptions that connect statistical models with observed data.
In his diagram of
the ‘big picture’, Rob connects the data of the real world with the
interactions between the scientific models and statistical models of
the theoretical world, and indicates that the data and the
interaction between the scientific and statistical models together
imply the statistical and scientific conclusions.
That’s all too true,
Professor Kass. You’re teaching some of us Grannys how to suck eggs.
But don’t forget to emphasise that the statistician should be
interacting with the scientific expert as often as he can during
this process.
In the same year, William
Kleber, Adrian Raftery and Tilmann Gneiting coauthored a paper in
JASA entitled ‘Geostatistical model averaging for locally
calibrated probabilistic quantitative precipitation forecasting’. I
am sure that they took Rob Kass’s notion of statistical pragmatism
to heart.
In 2011, Jennifer Hill published her paper
Bayesian NonParametric Modeling for Causal Inference in the
Journal of Computational and Graphical Statistics.
Jennifer's very insightful second line of work pursues
strategies for exploring the impact of violations of typical
assumptions that require that all possibly confounding variables
have been measured. She is a very dynamic Associate Professor of
Applied Statistics at NYU.




Jennifer Hill 

Brian Reich, Montserrat
Fuentes and David Dunson proposed some brand new methods for
Bayesian spatial quantile regression in their outstanding 2011 lead
article in JASA.
As they were a touch
keener on money supply, Alexandro CruzMarcelo, Katherine Ensor and
Gary Rossner investigated corporate bonds by using a semiparametric
hierarchical model to estimate term structure.
David Dunson retaliated
by investigating nonparametric Bayes stochastically ordered latent
class models with HongXia Yang and Sean O’Brien.
In a rare, high quality,
singleauthored paper, Yhua Zhao analysed high throughput assays by
reference to posterior probabilities and expected rates of discovery
in multiple hypothesis testing.
And Shane Jensen and
Stephen Shore rolled out the barrel by using semiparametric
Bayesian modeling to investigate volatility heterogeneity,
Other highlights of 2011
included the tutorial Bayesian NonParametrics with Peter
Orbanz of Columbia University and Yeh Whye Tey of Oxford University,
which was taught in the context of Machine Learning,
A novel
dataaugmentation approach in JASA by J. Ghosh and Merlise
Clyde employinb RaoBlackwellization for variable selection and
model averaging in linear and binary regression,
A paper in Demography
by Adrian Raftery and six coauthors entitled, ‘Probabilistic
Properties of The Total Fertility Rate of All Countries’,
An article in the
Proceedings of Maximum Entropy by Jonathan Botts and Ning Xiang
on Bayesian inference for acoustic impedence,
And,
A splendid paper in
JASA by Qing Zhou on multidomain sampling with application to
the structural inference of Bayesian networks. 


George Casella (19512012), a leading figure in the field of
Statistics passed away in June 2012, after a nineyear battle with
multiple myeloma. He was 61.
According
to Ed George and Christian Robert,
George Casella’s influence on research and education in
Statistics was broad and profound. He published over 200 articles,
coauthored nine books, and mentored 48 MS and Ph.D. students. His
publications included high impact contributions to Bayesian
analysis, clustering, confidence estimation, empirical Bayes,
frequentist decision theory, hypothesis testing, model selection,
Monte Carlo methods, and ridge regression. Of his books, Statistical
Inference (with Roger Berger) became the introduction of choice to
mathematical statistics for vast numbers of graduate students; this
is certainly the book that had the most impact on the community at
large.
In 1996, George joined a legendary figure of Statistics, Erich
Lehmann, to write a thorough revision of the already classical
Theory of Point Estimation which Lehmann had written himself in
1983. This collaboration resulted in a more modern, broader, and
more profound book that continues to be a key reference for courses
in mathematical statistics.
An ISI highly cited researcher, George Casella was elected a
Foreign Member of the Spanish Royal Academy of Sciences, selected as
a Medallion lecturer for the IMS, and received the Distinguished
Alumnus Award from Purdue University. His laughter remains with us.
Forever.
In 2012, David Rios Insua, Fabrizio Ruggeri and Michael Wiper
published their splendid volume on Bayesian Analysis of
Stochastic Process Models. This follows Rios Insua’s and
Ruggeri’s publication in 2000 of their coedited collection of
lecture notes, Robust Bayesian Analysis.
Michael Wiper is on the faculty of the Carlos the Third
University of Madrid. His fields of interest include Bayesian
Statistics, inference for stochastic processes, and software
reliability.
Fabrizio Ruggeri is a research director at the Italian National
Research Council’s Institute of Mathematical Applications and
Information Technology in Milan. He was an outstanding president of
ISBA in 2012, and his areas of Bayesian application include
healthcare, quality and reliability.
David Rios Insua is full professor at Rey Juan Carlos University
in Madrid. He is the son and disciple of Sixto Rios (19132008), the
father of Spanish Statistics. Sixto founded the Decision Analysis
group of the Spanish Royal Academy of Sciences during the 1960s, and
David is the youngest Fellow of the same illustrious academy. He has
applied his Bayesian methodology to neurononal networks, adversial
risk analysis, counterterrorism, and many other areas.




David
Rios Insua 

Jésus Palomo, who was
one of Rios Insua’s and Ruggeri’s Ph.D. students, was a cited
finalist for ISBA’s Savage Prize in 2004. His thesis title was
Bayesian methods in bidding processes.
In his JASA paper of
2012, Tristan Zajoric reported his Bayesian inferences for dynamic
treatment regimes, and used them to improve mobility, equity and
efficiency in student tracking.
In the same issue,
Alexandro Rodrigues and Peter Diggle used Bayesian estimation and
prediction for lowrank doubly stochastic logGaussian Poisson
process models, with fascinating applications in criminal
surveillance.
If that wasn’t
enough to put me into a neurosis, Lane Burgette and Jerome Reiter
described some nonparametric Bayesian imputation techniques when
some data are missing due to the midstudy switching of measurement
methods, and, to cap that, the ubiquitous Valen Johnson and David
Roswell investigated Bayesian model selection criteria in
highdimensional settings.
In their 2012 JASA
discussion paper, the remarkable Bayesian quintet consisting of
Ioanna Manolopoulov, Melanie Matthew, Michael Calaban, Mike West and
Thomas Kepler employed Bayesian spatiodynamic modeling in cell
motality studies, in reference to nonlinear taxic fields guiding the
immune response.
Meanwhile, the
hyperactive quartet of Laura Hatfield, Mark Boye, Michelle Hackshaw
and Bradley Carlin used multilevel models to predict survival times
and longitudinal patentreported outcomes with many zeros.
And in the lead article
of the 500^{ }th. issue of the Journal of the American
Statistical Association, the Famous Five, namely William Astle,
Maria De Iorio, Sylvia Richardson, David Stephens and Timothy Ebbels,
investigated their Bayesian model of NMR spectra for the
deconvolution and quantification of metabolites in complex
biological mixtures.
Also in the celebrated
500 th. issue, Michelle Danaher, Anindya Roy, Zhen Chen, Sunni
Mumford and Enrique Silverman analysed the Biocycle study using
MinkowskiWehl priors for models with parameter constraints. A
vintage year for the connoisseurs!
During August 2012, Tony O’
Hagan led an interdisciplinary ISBA online debate entitled Higgs
BosonDigest and Discussion about the statistics relating to the
Higgs Boson and the Hadron Collider, which concerned the physicists
predetermined standards for concluding that a particle resembles the
elusive Boson. The physicists require a test statistic to be at
least five standard errors from a null hypothesis, but didn’t
understand how to interpret this e.g. in relation to practical
significance when the confidence interval is very narrow. Some of
the participants in the debate ignored the possibility that the test
statistic might not be approximately normally distributed. While the
physicists take their observed counts to be Poisson distributed,
this assumption could itself be inaccurate since it could be
influenced by overdispersion. However, a number of imaginative
solutions were proposed by the Bayesian participants.




Simulated particle traces from an LHC collision in which a Higgs
Boson is produced.
Image Credit: Lucas Taylor 

Other highlights of the year
2012 included the following papers in the Royal Statistical
Society’s journals:
Species nonexchangeability
in probabilistic ecotoxicological risk assessment, by Peter Craig,
Graeme Hickey, Robert Luttik and Andy Hart,
Spacetime modelling of
coupled spatiotemporal environmental variables, by Luigi Ippoliti,
Pasquale Valentini and Dani Gamerman,
Bayesian Loptimal exact
design for experiments with biological kinetic models, by Steven
Gilmour and Luzia Trinca
Combining outputs from the
North American Regional Climate Change Assessment Program by using a
Bayesian hierarchical model, by Emily Kang, Noel Cressie and Stephan
Sain,
And,
Variable selection for high
dimensional Bayesian density estimation to human exposure
simulation, by Brian Reich and his four worthy but under exposed
coauthors.
Thomas Bayes was the real
winner in the US presidential elections on 9 th. November 2012,
according to a message to ISBA members from Charles Hogg. In 2010,
Charles had published a magnificent discussion paper in Bayesian
Analysis with Jay Kadane, Jong Soo Lee and Sara Majetich
concerning their inferential Bayesian error analysis for small angle
neutron scattering data sets.
As reported by Charles
Hogg, Nate Silver constructed a Bayesian model in 2008 to forecast
the US general election results. Silver won fame for correctly
predicting 49 of the 50 States, as well as every Senate race. That
brought him a New York Times column and a much higher profile.
In 2012, Nate’s
continuing predictions that Obama would win earned him a curious
backlash among pundits. While few of the criticisms had any merit,
most were mathematically illiterate, indignantly mocking the idea
that the race was anything other than a tossup, Nevertheless, Nate
confounded his critics by correctly predicting every single state.
Charles Hogg was
quick to advise us that Nate did strike lucky in Florida. Nate
‘called’ this state with a 50.3% Bayesian probability, essentially
the proverbial cointoss, a lucky gold coin perhaps. Way to go, Mr.
Silver! Did you use a Jeffreys prior or a conjugate one?
In his 2013 Kindle book
The Signal and the Noise, which is about Bayesian prediction
in general, Nate Silver assigned a prior probability of 1/20000 to
the event that at least one plane is intentionally crashed into a
Manhattan skyscraper on a given day. He then used Bayes theorem to
update his prior probability to a probability of 0.385 that one
plane crash is part of a Terrorist attack, and then to a probability
of 99.99% that two plane crashes amount to a terrorist attack.




Nate
Silver 

Maybe Nate should rename
his book The Sound and the Fury, though a guy called Bill
Faulkner once used a title like that. Calling it The Power and
the Glory would, nowadays, be much too naff.
In July
2013, Nate left the New York Times for a role at ESPN and ABC news.
Perhaps he will provide us with a Bayesian commentary on the next
Superbowl game. I have a prior probability of 0.2179 that the Green
Bay Packers will win, but a sneaking suspicion that the Cowboys will
take it.
In the June 2013 issue of
Scientific American, Hans Christian Bauer debates whether
Quantum Bayesianism can fix the paradoxes of Quantum Mechanics. The
author writes:
A new version of quantum
theory sweeps away the bizarre paradoxes of the microscopic world.
The cost? Quantum
information only exists in your imagination.
That’s a great application
of the Bayesian paradigm, Mr. Bauer, or I at least imagine so.
The 2013 volume Bayesian Theory and Applications, edited by
Paul Damien, Petros Dellaportas, Nicholas Polson and David Stephens,
contains 33 exciting papers and includes outstanding sections on
dynamic models, and exchangeability.
Andrew Lawson’s 2013
book Bayesian Disease Mapping: hierarchical modeling in spatial
epidemiology highlighted some of the recent applications of the
2008 RueMartino computer package [71] for conditional Laplacian
methodology.
Various applications of
INLA to Bayesian nonphylodynamics are described by Julia Palacios
and Vladimir Minin in the Proceedings of the 28^{ }th.
Conference on Uncertainty in Artificial Intelligence.
In the 2013 issue of
Biometrics & Biostatistics, XiaoFeng Wang of the Cleveland
Learner Research Institute describes some applications of INLA to
Bayesian nonparametric regression and density estimation.
INLA has arrived and the
pendulum is beginning to swing! Maybe right back to the Halcyon days of the
mathematically Bayesian theoretical research era of the 1970s and
1980s, no less, when Bayesians needed to know all sorts of stuff.
In his 2013 YouTube video What are Bayesian Methods?
Professor Simon French said that after he took his first statistics
course in 1970 he thought that ANOVA was a Russian mathematician.
However, his first course in Bayesian Statistics changed his life,
and he subsequently learnt much more about the paradigm from Adrian
Smith.
Simon French is Director
of the Risk Initiative and Statistical Consultancy Unit at the
University of Warwick. His 1986 text Decision Theory was
followed by his book Statistical Decision Theory with David
Rios Insua. However, Simon’s work has now become generally more
applied. He is looking at ways of supporting reallife decision
makers facing major strategic and risk issues.
In their JASA 2013 articles, Roee Gutman, Christopher
Afendulis and Alan Zaslavsky proposed a Bayesian filelinking
procedure for analysing endoflife costs, and ManWai Ho, Wanzhu Tu,
Pulak Ghosh and Ram Tiwari performed a nested Dirichlet process
analysis of cluster randomized trial data for geriatric care
assessment;
Riten Mitra, Peter
Müller, Shoudan Liang, Lu Yue and Yuan Ji investigated a Bayesian
graphical model for chIPSeq data on histone modifications, Drew
Linzer recounted the dynamic Bayesian forecasting of U.S.
presidential elections, and Curtis Storlie and his five reliable
coauthors reported their Bayesian reliability analysis of
neutroninduced errors in high performance computing software;
Yueqing Wang, Xin
Jiang, Bin Yu and Ming Jiang threw aside their parasols and reported
their hierarchical approach for aerosol retrieval using MISR data,
Josue Martinez, Kirsten Bohn, Ray Carroll and Jeffrey Morris put an
end to their idle chatter and described their study of Mexican
freetailed bat chirp principles, which employed Bayesian functional
mixed models for nonstationary acoustic time series, while Juhee
Lee, Peter Müller and Yuan Ji chummed up and described their
nonparametric Bayesian model for local clustering with application
to proteomics;
Jonathan Rougier,
Michael Goldstein and Leanna House exchanged vows and reported their
secondorder exchangeability analysis for multimodel ensembles, and
Francesco Stingo, Michele Guindani, Marina Vannucci and Vince
Calhoun presented the best possible image while describing their
integrative modeling approach to imaging genetics. 


In March 2013, the
entire statistical world mourned the death of George Edward Pelham
Box F.R.S. (19292013) in MadisonWisconsin in March 2013 at the
ripe old age of ninetyfour, in the arms of his third wife Claire.
As a wit, a kind man, and a statistician, he and his ‘Mr.
Regression’ buddy Norman Draper have fostered many successful
careers in American industry and academia. Pel was a soninlaw of
Sir Ronald Fisher and the father, with his second wife Joan Fisher
Box, of two of Fisher’s grandchildren. Pel’s life, works and
immortality should be celebrated by all Bayesians, because of the
multitudinous ways in which his virtues have enhanced the diversity,
richness and scientific reputation of our paradigm.
Born in Gravesend, Box
was the coinventor of the BoxCox and BoxMüller transformations
and the BoxPierce and LjungBox tests, a onetime army sergeant who
worked on the Second World War defences to Nazi poisonous gases, and
an erstwhile scientist with Imperial Chemical Industries. His
muchpublicised first divorce transformed his career, and his older
son Simon Box also survives him.




Soren
Bisgaard making a presentation to George Box 

I envision Pel sitting
there watching us from the twelfth floor of the crimson cube of
Heaven, kicking his dog while the Archangels Stephen One and Stephen
Two flutter to his bidding, and with a finetuned secromobile
connection to the, as ever obliging, Chairman of Statistics in
hometown Madison. Even in his retirement and maybe in death, George
Box was the quintessential ripple from above
On 27^{th} August 2013, Alan Gelfand of Duke University
received a Distinguished Achievement Medal from the ASA Section on
Statistics and the Environment at the Joint Statistical Meetings in
Montreal. The award recognizes his seminal work and leadership in
Bayesian spatial statistics, in particular hierarchical modeling
with applications in the natural and environmental sciences. Well
done, Alan! You’re not just an MCMC whiz kid.
I once nodded off during
one of Alan’s seminars and missed out on the beef. When I woke up, I
asked an utterly inane question. Ah well. You win some and you lose
some.
The other
highlights of the year 2013 included the article in Applied
Statistics by David Lunn, Jessica Barrett, Michael Sweeting and
Simon Thompson of the MRC Biostatistics Unit in Cambridge on fully
Bayesian hierarchical modelling with applications to metaanalysis.
The authors applied their multistage generalised linear models to
data sets concerning preeclampsia and abdominal aortic aneurism.
Also, a paper in
Statistics in Society by Susan Paddock and Terrance Savitsky of
the RAND Corporation, Santa Monica on the Bayesian hierarchical
semiparametric modelling of longitudinal posttreatment outcomes
from open enrolment therapy groups. The authors apply their
methodology to a casestudy which compares the posttreatment
depressive symptom score for patients on a building discovery
program with the scores for usual care clients.
After reading the
large number of Bayesian papers published in the March 2013 issue of
JASA, in particular in the Applications and Case Studies
section, Steve Scott declared, on Google,
It clearly goes to show
that Bayes has gone a long way from the intellectually appealing but
too hard to implement approach to the approach that many
practitioners now feel to be both natural and easy to use.
A RESPONSE
TO SOME OF THE ISSUES RAISED IN DENNIS LINDLEY'S
YOUTUBE INTERVIEW:


During the Fall of 2013, I participated in an online ISBA debate
concerning some of issues raised in Dennis Lindley’s 2013 YouTube
interview by Tony O’Hagan, who phrased some of his rather probing
questions quite craftily. I was somewhat concerned when Dennis, now
in his 90^{ }th year, confirmed his diehard opposition to
improper priors, criticised his own student Jose Bernardo, and
confirmed that he’d encouraged Florence David to leave UCL in 1967.
He, moreover, advised Tony that he’d wanted two colleagues at UCL to
retire early because they weren’t ‘sufficiently Bayesian’. This led
me to wonder how many other statistical (e.g. applied Bayesian)
careers Dennis had negatively influenced due to his individualistic
opinions about his subject. And whatever did happen to Dennis’s
fabled velvet glove?
I was also concerned
about Dennis’s apparent longsurviving naivety in relation to the
Axioms of Coherence and Expected Utility. He was still maintaining
that only very simple sets of axioms are needed to imply their
required objective, i.e. that you should be ‘coherent’ and behave
like a Bayesian, and I found it difficult to decide whether he was
‘pulling the wool’, or unfortunately misguided, or, for some bizarre
reason, perfectly correct. I was also more than a little concerned
about Dennis’s rather bland adherence to the Likelihood Principle
(LP) despite all the ongoing controversy regarding Birnbaum’s 1962
justification.
I wonder what the
unassuming Uruguayan mathematician Cesareo Villegas, who refuted the
De Finetti axiom system as long ago as 1964, would have thought
about all of this. He may well have been too tactful to say
anything.
This is what I think,
albeit rather cryptically and with apologies to George Orwell and
John Steinbeck:
‘How many fingers?’ asked Emperor Dennis O’Brien, stroking his
schlosshund Bruno’s furry chest.
‘A billion billion,
Your Imperial Majesty, just like the neoSavageous axiom system’,
answered bumptious young Winston, turning green with awe.
The Emperor’s
forearms seemed to wither like a frog’s, as if he, like the sadly
exiled Tom Winston, was way East of Eden. In Kate’s place in
Monterey, perhaps.
‘There are only
three, you blithering nincompoop’, raged O’Brien. ‘You don’t even
understand simple arithmetic. You’re incoherent!’
‘I’m sorry, sorry,
Your Munificence,’ wailed Winston, as the Royal Acolytes clapped him
in red hot irons. 

