My friend Irving Jack Good (with whom I corresponded about
Alan Turing and their multinomial shrinkage estimators for
cryptanalysis at Bletchley Park, which were instrumental in solving
the Nazi codes) did not believe that my more general Logit/ First
Stage Multivariate Normal Prior / hierarchical approach to the
analysis of categorical data was sufficiently recognised by other
Bayesians. Nevertheless, Alan Agresti and several other authors seem
to think that it was a pioneering contribution, along with my
external examiner’s (Patricia Altham’s) novel analysis of measures
of association for 2x2 contingency tables. Indeed, many others have
followed in my footsteps.
[See items (4) and (5) of CDC section for discussions of Jim
Albert’s later contributions, and of my 1978 Imperial College
shortcourse lecture notes]
In
my Biometrika 1972 paper, I used my method to calculate
shrinkage estimates for the pass rates at several different
colleges. Philip. J. Smith of the Pacific Halibut Commission applied
my approach to estimate the proportions of halibut in several
different catches, and implemented various generalisations of my
methodology.
[My
logit/normal firststageprior approach to the analysis of
categorical data was reviewed by Leonard and Hsu (1994) in
Aspects of Uncertainty: A Tribute to D.V. Lindley (edited
by Peter Freeman and Adrian Smith). We also reported an Empirical
Bayes analysis I’d developed in about 1977 and relating to the
simultaneous estimation of the parameters of several multinomial
distributions, via a multivariate normal prior for the different
sets of logits.
When we applied the methodology to Olevel data for 40
London high schools, we discovered that the posterior estimates of
the grade rates smoothed the raw proportions in a highly complex
fashion.
Leonard and Novick (Journal of Educational Statistics,
1986) describe a further educational testing study in relation to
another contingency table, that summarizes their Marine Corps data.
Leonard and Hsu (Annals of Statistics, 1992) report an
analysis of a portion of the Project Talent American High School
Data, where the observations are raw scores.]
I
shared my office at University College with Adrian F.M. Smith, a
wonderfully inspirational and charismatic Adonis of a man who was to
move on to a highly accomplished career[Don at Keble College Oxford,
translator of the prestigious works of Bruno De Finetti, many
successful Ph.D. students including Michael Goldstein and David
Spiegelhalter, Principal of Queen Mary College London, F.R.S., and
a couple of top national leadership positions. Knighted in 2010],
and with Daruish HaghighiTalab, a Persian gentleman with an immense
black beard, who studied road systems and was to become a Deputy
Director of Official Statistics in Iran.
The
skylight in our office was often left open. This was to become a
bone of contention in 1977 after Peter Freeman had moved in, and
when an overdiligent university administrator insisted that the
skylight should be kept closed. During the kafuffle that ensued,
Dennis Lindley retired at age 54, albeit with a generously increased
pension. He toured the world with his wife Joan into his sixties and
a generation of postgraduates missed out on his inspirational
guidance.
Six
other students studied for the Masters degree at University College
at the same time as me, including my Glaswegian friends Ben Torsney
and Jim McNicol, and a very nice man from Malaysia. We all took a
core measuretheoretic course on Bayesian Inference from Phil Dawid,
a brilliant junior lecturer, also out of Imperial College, who was
just a year older than me and more recently became a professor at
Cambridge.
I was particularly impressed by Phil’s description of Alan
Birnbaum’s Likelihood Principle, its easy justification via the
Sufficiency and Conditionality Principles, and the way it sorted the
sheep and the goats in statistical methodology. Despite objections
by George Barnard and others, I still find the proof of Birnbaum’s
1962 theorem to be extremely convincing and not at
all tautologous.
[The NeymanFisher factorization theorem is the key to the whole
issue. It is this theorem that introduces the key concept of
likelihood into statistical inference, based upon purely frequency
considerations, and Birnbaum applies it to an ingeniously
constructed mixed experiment to extend its influence to two simple
experiments that investigate the same unknown parameter. Birnbaum
has been described as one of the most profound thinkers in
Statistics ever, and he was a buddy of Adrian Smith. He
was however
highly introspective, and took his own life in London in 1976. I
have always empathised with him, particularly because of the way he
was mistreated by other leading psychometricians during the 1960s.
He was seriously antiauthoritarian and there
are some parallels
between our life stories]
Phil fully generalised Ericson’s method for Linear Bayes estimation,
and our homework was therefore light years ahead of the literature.
(He later expressed his irritation at the alternative procedure I
used to quickly derive the estimates during the final exam, though
that was in a special case.).
Phil’s parameterization, using degrees of freedom and prior
sample sizes, of the conjugate analysis for the linear model with
unknown variance was also superbly simple. This parameterization
does not appear to have been published until 1986 when J.J. Shiau
(one of Grace Wahba’s Ph.D. students) successfully applied it to
partial spline models after I’d included it on my Statistics 775
course in Madison.
The entire theory of linear splines can of course be
regarded as a special case of the very straightforward Gaussian
prior Bayesian paradigm, and I’ve never quite understood what all
the fuss was about and why we need to explicitly refer to them at
all, though it is important not to overparameterize when sensibly
modelling the prior mean value function and covariance kernel.
[an
exponential quadratic prior kernel often works better than an
autoregressive kernel since the posterior mean value function can
then be infinitely differentiable. See Hsu and Leonard,
Biometrika, 1997 where we used a semiparametric multiple
regression
and residual analysis to investigate a binary data set
that correlated the mortality of mice with time of exposure to NO2
and degree of exposure. It confirmed John’s tenure at UCSB. Some of
the statistical ideas originated from one of my 1982 MRC technical
reports, which described my Bayesian approach to semiparametric
logistic regression. See also Raynor, O’Sullivan and Yandell (JASA,
1985)].
I
thought that the correspondence between Bayes estimates and
smoothing splines was established as early as 1970 by
Kimeldorf and
Wahba, in a paper in Ann. Math. Statist. that was
cited by 468 other authors. One of Grace’s students was much later
quite irritating in the way he mimicked my Bayesian density
smoothing techniques with nonlinear smoothing splines, though he
later referenced me quite generously.
Many published spline techniques employ a crossvalidation
technique to empirically estimate a smoothing parameter called
lambda. However such techniques usually either mimic or recursively
modify Mervyn Stone’s pioneering crossvalidation method published
in JRSSB (1974, with Discussion) and JRSSB
(1977).
All
the 197071 Masters students at University College were expected to
learn advanced probability theory, including convolution semigroups
and domains of attraction, from Feller Volume 2, but Dieter Girmes
would come in for weeks on end, wave the book
at us, and tell us
about his latest statistical consultancy. We therefore had to
assimilate Feller largely on our own.
Dennis Lindley taught me educational testing in the
Princeton tradition, Markov decision processes (with stationarity
theorems that were later republished out of a Department of Decision
Theory in Manchester!) from the book by Sheldon Ross, and Masanao
Aoki’s stochastic control theory in all its glory. Mervyn Stone
taught an option on Art Dempsterstyle multivariate analysis, with
ellipsoids looking like spaceships and which made me feel like a
space cadet. While we didn’t learn any real statistics or
Berkeleystyle asymptotics, this was a Masters degree to be reckoned
with. (Derek Teather and I were awarded distinctions. I only say
this to emphasise that I was extremely able at that stage in my
life)
[During a trip to visit the Rev. Thomas Bayes’ grave in Bunhill
Cemetery, Moorgate, the caretaker advised Mervyn Stone that
Bayes was
responsible for getting rockets to the moon. This was doubtlessly
because of Aoki’s applications of Bayes theorem to stochastic
control theory.]
