The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. 'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.
Empirical Bayes Methods for Estimation, Testing, and Prediction
Author: Bradley Efron
Publisher: Cambridge University Press
We live in a new age for statistical inference, where modern scientific technology such as microarrays and fMRI machines routinely produce thousands and sometimes millions of parallel data sets, each with its own estimation or testing problem. Doing thousands of problems at once is more than repeated application of classical methods. Taking an empirical Bayes approach, Bradley Efron, inventor of the bootstrap, shows how information accrues across problems in a way that combines Bayesian and frequentist ideas. Estimation, testing and prediction blend in this framework, producing opportunities for new methodologies of increased power. New difficulties also arise, easily leading to flawed inferences. This book takes a careful look at both the promise and pitfalls of large-scale statistical inference, with particular attention to false discovery rates, the most successful of the new statistical techniques. Emphasis is on the inferential ideas underlying technical developments, illustrated using a large number of real examples.
This book gives a broad and up-to-date coverage of bootstrap methods, with numerous applied examples, developed in a coherent way with the necessary theoretical basis. Applications include stratified data; finite populations; censored and missing data; linear, nonlinear, and smooth regression models; classification; time series and spatial problems. Special features of the book include: extensive discussion of significance tests and confidence intervals; material on various diagnostic methods; and methods for efficient computation, including improved Monte Carlo simulation. Each chapter includes both practical and theoretical exercises. Included with the book is a disk of purpose-written S-Plus programs for implementing the methods described in the text. Computer algorithms are clearly described, and computer code is included on a 3-inch, 1.4M disk for use with IBM computers and compatible machines. Users must have the S-Plus computer application. Author resource page: http://statwww.epfl.ch/davison/BMA/
Statistics is a subject of many uses and surprisingly few effective practitioners. The traditional road to statistical knowledge is blocked, for most, by a formidable wall of mathematics. The approach in An Introduction to the Bootstrap avoids that wall. It arms scientists and engineers, as well as statisticians, with the computational techniques they need to analyze and understand complicated data sets.
A New Approach to Sound Statistical Reasoning Inferential Models: Reasoning with Uncertainty introduces the authors’ recently developed approach to inference: the inferential model (IM) framework. This logical framework for exact probabilistic inference does not require the user to input prior information. The authors show how an IM produces meaningful prior-free probabilistic inference at a high level. The book covers the foundational motivations for this new IM approach, the basic theory behind its calibration properties, a number of important applications, and new directions for research. It discusses alternative, meaningful probabilistic interpretations of some common inferential summaries, such as p-values. It also constructs posterior probabilistic inferential summaries without a prior and Bayes’ formula and offers insight on the interesting and challenging problems of conditional and marginal inference. This book delves into statistical inference at a foundational level, addressing what the goals of statistical inference should be. It explores a new way of thinking compared to existing schools of thought on statistical inference and encourages you to think carefully about the correct approach to scientific inference.
Statistics is a subject with a vast field of application, involving problems which vary widely in their character and complexity.However, in tackling these, we use a relatively small core of central ideas and methods. This book attempts to concentrateattention on these ideas: they are placed in a general settingand illustrated by relatively simple examples, avoidingwherever possible the extraneous difficulties of complicatedmathematical manipulation.In order to compress the central body of ideas into a smallvolume, it is necessary to assume a fair degree of mathematicalsophistication on the part of the reader, and the book is intendedfor students of mathematics who are already accustomed tothinking in rather general terms about spaces and functions
This book is for students and researchers who have had a first year graduate level mathematical statistics course. It covers classical likelihood, Bayesian, and permutation inference; an introduction to basic asymptotic distribution theory; and modern topics like M-estimation, the jackknife, and the bootstrap. R code is woven throughout the text, and there are a large number of examples and problems. An important goal has been to make the topics accessible to a wide audience, with little overt reliance on measure theory. A typical semester course consists of Chapters 1-6 (likelihood-based estimation and testing, Bayesian inference, basic asymptotic results) plus selections from M-estimation and related testing and resampling methodology. Dennis Boos and Len Stefanski are professors in the Department of Statistics at North Carolina State. Their research has been eclectic, often with a robustness angle, although Stefanski is also known for research concentrated on measurement error, including a co-authored book on non-linear measurement error models. In recent years the authors have jointly worked on variable selection methods.
Statistical Models in S extends the S language to fit and analyze a variety of statistical models, including analysis of variance, generalized linear models, additive models, local regression, and tree-based models. The contributions of the ten authors-most of whom work in the statistics research department at AT&T Bell Laboratories-represent results of research in both the computational and statistical aspects of modeling data.
Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of l1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.
A new and refreshingly different approach to presenting the foundations of statistical algorithms, Foundations of Statistical Algorithms: With References to R Packages reviews the historical development of basic algorithms to illuminate the evolution of today’s more powerful statistical algorithms. It emphasizes recurring themes in all statistical algorithms, including computation, assessment and verification, iteration, intuition, randomness, repetition and parallelization, and scalability. Unique in scope, the book reviews the upcoming challenge of scaling many of the established techniques to very large data sets and delves into systematic verification by demonstrating how to derive general classes of worst case inputs and emphasizing the importance of testing over a large number of different inputs. Broadly accessible, the book offers examples, exercises, and selected solutions in each chapter as well as access to a supplementary website. After working through the material covered in the book, readers should not only understand current algorithms but also gain a deeper understanding of how algorithms are constructed, how to evaluate new algorithms, which recurring principles are used to tackle some of the tough problems statistical programmers face, and how to take an idea for a new method and turn it into something practically useful.
Sensitivity analysis should be considered a pre-requisite for statistical model building in any scientific discipline where modelling takes place. For a non-expert, choosing the method of analysis for their model is complex, and depends on a number of factors. This book guides the non-expert through their problem in order to enable them to choose and apply the most appropriate method. It offers a review of the state-of-the-art in sensitivity analysis, and is suitable for a wide range of practitioners. It is focussed on the use of SIMLAB – a widely distributed freely-available sensitivity analysis software package developed by the authors – for solving problems in sensitivity analysis of statistical models. Other key features: Provides an accessible overview of the current most widely used methods for sensitivity analysis. Opens with a detailed worked example to explain the motivation behind the book. Includes a range of examples to help illustrate the concepts discussed. Focuses on implementation of the methods in the software SIMLAB - a freely-available sensitivity analysis software package developed by the authors. Contains a large number of references to sources for further reading. Authored by the leading authorities on sensitivity analysis.
Nature didn’t design human beings to be statisticians, and in fact our minds are more naturally attuned to spotting the saber-toothed tiger than seeing the jungle he springs from. Yet scienti?c discovery in practice is often more jungle than tiger. Those of us who devote our scienti?c lives to the deep and satisfying subject of statistical inference usually do so in the face of a certain under-appreciation from the public, and also (though less so these days) from the wider scienti?c world. With this in mind, it feels very nice to be over-appreciated for a while, even at the expense of weathering a 70th birthday. (Are we certain that some terrible chronological error hasn’t been made?) Carl Morris and Rob Tibshirani, the two colleagues I’ve worked most closely with, both ?t my ideal pro?le of the statistician as a mathematical scientist working seamlessly across wide areas of theory and application. They seem to have chosen the papers here in the same catholic spirit, and then cajoled an all-star cast of statistical savants to comment on them.
This book covers modern statistical inference based on likelihood with applications in medicine, epidemiology and biology. Two introductory chapters discuss the importance of statistical models in applied quantitative research and the central role of the likelihood function. The rest of the book is divided into three parts. The first describes likelihood-based inference from a frequentist viewpoint. Properties of the maximum likelihood estimate, the score function, the likelihood ratio and the Wald statistic are discussed in detail. In the second part, likelihood is combined with prior information to perform Bayesian inference. Topics include Bayesian updating, conjugate and reference priors, Bayesian point and interval estimates, Bayesian asymptotics and empirical Bayes methods. Modern numerical techniques for Bayesian inference are described in a separate chapter. Finally two more advanced topics, model choice and prediction, are discussed both from a frequentist and a Bayesian perspective. A comprehensive appendix covers the necessary prerequisites in probability theory, matrix algebra, mathematical calculus, and numerical analysis.