# Statistical modeling: The two cultures

@article{Breiman2001StatisticalMT, title={Statistical modeling: The two cultures}, author={Leo Breiman}, journal={Quality Engineering}, year={2001}, volume={48}, pages={81-82} }

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated bya given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical communityhas been committed to the almost exclusive use of data models. This commit- ment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current prob- lems… Expand

#### 1,556 Citations

Statistical Inference After Model Selection

- Computer Science
- 2010

This paper examines a variety of model selection procedures routinely undertaken followed by statistical tests and confidence intervals computed for a “final” model in criminology and shows how they are typically misguided. Expand

Big Data is not only about data: The two cultures of modelling

- Computer Science
- 2017

A brief discussion of model-based recursive partitioning which can bridge the theory and data-driven approach to statistical modelling and is an example of how this new approach can help revise models that work for the full dataset. Expand

Discussion Paper

- 2014

The views expressed in this paper are those of the author(s) and do not necessarily reflect the policies of Statistics Netherlands Data sources referred to as Big data become available for use by… Expand

6-2010 Statistical Inference After Model Selection

- 2017

Conventional statistical inference requires that a model of how the data were generated be known before the data are analyzed. Yet in criminology, and in the social sciences more broadly, a variety… Expand

Comment on "Statistical Modeling: The Two Cultures" by Leo Breiman

- Mathematics
- 2021

Motivated by Breiman’s rousing 2001 paper on the “two cultures” in statistics, we consider the role that different modeling approaches play in causal inference. We discuss the relationship between… Expand

Distributional Trees and Forests

- 2017

Obtaining valuable information from given data requires the use of appropriate methods of analysis. For example, if a certain variable of interest is assumed to depend on a (set of) covariate(s),… Expand

A problem-solving approach to data analysis for economics

- Sociology
- 2018

Data analysis for formal methods is constrained due to the lengthy dominance of the econometric view within economics. Best practice in statistics suggests a shift in emphasis from making statements… Expand

Big data and its epistemology

- Computer Science
- J. Assoc. Inf. Sci. Technol.
- 2015

Whether Big Data, in the form of data‐driven science, will enable the discovery, or appraisal, of universal scientific theories, instrumentalist tools, or inductive inferences is considered. Expand

It takes two to tango: Statistical modeling and machine learning

- Computer Science
- 2021

A scenario is created where it shows that when the learning from using a statistical method and applying it to machine learning, the ultimate benefit can be greater than the sum of each method’s benefits. Expand

The Causal Nature of Modeling with Big Data

- Computer Science
- 2016

It is shown to lack a pronounced hierarchical, nested structure and the significance of the transition to such “horizontal” modeling is underlined by the concurrent emergence of novel inductive methodology in statistics such as non-parametric statistics. Expand

#### References

SHOWING 1-10 OF 68 REFERENCES

Statistical models and shoe leather

- Mathematics
- 1989

A bstract . Regression models have been used in the social sciences at least since 1899, when Yule published a paper on the causes of pauperism. Regression models are now used to make causal… Expand

Computer-Intensive Methods in Statistics

- Physics
- 1983

In the past few years there has been a surge in the development of new statistical theories and methods that take advantage of the high speed digital computer. The payoff for such intensive… Expand

From Association to Causation via Regression

- Mathematics
- 1997

For nearly a century, investigators in the social sciences have used regression models to deduce cause-and-effect relationships from patterns of association. Path models and automated search… Expand

Discussion of David Freedman’s “Some Issues in the Foundations of Statistics”

- Sociology
- 1995

While results from statistical modelling too often receive blind acceptance, we question whether there is any real alternative to use of modelling. This does not diminish the main point of Professor… Expand

The problem of regions

- Mathematics
- 1998

In the problem of regions, we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive… Expand

Computer Intensive Methods in Statistics

- Computer Science
- 1994

Four topics that have been treated in more detail were: Bayesian Computing; Interfacing Statistics and Computers; Image Analysis; Resampling Methods. Expand

Nonparametric Statistical Data Modeling

- Mathematics
- 1979

Abstract This article attempts to describe an approach to statistical data analysis which is simultaneously parametric and nonparametric. Given a random sample X 1, …, X n of a random variable X, one… Expand

The 1991 Census Adjustment: Undercount or Bad Data?

- Computer Science
- 1994

Careful scrutiny of these studies together with auxiliary sources of information provided by the Census Bureau are used to examine the issue of whether the data gathered in the Post Enumeration Survey can provide reliable undercount estimates. Expand

Graphical Methods for Assessing Logistic Regression Models

- Mathematics
- 1984

Abstract In ordinary linear regression, graphical diagnostic displays can be very useful for detecting and examining anomalous features in the fit of a model to data. For logistic regression models,… Expand

Scientific Method, Statistical Method and the Speed of Light

- Computer Science
- 2000

A history on the speed of light up to the time of Michelson's study is presented and the details of a single study allow to place the method of statistics within the larger context of science. Expand