# Design of experiments

## Information about Design of experiments

Design of experiments includes the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. (The latter situation is usually called an observational study.) Often the experimenter is interested in the effect of some process or intervention (the 'treatment') on some objects (the 'experimental units'), which may be people. Design of experiments is thus a discipline that has very broad application across all the natural and social sciences.

## Early examples of experimental design

In 1747, while serving as surgeon on HM Bark Salisbury, James Lind, the ship's surgeon, carried out a controlled experiment to discover a cure for scurvy.

Lind selected 12 men from the ship, all suffering from scurvy, and divided them into six pairs, giving each group different additions to their basic diet for a period of two weeks. The treatments were all remedies that had been proposed at one time or another. They were
• A quart of cider per day
• Twenty five gutts of exilir vitriol three times a day upon an empty stomach,
• Half a pint of seawater every day
• A mixture of garlic, mustard and horseradish, in a lump the size of a nutmeg
• Two spoonfuls of vinegar three times a day
• Two oranges and one lemon every day.
The men who had been given citrus fruits recovered dramatically within a week. One of them returned to duty after 6 days and the other became nurse to the rest. The others experienced some improvement, but nothing was comparable to the citrus fruits, which were proved to be substantially superior to the other treatments.

In this study his subjects' cases "were as similar as I could have them", that is he provided strict entry requirements to reduce extraneous variation. The men were paired, which provided replication. From a modern perspective, the main thing that is missing is randomized allocation of subjects to treatments.

## A formal mathematical theory

The first statistician to consider a formal mathematical methodology for the design of experiments was Sir Ronald A. Fisher. As an example, he described how to test the hypothesis that a certain lady could distinguish by flavor alone whether the milk or the tea was first placed in the cup. While this sounds like a frivolous application, it allowed him to illustrate the most important means of experimental design:

1. Comparison

In many fields of study it is hard to reproduce measured results exactly. Comparisons between treatments are much more reproducible and are usually preferable. Often one compares against a standard or traditional treatment that acts as baseline.

2. Randomization

There is an extensive body of mathematical theory that explores the consequences of making the allocation of units to treatments by means of some random mechanism such as tables of random numbers, or the use of randomization devices such as playing cards or dice. Provided the sample size is adequate, the risks associated with random allocation (such as failing to obtain a representative sample in a survey, or having a serious imbalance in a key characteristic between a treatment group and a control group) are calculable and hence can be managed down to an acceptable level. Random does not mean haphazard, and great care must be taken that appropriate random methods are used.

3. Replication

Where measurement is made of a phenomenon that is subject to variation it is important to carry out repeat measurements, so that the variability associated with the phenomenon can be estimated.

4. Blocking

Blocking is the arrangement of experimental units into groups (blocks) that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study.

5. Orthogonality

Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are T treatments and T - 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts.

6. Use of factorial experiments instead of the one-factor-at-a-time method. These are efficient at evaluating the effects and possible interactions of several factors (independent variables).

Analysis of the design of experiments was built on the foundation of the analysis of variance, a collection of models in which the observed variance is partitioned into components due to different factors which are estimated and/or tested.

Some efficient designs for estimating several main effects simultaneously were found by Raj Chandra Bose and K. Kishen in 1940 at the Indian Statistical Institute, but remained little known until the Plackett-Burman designs were published in Biometrika in 1946.

In 1950, Gertrude Mary Cox and William Cochran published the book Experimental Designs which became the major reference work on the design of experiments for statisticians for years afterwards.

Developments of the theory of linear models have encompassed and surpassed the cases that concerned early writers. Today, the theory rests on advanced topics in abstract algebra and combinatorics.

As with all other branches of statistics, there is both classical and Bayesian experimental design.

## Example

This example is attributed to Harold Hotelling in [1]. Although very simple, it conveys at least some of the flavor of the subject.

The weights of eight objects are to be measured using a pan balance that measures the difference between the weight of the objects in the two pans. Each measurement has a random error. The average error is zero; the standard deviations of the probability distribution of the errors is the same number σ on different weighings; and errors on different weighings are independent. Denote the true weights by

We consider two different experiments:
1. Weigh each object in one pan, with the other pan empty. Call the measured weight of the ith object Xi for i = 1, ..., 8.
2. Do the eight weighings according to the following schedule and let Yi be the measured difference for i = 1, ..., 8:

:

Then the estimated value of the weight θ1 is

:

The question of design of experiments is: which experiment is better?

The variance of the estimate X1 of θ1 is σ2 if we use the first experiment. But if we use the second experiment, the variance of the estimate given above is σ2/8. Thus the second experiment gives us 8 times as much precision.

Many problems of the design of experiments involve combinatorial designs, as in this example.

## References

1. ^ Herman Chernoff, Sequential Analysis and Optimal Design, SIAM Monograph, 1972.
• Box,G. E, Hunter,W.G., Hunter, J.S., Hunter,W.G., "Statistics for Experimenters: Design, Innovation, and Discovery", 2nd Edition, Wiley, 2005, ISBN: 0471718130
• Pearl, J. Causality: Models, Reasoning and Inference, Cambridge University Press, 2000.

### Design of military experiments

In statistics, the goal of an observational study is to draw inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator.
James Lind (1716 in Edinburgh – 1794 in Gosport) was the pioneer of naval hygiene in the Royal Navy. By conducting what was perhaps the first ever clinical trial, he proved that citrus fruits cure scurvy.
Scurvy
Classification & external resources

Scorbutic gums, a symptom of scurvy
ICD-10 E 54.
ICD-9 267

OMIM 240400
DiseasesDB 13930
MedlinePlus 000355
MeSH D012614 Scurvy
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities.
Ronald Fisher

Sir Ronald Aylmer Fisher
Born 17 January 1890
East Finchley, London , England
A hypothesis (from Greek ὑπόθεσις) consists either of a suggested explanation for a phenomenon or of a reasoned proposal suggesting a possible correlation between multiple phenomena.
Randomization is the process of making something random; this can mean:
• Generating a random permutation of a sequence (such as when shuffling cards).
• Selecting a random sample of a population (important in statistical sampling).

In statistics, replication is the repetition of the creation of a phenomenon so that the variability associated with the phenomenon can be estimated. Replications and repeated measurements are dealt with differently in statistical experimental design and analysis.
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another. For example, an experiment is designed to test a new drug on patients.
In mathematics, orthogonal, as a simple adjective, not part of a longer phrase, is a generalization of perpendicular. It means at right angles, from the Greek ὀρθός orthos
In statistics, a factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors.
In the scientific method, an experiment (Latin: ex- periri, "of (or from) trying") is a set of observations performed in the context of solving a particular problem or question, to support or falsify a hypothesis or research concerning phenomena.
In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different explanatory variables.
Raj Chandra Bose (June 19, 1901 - October 31, 1987) Indian mathematician and statistician best known for his work in design theory and the theory of error-correcting codes in which the class of BCH codes is partly named after him.
Indian Statistical Institute (ISI) engages in the research, teaching, and application of statistics to the natural sciences and social sciences. Founded by Professor P.C. Mahalanobis [2] in Kolkata in 1931, while statistics was a relatively new scientific field, the
Plackett-Burman designs are experimental designs presented in 1946 by Robin L. Plackett and J. P. Burman while working in the British Ministry of Supply. [1] Their goal was to find experimental designs for investigating the dependence of some measured quantity on a
Biometrika is a scientific journal principally covering theoretical statistics.

## History

Biometrika was established in 1901 by Francis Galton, Karl Pearson and W. F. R.
Gertrude Mary Cox (January 13, 1900 – October 17 1978) was an influential American statistician and founder of the department of Experimental Statistics at North Carolina State University.
William Cochran could refer to:
• William Thad Cochran, American politician
• William Gemmell Cochran British-American statistician.
• William Cochran (30 July 1922 – 28 August 2003), Fellow of the Royal Society from 15/03/1962

In statistics the linear model is given by

where Y is an n×1 column vector of random variables, X is an n×p matrix of "known" (i.e.
Abstract algebra is the subject area of mathematics that studies algebraic structures, such as groups, rings, fields, modules, vector spaces, and algebras. Most authors nowadays simply write algebra instead of abstract algebra.
Combinatorics is a branch of pure mathematics concerning the study of discrete (and usually finite) objects. It is related to many other areas of mathematics, such as algebra, probability theory, ergodic theory and geometry, as well as to applied subjects such as computer science
Bayesian experimental design differs from the classical approach in that the purpose of the experiment is explicitly represented in the form of a loss function. Different loss functions imply different ways to optimise the design.
Harold Hotelling (Fulda, Minnesota, september 29, 1895 - december 26, 1973) was a mathematical statistician, and very influential economic theorist. His name is known to all statisticians because of Hotelling's T-square distribution and its use in statistical hypothesis testing and
A weighing scale (usually just "scale" in common usage) is a device for measuring the weight of an object. These scales are often used to measure the weight of a person, and are also used in science to obtain the mass of an object, and in many industrial and commercial applications
In statistics and optimization, the concepts of error and residual are easily confused with each other.

Error is a misnomer; an error is the amount by which an observation differs from its expected value; the latter being based on the whole