# Theil index

The Theil index[1], derived by econometrician Henri Theil, is a statistic used to measure economic inequality.

## Mathematics

The formula is

where is the income of the th person, is the mean income, and is the number of people. The first term inside the sum can be considered the individual's share of aggregate income, and the second term is that person's income relative to the mean. If everyone has the same (i.e., mean) income, then the index is 0. If one person has all the income, then the index is ln N.

The Theil index is derived from Shannon's measure of information entropy. Letting T be the Theil index and S be Shannon's information entropy measure,

Shannon derived his entropy measure in terms of the probability of an event occurring. This can be interpreted in the Theil index as the probability a dollar drawn at random from the population came from a specific individual. This is the same as the first term, the individual's share of aggregate income.

With reference to information theory[2], Theil's measure is a redundancy rather than an entropy. The redundancy of a system at a given time is the difference between its maximum entropy and its present entropy at that time.[3]

## Decomposability

One of the advantages of the Theil index is that it is a weighted average of inequality within subgroups, plus inequality among those subgroups. For example, inequality within the United States is the average inequality within each state, weighted by state income, plus the inequality among states.

If the population is divided into certain subgroups and is the income share of group , is the Theil index for that subgroup, and is the average income in group , then the Theil index is

Another, more popular, measure of inequality is the Gini coefficient. The Gini coefficient is more intuitive to many people since it is based on the Lorenz curve. However, it is not easily decomposable like the Theil.

## Application of the Theil index

Theil's index takes an equal distribution for reference which is similar to distributions in statistical physics. An index for an actual system is an actual redundancy, that is, the difference between maximum entropy and actual entropy of that system.

Theil's measure can be converted<ref name="Formulas" /> into one of the indexes of Anthony Barnes Atkinson. The result of the conversion also is called normalized Theil index[4]. James E. Foster[5] used such a measure to replace the Gini coefficient in Amartya Sen's welfare function W=f(income,inequality). The income e.g. is the average income for individuals in a group of income earners. Thus, Foster's welfare function can be computed directly from the Theil index T, if the conversion is included into the computation of the average per capita welfare function:

Map of economic inequality in the United States using the Theil Index. A high positive theil index indicates more income than population while a negative value shows more population than income. A value of zero shows equality between population and income.

Note: This image is not the Theil Index in each area of the United States, but of contributions to the US Theil Index by each area (the Theil Index is always positive, individual contributions to the Theil Index may be negative or positive).

## Theil index and Hoover index

The formula for the Hoover index (also called Robin Hood index) is:

A comparison of the Hoover index and the Theil index gives sense to of both indices:
• For the Hoover index, the relative deviations in each quantile are summed up. Each deviation is weighted by its own sign (+1 or −1). Thus, the Hoover index is the most simple inequality measure. It has no normative foundations and does not refer to any models from physics or information theory.
• For the symmetrized Theil index, the relative deviations in each quantile are summed up as well. But each deviation is weighted by its relative information weight. Thus, the Theil index is an indicator not only for the plain relative inequality, it also attempts to indicate how much attention inequality can get.
The following formulas illustrate that difference in the categories symmetry and percevability. For the formulas, a notation[7] is used, where the amount of quantiles only appears as upper border of summations. Thus, inequities can be computed for quantiles with different widths . For example, could be the income in the quantile #i and could be the amount (absolute or relative) of earners in the quantile #i. then would be the sum of incomes of all quantiles and would be the sum of the income earners in all quantiles.

Computation of the (asymmetric) Theil index T [8]:

With normalized data, and would apply. This would simplify the formula:

Computation of the symmetrized Theil index :

The difference between the Hoover index and the symmetrized Theil index only is the operation in the deviation from equity .

## Pareto principle

### Understanding the range of the Theil index

The property of not being a measure with a closed scale between 0 and 1 (or 0% and 100%), like in case of the Gini index, is a barrier, which to overcome seems to be difficult even for famous scientists: Theil's index "is not a measure that is exactly overflowing with intuitive sense," wrote Amartya Sen in a book[5], in which his co-author James Foster used the Theil index nevertheless. One way to overcome this obstacle is the normalized[4] Theil index .

The alternative is, not to normalize the index and to use it as it is due to an interesting property of that index: For resource distributions described by only two quantiles, the Theil index is 0 for 50:50 distributions and reaches 1 at 82:18[9], which is very close to a distribution often referred to as "Pareto Principle". Higher inequities yield Theil indices above 1. This leads to a comparison, which yields to intuition:
• The Gini index is 0 if the distribution is completely equal. It is 1 at maximum inequality.
• The Theil index is 0 if the distribution is completely equal. It is 1 for an inequality, which is slightly above the equivalent to the frequently cited 80:20 distribution.

### Computing the Theil index from an A:B distribution

A Theil index can be found for any A:B distribution in societies, which are split into two quantiles. The height of the 1st quantile is the height of the 2nd quantile. The width of the 1st quantile is the width of the 2nd quantile. First the Gini index is calculated from the A:B distribution:
Then:
.
For these computations the range 0 to 1 has to be used for a and b instead of 0% to 100%.

## References

1. ^ Introduction to the Theil index from the University of Texas
2. ^ ISO/IEC DIS 2382-16:1996 Information theory
3. ^ [1] (Redundancy, Entropy and Inequality Measures)
4. ^ Juana Domínguez-Domínguez, José Javier Núñez-Velázquez: The Evolution of Economic Inequality in the EU Countries During the Nineties, 2005
5. ^ James E. Foster and Amartya Sen, 1996, On Economic Inequality, expanded edition with annexe, ISBN 0-19-828193-5
6. ^ [2]
7. ^ The notation using E and A follows the notation of a small calculus published by Lionnel Maugis: Inequality Measures in Mathematical Programming for the Air Traffic Flow Management Problem with En-Route Capacities (für IFORS 96), 1996
8. ^ (1) The first part of the formula is the maximum entropy of the E-A-system. The second part (after the minus symbol) is the real entropy of the E-A-system at a certain time. Such a difference is called redundancy (ISO/IEC DIS 2382-16, information theory).
(2) This version of Theil's formula allows to process quantiles with different widths . only serves as summation index.
(3) Besides mathematical comparison of this formula to the formulas found in many calculuses, you can compare the results 1A and 1B yielded by this formula with the examples 1A and 1B given in The Theoretical Basics of Popular Inequality Measures (Travis Hale, University of Texas Inequality Project, 2003).
9. ^ Example: 82.4% of the people own 17.6% of all ressources and 17.6% own 82.4% of all ressources. For computation see also [3]
Econometrics is concerned with the tasks of developing and applying quantitative or statistical methods to the study and elucidation of economic principles.[1] Econometrics combines economic theory with statistics to analyze and test economic relationships.
Henri Theil (born 13 October 1924 in Amsterdam, died 2000) was a a Dutch econometrician. He graduated from the University of Amsterdam. He was the successor of Jan Tinbergen at the Erasmus University Rotterdam. Later he taught in Chicago and at the University of Florida.
Economic inequality refers to disparities in the distribution of economic assets and income. The term typically refers to inequality among individuals and groups within a society, but can also refer to inequality among nations.
Claude Shannon

Claude Shannon
Born 30 March 1916
Petoskey, Michigan
Died 24 January 2001 (aged 86)
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.

Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
Probability is the likelihood that something is the case or will happen. Probability theory is used extensively in areas such as statistics, mathematics, science and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of
Gini coefficient is a measure of statistical dispersion most prominently used as a measure of inequality of income distribution or inequality of wealth distribution. It is defined as a ratio with values between 0 and 1: the numerator is the area between the Lorenz curve of the
The Lorenz curve is a graphical representation of the cumulative distribution function of a probability distribution; it is a graph showing the proportion of the distribution assumed by the bottom y% of the values.
Sir Anthony Barnes "Tony" Atkinson, FBA is a British economist and has been a Senior Research Fellow of Nuffield College, Oxford since 2005. He served as Warden of Nuffield College from 1994 to 2005.
Amartya Sen

Born November 3 1933 (age 74)
Santiniketan, India
Residence U.S.
The Robin Hood index, also known as the Hoover index, is a measure of income inequality. It is equal to the portion of the total community income that would have to be redistributed (taken from the richer half of the population and given to the poorer half) for there to be
Summation is the addition of a set of numbers; the result is their sum. The "numbers" to be summed may be natural numbers, complex numbers, matrices, or still more complicated objects. An infinite sum is a subtle procedure known as a series.
Gini coefficient is a measure of statistical dispersion most prominently used as a measure of inequality of income distribution or inequality of wealth distribution. It is defined as a ratio with values between 0 and 1: the numerator is the area between the Lorenz curve of the
Amartya Sen

Born November 3 1933 (age 74)
Santiniketan, India
Residence U.S.
Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution that coincides with social, scientific, geophysical, actuarial, and many other types of observable phenomena.
Information theory is a branch of applied mathematics and engineering involving the quantification of information to find fundamental limits on compressing and reliably communicating data.