Elias gamma coding

Elias gamma code is a universal code encoding positive integers. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand.

To code a number:
  1. Write it in binary.
  2. Subtract 1 from the number of bits written in step 1 and prepend that many zeros.


An equivalent way to express the same process:
  1. Separate the integer into the highest power of 2 it contains (2N) and the remaining N binary digits of the integer.
  2. Encode N in unary; that is, as N zeroes followed by a one.
  3. Append the remaining N binary digits to this representation of N.


The code begins: 1 = 20 + 0 = 1 2 = 21 + 0 = 010 3 = 21 + 1 = 011 4 = 22 + 0 = 00100 5 = 22 + 1 = 00101 6 = 22 + 2 = 00110 7 = 22 + 3 = 00111 8 = 23 + 0 = 0001000 9 = 23 + 1 = 0001001 10 = 23 + 2 = 0001010 11 = 23 + 3 = 0001011 12 = 23 + 4 = 0001100 13 = 23 + 5 = 0001101 14 = 23 + 6 = 0001110 15 = 23 + 7 = 0001111 16 = 24 + 0 = 000010000 17 = 24 + 1 = 000010001

To decode an Elias gamma-coded integer:
  1. Read and count 0s from the stream until you reach the first 1. Call this count of zeroes N.
  2. Considering the one that was reached to be the first digit of the integer, with a value of 2N, read the remaining N digits of the integer.


Gamma coding is used in applications where the largest encoded value is not known ahead of time, or to compress data in which small values are much more frequent than large values.

Generalizations

Gamma coding does not code zero or negative integers. One way of handling zero is to add 1 before coding and then subtract 1 after decoding. Another way is to prefix each nonzero code with a 1 and then code zero as a single 0. One way to code all integers is to set up a bijection, mapping integers (0, 1, -1, 2, -2, 3, -3, ...) to (1, 2, 3, 4, 5, 6, 7, ...) before coding.

Example code

Encode
>
void eliasGammaEncode(char* source, char* dest)
{
IntReader intreader(source);
BitWriter bitwriter(dest);
while(intreader.hasLeft())
{
int num = intreader.getInt();
int l = log2(num);
for (int a=0; a < l; a++)
{
bitwriter.putBit(false); //put 0's to indicate how much bits that will follow
}
bitwriter.putBit(true);//mark the end of the 0's
for (int a=0; a < l; a++) //Write the bits as plain binary
{
if (num & 1 << a)
bitwriter.putBit(true);
else
bitwriter.putBit(false);
}
}
intreader.close();
bitwriter.close();
}


Decode

>
void eliasGammaDecode(char* source, char* dest)
{
BitReader bitreader(source);
BitWriter bitwriter(dest);
int numberBits = 0;
while(bitreader.hasLeft())
{
while(!bitreader.getBit() || bitreader.hasLeft())numberBits++; //keep on reading until we fetch a one...
int current = 0;
for (int a=0; a < numberBits; a++) //Read numberBits bits
{
if (bitreader.getBit())
current += 1 << a;
}
//write it as a 32 bit number
current= current | 1 ;              //last bit isn't encoded!
for (int a=0; a < 32; a++) //Read numberBits bits
{
if (current & (1 << a))
bitwriter.putBit(true);
else
bitwriter.putBit(false);
}
}
}

See also

External links

Elias gamma coding
universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic (i.e.
..... Click the link for more information.
number is an abstract idea used in counting and measuring. A symbol which represents a number is called a numeral, but in common usage the word number is used for both the idea and the symbol.
..... Click the link for more information.
binary numeral system, or base-2 number system, is a numeral system that represents numeric values using two symbols, usually 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2.
..... Click the link for more information.
unary numeral system (base-1, a non-standard positional numeral system) is the simplest numeral system to represent natural numbers: in order to represent a number N, an arbitrarily chosen symbol is repeated N times.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
In mathematics, a bijection, or a bijective function is a function f from a set X to a set Y with the property that, for every y in Y, there is exactly one x in X such that
f(x) = y.
..... Click the link for more information.
Elias delta code is a universal code encoding the positive integers. To code a number:
  1. Write it in binary.
  2. Count the bits and write down that number of bits in binary (X).

..... Click the link for more information.
Elias omega coding is a universal code encoding the positive integers. Like Elias gamma coding and Elias delta coding, it works by prefixing the integer with a representation of its order of magnitude in a universal code.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the
..... Click the link for more information.
Information theory is a branch of applied mathematics and engineering involving the quantification of information to find fundamental limits on compressing and reliably communicating data.
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.

Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity
..... Click the link for more information.
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data.
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.

One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
Adaptive Huffman coding (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding, building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to
..... Click the link for more information.
Arithmetic coding is a method for lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code.
..... Click the link for more information.
Range encoding is a form of arithmetic coding, a data compression method, that is believed to be free from arithmetic coding related patents. It is on this basis that interest in range encoding has arisen, particularly in the open source community.
..... Click the link for more information.

..... Click the link for more information.
An Exponential-Golomb code (or just Exp-Golomb code) of order is a type of universal code, parameterized by a whole number . To encode a nonnegative integer in an order- exp-Golomb code, one can use the following method:

..... Click the link for more information.
universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic (i.e.
..... Click the link for more information.
Fibonacci coding is a universal code which encodes positive integers into binary code words. All tokens end with "11" and have no "11" before the end.

The formula used to generate Fibonacci codes is:


where F(i) is the i-th Fibonacci number.
..... Click the link for more information.
A dictionary coder, also sometimes known as a substitution coder, is any of a number of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary')
..... Click the link for more information.
LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively [1] .
..... Click the link for more information.
Lempel-Ziv-Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.
..... Click the link for more information.
Lempel-Ziv-Oberhumer (LZO) is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.

A free software tool which implements it is lzop.
..... Click the link for more information.
DEFLATE is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool, and was later specified in RFC 1951.
..... Click the link for more information.
The Lempel-Ziv-Markov chain-Algorithm (LZMA) is an algorithm for data compression in development since 1998[1] and used in the 7z format of the 7-Zip archiver.
..... Click the link for more information.
LZX is the name of an LZ77 family compression algorithm. It is also the name of a file archiver with the same name. Both were invented by Jonathan Forbes and Tomi Poutanen.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.