Golomb coding

Golomb coding is a form of entropy encoding invented by Solomon W. Golomb that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. If negative values are present in the input stream then a overlap and interleave scheme is used, where all non-negative inputs are mapped to even numbers (), and all negative numbers are mapped to odd numbers ().

Golomb coding can also be used (as in Golomb's original article) to encode an alphabet of two symbols, where one is more likely than the other. In this case it can be thought of as a kind of run-length encoding.

Rice coding is a special case of Golomb coding first described by, and independently invented by, Robert F. Rice. It is equivalent to Golomb coding where the tunable parameter is a power of two. This makes it extremely efficient for use on computers, since the division operation becomes a bitshift operation and the remainder operation becomes a bitmask operation.

Rice coding is used as the entropy encoding stage in a number of lossless image compression and audio data compression methods.

Overview

Golomb coding uses a tunable parameter M to divide an input value into two parts: , the result of a division by M, and , the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding. When Golomb coding is equivalent to unary coding.



Golomb-Rice codes can be thought of as codes that indicate a number by the position of the bin (q), and the offset within the bin (r). The above figure shows the position q, and offset r for the encoding of integer N using Golomb-Rice parameter M.

Formally, the two parts are given by the following expression, where is the number being encoded: and The final result looks like:

Note that can be of a varying number of bits, and is specifically only b bits for Rice code, and switches between b-1 and b bits for Golomb code (i.e M is not a power of 2): Let . If , then use b-1 bits to encode r. If , then use b bits to encode r. Clearly, if M is a power of 2 and we can encode all values of r with b bits.

The parameter M is a function of the corresponding Bernoulli process, which is parameterized by the probability of success in a given Bernoulli Trial. and are related by these inequalities:



The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code.

Simple Algorithm

Note below that this is the Rice-Golomb encoding, where the remainder code uses simple truncated binary encoding, also named "Rice coding" (other varying-length binary encodings, like arithmetic or Huffman encodings, are possible for the remainder codes, if the statistic distribution of remainder codes is not flat, and notably when not all possible remainders after the division are used). In this algorithm, if the M parameter is a power of 2, it becomes equivalent to the simpler Golomb encoding.
  1. Fix the parameter M to an integer value.
  2. For N, the number to be encoded, find
  3. quotient = q = int[N/M]
  4. remainder = r = N%M
  5. Generate Codeword
  6. The Code format : <quotient Code><remainder Code>, where
  7. Quotient Code (in unary coding)
  8. Write a q-length string of 1 bits
  9. Write a 0 bit
  10. Remainder Code (in truncated binary encoding)
  11. If M is power of 2, code remainder as binary format. So bits are needed. (Rice code)
  12. If M is not a power of 2, set
  13. Code the first values of r as plain binary using b-1 bits.
  14. Code the rest of the values of r by coding the number in plain binary representation using b bits.

Example

Set M = 10. Thus . The cutoff is

Encoding of quotient part
qoutput bits
00
110
2110
31110
411110
5111110
61111110
...111...10
Encoding of remainder part
roffsetbinaryoutput bits
000000000
110001001
220010010
330011011
440100100
550101101
61211001100
71311011101
81411101110
91511111111


For example, with a Rice-Golomb encoding of parameter M=10, the decimal number 42 would first be split into q=4,r=2, and would be encoded as qcode(q),rcode(r) = qcode(4),rcode(2) = 11110,010 (you don't need to encode the separating comma in the output stream, because the 0 at the end of the q code is enough to say when q ends and r begins ; both the qcode and rcode are self-delimited).

Example code

Note: this basic code assumes that the M parameter is a power of 2; it does not implement the case where truncated bit encoding of division remainders will be preferable (when M is not a power of 2, like in the previous example).

Encoding

>
void golombEncode(char* source, char* dest, int M)
{
IntReader intreader(source);
BitWriter bitwriter(dest);
int num;
int c = 0;
while(intreader.hasLeft())
{
num = intreader.getInt();
int q = num / M;
for (int i = 0 ; i < q; i++)
bitwriter.putBit(true);   // write q ones
bitwriter.putBit(false);      // write one zero
int v = 1;
for (int i = 0 ; i < log2(M); i++)
{
bitwriter.putBit( v & num );
v = v << 1;
}
}
bitwriter.close();
intreader.close();
}

Decoding

>
void golombDecode(char* source, char* dest, int M)
{
BitReader bitreader(source);
IntWriter intwriter(dest);
int q = 0;
int nr = 0;
while (bitreader.hasLeft())
{
nr = 0;
q = 0;
while (bitreader.getBit()) q++;     // potentially dangerous with malformed files.
for (int a = 0; a < log2(M); a++)   // read out the sequential log2(M) bits
if (bitreader.getBit())
nr += 1 << a;
nr += q*M;                          // add the bits and the multiple of M
intwriter.putInt(nr);               // write out the value
}
bitreader.close();
intwriter.close();
}

Use for Run-Length Encoding

Given an alphabet of two symbols, or a set of two events, P and Q, with probabilities p and (1-p) respectively, where p >= 1/2, Golomb coding can be used to encode runs of zero or more P's separated by single Q's. In this application, the best setting of the parameter M is the nearest integer to . When p = 1/2, M = 1, and the Golomb code corresponds to binary (n>=0 ones followed by a zero codes for n P's followed by a Q).

Applications

The FLAC audio codec uses Rice coding to represent linear prediction residues (because these residues are modeled into a near-geometric distribution, with small residues being more frequent than large residues, and are then encoded with less bits using a predefined Rice-Golomb encoding, without requiring more complex variable-length encodings like Huffmann or arithmetic codings). However, this assumption is wrong for a simple sinusoidal signal, because the differential residues create a sinusoidal signal whose values are not creating a geometric distribution (the highest and lowest residue values have similar high frequency of occurrences, only the median positive and negative residues occur less often).

Apple's ALAC audio codec uses modified Rice coding after its Adaptive FIR filter.

The Golomb-Rice coder is used in the entropy coding stage of Rice Algorithm based lossless image codecs. One such experiment yields a compression ratio graph given below. See other entries in this category at the bottom of this page. in those compression, the progressive space differential data yields an alternating suite of positive and negative values around 0, which are remapped to positive-only integers (by doubling the absolute value and adding one if the input is negative), and then Rice-Golomb coding is applied by varying the divisor which remains small.



Note that in those results, the Rice coding may create very long sequences of one-bits for the quotient ; for practical reasons, it is often necessary to limit the total run-length of 1-bits, so a modified version of the Rice-Golomb encoding consists in replacing the long string of one-bits by encoding its length by a recursive Rice-Golomb encoding; this requires reserving some values in addition to of the initial divisor k to allow the necessary distinction.

References

In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.

One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Solomon Wolf Golomb (b. 1932 in Baltimore, Maryland) is a mathematician and engineer, a professor of electrical engineering at the University of Southern California best known to the general public and fans of mathematical games as the inventor of polyominoes, the inspiration for
..... Click the link for more information.
geometric distribution is either of two discrete probability distributions:
  • the probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set , or
  • the probability distribution of the number Y

..... Click the link for more information.
Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.
..... Click the link for more information.
computer is a machine which manipulates data according to a list of instructions.

Computers take numerous physical forms. The first devices that resemble modern computers date to the mid-20th century (around 1940 - 1941), although the computer concept and various machines
..... Click the link for more information.
In computer science, a mask is some data that, along with an operation, is used in order to extract information stored elsewhere.

The most common mask used, also known as a bitmask
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.

One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Image compression is the application of Data compression on digital images. In effect, the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form.

Image compression can be lossy or lossless.
..... Click the link for more information.
Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are implemented in computer software as audio codecs.
..... Click the link for more information.
Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one.
..... Click the link for more information.
Truncated binary encoding is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n.
..... Click the link for more information.
In probability and statistics, a Bernoulli process is a discrete-time stochastic process consisting of a sequence of independent random variables taking values over two symbols. Prosaically, a Bernoulli process is coin flipping, possibly with an unfair coin.
..... Click the link for more information.
In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure".

In practice it refers to a single experiment which can have one of two possible outcomes.
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
Unary coding is an entropy encoding that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one.
..... Click the link for more information.
Truncated binary encoding is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n.
..... Click the link for more information.
Free Lossless Audio Codec

File extension: .flac
MIME type: audio/x-flac[1]
Type of format: Audio
Free Lossless Audio Codec

Developer: Xiph.Org Foundation
Latest release: 1.2.
..... Click the link for more information.
Audio compression can mean two things:
  • Audio data compression - in which the amount of data in a recorded waveform is reduced for transmission. This is used in CD and MP3 encoding, internet radio, and the like.

..... Click the link for more information.
Apple Inc.

Public (NASDAQ:  AAPL , LSE:  ACP , FWB: APC )
Founded California (April 1 1976, as Apple Computer, Inc.)
Headquarters 1 Infinite Loop, Cupertino, California

Key people Steve Jobs, CEO & Co-founder
Steve Wozniak, Co-founder
..... Click the link for more information.
ALAC may refer to:
  • Apple Lossless, a digital audio format
  • The At-Large Advisory Committee, a committee of ICANN

..... Click the link for more information.
Audio compression can mean two things:
  • Audio data compression - in which the amount of data in a recorded waveform is reduced for transmission. This is used in CD and MP3 encoding, internet radio, and the like.

..... Click the link for more information.
A finite impulse response (FIR) filter is a type of a digital filter. It is 'finite' because its response to an impulse ultimately settles to zero. This is in contrast to infinite impulse response filters which have internal feedback and may continue to respond indefinitely.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the
..... Click the link for more information.
Information theory is a branch of applied mathematics and engineering involving the quantification of information to find fundamental limits on compressing and reliably communicating data.
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.

Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity
..... Click the link for more information.
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data.
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.

One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.