Revision as of 00:20, 5 December 2004 edit141.213.241.155 (talk)No edit summary | Revision as of 00:28, 5 December 2004 edit undo141.213.241.155 (talk) →External LinksNext edit → | ||
Line 125: | Line 125: | ||
Katz, Bob. "" | Katz, Bob. "" | ||
Both |
Both Nika Aldrich and Bob Katz are esteemed experts in the field of digital audio and have books available as well, each of which are far more comprehensive in their explanations: | ||
Aldrich, Nika. "" | Aldrich, Nika. "" |
Revision as of 00:28, 5 December 2004
Formal Definition
The formal definition of "dither" is "a state of indecision" or "to be nervously irresolute in acting or doing."
Dither is form of noise added to digital data for the purpose of minimizing quantization error. It most often surfaces in the field of digital audio, but is utilized and many different fields where digital processing and analysis is used - especially waveform analysis. This includes digital audio, digital video, digital photography, seismology, RADAR, weather prediction systems and much more.
The premise is that quantization and re-quantization of digital data yields error. If that error is repeating and correlated to the signal then the result is repeating, cyclical and mathematically determinable error to the signal. In some fields, especially where the receptor is sensitive to such artifacts, cyclical errors yield undesirable artifacts. In these fields we use dither to results in less determinable artifacts. The field of audio is a primary example of this - the human ear functions much like a Fourier Transform, wherein it hears individual frequencies. The ear is therefore very sensitive to distortion, or additional frequency content that "colors" the sound differently. The ear is far less sensitive to random noise at all frequencies, however, so dither is used in digital audio to turn correlated quantization error into uncorrelated quantization error, or noise.
Dither in Everyday Life
Any time you shake a cookie sheet to get the tater tots to slowly slide off of the sheet you are "dithering" the cooking sheet - adding dither to allow for a smoother, more predictable behavior.
In traditional popcorn making techniques people would often put popcorn in a pan and shake the pan in order to keep the heat even throughout the kernels. This, also, is an example of adding "dither" in everyday life.
In the 1940s (as the story goes) the British Naval Air Fleet was having difficulty calibrating their navigation systems - primitive systems with cogs and gears. While the planes were on the ground the systems would not calibrate because the mechanisms would choke and stick. While the planes were in the air they would work much more fluidly because the vibration of the engines caused more fluid motion. The British ended up using engines to calibrate their equipment while on the ground in order to get more accurate results. This was the first known use of intentionally adding "dither" and calling it such.
Dither in Digital Audio
The final version of audio that goes onto a compact disc contains only 16 bits, but throughout the audio editing process the digital data grows in bit depth as computation occurs. The more math we do, the larger the samples grow in bit depth - just like adding, multiplying, or dividing decimal numbers. In the end, the digital data must be returned to 16 bits for pressing onto a CD and distributing.
There are multiple ways in which one can return the data to 16 bits. They can, for example, simply lop off the excess bits - called truncation. They can also round the excess bits to the nearest value. Each of these methods, however, results in predictable and determinable errors in the result. Take, for example, a waveform that consists of the following values:
1 2 3 4 5 6 7 8
If we reduce our waveform by, say, 20% then we end up with the following values:
.8 1.6 2.4 3.2 4.0 4.8 5.6 6.4
If we truncate these values we end up with the following data:
0 1 2 3 4 4 5 6
If we round these values we end up with the following data:
1 2 2 3 4 5 5 6
Any waveform comprised of the original values, then processed by multiplying each value by .8, would have errors in it in the result, and the errors would be manifested as repeatable. A repeating sine wave quantized to the original sample values, for example, would experience the same error every time its supposed value was "3," in that the truncated result would be off by .4. Any time the supposed value was "5" the error after processing and truncation would be 0. Therefore, the error amount would change repeatedly as the values change. The result is cyclical behavior in the error, which manifests itself as additional frequency content on the waveform. The ear hears this as 'distortion,' or the presence of additional frequency content.
We cannot avoid error resulting in this process. Taking a 2 digit number (4.8) and turning it into a 1 digit number (4 or 5) is going to result in error, and that is unavoidable. What we want to do, however, is create a system wherein that error does not repeat as the values repeat.
A plausible solution would be to take the 2 digit number (say, 4.8) and round it one direction or the other. For example, we could round it to 5 one time and then 4 the next time, etc., etc. This would make the long-term average 4.5 instead of 4, so that over the long-term the value is closer to it's actual value. This, on the other hand, still results in determinable (though more complicated) error. Every other time the value 4.8 comes up the result is an error of .2, and the other times it is -.8. This still results in repeating, quantifiable error.
Another plausible solution would be to take 4.8 and round it so that four times out of five it rounded up to 5, and the other time it rounded to 4. This would average out to exactly 4.8 over the long term. Unfortunately, however, it still results in repeatable and determinable errors, and those errors still manifest themselves as distortion to the ear.
This leads to the dither solution. Rather than predictably rounding up or down in a repeating pattern, what if we rounded up or down in a random pattern? If we came up with a way to randomly toggle our results between 4 and 5 so that 80% of the time it ended up on 5 then we would average 4.8 over the long run but would have random, unrepeating error in the result. This is done through dither.
We calculate a series of random numbers between 0 and .9 (ex: .6, .4, .5, .3, .7, etc.) and we add these random numbers to the results of our equation. Two times out of ten the result will truncate back to 4 (if 0 or .1 are added to 4.8) and the rest of the times it will truncate to 5, but each given situation has a random, but 20% chance of rounding to 4 or 80% chance of rounding to 5. Over the long haul this will result in results that round to 4.8 and a quantization error that is random - or noise. This "noise" result is less offensive to the ear than the determinable distortion that would result otherwise.
When to Add Dither
Dither must be added before any quantization or re-quantization process in order to prevent non-linear behavior (distortion). The results of the process still yield distortion, but the distortion is of the noise so its results are effectively noise. Any bit-reduction process should add dither to the waveform before the reduction is performed.
Different Types of Dither
RPDF stands for "Rectangular Probability Density Function," equivalent to a role of a die. Any number has the same random probability of surfacing.
TPDF stands for "Triangular Probability Density Function," equivalent to a role of two dice. The number sums have different probabilities of surfacing:
1/1 = 2 1/2 2/1 = 3 1/3 2/2 3/1 = 4 1/4 2/3 3/2 4/1 = 5 1/5 2/4 3/3 4/2 5/1 = 6 1/6 2/5 3/4 4/3 5/2 6/1 = 7 2/6 3/5 4/4 5/3 6/2 = 8 3/6 4/5 5/4 6/3 = 9 4/6 5/5 6/4 = 10 5/6 6/5 = 11 6/6 = 12
7 is far more likely to surface than 2 or 12, and the relationship between these probabilities is said to be "triangular."
Gaussian PDF is equivalent to a role of infinite dice. The relationship of probabilities of results follows a bell-shaped, or Gaussian curve. Gaussian PDF dither is closest in proximity to the sound of natural atmospheric noise, tape hiss, etc.
Colored Dither is sometimes mentioned as dither that has been filtered to be different than white noise. Some dither algorithms use noise that has more energy in the higher frequencies so as to lower the energy in the critical audio band.
Noiseshaping is not actually dither, but rather a feedback process that has dither within it. It is used for the same purposes.
Which Dither to Use
If the signal being dithered is to undergo any further processing at all then it should be processed with TPDF dither that has an amplitude of two quantization steps (so that the dither values computed range from, say, -1 to +1, or 0 to 2). If colored dither is used at these intermediate processing stages then the frequency content can "bleed" into other, more noticeable frequency ranges and become distractingly audible.
If the signal being dithered is to undergo no further processing - it is being dithered to its final result for distribution - then colored dither or noiseshaping are appropriate, and can effectively lower the audible noise level by putting most of that noise in areas where it is less critical.
Related Topics
Digital Audio Quantization (signal processing)
External Links
Much of the research in the field of dither for (at absolute least) the audio industry was done by Lipshitz and Vanderkooy out of the University of Waterloo. Other well-written papers on the subject at a more elementary level are available by:
Aldrich, Nika. "Dither Explained"
Katz, Bob. "The Secrets of Dither"
Both Nika Aldrich and Bob Katz are esteemed experts in the field of digital audio and have books available as well, each of which are far more comprehensive in their explanations:
Aldrich, Nika. "Digital Audio Explained"
Katz, Bob. "Mastering Audio"