AIVPCoreyLee: Week 2: Audio, Image and Video Processing

Audio, Image and Video Processing

Week 2: Digital Signal Processing and More

Digital Signal Processing - Digital Signal Processing is the process of digitized real-world signals such as voice, audio, video, temperature and even pressure and then uses maths to manipulate these signals. A Digital Signal Processor is designed for performing all of the base mathematical functions (add, subtract, multiply, divide) and more very quickly. The real-world signals must be processed so that the information contained in them can be displayed, analyzed or converted to another type of signal, such as analogue to digital. Analogue products detect signals along the lines of sound, light, temperature and pressure (as mentioned above) and manipulate these signals.

Analogue to digital converters are used to take the aforementioned real-world signals and turn these signals into a digitized form of 1's and 0's i.e. binary. After this the DSP takes the digitized signal and processes it. This is then fed back through a device to the real world. This is accomplished in one of two ways, either completely digitally or via an analogue format from a digital to analogue converter. All of this manipulation and digitization happens at incredibly high speeds.
Source: http://www.hearingcarecentre.co.uk/Info_page_two_pic_2_det.asp?art_id=6361&sec_id=2937

Above is a diagram of Digital Signal Processing. Analogue signals at either end, Digital signals inbetween along with the Analogue to digital converter and digital to analogue converter on either side of the DSP.

For an example we'll talk about how a DSP is used in an MP3 Player. First you have the recording phase where analogue audio from the real world is picked up via a receiver or other source. This signal is then converted to a digital signal via an analogue to digital converter. Following this the signal is passed to the DSP where the signal is encoded as an MP3 and saved as a file to memory. For the playback phase this file is taken from memory, decoded by the DSP and once more converted to an analogue signal through a digital to analogue converter. By doing this the signal can now be played through a speaker system. In more complex examples the DSP can perform other functions like volume and controls or even provide a user interface.

Filters - Electronic filters are electronic circuits that perform signal processing functions. This is specifically used to remove certain unwanted frequency components from a signal and in doing so enhances one or both the signal and frequency components, A low-pass filter is an electronic filter that passes low-frequency signals but also reduces the amplitude of any signals that have a frequency higher that a specific cutoff frequency.

The term smoothing talks about a function in which certain points of the data in a signal are changed so that any points that are higher than the immediately adjacent points (most likely because of noise) are reduced and vice versa for points lower. In doing this you create a smoother signal. As long as the actual signal is smooth in the beginning. If this is the case then the true signal with not be distorted by smoothing but noise with be reduced.

A sampled input signal must be band-limited in order to prevent the chances of aliasing occurring. Meaning waves of higher frequency are recorded as a lower frequency.

Pitch - A high frequency sound wave means a high pitch and a low frequency sound wave means a low pitch. It's important to note that when two sounds which have a difference in frequency that amounts to greater that 7 Hz are played at the same time then most people are capable of noticing the presence of a complex wave pattern which is the result of interference and superposition from the two sound waves.
Source: http://www.physicsclassroom.com/class/sound/Lesson-2/Pitch-and-Frequency

It is important to know the pitch and duration of musical notes. The pitch of a note is shown by the vertical position of the note head. Each line of a stave and each space between a line have different pitches. These pitches are represents by the first seven letters of the alphabet.The pattern repeats itself following the letter G where A is the following pitch.
Source: http://www.shredsandpatches.org.uk/abcdetails.html

The System Needs of a DSP - A Digital Signal Processor needs both Input and Output filtering, analogue to digital and digital to analogue conversion and a Digital Processing Unit. There are three points for why we use DSP. It is precise, robust and flexible.

The precision of DSP systems are only limited by the process of conversion at both the input and output of a signal. Two things modify this, the sampling rate and work length restrictions. (Sampling frequency and number of bits). Due to the increased operate speed and word length of modern appliances this is becoming less of a problem.

Because of logic level noise margins, digital systems are less likely to experience Electrical Noise (any form of electrical energy you don't want). You also avoid the need for impractical component values such as large capacitors (used to store electrical energy) and inducers (coil of wire) causing low frequency filtering.

Programmability means that you upgrade and expanded the various operations a DSP provides with needing to necessarily change the hardware on any scale. You can create practical systems with adaptive characteristics to suit your needs.

Simple Sound Card Architecture -
Source: http://ronwayking.blogspot.co.uk/

How a Sound Card Works - Prior to the creation of the first sound card the only sound a computer could make was a beep. The PC could change the beep's frequency and duration but that was all. This beep served as a warning or a signal and then later developers used this beep to create music for early PC games using beeps with alternating pitches and lengths. Around the 1980's a leap into improved sound capabilities brought out sound cards. A computer with a sound card can produce 3-D audio for games and surround sound for CD's and DVD's. The sound card also allows the capture and recording of sound from the real world.#

Sounds and Computer Data are obviously different. Sounds analogue signals that are made of waves travelling through matters. These sounds are heard when these waves vibrate through the eardrums. Computers operate digitally with electrical impulses represented as 0's and 1's. Binary. A sound card allows a computer to translate between both digital and analogue information.

Analogue Wave - Below you see a picture of a graph that shows the analogue wave created by saying the word "hello". As you can see the vibrations are fast paced. The human diaphragm is vibrating on the order of 1,000 oscillations per second. A complex waveform for a simple word.

A pure tone is a sine wave vibrating at a specific frequency. Below is a 500 Hz wave. That means 500 oscillations per second.

Source: http://www.audiocircuit.com/DIY/CD-Players/Article:How-CD-players-work

Digital Data - In any digital recording technology you aim to create a recording that is very similar to the original signal and that the recording sounds the same each and every time you play the track. In order to make this happen digital recording converts an analogue wave into a stream of numbers and then records the numbers as opposed to the wave. This conversion is done by an analogue to digital converter. When you play the music back the numbers are then converted back from digital data into analogue data by an appropriate digital to analogue converter. This wave is then amplified and sent to speakers to produce the sound. If done correctly the analogue wave that the DAC creates will be the exact same every time. This analogue wave will also be incredibly similar to the original wave as long as the first ADC sampled the wave at a high rate so as to produce accurate numbers.

Sampling a Signal - When the analogue wave is sampled by an analogue to digital converter you have control over two variables. The first variable is the sampling rate. This controls how many samples of the wave are taken per second. The second variable is the sampling position. This controls the amount of different gradations that are possible when taking the sample.
Source:http://www.mixrevu.com/?p=article&id=17

The gaps between the digital samples are the sampling period seconds (T). The first line is T1, the second T2 and so on. The point where the digital samples touch the analogue signal is the analogue sample.

Comparing Analogue and Digital Audio - Both analogue and digital audio transmit and store sound in different ways. Analogue audio uses positive and negative voltages. A microphone will pick up and convert the pressure waves from a sound into voltage changes to be transmitted on a wire. Higher pressure means positive voltage and lower pressure means negative voltage. When these voltages travel from a microphone they can be recorded onto tape as changes in magnetic strength or onto vinyl as changes in groove size. A speaker works in the same way a microphone does but only in reverse, in order to recreate the pressure wave via vibrations.

Digital audio uses zeroes and ones (binary). As opposed to analogue storage media like tape or records computers store audio information digitally as binary. In digital storage the starting waveform is changed into individual snapshots called samples. This is sampling the audio, or analogue to digital conversion. When you record from a microphone to a computer analogue to digital converters take the analogue signal and change it into digital samples the computer can store and process (binary).

Understanding Sample Rate - Sample rate is the number of digital samples taken of a signal each second. This rate tells you the range of frequency from a audio file. If there is a high sample rate then the shape of the digital waveform will be close to the shape of the original analogue waveform. Low sample rates as such limit the range of frequencies that can be recorded, which makes for poorly represented sounds from a recording.

Below you will see two sample rates. The first being a low sample rate differing from the original sound wave and the second being a high sample rate that is exactly like the original.

Source: http://www.basic-home-recording-studio.com/audio-sampling-rates.html

In order to reproduce a frequency the sample rate must be at least twice the frequency. CD's have a sample rate of 44,100 samples per second so they can reproduce any frequencies up to 22,050 Hz which is slightly larger than the limit of human hearing which is 20,000 Hz.

Understanding Bit Depth - Bit depth determines the dynamic range of a track. Whena sound wave is sampled each sample is given an amplitude value that is closest to the original waves amplitude. Having a high bit depth means that more possible amplitude values are achievable, which in turn produces a greater dynamic range, lower noise floor and higher fidelity. If you want the best audio quality then remain at 32bit resolution whilst transforming the audio and then convert this into a lower bit depth for output.

A higher bit depth means a greater dynamic range. For example 8-bits can reach 48 dB whilst 32-bit can reach 192 dB. Bit depth equals Amplitude Resolution, a Bit has a value of 0 or 1. A single bit can represent two states either on or off, two bits can represent four states. Each additional bit therefore doubles the number of presentable states.

The resolution and amplitude are just as important as the sampling resolution. When a waveform is sampled each sample is then assigned the amplitude value that is closest to the original analogue wave. With a resolution of two bits each sample can then have one of four possible amplitude positions. Higher bit depth means a lower noise floor and higher fidelity. For example CD quality sound has a 16 bit wich means that each sample has 65,536 possible amplitude values, i.e 2 to the power of 16.

Audio File Contents and Size - An audio file consists of a small header indicating sample rate and bit depth, followed by a long series of numbers one for each sample. This can make for very large files. An example is that 44,100 samples per second and 16 bits per sample, a mono requires 86 KB per second, around 5 MB per minute. This figure doubles to 10 MB per minute for a stereo file due to it having two channels.

MIDI is a much small file. This is due to MIDI being Digital Audio (an exact recording). MIDI is essentially sheet music for creating musical selections. MIDI is as small as only 10KB per minute. A MIDI file works by having the sound card take information from the MIDI file and uses a synthesizer to recreate the note being played on the corresponding instrument. Due to the fact that every synthesizer sounds different the MIDI file sounds different depending on what sound card plays the file back. MIDI files can't record sounds that can't be resynthesized.

Sound Card Word Length - A sound card that can support a 16 bit word length of quantised sample values will allow representation of 65536 different signal levels inside the input voltage range of the card. For example, If the input connection has a voltage of +/- 5V then the range is 10V and a 16 bit system will then have a Quantisation Step Size of 0.15 mV. Due to the formula of Q = 10V over 65536.

Dynamic Range - The ratio of the largest signal amplitude to the smallest is called the Dynamic Range. A 16 bit word length allows for 65536 signal levels that means the dynamic range would be:
(The human ear has a DR of more than 120 dB and as such CD Quality means there is some compromise.)

DR = 20log([ Voltage Range] / [ quantisation Step Size]) dB

=20log(216) dB

= 96 dB

Below is a comparison of audio recording specifications for Analogue Tape, CD and then DAT.

Ffeaeature	Analogue Tape	CD	DAT
Frequency Response	20 Hz – 20 kHz + / - 3 dB	20 Hz – 20 kHz + / - 0.5 dB	20 Hz – 20 kHz + / - 0.5 dB
Dynamic Range	70 dB	> 90 dB	> 90 dB
Signal to Noise Ratio	~ 70 dB	> 90 dB	> 90 dB
Harmonic Distortion	< 0.5 %	0.004%	< 0.05 %
Channel Separation	40 – 60 dB	> 90 dB	> 80 dB
Wow and Flutter	0.03 %	Non-detectable	Non-detectable
Robustness	Low	High	High

AIVPCoreyLee

Wednesday, 1 October 2014

Week 2: Audio, Image and Video Processing - Lecture 2

No comments:

Post a Comment