Audio and the Web

Nocturne

Permissions
© Secret Garden from the Album Songs from a Secret Garden

Preparation of audio for the Web follows the same principles as for other media. The objective is:

Make the file size of the audio recording small enough for quick transport across the Internet while maintaining an acceptable level of audio quality.

As is also true for other media—still images and video—codecs have been developed for meeting this objective.

To complete the process of preparing audio for the Web, the audio source must be digitized. The next few sections review the concept of digitization. For another view of the world of Audio digitization, review this page in your Audacity manual (it came with Audacity when you downloaded it):

Audacity/help/manual/man/digital_audio.html

The wave form images in the following text are taken from this Audacity manual.

Analog Audio

Remember that in nature signals that are received by our human senses, or a sensory recording device, arrive in one form and are then are processed into another, analogous, form. With respect to audio, signals arrive at our ear drums in the form of sound waves. Our brains then process these sound waves and produce what we perceive as sound in a manner that is analogous to the sound waves that arrive. For instance, the more pressure (the amplitude, or height or the waves) the sound waves exert on our eardrums, the louder the sound our brains make us perceive. In other words, the loudness of the sound we hear is analogous to the pressure exerted on our eardrums by incoming sound waves. Similarly, the quicker the sound wave changes (its frequency, noted when the peaks are closer together for a given length of time) the higher the sound is that our brains make us perceive; that is, the pitch of the sound we hear is analogous to the frequency of the sound wave reaching our eardrums, with higher frequencies generating higher perceived pitch. This all occurs through electro-chemical reactions in our brains that are analogs of the sound waves reaching our ears.

Illustration of Sound Wave

Permissions
© Audacity Open Source Project, Found in Audacity/help/manual/man/digital_audio.html

What this all means is that our brains process external signals and represent those signals in some analogous fashion in our brains. Early sound recording devices did the same thing. For instance vinyl records are molded with grooves of varying depth that are analogous to the sound being recorded. Playing a vinyl record causes electrical charges to be picked up by a needle from the grooves in the record that in turn are passed to speakers which are forced to vibrate in a way that generates sound waves that are analogous to the charges in the groove in the record, thus reproducing the sound waves originally recorded.

Digital Audio

Digital renderings of sensory input are different than analog renderings. In analog renderings, the signals are processed in a smooth fashion. That is, as the strength (amplitude) of sound waves change, an analog rendering thereof will change smoothly along with the changes in the sound waves. For example, when music is played that goes from light sound wave pressure to strong sound wave pressure, and/or from low frequency sound waves to high frequency sound waves, our brains respond with a smooth transition from a perception of quiet sounds to loud sounds and/or low sounds to high sounds.

Digitizing a sound wave results in a digital representation of the sound wave. Digitizing is done by sampling the sound wave at regular intervals and assigning to each sample a number that represents how high (loud) the sound wave is. If enough samples are taken per second of the sound waves reaching our digital recording device, the sound wave can be reconstructed fairly accurately from the individually sampled numbers. If samples are taken often enough, the frequency of the sound wave can also be captured in terms of how fast the sampled numbers change.

Illustration of Sound Wave that has been Sampled

Permissions
© Audacity Open Source Project, Found in Audacity/help/manual/man/digital_audio.html

It is certainly reasonable to say that the size of a sampled number is analogous to the height of the sampled signal. However, these sampled numbers are discreet. That just means that there is no smooth transition between the numbers, as opposed to a recorded analog signal that changes smoothly with the sound it represents. Think of a vinyl record again. Instead of a smooth deepening and shallowing of the grooves in the record, if there were abrupt, virtical drops or rises in the grooves, each followed by a short, flat stretch followed by another abrupt drop or rise in the groove, followed by another flat stretch, and so on, this would be more like a digital recording.

However, the more digital samples that are taken per second, each with high numerical accuracy, the more like the original wave form the numbers will be. Thinking of our vinyl record example again, this would mean that the abrupt, vertical drops and rises would be so small in their change, and the flat stretches between drops and rises so short that the human ear would not perceive them at all.

This, then, is the goal of digitizing any media: take enough samples (represented by numbers) of the sensory media being recorded so that the sensory information can be reformulated from the sampled numbers in a manner that the sensory information is perceived to be the same as the original-prerecorded version. As an example, the next illustration shows in the second half of the wave a sampling rate that is twice that of the first half. It is easy to see that the original wave could more easily be closely approximated from the higher sample rate than the lower sample rate.

Illustration of Sound Wave that has been Sampled at Two Different Sampling Rates

Permissions
© Audacity Open Source Project, Found in Audacity/help/manual/man/digital_audio.html

The problem with this in terms of Web delivery of digital media is that in general, a large number of samples of the media (over time for video and audio) must be taken in order for the resulting reformulation of the original sensory information to be good. This results in file sizes for the media, which must contain all of the sampled numbers, to be very large and not well suited for delivery over the Web. This fact has given rise to codecs (coder-decoder algorithms, which are also called compressor/doecompressor algorithms).