Before you can record any audio, you must first know how audio actually works. This is important in a number of ways, especially when attempting to understand concepts like frequency, amplitude, phase, and reverb. You can make sound happen without knowing the concepts, but knowing why something works makes it much easier to make it work even better.
This article is intended to be an introduction to the basics of sound. It is not intended to be comprehensive nor in-depth; there will be no math, and little in the way of advanced terminology, and since I’m a visual thinker, I tend to prefer having visual aids when I’m learning, so I’ll provide them for others who think the same way.
The Physics of Air
It all starts with the fact that we live in a gaseous environment. Air is all around us, and even though we are not inhibited by it, air is a physical object with mass. And like any physical object, it responds to Newton’s Laws of Motion. This is important, because these physical laws actually work together to make the air capable of producing the sounds you hear; if it was less massive, or more massive, you wouldn’t be able to hear anything, because it would respond differently to… well, let’s just start at the beginning, shall we?
Sound is caused by sudden motion, whether it’s a crack caused by collision, or a steady tone caused by vibration, sound is produced by physical objects making fast movements with sudden changes. This sudden movement produces a sudden change in the air’s position, which then causes air molecules to collide with their neighbors as they speed away from the source. Since they do so en masse, they crush into stationary air molecules, causing compression to happen in that location.
The crush then causes the stationary molecules to move suddenly, while their assailants will react by moving back in the other direction from the impact. Suddenly, where there was a crush of air molecules, now is very little air, which means that in this location, the air is expanded. Of course, the air that was here are just now hitting their neighbors, causing the neighboring air to fly off, while the previous air are now bouncing back for another compression.
This continues outward in all directions as air continues to collide and bounce from location to location, rippling through the rest of the air until it reaches something capable of picking up the compression and expansion, and transforming the vibrations caused by this chain reaction into signals that can be understood. At this point, the signals being picked up from the compressing and expanding air is “sound.”
Now, sometimes, the sound is softer, because the expansion and compression are less severe. The compressions and expansions are said to have less amplitude, meaning that they are less strong than they could otherwise be. Their speed also has an important role; as the compressions and expansions happen more frequently the faster the air molecules move and react. This translates into frequency.
In the above image, you might also have noticed that the central arrows become more faint as the sound compressions continue outwards. As long as there is no constant vibration causing more sound, the air will lose energy with every bounce, eventually causing the air to settle again; this usually happens very quickly. As the compression waves get further from the source, the overall energy in the compression/expansion cycle also starts to get less, resulting in things being quieter the further away they are.
In a way, the motion of sound behaves pretty much like ripples in a pond. From the moment air molecules start getting bounced around, they get bounced around in all directions. However, unlike the pond’s ripples, sound does not move out in a circle, despite the use use of circles in the above visual aids; the sound goes out in a sphere from the source. They will keep moving, reducing in power, until they reach a point where the transferred motion doesn’t carry enough energy for the next group of air to be compressed with… or they will hit something that does not react, causing them to turn back, such as a solid wall. This bounceback is known as reverb. As you can see from the image, reverb can happen multiple times, although it’s usually not as straightforward as a billiard table (other waves can modify the reverb’s sound).
The Human Ear
Once the air gets bouncing around, this is where the human ear comes in. When the compressions and expansions reach the eardrum, the eardrum vibrates in reaction. This vibration moves the bones in the ear, which triggers the nerves associated with hearing. This is where the air motion actually becomes sound.
The Audio Waveform
Now, you’ll notice that while sound travels in a where in all directions, the devices used to pick up the sound are single-path; each ear only has one eardrum, and that eardrum only vibrates depending on the movement of the air immediately beside it. This means that we only need to pay attention to the actual ripples making it to the ear/microphone/what-have-you.
This introduces us to the measurement device used to understand how the sound works, and how your changes apply to it: the waveform.
A waveform is a measurement, over time, of the amplitude of change in the air surrounding the ear, microphone, or other sensing device. Since the air will snap back and forth between compressed and expanded, the measurement over time will actually show as a series of ripples, just like the above image.
An amplitude of zero is known as a baseline, as at this point, the air is neither expanded, nor compressed. If a waveform is at zero, there will be absolutely no sound. However, in the real world, this is impossible outside of a hermetically-sealed location that is insanely treated to prevent any sounds from entering… and even then, there’s only a possibility.
Amplitudes below zero are measurements of expansion in the air, while amplitudes above zero are measurements of compression in the air. The curve showing compression is often referred to as a crest, while the curve showing the expansion is often referred to as a valley. One cycle of audio will always start at the baseline, go to one extreme, then go to the other, and then return to the baseline. Whether the crest or the valley comes first determines the cycle’s phase.
The above image was a simply-generated sine wave; it has no variations in amplitude or frequency. Below, however, you can see three different waveforms. The central waveform is the same one identified above.
To the left of the wave form is another form with the same frequency, but roughly only half the amplitude. You can tell that this is the case because the crests and valleys of the wave only extend about half as far as the original wave. To the ear, this results in a quieter sound at the same pitch.
To the right, you will see a wave that has the same amplitude, but has twice the frequency; in this case, you can fit two of the cycles in the right within the space of one cycle in the center. What you will actually perceive is that the sound on the right is one octave higher in pitch than the sound in the middle (for those who don’t know, in music, each octave is double the frequency of the previous octave, and half the frequency of the next octave).
The human ear has ranges that can perceive vibrations; as mentioned, changes in frequency result in changes of pitch, while amplitude determines loudness. The range that determines what frequencies the ear (or other device) can pick up is known as frequency response, while the range of amplitudes that are perceivable are known as the dynamic range.
As was previously mentioned, sound does not travel alone; it radiates out from the source in all directions. Reverberation then causes some of the sounds to turn back on their brethren. This can have several effects, depending on the difference in amplitude, frequency, and phase between the two sounds.
Most times, when two sounds collide, they are in different states; the sounds being transmitted in one direction are newer than the sounds travelling in another direction, so the differences in amplitude and frequency can allow them to coexist. However, keep in mind that even if they coexist, they can affect other’s waveform, causing both sounds to become more dull than they were before the collision. This is what’s referred to as diffusion.
In the image above, note that the sine wave comes out pretty close to its original state, as the noise wave was relatively small. However, you’ll notice that, even though it was unobtrusive, the noise did have an effect on the sinewave; the final wave is not quite as smooth coming out as it was going in. If they both had similar amplitudes, neither would have escaped the conflict in anything quite like their original forms.
You might also notice that there are square edges to some of the waves; this is because the combination of the two waves took the sound beyond “peak” level. This is a technical issue that means that a sound has more amplitude than a device is capable of recording; when this happens, the recorded sound simply measures at top amplitude for as long as the original is above that point. This isn’t important until we start to discuss Analog-to-Digital Conversion.
Diffusion is often used in a studio environment to remove the wrong kind of reverb from a recording without eliminating it entirely. This can have a pleasant backing sound when creating music, as a form of harmonizing.
Cancellation happens when two equal waves collide while in opposite phases. In other words, the compression of one wave interacts with the expansion wave of the other; this essentially drains the energy of the sound out of the compression, while the following expansion in the first wave then interacts with the returning compression of the second wave in exactly the same way.
The end result of this process is that both sounds are stopped in their tracks. This is desirable for circumstances where one needs as little reverb as possible, and this is the method used in noise-cancelling headphones to remove background sound beyond what the sealing cushions can handle.
Standing waves are the opposite of noise cancellation. Unlike the prior, standing waves are generated when two waveforms collide at exactly the same phase, and when directly opposed (any other angle simply strengthens the signal, resulting in reverb). This collision results in the collided waves remaining in place (hence “standing”), at which point they become a sound source in their own right, generating a new set of sound waves that radiate outward.
In the above image, the standing waves show up where the two other waves intersect, creating their own source of “ripples” to produce even more sound. As you can see, this can get pretty noisy overall. Keep in mind, however, that while the standing waves can spread, each has only a part of the original waves’ energy, so they are not going to overpower the original sound, except initially, and the sound will fade as the overall energy dissipates.
Standing waves are usually a problem for people attempting to record, because they’re a sign of poor noise treatment of a room. An excellent example of standing waves: go into the bathroom, or somewhere else that has a lot of reverb, and clap your hands. Did you hear a ringing noise? That ring was the result of standing waves being formed; they generate a secondary noise that can be heard after the original sound already died.
So, at this point, we have covered what sound is, how it is actually picked up by the ear, and how it is measured. We have identified a number of terms that apply to sound, and how they determine a certain quality or quantity of the sound.
While this, by itself, is not enough to understand how recording or filtering equipment is used, I hope this will give you a better understanding of the basics, which will make researching the tools of the process that much easier.
Hopefully, you’ve learned enough for something good to come of this!