This article will be pretty dry of actual, useful content. if you’re looking for a how-to article, this will probably be a disappointment. Still, if you’re wondering what the differences are between Alsa, Jack, PulseAudio, and other tools, this might help give you a little perspective.
Some of the things I’ve seen on the web suggests some confusion about Linux audio. I’ve seen and heard where people believe that Jack or PulseAudio are sound drivers, along with Alsa, FFADO, or OSS.
Abridged Linux Audio History
Let me set the ground work a bit. This is somewhat compressed and likely somewhat biased, as most of this is coming from my own memories of the intervening years.
Originally, Linux sound was handled by the Open Sound System (OSS). These drivers created a special file that, when a sound file was copied to it, would play a sound file in the speakers. This allowed any program, or even simple scripts, to make sounds available for their purposes.
The problem, however, was (and, for the most part, still is) that a sound card only has one DAC (digital-to-analog converter), meaning that only one stream of audio can be processed at a time. So, while the speakers are playing a sound file, the sound driver cannot add another sound file until the first is done.
Imagine that you have a desktop environment. Now, imagine that you have a music player, an IM program, and a video game running, perhaps even a talking program like TeamSpeak, Ventrilo, or Mumble, and maybe even Skype. Every single one of these programs has sounds that need to be played at the moment they are requested, because you depend on them to let you know what’s going on… except for the music player.
Now, imagine that your music player has been playing music the whole time.
While the music player is playing music, it has access to the DAC; its audio data has been going to the DAC, non-stop. Since the DAC can only handle one sound source at a time, every system, IM, speaking, and game notification have been waiting patiently for their sounds to be played; you’ve heard absolutely nothing but the music. As you can guess, this can be a problem.
The first sound server, a program that can mix different sounds together to make a single stream for the DAC, is (yes, it’s still being maintained) called the “Network Audio System,” or “NAS” for short. As the name suggests, this tool was designed to allow sounds from all over the network to be sent to a central server, which for most end users can be overkill.
Next came EsounD (Enlightened Sound Daemon) and aRts (Analog Real-Time Synthesizer). These were the primary sound servers for the two major competing desktop environments, GNOME and KDE, respectively. These were simpler by comparison, focused on mixing the sounds for the computer they were on. However, things could get frustrating when the programs you used were limited to one or the other, which, once again, introduced the problem of multiple applications attempting to access the DAC, as both were needed to allow all programs to be usable.
While these programs were duking it out, there was also a need for more professional handling of audio, consisting of low-latency manipulation of sound. The Jack Audio Connection Kit served the role admirably. Its ability to produce custom redirection of sound between different programs have provided a useful tool for managing complex digital signal chains, and its transport ability provided a valuable means to synchronize work among many different programs at once.
Jack’s flexibility may have been a boon for audio professionals, but it was both too complex, and way too much overkill for simple desktop usage. Setting it up required careful consideration for sound performance, since it was designed to be low-latency, and when your only need is to mix different programs’ sound to to your system speakers, you don’t need the connection graph. And, once again, only some of the software out there would support this software, and those were similarly designed more for professional use than your normal user would need.
While all the servers were duking it out for dominance on the Linux desktop, the driver space was finding a little competition itself. OSS had lacked some features that a lot of people needed, such as MIDI hardware support and full-duplex (simultaneous DAC (recording) and ADC (playback) support) audio, since an audio device file handled both input and output, and could only be accessed by one program at a time, so you couldn’t record and playback at the same time, something that is required for software monitoring.
Originally, OSS had been free software, but then newer features and device drivers were being released under a proprietary licence, owned by 4Front Technologies. This did not gel well with the open-source community, for obvious reasons. This has since been corrected, but at a point well after the damage was already done.
To address these issues, a competing set of Linux drivers were formed. Called the “Advanced Linux Sound Architecture,” These drivers provided features for a number of sound cards that OSS couldn’t at the time, and it used the General Public License to ensure that ALSA would not meet the same fate as OSS.
Additionally, Alsa had the ability to insert plugins into the driver system. This would allow explicit support for tasks that might not otherwise be available in the core drivers. One of the most commonly-used plugins in Alsa is the “dmix” plugin. For the first time, mixing of multiple audio streams could be in the core sound driver system.
It was around this time that yet another sound driver was appearing on the scene. Alsa drivers did not include drivers for Firewire devices, which would become important, especially in the case of professional audio applications, where the high-performance Firewire interface was handily outperforming the USB and PCI buses. For this reason, the FreeBob project was created. FreeBob can be considered a form of “Alsa for FireWire.” Since then, the project has renamed itself as FFADO, which stands for “Free Firewire Audio Drivers”. I don’t know what the “O” stands for, unless it means “organization” or something along those lines.
The sound servers continued to develop during this time, and GStreamer entered into the mix. GStreamer seemed to be more of a framework, allowing different forms of control and conversion to be applied to audio data by adding filters and controls to a pipeline, and then transmitting the data to the desired server or driver. This is similar to how Jack handles sound, but seems to be limited to a linear chain with GStreamer-specific plugins.
GNOME took this project into its fold, and since then, a number of GNOME projects have started using GStreamer as their primary backend, such as Rhythmbox and Miro.
At this point, the war between EsounD and aRts started dying down, as the two camps started to collaborate on GStreamer, as well as the newcomer to the audio server sphere: PulseAudio, a layered and modular sound server that can actually be connected to by the other sound servers, allowing them all to, for the first time, access the same sound device at the same time.
There is the history as I remember it (with some surreptitious peeking in Wikipedia to help me keep them in order). Let’s actually focus on what they are.
Alsa, FFADO, and OSS: Driver Megapacks
Imagine if you were to download every single audio driver available for Windows, take them apart, merge all the similar parts, and then make a single installer for all of them, perhaps with some steps asking which ones you want to use.
That is essentially what Alsa, FFADO, and OSS are; they are collections of a huge number of drivers for sound devices, all combined in a way that they can be interfaced in the same way. These are the back part of the Linux sound system, the actual drivers that determine Linux’s compatibility with your chosen piece of hardware, whether it is a sound card, audio interface, MIDI interface, or standalone USB speakers, headset, or microphone.
To determine if your hardware is supported, you need to check the lists available on their respective sites:
- Open Sound System (Device Lists)
- Advanced Linux Sound Architecture (Device Lists)
- Free Firewire Audio Drivers Collection (Device Lists)
If you can compare Alsa, OSS, or FFADO to super-collections of Windows drivers, then the above sound servers can be similarly compared to DirectSound or ASIO. Sound servers do not have anything at all to do with the sound devices; your sound card or audio interface will work, regardless of whether you use Jack, Pulse, or GStreamer.
However, these servers will have everything to do with making your audio applications work, as they are the way your programs can access the sound hardware uniformly. Of course, they also provide the ability for programs’ sounds to be mixed together, so that you can hear the sound from multiple programs at the same time.
The classic sound servers can still be used, but, with the exception of NAS, are discontinued.
Jack is technically classic, except that it exists in a class of its own. Originally, there were actually two competing versions of Jack, the original version, and Jackdmp, a C++ rewrite designed to support multiprocessor load balancing (a technique that balances Jack between several processors, reducing the risk of XRuns). Recently, these versions have begun merging, labeling the original version of Jack as “Jack 1” and the Jackdmp variant has taken the “Jack 2” name, demonstrating the expectation of the dmp version replacing the original. (For those curious, I use Jack 2, since I have a quad-core system that can benefit from the load balancing).
The new sound servers are used in most distributions by default, and many of their features are taken advantage by just about all the major consumer audio applications for Linux. They work together well, so there is no need to choose between them.
Additionally, with its new plugin architecture, one of the earliest plugins available for alsa is it’s digital mixer module (called “dmix”). This can, when set up in Alsa, allow Alsa itself to function as a sound mixer, allowing multiple programs to access the sound without needing to even use a separate sound server program. This can be useful when memory is scarce.
- Alsa’s Digital Mixer module (warning: technical information)
Well, there’s everything I can offer. I don’t know if it’s enough to clarify the common confusion between drivers and sound servers, but hopefully, for those who are curious, this can make the connections a little clearer. Perhaps enough to help make something good?