The question reads:
I was wondering if you could make a tutorial/tips for voice removal from stereo audio files. I have had read about people saying that this can be accomplished by using some sort of frequency filter. I have (two) Karaoke plug-in on Ardour, but it doesn’t works quite right with one audio file that I’m working on. Maybe some recommendations about using this sort of “tricks” with the right type of audio files (is better to get 5.1 channels in one audio file than just 2?). With which audio files can be used the Karaoke plugin, etc? if possible… Best regards!
Before I begin, let’s just point out that you’re not likely to find much music in 5.1-channel sound; the surround sound system is usually for theatre sound, where you’re watching a movie, where the sound needs to come from a specific direction to match the positions of actors and props on the set. Music is often set to a stereo soundfield, because that’s the default number of speakers for radios, computer systems, and even personal music devices like the iPod (ever see an iPod with 6 earbuds?). More channels would add little to the effect, and would take up a lot more bandwidth and space on a device’s storage.
Now, I have not really experimented with making karaoke files, but the principle is very simple: Every instrument has its own place on the stereo soundfield, except for the exact center; that is the place where the lead singer’s voice is put. What this means is that of both speakers, the singer’s voice is equal on both. The soundwaves are identical, as they correspond to a mono recording evenly split between channels. Because the lead singer’s voice is the same for both channels, this means that by reversing the phase of one channel and mixing the two channels into one, the lead singer’s voice can be theoretically cancelled out, while all the other instruments and voices will remain audible.
I say “theoretically” because this would only work correctly if lossy compression was never applied to the audio throughout the history of the recording, and if there is no other audio processing making changes to the lead singer’s voice in a stereo soundfield, such as reverb. The more realistic expectation is that the lead singer’s voice will be softened, and if there is reverb, it may become more pronounced.
Additionally, the change in phase for one channel can also have less-pleasant effects on the other voices and instruments, which were mastered specifically to be phase-balanced. The phase-reversed instruments can cause diffusion and cancellation of themselves and other instruments, resulting in a muddy mess, or a strange sound can result. Additionally, since you have to combine the two channels directly into a single one (in order to apply the phase cancellation) you end up removing the stereo soundfield altogether.
Of course, considering the singing quality of some karaoke singers, I can’t say that the quality of the music will be noticed over the singing.
In the end, all you need to perform this task is one single LADSPA plugin: Inverter.
This plugin should only be applied to one channel in the chain; the other should not be changed.
Granted, this is just being output to the speakers, but the principle is the same; take the two channels from any source, put one into Jack Rack with the inverter plugin, and then mix both into the desired device or application. The ending result will be as close as you can get without having the original tracks to simply mute out the lead singer.
I tested the process with Styx’s “Lady,” (ripped from CD and placed in FLAC) and it seemed that the piano music, which had dynamic movement in the soundfield, ended up turning into a pulsing noise. Dennis’s voice was much softer, but you could clearly hear the reverb of his voice in the process.
The point of this: Your results will vary. Official karaoke tracks are made from the original tracks with muted singers, or by re-performing the music altogether. Tricks like this are imperfect at best.
But, who knows? You might have something good.