- Joined
- Apr 11, 2010
- Messages
- 930
Concept
Immersive surround systems have (at least) two “layers” of speakers. A “bed” layer at ear level, and a “height” layer above (or upward facing Dolby Atoms speakers).
Ideally the height layer would have 4 or more speakers, a pair in front and pair in back.
This creates two horizontal layers of “surround” speakers.
The idea we are discussing here is to apply both traditional surround upmix techniques, in this case SpecWeb, in the horizontal layers, and music source separation techniques to create the stereo source for surround upmix in each layer.
I’ve first experimented by placing the separated Drums in the height layer, but this method could also be used for placing Piano, or Piano and Drums, etc. in the height layer.
The source for the bed layer is everything except what was source separated out for the height, so in the case of Drums in the height layer, Bass, Vocals, and “Other” would be in the bed layer.
My original idea was actually to have just the cymbals in the height layer, but sadly there aren’t any pre-built source separation tools (I could find) that can do that. However, after listening to my results of having the entire drum kit in the height layer, I think the surround “WOW” factor of separate speaker sources for separate instruments overrides the logic of not placing the entire drum kit in the height layer.
Also my upmix philosophy has always bin "listener on stage surrounded by instruments and vocals" vs. "listener out in the audience with all sounds coming at you from a front stage" (and only ambiance/reflections behind).
Tools Used
Each of the tools used has alternatives, some of which I will mention, but I will mainly focus on the tools I used for their simplicity, and because I had easy access.
In order of process:
Acustica Premium Edition – for source separation
This uses Spleeter behind the scenes, but is a quick and easy way to generate the two stereo “stems” used in later steps. Namely “Drums” and “Everything Else”.
Any other source separator could be used, and in fact I started with Spleeter on the command line with Anaconda, but that required an extra mix down step to mix Bass, Vocals, and Other stems into a single “Everything Else” stereo stem.
SpecWeb 2.2 – for Stereo to 5.1 surround upmix
Used for each layer separately, with the Drums and “Everything Else” stereo stems as the two sources.
Of course, you could substitute another upmixer or maybe use one upmixer for the bed and a different one for the height layer.
In my tests I used SpecWeb 2.2 with defaults for both (more on that later).
Note: this means the resulting upmix is 5.1.4 vs. 7.1.4, so the side surrounds are empty. I could have used a Plogue Bidule Spec 7.1 layout to full achieve 7.1.4, but I wanted the latest SpecWeb 2.2 upmix method and haven’t created a 7.1 SpecWeb version to date.
Plogue Bidule – With a layout and my plugins for:
Audiomuxer can also split a 12 ch wav file into monos in you need that. However I discovered that for purposes of proper file naming for a Dolby Atmos or DTS:X encoder it is easiest to let Audio muxer merge numbered channels to create a properly channel mapped wav file, then split them again (and resample to 48KHz) to get mono files named properly for drag and drop into an encoder (at least for the bed layer. I had to manually assign the files in the height layer to encoder input channels).
AudioMuxer – Join the 12 (7.1.4) mono files and tag the wave file with channel mapping
This step depends on how you are going to play back. If you can’t playback a 12 channel wav file and are instead going to use a DTS X or Dolby Atmos Encoder this could be skipped as your encoder will want multiple mono files.
However if you can play a 12ch file in say foobar2000 to a 12 or more channel audio interface, or with Play PCM Win to an HDMI output
PlayPcmWin) then you will absolutely need this channel mapping encoded in the 12 channel wav file.
Note: Play PCM Win is the only application I know of that uses Windows 10’s Dolby Atmos Encoding capability on a file, vs. in a game or other app (Because I asked the developer for that functionality ;0)
If for some reason you don’t want to use audiomuxer you can do this with eac3to & ffmpeg (which is how audiomuxer does it).
Screen shot of the Plogue Bidule final mix layout:
Results to date
Screen shot of a 12ch file being played back in Foobar 2000 (with waveform seekbar)
The vertical gap is the two empty Lss and Rss channels. Channel order is L, R, C, LFE, Ls, Rs, Lss, Rss, Lfh, Rfh, Lbh, Rbh
Since this is still a manual process rather than use my 50ish “regression” tracks I use for SpecWeb testing, I decided to use tracks I had acquired recently, mostly from HD Tracks and 96x24, and a few recent 44x16 tracks I like.
In all I’ve tested 14 songs to date, of various genres and mixing styles. They are (all stereo sources):
Bad Guy – Billie Eilish
Chick’s Chums – Chick Corea
Clocks – Coldplay
Closer to the heart – Rush
Don’t stop me now – Queen
Hey Nineteen – Steely Dan
Oh I Wept – Free
Pariah – Steven Wilson
People who eat darkness – Steven Wilson
Sgt. Pepper – The Beatles
Skyfall – Adele
The Hard Way – Paula Cole
Tonight’s the night – Neil Young
Wish You Were Here - The Wyclef Jean version ;0)
Listening tests were done both in my 7.1.4 Genelec “studio” and virtually in BBC 9.1.6 or 7.1.4 rooms via a Smyth Research Realiser A16 with Sennheiser HD 800S headphones.
Also, I listened to a DTS:X encoded AVCHD in my living room, which has a 5.2.4 system with NewWave satellites and JBL Arena subs.
Since my studio also has a (rarely used) single “Top” or “Voice of God” Genelec speaker, I also experimented with routing the center channel of the Drum 5.1 upmix to it. Results were good, but as most people don’t have that speaker in their setup, and various parts of the tool chain I used were expecting 7.1.4 I didn’t take it any further.
I am very happy with the results over all and feel that the concept test was a success and further development is warranted (along with other immersive upmix ideas I’ve been developing).
Conclusions
As always, SpecWeb settings should have been fine tuned for each track, each layer. Setting it up to push more drum signal to the rear heights would be good as well as it seemed the majority of tracks had a front weighted drum sound, with the default SpecWeb settings.
As mentioned earlier SpecWeb for the Drum layer could also have been setup for 4.1, vs. 5.1 which would simplify the final mix down.
If there is a down side to this Drum layer separation it is that this doesn’t reduce drum or other separation artifacts. In fact, you can experience artifacts (as always) with both the upmix AND music source separation processes. In these tests Source separation artifacts were mitigated by blending the layers such that each layer had the other layer mixed in at -9dB.
That said, the search for a time to frequency domain and back algorithm for SpecWeb that doesn’t occasionally suffer from “swishy” drum sounds is still ongoing (you can google phase vocoder artifacts, if you’re curious). If you know any Audio DSP coders let me know! I suspect it’s a solved problem for a lot of products but unfortunately also part of their proprietary code.
Lastly, I noticed that the 7.1 downmix from the 7.1.4 DTS:X AVCHD was also very pleasing, and it occurs to me that this technique could be used in non-immersive surround systems as well. Namely using different upmixers or different upmix settings for different source separated stereo stems.
---
I can post the Plogue Bidule layout and plugins if there is interest, and/or you can PM me for audio examples.
Cheers,
Glenn
Immersive surround systems have (at least) two “layers” of speakers. A “bed” layer at ear level, and a “height” layer above (or upward facing Dolby Atoms speakers).
Ideally the height layer would have 4 or more speakers, a pair in front and pair in back.
This creates two horizontal layers of “surround” speakers.
The idea we are discussing here is to apply both traditional surround upmix techniques, in this case SpecWeb, in the horizontal layers, and music source separation techniques to create the stereo source for surround upmix in each layer.
I’ve first experimented by placing the separated Drums in the height layer, but this method could also be used for placing Piano, or Piano and Drums, etc. in the height layer.
The source for the bed layer is everything except what was source separated out for the height, so in the case of Drums in the height layer, Bass, Vocals, and “Other” would be in the bed layer.
My original idea was actually to have just the cymbals in the height layer, but sadly there aren’t any pre-built source separation tools (I could find) that can do that. However, after listening to my results of having the entire drum kit in the height layer, I think the surround “WOW” factor of separate speaker sources for separate instruments overrides the logic of not placing the entire drum kit in the height layer.
Also my upmix philosophy has always bin "listener on stage surrounded by instruments and vocals" vs. "listener out in the audience with all sounds coming at you from a front stage" (and only ambiance/reflections behind).
Tools Used
Each of the tools used has alternatives, some of which I will mention, but I will mainly focus on the tools I used for their simplicity, and because I had easy access.
In order of process:
Acustica Premium Edition – for source separation
This uses Spleeter behind the scenes, but is a quick and easy way to generate the two stereo “stems” used in later steps. Namely “Drums” and “Everything Else”.
Any other source separator could be used, and in fact I started with Spleeter on the command line with Anaconda, but that required an extra mix down step to mix Bass, Vocals, and Other stems into a single “Everything Else” stereo stem.
SpecWeb 2.2 – for Stereo to 5.1 surround upmix
Used for each layer separately, with the Drums and “Everything Else” stereo stems as the two sources.
Of course, you could substitute another upmixer or maybe use one upmixer for the bed and a different one for the height layer.
In my tests I used SpecWeb 2.2 with defaults for both (more on that later).
Note: this means the resulting upmix is 5.1.4 vs. 7.1.4, so the side surrounds are empty. I could have used a Plogue Bidule Spec 7.1 layout to full achieve 7.1.4, but I wanted the latest SpecWeb 2.2 upmix method and haven’t created a 7.1 SpecWeb version to date.
Plogue Bidule – With a layout and my plugins for:
- Adjusting the mix or relative level between the layers
- Downmixing the height layer from 5.1 to 4.0 (SpecWeb could have been set for quad upmix, for this layer, instead) plus routing the Height .1 to the Bed .1 channel
- Blending the layers for artifact reduction
- Creating 12 channel output (7.1.4 – Side Surrounds are empty)
- Adjusting the final gain for track normalization, using built in peak meters
- “Zmon 7.1.4” to easily monitor (assuming you can play 7.1.4 audio live) just the height layer, just the bed layer, flip the layers, solo channels, etc.
- Output 12 mono files (1 per channel).
Audiomuxer can also split a 12 ch wav file into monos in you need that. However I discovered that for purposes of proper file naming for a Dolby Atmos or DTS:X encoder it is easiest to let Audio muxer merge numbered channels to create a properly channel mapped wav file, then split them again (and resample to 48KHz) to get mono files named properly for drag and drop into an encoder (at least for the bed layer. I had to manually assign the files in the height layer to encoder input channels).
AudioMuxer – Join the 12 (7.1.4) mono files and tag the wave file with channel mapping
This step depends on how you are going to play back. If you can’t playback a 12 channel wav file and are instead going to use a DTS X or Dolby Atmos Encoder this could be skipped as your encoder will want multiple mono files.
However if you can play a 12ch file in say foobar2000 to a 12 or more channel audio interface, or with Play PCM Win to an HDMI output
PlayPcmWin) then you will absolutely need this channel mapping encoded in the 12 channel wav file.
Note: Play PCM Win is the only application I know of that uses Windows 10’s Dolby Atmos Encoding capability on a file, vs. in a game or other app (Because I asked the developer for that functionality ;0)
If for some reason you don’t want to use audiomuxer you can do this with eac3to & ffmpeg (which is how audiomuxer does it).
Screen shot of the Plogue Bidule final mix layout:
Results to date
Screen shot of a 12ch file being played back in Foobar 2000 (with waveform seekbar)
The vertical gap is the two empty Lss and Rss channels. Channel order is L, R, C, LFE, Ls, Rs, Lss, Rss, Lfh, Rfh, Lbh, Rbh
Since this is still a manual process rather than use my 50ish “regression” tracks I use for SpecWeb testing, I decided to use tracks I had acquired recently, mostly from HD Tracks and 96x24, and a few recent 44x16 tracks I like.
In all I’ve tested 14 songs to date, of various genres and mixing styles. They are (all stereo sources):
Bad Guy – Billie Eilish
Chick’s Chums – Chick Corea
Clocks – Coldplay
Closer to the heart – Rush
Don’t stop me now – Queen
Hey Nineteen – Steely Dan
Oh I Wept – Free
Pariah – Steven Wilson
People who eat darkness – Steven Wilson
Sgt. Pepper – The Beatles
Skyfall – Adele
The Hard Way – Paula Cole
Tonight’s the night – Neil Young
Wish You Were Here - The Wyclef Jean version ;0)
Listening tests were done both in my 7.1.4 Genelec “studio” and virtually in BBC 9.1.6 or 7.1.4 rooms via a Smyth Research Realiser A16 with Sennheiser HD 800S headphones.
Also, I listened to a DTS:X encoded AVCHD in my living room, which has a 5.2.4 system with NewWave satellites and JBL Arena subs.
Since my studio also has a (rarely used) single “Top” or “Voice of God” Genelec speaker, I also experimented with routing the center channel of the Drum 5.1 upmix to it. Results were good, but as most people don’t have that speaker in their setup, and various parts of the tool chain I used were expecting 7.1.4 I didn’t take it any further.
I am very happy with the results over all and feel that the concept test was a success and further development is warranted (along with other immersive upmix ideas I’ve been developing).
Conclusions
As always, SpecWeb settings should have been fine tuned for each track, each layer. Setting it up to push more drum signal to the rear heights would be good as well as it seemed the majority of tracks had a front weighted drum sound, with the default SpecWeb settings.
As mentioned earlier SpecWeb for the Drum layer could also have been setup for 4.1, vs. 5.1 which would simplify the final mix down.
If there is a down side to this Drum layer separation it is that this doesn’t reduce drum or other separation artifacts. In fact, you can experience artifacts (as always) with both the upmix AND music source separation processes. In these tests Source separation artifacts were mitigated by blending the layers such that each layer had the other layer mixed in at -9dB.
That said, the search for a time to frequency domain and back algorithm for SpecWeb that doesn’t occasionally suffer from “swishy” drum sounds is still ongoing (you can google phase vocoder artifacts, if you’re curious). If you know any Audio DSP coders let me know! I suspect it’s a solved problem for a lot of products but unfortunately also part of their proprietary code.
Lastly, I noticed that the 7.1 downmix from the 7.1.4 DTS:X AVCHD was also very pleasing, and it occurs to me that this technique could be used in non-immersive surround systems as well. Namely using different upmixers or different upmix settings for different source separated stereo stems.
---
I can post the Plogue Bidule layout and plugins if there is interest, and/or you can PM me for audio examples.
Cheers,
Glenn
Last edited: