Immersive upmixing utilizing drum source separation and SpecWeb

QuadraphonicQuad

Help Support QuadraphonicQuad:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

zeerround

Moderator
Staff member
Moderator
QQ Supporter
Joined
Apr 11, 2010
Messages
923
Concept


Immersive surround systems have (at least) two “layers” of speakers. A “bed” layer at ear level, and a “height” layer above (or upward facing Dolby Atoms speakers).

Ideally the height layer would have 4 or more speakers, a pair in front and pair in back.

This creates two horizontal layers of “surround” speakers.

The idea we are discussing here is to apply both traditional surround upmix techniques, in this case SpecWeb, in the horizontal layers, and music source separation techniques to create the stereo source for surround upmix in each layer.

I’ve first experimented by placing the separated Drums in the height layer, but this method could also be used for placing Piano, or Piano and Drums, etc. in the height layer.

The source for the bed layer is everything except what was source separated out for the height, so in the case of Drums in the height layer, Bass, Vocals, and “Other” would be in the bed layer.

My original idea was actually to have just the cymbals in the height layer, but sadly there aren’t any pre-built source separation tools (I could find) that can do that. However, after listening to my results of having the entire drum kit in the height layer, I think the surround “WOW” factor of separate speaker sources for separate instruments overrides the logic of not placing the entire drum kit in the height layer.

Also my upmix philosophy has always bin "listener on stage surrounded by instruments and vocals" vs. "listener out in the audience with all sounds coming at you from a front stage" (and only ambiance/reflections behind).

Tools Used


Each of the tools used has alternatives, some of which I will mention, but I will mainly focus on the tools I used for their simplicity, and because I had easy access.

In order of process:

Acustica Premium Edition – for source separation

This uses Spleeter behind the scenes, but is a quick and easy way to generate the two stereo “stems” used in later steps. Namely “Drums” and “Everything Else”.

Any other source separator could be used, and in fact I started with Spleeter on the command line with Anaconda, but that required an extra mix down step to mix Bass, Vocals, and Other stems into a single “Everything Else” stereo stem.

SpecWeb 2.2 – for Stereo to 5.1 surround upmix

Used for each layer separately, with the Drums and “Everything Else” stereo stems as the two sources.

Of course, you could substitute another upmixer or maybe use one upmixer for the bed and a different one for the height layer.

In my tests I used SpecWeb 2.2 with defaults for both (more on that later).

Note: this means the resulting upmix is 5.1.4 vs. 7.1.4, so the side surrounds are empty. I could have used a Plogue Bidule Spec 7.1 layout to full achieve 7.1.4, but I wanted the latest SpecWeb 2.2 upmix method and haven’t created a 7.1 SpecWeb version to date.

Plogue Bidule – With a layout and my plugins for:

  • Adjusting the mix or relative level between the layers
  • Downmixing the height layer from 5.1 to 4.0 (SpecWeb could have been set for quad upmix, for this layer, instead) plus routing the Height .1 to the Bed .1 channel
  • Blending the layers for artifact reduction
  • Creating 12 channel output (7.1.4 – Side Surrounds are empty)
  • Adjusting the final gain for track normalization, using built in peak meters
  • “Zmon 7.1.4” to easily monitor (assuming you can play 7.1.4 audio live) just the height layer, just the bed layer, flip the layers, solo channels, etc.
  • Output 12 mono files (1 per channel).
Another audio application, such as Audition, Sound Forge, Audacity, etc. could be used, but perhaps would need multiple passes to achieve the same functionality.

Audiomuxer can also split a 12 ch wav file into monos in you need that. However I discovered that for purposes of proper file naming for a Dolby Atmos or DTS:X encoder it is easiest to let Audio muxer merge numbered channels to create a properly channel mapped wav file, then split them again (and resample to 48KHz) to get mono files named properly for drag and drop into an encoder (at least for the bed layer. I had to manually assign the files in the height layer to encoder input channels).



AudioMuxer – Join the 12 (7.1.4) mono files and tag the wave file with channel mapping

This step depends on how you are going to play back. If you can’t playback a 12 channel wav file and are instead going to use a DTS X or Dolby Atmos Encoder this could be skipped as your encoder will want multiple mono files.

However if you can play a 12ch file in say foobar2000 to a 12 or more channel audio interface, or with Play PCM Win to an HDMI output

PlayPcmWin) then you will absolutely need this channel mapping encoded in the 12 channel wav file.

Note: Play PCM Win is the only application I know of that uses Windows 10’s Dolby Atmos Encoding capability on a file, vs. in a game or other app (Because I asked the developer for that functionality ;0)

If for some reason you don’t want to use audiomuxer you can do this with eac3to & ffmpeg (which is how audiomuxer does it).



Screen shot of the Plogue Bidule final mix layout:


1600740387582.png


Results to date


Screen shot of a 12ch file being played back in Foobar 2000 (with waveform seekbar)

1600740421491.png

The vertical gap is the two empty Lss and Rss channels. Channel order is L, R, C, LFE, Ls, Rs, Lss, Rss, Lfh, Rfh, Lbh, Rbh

Since this is still a manual process rather than use my 50ish “regression” tracks I use for SpecWeb testing, I decided to use tracks I had acquired recently, mostly from HD Tracks and 96x24, and a few recent 44x16 tracks I like.

In all I’ve tested 14 songs to date, of various genres and mixing styles. They are (all stereo sources):

Bad Guy – Billie Eilish

Chick’s Chums – Chick Corea

Clocks – Coldplay

Closer to the heart – Rush

Don’t stop me now – Queen

Hey Nineteen – Steely Dan

Oh I Wept – Free

Pariah – Steven Wilson

People who eat darkness – Steven Wilson

Sgt. Pepper – The Beatles

Skyfall – Adele

The Hard Way – Paula Cole

Tonight’s the night – Neil Young

Wish You Were Here - The Wyclef Jean version ;0)

Listening tests were done both in my 7.1.4 Genelec “studio” and virtually in BBC 9.1.6 or 7.1.4 rooms via a Smyth Research Realiser A16 with Sennheiser HD 800S headphones.
1600740492595.png

1600740516268.png

Also, I listened to a DTS:X encoded AVCHD in my living room, which has a 5.2.4 system with NewWave satellites and JBL Arena subs.

Since my studio also has a (rarely used) single “Top” or “Voice of God” Genelec speaker, I also experimented with routing the center channel of the Drum 5.1 upmix to it. Results were good, but as most people don’t have that speaker in their setup, and various parts of the tool chain I used were expecting 7.1.4 I didn’t take it any further.

I am very happy with the results over all and feel that the concept test was a success and further development is warranted (along with other immersive upmix ideas I’ve been developing).

Conclusions
As always, SpecWeb settings should have been fine tuned for each track, each layer. Setting it up to push more drum signal to the rear heights would be good as well as it seemed the majority of tracks had a front weighted drum sound, with the default SpecWeb settings.

As mentioned earlier SpecWeb for the Drum layer could also have been setup for 4.1, vs. 5.1 which would simplify the final mix down.

If there is a down side to this Drum layer separation it is that this doesn’t reduce drum or other separation artifacts. In fact, you can experience artifacts (as always) with both the upmix AND music source separation processes. In these tests Source separation artifacts were mitigated by blending the layers such that each layer had the other layer mixed in at -9dB.

That said, the search for a time to frequency domain and back algorithm for SpecWeb that doesn’t occasionally suffer from “swishy” drum sounds is still ongoing (you can google phase vocoder artifacts, if you’re curious). If you know any Audio DSP coders let me know! I suspect it’s a solved problem for a lot of products but unfortunately also part of their proprietary code.

Lastly, I noticed that the 7.1 downmix from the 7.1.4 DTS:X AVCHD was also very pleasing, and it occurs to me that this technique could be used in non-immersive surround systems as well. Namely using different upmixers or different upmix settings for different source separated stereo stems.

---

I can post the Plogue Bidule layout and plugins if there is interest, and/or you can PM me for audio examples.

Cheers,
Glenn
 
Last edited:
(From a one song test) This appears to be a viable alternative for source separation if you don't want to fork out $$ for Acustica or similar and/or don't want to mess with spleeter/openumix/python etc.:

https://phonicmind.com/
I paid $3.99 paypal for one song. You can get it down to $1.49 per song by paying for a "bundle".

No Piano option, however, and I wasn't impressed with the bass stem (compared to spleeter) but I thought the drums were more artifact free, but with some bass info that should have been in the bass stem.

Also this site:

Lalal.ai

claims to be better and "coming soon" but their free sample didn't work so it's vapor ware for now.
 
A couple of (important) updates.

I've switched to demucs, for drum separation. I think it has the best drum separation quality of the various choices we have today. It might not be best at some of the other stems, but for my usage I only care about the drums stems.

https://github.com/facebookresearch/demucs
I am using GPU and:

python.exe -m demucs.separate --dl -n demucs_extra
--shifts=10 "full path to input file"

I do have one more parameter to test but the above gives the best quality to date.

Next, because we have a drums only signal, we can attack some Phase Vocoder transient artifacts, if they occur (more on that in a minute).

e.g. in "Hey Nineteen" I got the dreaded smeared snare sound after upmix. However I found that I could clean it up with a Noise Gate VST, before mixing the drums upmixed height channels back in with the "Not Drums" bed channels. I'm VERY pleased with this discovery.:51QQ

Additionally, having discovered this repair ability I went to test it on my SpecWeb torture track for this: "Rat in Mi Kitchen", which has more of an emphasized "swoosh" sounding snare, vs. "smeared". However, as it turned out, the drum stem from demucs did not produce any artifacts when upmixed.

Also very cool.

Of course I will continue research into phase vocoder artifact reduction (lots of papers, but no source code :() but having ways around it is pretty huge IMO.

By the way you can help by sending me any stereo sources that produce objectionable upmix artifacts.

Lastly, what this means for those of you without immersive systems is that you could still use this drum separation technique for 5.1, by just recombining the separately upmixed 5.1 stems.

Note: The free VST I used as a noise gate (only 32bit version available):

1602181502727.png

I chose this one because it has a side chain input, and I fed that with the (in sync) stereo drum stem (vs. the upmixed drum stem in the main channels).

I can share my Plogue Bidule (32bit) layout for that if anyone is interested.
 
Thanks for all the work you’ve put into this Glenn.

I‘m keen to play with 7.1.4 upmixing and contacted the PlayPCMWin dev to get the command line needed to play a file so I can configure Kodi to use PlayPCMWin as an external audio player for 12 channel wav files.

Ive tried upmixing 7.1.4 using Penteo and Reaper’s batch converter but Reaper has a hard max limit of 8 channels only. I’ve contacted Reaper support to ask that to be increased to 12 for wav files.

Demucs sounds interesting, when I get a chance I’ll try this out.

THX
 
OK cool. Let me know how you get on with Kodi. FYI a better link for the spatial audio player is:

https://sourceforge.net/p/playpcmwin/wiki/WWSpatialAudioPlayer/
Also FYI I posted in the foobar 2000 WASAPI output support forum about adding support for windows sonic (which gets us dolby attmos over hdmi) but I don't think anyone has responded to date. Seems like it might just need very small changes to go from generic WASAPI it Spatial WASAPI, but I don't know if the existing plugin is open source.

Re: Reaper batch convert I could swear others here are using it for 7.1.4. I wrote some windows batch scripts to assist in wav tagging etc. for that.

I guess I can look and see if I have anything configured to tell (and of course I'm biased towards my own upmix methods ;)). If I find anything I'll start a separate thread.
 
My son in law just released a single on all streaming services:

"It never rains in LA" by Eric Michael Krop

(note there are several different songs with that name or similar so look for the one by him).

and I also up/remixed it for 7.1.4. DTS:X .Now I'm waiting to hear back from Tidal if they would be interested in a Dolby Atmos version.

Speaking of Eric he just appeared on "I Can See Your Voice" as "The Broadway Belter".

Here's his encore version off "Somebody to Love":



and he did a duet with Jesse McCartney:



PM me if you want to check out the 7.1.4 DTS:X version of "It never rains in LA".
 
My son in law just released a single on all streaming services:

"It never rains in LA" by Eric Michael Krop

(note there are several different songs with that name or similar so look for the one by him).

and I also up/remixed it for 7.1.4. DTS:X .Now I'm waiting to hear back from Tidal if they would be interested in a Dolby Atmos version.

Speaking of Eric he just appeared on "I Can See Your Voice" as "The Broadway Belter".

Here's his encore version off "Somebody to Love":



and he did a duet with Jesse McCartney:



PM me if you want to check out the 7.1.4 DTS:X version of "It never rains in LA".

Wow, G - small world. Eric was great on the show. My wife and I both thought he was a 'good singer' from the beginning of the show. The contestant should have stayed with him through the end.
 
Concept


Immersive surround systems have (at least) two “layers” of speakers. A “bed” layer at ear level, and a “height” layer above (or upward facing Dolby Atoms speakers).

Ideally the height layer would have 4 or more speakers, a pair in front and pair in back.

This creates two horizontal layers of “surround” speakers.

The idea we are discussing here is to apply both traditional surround upmix techniques, in this case SpecWeb, in the horizontal layers, and music source separation techniques to create the stereo source for surround upmix in each layer.

I’ve first experimented by placing the separated Drums in the height layer, but this method could also be used for placing Piano, or Piano and Drums, etc. in the height layer.

The source for the bed layer is everything except what was source separated out for the height, so in the case of Drums in the height layer, Bass, Vocals, and “Other” would be in the bed layer.

My original idea was actually to have just the cymbals in the height layer, but sadly there aren’t any pre-built source separation tools (I could find) that can do that. However, after listening to my results of having the entire drum kit in the height layer, I think the surround “WOW” factor of separate speaker sources for separate instruments overrides the logic of not placing the entire drum kit in the height layer.

Also my upmix philosophy has always bin "listener on stage surrounded by instruments and vocals" vs. "listener out in the audience with all sounds coming at you from a front stage" (and only ambiance/reflections behind).

Tools Used


Each of the tools used has alternatives, some of which I will mention, but I will mainly focus on the tools I used for their simplicity, and because I had easy access.

In order of process:

Acustica Premium Edition – for source separation

This uses Spleeter behind the scenes, but is a quick and easy way to generate the two stereo “stems” used in later steps. Namely “Drums” and “Everything Else”.

Any other source separator could be used, and in fact I started with Spleeter on the command line with Anaconda, but that required an extra mix down step to mix Bass, Vocals, and Other stems into a single “Everything Else” stereo stem.

SpecWeb 2.2 – for Stereo to 5.1 surround upmix

Used for each layer separately, with the Drums and “Everything Else” stereo stems as the two sources.

Of course, you could substitute another upmixer or maybe use one upmixer for the bed and a different one for the height layer.

In my tests I used SpecWeb 2.2 with defaults for both (more on that later).

Note: this means the resulting upmix is 5.1.4 vs. 7.1.4, so the side surrounds are empty. I could have used a Plogue Bidule Spec 7.1 layout to full achieve 7.1.4, but I wanted the latest SpecWeb 2.2 upmix method and haven’t created a 7.1 SpecWeb version to date.

Plogue Bidule – With a layout and my plugins for:

  • Adjusting the mix or relative level between the layers
  • Downmixing the height layer from 5.1 to 4.0 (SpecWeb could have been set for quad upmix, for this layer, instead) plus routing the Height .1 to the Bed .1 channel
  • Blending the layers for artifact reduction
  • Creating 12 channel output (7.1.4 – Side Surrounds are empty)
  • Adjusting the final gain for track normalization, using built in peak meters
  • “Zmon 7.1.4” to easily monitor (assuming you can play 7.1.4 audio live) just the height layer, just the bed layer, flip the layers, solo channels, etc.
  • Output 12 mono files (1 per channel).
Another audio application, such as Audition, Sound Forge, Audacity, etc. could be used, but perhaps would need multiple passes to achieve the same functionality.

Audiomuxer can also split a 12 ch wav file into monos in you need that. However I discovered that for purposes of proper file naming for a Dolby Atmos or DTS:X encoder it is easiest to let Audio muxer merge numbered channels to create a properly channel mapped wav file, then split them again (and resample to 48KHz) to get mono files named properly for drag and drop into an encoder (at least for the bed layer. I had to manually assign the files in the height layer to encoder input channels).



AudioMuxer – Join the 12 (7.1.4) mono files and tag the wave file with channel mapping

This step depends on how you are going to play back. If you can’t playback a 12 channel wav file and are instead going to use a DTS X or Dolby Atmos Encoder this could be skipped as your encoder will want multiple mono files.

However if you can play a 12ch file in say foobar2000 to a 12 or more channel audio interface, or with Play PCM Win to an HDMI output

PlayPcmWin) then you will absolutely need this channel mapping encoded in the 12 channel wav file.

Note: Play PCM Win is the only application I know of that uses Windows 10’s Dolby Atmos Encoding capability on a file, vs. in a game or other app (Because I asked the developer for that functionality ;0)

If for some reason you don’t want to use audiomuxer you can do this with eac3to & ffmpeg (which is how audiomuxer does it).



Screen shot of the Plogue Bidule final mix layout:


View attachment 56639

Results to date


Screen shot of a 12ch file being played back in Foobar 2000 (with waveform seekbar)

View attachment 56640
The vertical gap is the two empty Lss and Rss channels. Channel order is L, R, C, LFE, Ls, Rs, Lss, Rss, Lfh, Rfh, Lbh, Rbh

Since this is still a manual process rather than use my 50ish “regression” tracks I use for SpecWeb testing, I decided to use tracks I had acquired recently, mostly from HD Tracks and 96x24, and a few recent 44x16 tracks I like.

In all I’ve tested 14 songs to date, of various genres and mixing styles. They are (all stereo sources):

Bad Guy – Billie Eilish

Chick’s Chums – Chick Corea

Clocks – Coldplay

Closer to the heart – Rush

Don’t stop me now – Queen

Hey Nineteen – Steely Dan

Oh I Wept – Free

Pariah – Steven Wilson

People who eat darkness – Steven Wilson

Sgt. Pepper – The Beatles

Skyfall – Adele

The Hard Way – Paula Cole

Tonight’s the night – Neil Young

Wish You Were Here - The Wyclef Jean version ;0)

Listening tests were done both in my 7.1.4 Genelec “studio” and virtually in BBC 9.1.6 or 7.1.4 rooms via a Smyth Research Realiser A16 with Sennheiser HD 800S headphones.
View attachment 56641
View attachment 56642
Also, I listened to a DTS:X encoded AVCHD in my living room, which has a 5.2.4 system with NewWave satellites and JBL Arena subs.

Since my studio also has a (rarely used) single “Top” or “Voice of God” Genelec speaker, I also experimented with routing the center channel of the Drum 5.1 upmix to it. Results were good, but as most people don’t have that speaker in their setup, and various parts of the tool chain I used were expecting 7.1.4 I didn’t take it any further.

I am very happy with the results over all and feel that the concept test was a success and further development is warranted (along with other immersive upmix ideas I’ve been developing).

Conclusions
As always, SpecWeb settings should have been fine tuned for each track, each layer. Setting it up to push more drum signal to the rear heights would be good as well as it seemed the majority of tracks had a front weighted drum sound, with the default SpecWeb settings.

As mentioned earlier SpecWeb for the Drum layer could also have been setup for 4.1, vs. 5.1 which would simplify the final mix down.

If there is a down side to this Drum layer separation it is that this doesn’t reduce drum or other separation artifacts. In fact, you can experience artifacts (as always) with both the upmix AND music source separation processes. In these tests Source separation artifacts were mitigated by blending the layers such that each layer had the other layer mixed in at -9dB.

That said, the search for a time to frequency domain and back algorithm for SpecWeb that doesn’t occasionally suffer from “swishy” drum sounds is still ongoing (you can google phase vocoder artifacts, if you’re curious). If you know any Audio DSP coders let me know! I suspect it’s a solved problem for a lot of products but unfortunately also part of their proprietary code.

Lastly, I noticed that the 7.1 downmix from the 7.1.4 DTS:X AVCHD was also very pleasing, and it occurs to me that this technique could be used in non-immersive surround systems as well. Namely using different upmixers or different upmix settings for different source separated stereo stems.

---

I can post the Plogue Bidule layout and plugins if there is interest, and/or you can PM me for audio examples.

Cheers,
Glenn

Can you pm me the layout for bidule
 
Concept


Immersive surround systems have (at least) two “layers” of speakers. A “bed” layer at ear level, and a “height” layer above (or upward facing Dolby Atoms speakers).

Ideally the height layer would have 4 or more speakers, a pair in front and pair in back.

This creates two horizontal layers of “surround” speakers.

The idea we are discussing here is to apply both traditional surround upmix techniques, in this case SpecWeb, in the horizontal layers, and music source separation techniques to create the stereo source for surround upmix in each layer.

I’ve first experimented by placing the separated Drums in the height layer, but this method could also be used for placing Piano, or Piano and Drums, etc. in the height layer.

The source for the bed layer is everything except what was source separated out for the height, so in the case of Drums in the height layer, Bass, Vocals, and “Other” would be in the bed layer.

My original idea was actually to have just the cymbals in the height layer, but sadly there aren’t any pre-built source separation tools (I could find) that can do that. However, after listening to my results of having the entire drum kit in the height layer, I think the surround “WOW” factor of separate speaker sources for separate instruments overrides the logic of not placing the entire drum kit in the height layer.

Also my upmix philosophy has always bin "listener on stage surrounded by instruments and vocals" vs. "listener out in the audience with all sounds coming at you from a front stage" (and only ambiance/reflections behind).

Tools Used


Each of the tools used has alternatives, some of which I will mention, but I will mainly focus on the tools I used for their simplicity, and because I had easy access.

In order of process:

Acustica Premium Edition – for source separation

This uses Spleeter behind the scenes, but is a quick and easy way to generate the two stereo “stems” used in later steps. Namely “Drums” and “Everything Else”.

Any other source separator could be used, and in fact I started with Spleeter on the command line with Anaconda, but that required an extra mix down step to mix Bass, Vocals, and Other stems into a single “Everything Else” stereo stem.

SpecWeb 2.2 – for Stereo to 5.1 surround upmix

Used for each layer separately, with the Drums and “Everything Else” stereo stems as the two sources.

Of course, you could substitute another upmixer or maybe use one upmixer for the bed and a different one for the height layer.

In my tests I used SpecWeb 2.2 with defaults for both (more on that later).

Note: this means the resulting upmix is 5.1.4 vs. 7.1.4, so the side surrounds are empty. I could have used a Plogue Bidule Spec 7.1 layout to full achieve 7.1.4, but I wanted the latest SpecWeb 2.2 upmix method and haven’t created a 7.1 SpecWeb version to date.

Plogue Bidule – With a layout and my plugins for:

  • Adjusting the mix or relative level between the layers
  • Downmixing the height layer from 5.1 to 4.0 (SpecWeb could have been set for quad upmix, for this layer, instead) plus routing the Height .1 to the Bed .1 channel
  • Blending the layers for artifact reduction
  • Creating 12 channel output (7.1.4 – Side Surrounds are empty)
  • Adjusting the final gain for track normalization, using built in peak meters
  • “Zmon 7.1.4” to easily monitor (assuming you can play 7.1.4 audio live) just the height layer, just the bed layer, flip the layers, solo channels, etc.
  • Output 12 mono files (1 per channel).
Another audio application, such as Audition, Sound Forge, Audacity, etc. could be used, but perhaps would need multiple passes to achieve the same functionality.

Audiomuxer can also split a 12 ch wav file into monos in you need that. However I discovered that for purposes of proper file naming for a Dolby Atmos or DTS:X encoder it is easiest to let Audio muxer merge numbered channels to create a properly channel mapped wav file, then split them again (and resample to 48KHz) to get mono files named properly for drag and drop into an encoder (at least for the bed layer. I had to manually assign the files in the height layer to encoder input channels).



AudioMuxer – Join the 12 (7.1.4) mono files and tag the wave file with channel mapping

This step depends on how you are going to play back. If you can’t playback a 12 channel wav file and are instead going to use a DTS X or Dolby Atmos Encoder this could be skipped as your encoder will want multiple mono files.

However if you can play a 12ch file in say foobar2000 to a 12 or more channel audio interface, or with Play PCM Win to an HDMI output

PlayPcmWin) then you will absolutely need this channel mapping encoded in the 12 channel wav file.

Note: Play PCM Win is the only application I know of that uses Windows 10’s Dolby Atmos Encoding capability on a file, vs. in a game or other app (Because I asked the developer for that functionality ;0)

If for some reason you don’t want to use audiomuxer you can do this with eac3to & ffmpeg (which is how audiomuxer does it).



Screen shot of the Plogue Bidule final mix layout:


View attachment 56639

Results to date


Screen shot of a 12ch file being played back in Foobar 2000 (with waveform seekbar)

View attachment 56640
The vertical gap is the two empty Lss and Rss channels. Channel order is L, R, C, LFE, Ls, Rs, Lss, Rss, Lfh, Rfh, Lbh, Rbh

Since this is still a manual process rather than use my 50ish “regression” tracks I use for SpecWeb testing, I decided to use tracks I had acquired recently, mostly from HD Tracks and 96x24, and a few recent 44x16 tracks I like.

In all I’ve tested 14 songs to date, of various genres and mixing styles. They are (all stereo sources):

Bad Guy – Billie Eilish

Chick’s Chums – Chick Corea

Clocks – Coldplay

Closer to the heart – Rush

Don’t stop me now – Queen

Hey Nineteen – Steely Dan

Oh I Wept – Free

Pariah – Steven Wilson

People who eat darkness – Steven Wilson

Sgt. Pepper – The Beatles

Skyfall – Adele

The Hard Way – Paula Cole

Tonight’s the night – Neil Young

Wish You Were Here - The Wyclef Jean version ;0)

Listening tests were done both in my 7.1.4 Genelec “studio” and virtually in BBC 9.1.6 or 7.1.4 rooms via a Smyth Research Realiser A16 with Sennheiser HD 800S headphones.
View attachment 56641
View attachment 56642
Also, I listened to a DTS:X encoded AVCHD in my living room, which has a 5.2.4 system with NewWave satellites and JBL Arena subs.

Since my studio also has a (rarely used) single “Top” or “Voice of God” Genelec speaker, I also experimented with routing the center channel of the Drum 5.1 upmix to it. Results were good, but as most people don’t have that speaker in their setup, and various parts of the tool chain I used were expecting 7.1.4 I didn’t take it any further.

I am very happy with the results over all and feel that the concept test was a success and further development is warranted (along with other immersive upmix ideas I’ve been developing).

Conclusions
As always, SpecWeb settings should have been fine tuned for each track, each layer. Setting it up to push more drum signal to the rear heights would be good as well as it seemed the majority of tracks had a front weighted drum sound, with the default SpecWeb settings.

As mentioned earlier SpecWeb for the Drum layer could also have been setup for 4.1, vs. 5.1 which would simplify the final mix down.

If there is a down side to this Drum layer separation it is that this doesn’t reduce drum or other separation artifacts. In fact, you can experience artifacts (as always) with both the upmix AND music source separation processes. In these tests Source separation artifacts were mitigated by blending the layers such that each layer had the other layer mixed in at -9dB.

That said, the search for a time to frequency domain and back algorithm for SpecWeb that doesn’t occasionally suffer from “swishy” drum sounds is still ongoing (you can google phase vocoder artifacts, if you’re curious). If you know any Audio DSP coders let me know! I suspect it’s a solved problem for a lot of products but unfortunately also part of their proprietary code.

Lastly, I noticed that the 7.1 downmix from the 7.1.4 DTS:X AVCHD was also very pleasing, and it occurs to me that this technique could be used in non-immersive surround systems as well. Namely using different upmixers or different upmix settings for different source separated stereo stems.

---

I can post the Plogue Bidule layout and plugins if there is interest, and/or you can PM me for audio examples.

Cheers,
Glenn

Glenn: even though I still haven't learned to use SpecWeb, I enjoy others' upmixes, and I try to keep abreast of developments. I've also been getting into Atmos, so this is exciting.

I have a slightly off-topic question; feel free to reply by PM if you prefer. I'm wondering how you succeeded in getting foobar2000 to play the overhead channels of Atmos-encoded files (and/or other object-based, like MPEG-H 3D?). So far, I've only gotten VLC, Kodi, and Windows Movies and TV, as well as my Oppos, to play such files properly (in mp4, m4a, or mkv/mka formats). Foobar sees and plays the TrueHD 5.1/7.1 core only, but doesn't seem to read or transmit the object data to my AVR. I'm wondering if there's a codec I'm missing (I have the latest ffmpeg, as well as the latest version of foobar) or a deeply buried pass-through setting that I haven't seen.
 
Last edited:
I am not playing Atmos encoded files in foobar2000. I am playing 12 or 16 channel wav or wavpak files, because I have an audio interface with 16 outputs.

Yes I use VLC, instead of foobar2000, to play back files after I have Dolby Atmos or DTS:X encoded them, in that case via HDMI to an AVR.

You might want to post in the foobar2000 forum. It seems to me that given a bit perfect driver (ASIO) for an HDMI output it could work in principal.

Its frustrating that the HDMI spec has support for 32 discrete channels, but to my knowledge no drivers, or AVRs support more than 8.
 
Concept


Immersive surround systems have (at least) two “layers” of speakers. A “bed” layer at ear level, and a “height” layer above (or upward facing Dolby Atoms speakers).

Ideally the height layer would have 4 or more speakers, a pair in front and pair in back.

This creates two horizontal layers of “surround” speakers.

The idea we are discussing here is to apply both traditional surround upmix techniques, in this case SpecWeb, in the horizontal layers, and music source separation techniques to create the stereo source for surround upmix in each layer.

I’ve first experimented by placing the separated Drums in the height layer, but this method could also be used for placing Piano, or Piano and Drums, etc. in the height layer.

The source for the bed layer is everything except what was source separated out for the height, so in the case of Drums in the height layer, Bass, Vocals, and “Other” would be in the bed layer.

My original idea was actually to have just the cymbals in the height layer, but sadly there aren’t any pre-built source separation tools (I could find) that can do that. However, after listening to my results of having the entire drum kit in the height layer, I think the surround “WOW” factor of separate speaker sources for separate instruments overrides the logic of not placing the entire drum kit in the height layer.

Also my upmix philosophy has always bin "listener on stage surrounded by instruments and vocals" vs. "listener out in the audience with all sounds coming at you from a front stage" (and only ambiance/reflections behind).

Tools Used


Each of the tools used has alternatives, some of which I will mention, but I will mainly focus on the tools I used for their simplicity, and because I had easy access.

In order of process:

Acustica Premium Edition – for source separation

This uses Spleeter behind the scenes, but is a quick and easy way to generate the two stereo “stems” used in later steps. Namely “Drums” and “Everything Else”.

Any other source separator could be used, and in fact I started with Spleeter on the command line with Anaconda, but that required an extra mix down step to mix Bass, Vocals, and Other stems into a single “Everything Else” stereo stem.

SpecWeb 2.2 – for Stereo to 5.1 surround upmix

Used for each layer separately, with the Drums and “Everything Else” stereo stems as the two sources.

Of course, you could substitute another upmixer or maybe use one upmixer for the bed and a different one for the height layer.

In my tests I used SpecWeb 2.2 with defaults for both (more on that later).

Note: this means the resulting upmix is 5.1.4 vs. 7.1.4, so the side surrounds are empty. I could have used a Plogue Bidule Spec 7.1 layout to full achieve 7.1.4, but I wanted the latest SpecWeb 2.2 upmix method and haven’t created a 7.1 SpecWeb version to date.

Plogue Bidule – With a layout and my plugins for:

  • Adjusting the mix or relative level between the layers
  • Downmixing the height layer from 5.1 to 4.0 (SpecWeb could have been set for quad upmix, for this layer, instead) plus routing the Height .1 to the Bed .1 channel
  • Blending the layers for artifact reduction
  • Creating 12 channel output (7.1.4 – Side Surrounds are empty)
  • Adjusting the final gain for track normalization, using built in peak meters
  • “Zmon 7.1.4” to easily monitor (assuming you can play 7.1.4 audio live) just the height layer, just the bed layer, flip the layers, solo channels, etc.
  • Output 12 mono files (1 per channel).
Another audio application, such as Audition, Sound Forge, Audacity, etc. could be used, but perhaps would need multiple passes to achieve the same functionality.

Audiomuxer can also split a 12 ch wav file into monos in you need that. However I discovered that for purposes of proper file naming for a Dolby Atmos or DTS:X encoder it is easiest to let Audio muxer merge numbered channels to create a properly channel mapped wav file, then split them again (and resample to 48KHz) to get mono files named properly for drag and drop into an encoder (at least for the bed layer. I had to manually assign the files in the height layer to encoder input channels).



AudioMuxer – Join the 12 (7.1.4) mono files and tag the wave file with channel mapping

This step depends on how you are going to play back. If you can’t playback a 12 channel wav file and are instead going to use a DTS X or Dolby Atmos Encoder this could be skipped as your encoder will want multiple mono files.

However if you can play a 12ch file in say foobar2000 to a 12 or more channel audio interface, or with Play PCM Win to an HDMI output

PlayPcmWin) then you will absolutely need this channel mapping encoded in the 12 channel wav file.

Note: Play PCM Win is the only application I know of that uses Windows 10’s Dolby Atmos Encoding capability on a file, vs. in a game or other app (Because I asked the developer for that functionality ;0)

If for some reason you don’t want to use audiomuxer you can do this with eac3to & ffmpeg (which is how audiomuxer does it).



Screen shot of the Plogue Bidule final mix layout:


View attachment 56639

Results to date


Screen shot of a 12ch file being played back in Foobar 2000 (with waveform seekbar)

View attachment 56640
The vertical gap is the two empty Lss and Rss channels. Channel order is L, R, C, LFE, Ls, Rs, Lss, Rss, Lfh, Rfh, Lbh, Rbh

Since this is still a manual process rather than use my 50ish “regression” tracks I use for SpecWeb testing, I decided to use tracks I had acquired recently, mostly from HD Tracks and 96x24, and a few recent 44x16 tracks I like.

In all I’ve tested 14 songs to date, of various genres and mixing styles. They are (all stereo sources):

Bad Guy – Billie Eilish

Chick’s Chums – Chick Corea

Clocks – Coldplay

Closer to the heart – Rush

Don’t stop me now – Queen

Hey Nineteen – Steely Dan

Oh I Wept – Free

Pariah – Steven Wilson

People who eat darkness – Steven Wilson

Sgt. Pepper – The Beatles

Skyfall – Adele

The Hard Way – Paula Cole

Tonight’s the night – Neil Young

Wish You Were Here - The Wyclef Jean version ;0)

Listening tests were done both in my 7.1.4 Genelec “studio” and virtually in BBC 9.1.6 or 7.1.4 rooms via a Smyth Research Realiser A16 with Sennheiser HD 800S headphones.
View attachment 56641
View attachment 56642
Also, I listened to a DTS:X encoded AVCHD in my living room, which has a 5.2.4 system with NewWave satellites and JBL Arena subs.

Since my studio also has a (rarely used) single “Top” or “Voice of God” Genelec speaker, I also experimented with routing the center channel of the Drum 5.1 upmix to it. Results were good, but as most people don’t have that speaker in their setup, and various parts of the tool chain I used were expecting 7.1.4 I didn’t take it any further.

I am very happy with the results over all and feel that the concept test was a success and further development is warranted (along with other immersive upmix ideas I’ve been developing).

Conclusions
As always, SpecWeb settings should have been fine tuned for each track, each layer. Setting it up to push more drum signal to the rear heights would be good as well as it seemed the majority of tracks had a front weighted drum sound, with the default SpecWeb settings.

As mentioned earlier SpecWeb for the Drum layer could also have been setup for 4.1, vs. 5.1 which would simplify the final mix down.

If there is a down side to this Drum layer separation it is that this doesn’t reduce drum or other separation artifacts. In fact, you can experience artifacts (as always) with both the upmix AND music source separation processes. In these tests Source separation artifacts were mitigated by blending the layers such that each layer had the other layer mixed in at -9dB.

That said, the search for a time to frequency domain and back algorithm for SpecWeb that doesn’t occasionally suffer from “swishy” drum sounds is still ongoing (you can google phase vocoder artifacts, if you’re curious). If you know any Audio DSP coders let me know! I suspect it’s a solved problem for a lot of products but unfortunately also part of their proprietary code.

Lastly, I noticed that the 7.1 downmix from the 7.1.4 DTS:X AVCHD was also very pleasing, and it occurs to me that this technique could be used in non-immersive surround systems as well. Namely using different upmixers or different upmix settings for different source separated stereo stems.

---

I can post the Plogue Bidule layout and plugins if there is interest, and/or you can PM me for audio examples.

Cheers,
Glenn

Understand You have done mind boggling work. Great. I need some inputs. Asking you since you would have come across,from your above laborious task, I presume.

You have mentioned Stem separation using Acoustica, Spleeter etc., Can we do stem separation in Plogue or Cubase or Reaper? or anything better than Acoustica for stem separation I mean some software which can further separate what Acoustica separates as Others.Not online

And Stereo - We have something - Can we change its image to our like using any software? Using above soft wares or something else? I have some stereo imaging plug ins but some softwares I have don't allow those plug ins except their own inbuilt in their software.
 
There are commercial music source separation tools and open source tools. The open source ones seem to be where all the development is and then there are behind the scenes deals that turn them into commercial tools and we don't necessarily know which open source product is behind which commercial tool.

The open source stuff is all in python, so is done outside of a DAW. Some of the commercial stuff might be in plugins (I don't recall at the moment) but it's not going to be in real time so basically not in a DAW the way most plugins would go in your workflow.

Stereo --> Music Source Separation --> 4 or 5 stereo stems --> upmix one or more stems --> remix

Only the remix (or upmix and remix) job above, would typically be in a DAW or program like Plogue Bidule. Plogue Bidule is where I'm doing it, mainly because of the need (in my case) for 12 or 16 output channels.

I'm not sure what you mean be changing the stereo image, unless you are refereeing to the upmix process. Spreading the stereo into 5 channels (upmix) is changing its image, but in remix you could also narrow or widen the stereo stem using traditional plugins, etc. You can also "stereoize" essentially mono signals with crossovers etc. I typically do that with the bass stem.

Coming back to source separation, It's a bit of a moving target and you should probably do your own evaluation but at the moment it seems to me that

Ultimate Vocal Remover has the best vocal stem
Demucs has the best drums stem
Spleeter has the best bass (and Piano) stems

So if you wanted the best you'd do all those and then mix them back together (with no changes to volume or pan, etc.), invert the phase, and mix it back with the original stereo to get an "other stem", then take all the stems forward through upmix and/or remix.

A friend of mine, however, feels the latest version of open unmix, unmxl, is the best of everything, so if you agree that would be simpler as you would get all the stems from one tool and no need to craft your own "other" stem. However he is doing "spectral cleanup" on the stems so that is a different workflow than I am using.

The python based tools require a python environment and command knowledge, and can be frustrating to get going because of all the python package dependencies. A big GPU is also helpful, if you want the separations to go quickly.

I guess an exception would be Spleeter GUI, which hides all the python stuff behind the scenes.
 
Last edited:
There are commercial music source separation tools and open source tools. The open source ones seem to be where all the development is and then there are behind the scenes deals that turn them into commercial tools and we don't necessarily know which open source product is behind which commercial tool.

The open source stuff is all in python, so is done outside of a DAW. Some of the commercial stuff might be in plugins (I don't recall at the moment) but it's not going to be in real time so basically not in a DAW the way most plugins would go in your workflow.

Stereo --> Music Source Separation --> 4 or 5 stereo stems --> upmix one or more stems --> remix

Only the remix (or upmix and remix) job above, would typically be in a DAW or program like Plogue Bidule. Plogue Bidule is where I'm doing it, mainly because of the need (in my case) for 12 or 16 output channels.

I'm not sure what you mean be changing the stereo image, unless you are refereeing to the upmix process. Spreading the stereo into 5 channels (upmix) is changing its image, but in remix you could also narrow or widen the stereo stem using traditional plugins, etc. You can also "stereoize" essentially mono signals with crossovers etc. I typically do that with the bass stem.

Coming back to source separation, It's a bit of a moving target and you should probably do your own evaluation but at the moment it seems to me that

Ultimate Vocal Separator has the best vocal stem
Demucs has the best drums stem
Spleeter has the best bass (and Piano) stems

So if you wanted the best you'd do all those and then mix them back together (with no changes to volume or pan, etc.), invert the phase, and mix it back with the original stereo to get an "other stem", then take all the stems forward through upmix and/or remix.

A friend of mine, however, feels the latest version of open unmix, unmxl, is the best of everything, so if you agree that would be simpler as you would get all the stems from one tool and no need to craft your own "other" stem. However he is doing "spectral cleanup" on the stems so that is a different workflow than I am using.

The python based tools require a python environment and command knowledge, and can be frustrating to get going because of all the python package dependencies. A big GPU is also helpful, if you want the separations to go quickly.

I guess an exception would be Spleeter GUI, which hides all the python stuff behind the scenes.

Stereo Image Yes You are right I'm referring Narrow or Widen Stereo Stems with some plug ins & not Stereoize mono signal or Upmix to 5 Channels - If you can suggest some software open source or commercial ?

And SpleeterGui - AVXCheck Utility shows everything OK with my Laptop - Also works ok for 2 part stem separation but if I go for 4 or 5 parts separation either system hangs halfway & in case if it completes rarely the process, it says process complete OK but no folder or any files found, simply blank in the Save location wherever I give.

And this Open Unmix,unmxl I could not locate - How to get it?

As you have rightly said some stems are good in some & I would like to take the best ones from each and remix

Thanking You for your support throughout
 
Back
Top