Quantcast

Immersive upmixing utilizing drum source separation and SpecWeb

Help Support QuadraphonicQuad:

zeerround

Moderator
Joined
Apr 11, 2010
Messages
277
Concept


Immersive surround systems have (at least) two “layers” of speakers. A “bed” layer at ear level, and a “height” layer above (or upward facing Dolby Atoms speakers).

Ideally the height layer would have 4 or more speakers, a pair in front and pair in back.

This creates two horizontal layers of “surround” speakers.

The idea we are discussing here is to apply both traditional surround upmix techniques, in this case SpecWeb, in the horizontal layers, and music source separation techniques to create the stereo source for surround upmix in each layer.

I’ve first experimented by placing the separated Drums in the height layer, but this method could also be used for placing Piano, or Piano and Drums, etc. in the height layer.

The source for the bed layer is everything except what was source separated out for the height, so in the case of Drums in the height layer, Bass, Vocals, and “Other” would be in the bed layer.

My original idea was actually to have just the cymbals in the height layer, but sadly there aren’t any pre-built source separation tools (I could find) that can do that. However, after listening to my results of having the entire drum kit in the height layer, I think the surround “WOW” factor of separate speaker sources for separate instruments overrides the logic of not placing the entire drum kit in the height layer.

Also my upmix philosophy has always bin "listener on stage surrounded by instruments and vocals" vs. "listener out in the audience with all sounds coming at you from a front stage" (and only ambiance/reflections behind).

Tools Used


Each of the tools used has alternatives, some of which I will mention, but I will mainly focus on the tools I used for their simplicity, and because I had easy access.

In order of process:

Acustica Premium Edition – for source separation

This uses Spleeter behind the scenes, but is a quick and easy way to generate the two stereo “stems” used in later steps. Namely “Drums” and “Everything Else”.

Any other source separator could be used, and in fact I started with Spleeter on the command line with Anaconda, but that required an extra mix down step to mix Bass, Vocals, and Other stems into a single “Everything Else” stereo stem.

SpecWeb 2.2 – for Stereo to 5.1 surround upmix

Used for each layer separately, with the Drums and “Everything Else” stereo stems as the two sources.

Of course, you could substitute another upmixer or maybe use one upmixer for the bed and a different one for the height layer.

In my tests I used SpecWeb 2.2 with defaults for both (more on that later).

Note: this means the resulting upmix is 5.1.4 vs. 7.1.4, so the side surrounds are empty. I could have used a Plogue Bidule Spec 7.1 layout to full achieve 7.1.4, but I wanted the latest SpecWeb 2.2 upmix method and haven’t created a 7.1 SpecWeb version to date.

Plogue Bidule – With a layout and my plugins for:

  • Adjusting the mix or relative level between the layers
  • Downmixing the height layer from 5.1 to 4.0 (SpecWeb could have been set for quad upmix, for this layer, instead) plus routing the Height .1 to the Bed .1 channel
  • Blending the layers for artifact reduction
  • Creating 12 channel output (7.1.4 – Side Surrounds are empty)
  • Adjusting the final gain for track normalization, using built in peak meters
  • “Zmon 7.1.4” to easily monitor (assuming you can play 7.1.4 audio live) just the height layer, just the bed layer, flip the layers, solo channels, etc.
  • Output 12 mono files (1 per channel).
Another audio application, such as Audition, Sound Forge, Audacity, etc. could be used, but perhaps would need multiple passes to achieve the same functionality.

Audiomuxer can also split a 12 ch wav file into monos in you need that. However I discovered that for purposes of proper file naming for a Dolby Atmos or DTS:X encoder it is easiest to let Audio muxer merge numbered channels to create a properly channel mapped wav file, then split them again (and resample to 48KHz) to get mono files named properly for drag and drop into an encoder (at least for the bed layer. I had to manually assign the files in the height layer to encoder input channels).



AudioMuxer – Join the 12 (7.1.4) mono files and tag the wave file with channel mapping

This step depends on how you are going to play back. If you can’t playback a 12 channel wav file and are instead going to use a DTS X or Dolby Atmos Encoder this could be skipped as your encoder will want multiple mono files.

However if you can play a 12ch file in say foobar2000 to a 12 or more channel audio interface, or with Play PCM Win to an HDMI output

PlayPcmWin) then you will absolutely need this channel mapping encoded in the 12 channel wav file.

Note: Play PCM Win is the only application I know of that uses Windows 10’s Dolby Atmos Encoding capability on a file, vs. in a game or other app (Because I asked the developer for that functionality ;0)

If for some reason you don’t want to use audiomuxer you can do this with eac3to & ffmpeg (which is how audiomuxer does it).



Screen shot of the Plogue Bidule final mix layout:


1600740387582.png


Results to date


Screen shot of a 12ch file being played back in Foobar 2000 (with waveform seekbar)

1600740421491.png

The vertical gap is the two empty Lss and Rss channels. Channel order is L, R, C, LFE, Ls, Rs, Lss, Rss, Lfh, Rfh, Lbh, Rbh

Since this is still a manual process rather than use my 50ish “regression” tracks I use for SpecWeb testing, I decided to use tracks I had acquired recently, mostly from HD Tracks and 96x24, and a few recent 44x16 tracks I like.

In all I’ve tested 14 songs to date, of various genres and mixing styles. They are (all stereo sources):

Bad Guy – Billie Eilish

Chick’s Chums – Chick Corea

Clocks – Coldplay

Closer to the heart – Rush

Don’t stop me now – Queen

Hey Nineteen – Steely Dan

Oh I Wept – Free

Pariah – Steven Wilson

People who eat darkness – Steven Wilson

Sgt. Pepper – The Beatles

Skyfall – Adele

The Hard Way – Paula Cole

Tonight’s the night – Neil Young

Wish You Were Here - The Wyclef Jean version ;0)

Listening tests were done both in my 7.1.4 Genelec “studio” and virtually in BBC 9.1.6 or 7.1.4 rooms via a Smyth Research Realiser A16 with Sennheiser HD 800S headphones.
1600740492595.png

1600740516268.png

Also, I listened to a DTS:X encoded AVCHD in my living room, which has a 5.2.4 system with NewWave satellites and JBL Arena subs.

Since my studio also has a (rarely used) single “Top” or “Voice of God” Genelec speaker, I also experimented with routing the center channel of the Drum 5.1 upmix to it. Results were good, but as most people don’t have that speaker in their setup, and various parts of the tool chain I used were expecting 7.1.4 I didn’t take it any further.

I am very happy with the results over all and feel that the concept test was a success and further development is warranted (along with other immersive upmix ideas I’ve been developing).

Conclusions
As always, SpecWeb settings should have been fine tuned for each track, each layer. Setting it up to push more drum signal to the rear heights would be good as well as it seemed the majority of tracks had a front weighted drum sound, with the default SpecWeb settings.

As mentioned earlier SpecWeb for the Drum layer could also have been setup for 4.1, vs. 5.1 which would simplify the final mix down.

If there is a down side to this Drum layer separation it is that this doesn’t reduce drum or other separation artifacts. In fact, you can experience artifacts (as always) with both the upmix AND music source separation processes. In these tests Source separation artifacts were mitigated by blending the layers such that each layer had the other layer mixed in at -9dB.

That said, the search for a time to frequency domain and back algorithm for SpecWeb that doesn’t occasionally suffer from “swishy” drum sounds is still ongoing (you can google phase vocoder artifacts, if you’re curious). If you know any Audio DSP coders let me know! I suspect it’s a solved problem for a lot of products but unfortunately also part of their proprietary code.

Lastly, I noticed that the 7.1 downmix from the 7.1.4 DTS:X AVCHD was also very pleasing, and it occurs to me that this technique could be used in non-immersive surround systems as well. Namely using different upmixers or different upmix settings for different source separated stereo stems.

---

I can post the Plogue Bidule layout and plugins if there is interest, and/or you can PM me for audio examples.

Cheers,
Glenn
 
Last edited:

zeerround

Moderator
Joined
Apr 11, 2010
Messages
277
(From a one song test) This appears to be a viable alternative for source separation if you don't want to fork out $$ for Acustica or similar and/or don't want to mess with spleeter/openumix/python etc.:


I paid $3.99 paypal for one song. You can get it down to $1.49 per song by paying for a "bundle".

No Piano option, however, and I wasn't impressed with the bass stem (compared to spleeter) but I thought the drums were more artifact free, but with some bass info that should have been in the bass stem.

Also this site:

Lalal.ai

claims to be better and "coming soon" but their free sample didn't work so it's vapor ware for now.
 

zeerround

Moderator
Joined
Apr 11, 2010
Messages
277
A couple of (important) updates.

I've switched to demucs, for drum separation. I think it has the best drum separation quality of the various choices we have today. It might not be best at some of the other stems, but for my usage I only care about the drums stems.


I am using GPU and:

python.exe -m demucs.separate --dl -n demucs_extra
--shifts=10 "full path to input file"

I do have one more parameter to test but the above gives the best quality to date.

Next, because we have a drums only signal, we can attack some Phase Vocoder transient artifacts, if they occur (more on that in a minute).

e.g. in "Hey Nineteen" I got the dreaded smeared snare sound after upmix. However I found that I could clean it up with a Noise Gate VST, before mixing the drums upmixed height channels back in with the "Not Drums" bed channels. I'm VERY pleased with this discovery.:51QQ

Additionally, having discovered this repair ability I went to test it on my SpecWeb torture track for this: "Rat in Mi Kitchen", which has more of an emphasized "swoosh" sounding snare, vs. "smeared". However, as it turned out, the drum stem from demucs did not produce any artifacts when upmixed.

Also very cool.

Of course I will continue research into phase vocoder artifact reduction (lots of papers, but no source code :() but having ways around it is pretty huge IMO.

By the way you can help by sending me any stereo sources that produce objectionable upmix artifacts.

Lastly, what this means for those of you without immersive systems is that you could still use this drum separation technique for 5.1, by just recombining the separately upmixed 5.1 stems.

Note: The free VST I used as a noise gate (only 32bit version available):

1602181502727.png

I chose this one because it has a side chain input, and I fed that with the (in sync) stereo drum stem (vs. the upmixed drum stem in the main channels).

I can share my Plogue Bidule (32bit) layout for that if anyone is interested.
 

HomerJAU

Moderator: MCH Media Players
Staff member
Joined
Jun 13, 2013
Messages
3,471
Location
Melbourne, Australia
Thanks for all the work you’ve put into this Glenn.

I‘m keen to play with 7.1.4 upmixing and contacted the PlayPCMWin dev to get the command line needed to play a file so I can configure Kodi to use PlayPCMWin as an external audio player for 12 channel wav files.

Ive tried upmixing 7.1.4 using Penteo and Reaper’s batch converter but Reaper has a hard max limit of 8 channels only. I’ve contacted Reaper support to ask that to be increased to 12 for wav files.

Demucs sounds interesting, when I get a chance I’ll try this out.

THX
 

zeerround

Moderator
Joined
Apr 11, 2010
Messages
277
OK cool. Let me know how you get on with Kodi. FYI a better link for the spatial audio player is:


Also FYI I posted in the foobar 2000 WASAPI output support forum about adding support for windows sonic (which gets us dolby attmos over hdmi) but I don't think anyone has responded to date. Seems like it might just need very small changes to go from generic WASAPI it Spatial WASAPI, but I don't know if the existing plugin is open source.

Re: Reaper batch convert I could swear others here are using it for 7.1.4. I wrote some windows batch scripts to assist in wav tagging etc. for that.

I guess I can look and see if I have anything configured to tell (and of course I'm biased towards my own upmix methods ;)). If I find anything I'll start a separate thread.
 

zeerround

Moderator
Joined
Apr 11, 2010
Messages
277
My son in law just released a single on all streaming services:

"It never rains in LA" by Eric Michael Krop

(note there are several different songs with that name or similar so look for the one by him).

and I also up/remixed it for 7.1.4. DTS:X .Now I'm waiting to hear back from Tidal if they would be interested in a Dolby Atmos version.

Speaking of Eric he just appeared on "I Can See Your Voice" as "The Broadway Belter".

Here's his encore version off "Somebody to Love":


and he did a duet with Jesse McCartney:


PM me if you want to check out the 7.1.4 DTS:X version of "It never rains in LA".
 

Wagonmaster_91

400 Club - QQ All-Star
Since 2002/2003
Joined
Mar 11, 2002
Messages
436
Location
Dallas,TX
My son in law just released a single on all streaming services:

"It never rains in LA" by Eric Michael Krop

(note there are several different songs with that name or similar so look for the one by him).

and I also up/remixed it for 7.1.4. DTS:X .Now I'm waiting to hear back from Tidal if they would be interested in a Dolby Atmos version.

Speaking of Eric he just appeared on "I Can See Your Voice" as "The Broadway Belter".

Here's his encore version off "Somebody to Love":


and he did a duet with Jesse McCartney:


PM me if you want to check out the 7.1.4 DTS:X version of "It never rains in LA".
Wow, G - small world. Eric was great on the show. My wife and I both thought he was a 'good singer' from the beginning of the show. The contestant should have stayed with him through the end.
 

zeerround

Moderator
Joined
Apr 11, 2010
Messages
277
Yes the tears you saw were Eric's sadness that she didn't win the $100K for her son and students.
 
Top