New Opensource tool "Spleeter" for extracting stems

QuadraphonicQuad

Help Support QuadraphonicQuad:

boondocks

Senior Surround Collector
QQ Supporter
Joined
Apr 19, 2006
Messages
2,761
Location
State of Being
So following zcftr29's lead I installed miniconda. I already had ffmpeg installed and entered into the path. I then installed (er caused to be installed) Spleeter.
Same as before, used the default separation and it worked perfectly. Next go around I tried a 4 stem separation and suddenly I' m getting messages making me scratch my head, like this one where I did a simple help:
(base) E:\Downloads>spleeter separate -h
Fatal error in launcher: Unable to create process using '"d:\bld\spleeter_1586941290053\_h_env\python.exe" "c:\users\gj\miniconda3\Scripts\spleeter.exe" separate -h': The system cannot find the file specified.
Now, I have no idea wth this ""d:\bld\spleeter_ stuff is as it does not exist on my d: drive. The next statement is indeed the miniconda directory and all that is correct.
So now even a reboot does not help, so this ""d:\bld\spleeter_ reference must be coming from somewhere, right?
so weird.
 

zeerround

Moderator
Staff member
Moderator
QQ Supporter
Joined
Apr 11, 2010
Messages
803
Please try adding:

python -m

to the front of that, so:

python -m spleeter separate -h

This is a known issue for some on windows, needing python -m in front of spleeter...
 

sjcorne

Moderator
Staff member
Moderator
Joined
Jan 1, 2010
Messages
5,006
Location
Washington, D.C.
I found an interesting use for spleeter while working working on a quad vinyl conversion earlier today.

As many quad fans know, the song "You're My Home" on the Piano Man quad mix suffers from a buried lead vocal (both the SQ LP and Q8 tape have the problem - I guess someone forgot to push up a fader during the mixing session?). Anyway, I took my decoded front channels and ran them through spleeter to generate an isolated lead vocal, boosted it a few dB, and mixed it back into the original fronts. Voila - Billy Joel is back in the mix!

Old (Buried Vocals):
BJ_YMH_1.jpg


New (Restored Vocals):
BJ_YMH_2.jpg
 

JonUrban

Forum Curmudgeon
Staff member
Admin
Moderator
Since 2002/2003
Joined
Mar 2, 2002
Messages
17,236
Location
Connecticut
I found an interesting use for spleeter while working working on a quad vinyl conversion earlier today.

As many quad fans know, the song "You're My Home" on the Piano Man quad mix suffers from a buried lead vocal (both the SQ LP and Q8 tape have the problem - I guess someone forgot to push up a fader during the mixing session?). Anyway, I took my decoded front channels and ran them through spleeter to generate an isolated lead vocal, boosted it a few dB, and mixed it back into the original fronts. Voila - Billy Joel is back in the mix!

Old (Buried Vocals):
View attachment 50842

New (Restored Vocals):
View attachment 50841

Good job! That's the kind of cool thing to do with this stuff that becomes a new tool for sprucing up your files.
You could have also put it in the center channel. I have done that with a lot of quad conversions I have done, extract the lead vocal and put it in the center as it really fills in the void on a 5.1 system, especially in the car.
 

staygroovy

500 Club - QQ All-Star
Joined
Oct 18, 2009
Messages
513
Location
NYC, NY
I tried this version, but couldn't get it to work. Maybe someone else will have better luck...
None of the GUI options I tried for Spleeter worked and I didn't delve on this for long, this may have changed as it was back around November 2019.

For those who are interested in trying Spleeter out after it's installed locally on your machine, there are many tutorial videos that shows in detail how to install it, here is the one I followed (Windows). It's really well explained. Just make sure that the new 16kHz libraries and improved pre-trained models are installed. Mac users will have some more hoops to go through, because they have to deal with Anaconda in order to get Python running, but I was told it's not too complex, again many online resources available to consult for this.

Command line is no problem, the first time it's a bit of a pain but then (since PowerShell remembers your commands) just toggle cursor up, recall the last good command, edit the new name to match the song, press [Enter] and off you go... you'll never have to type all of this complex stuff again, just change the name of the song. If this still scares you too much, you can upload to this web site and they email you a download link for the stems from what I remember.

[caveat] Spleeter needs fast computer, and I find that even with 32 Gigs of RAM I can't run any other apps when doing long songs.
(Max: under 10 minutes, if your track is longer you'll have to do it in segments and rejoin them later, I find that Adobe Audition multitrack session mode works best for this)
Also if you mess up, remember to delete the sub-folder that was generated in the 'splits' folder otherwise the next time you try for a song that was already attempted, it'll throw an error as the folder for that song already exists.

First, once in Powershell you will need to make sure to navigate to the directory where your audio files are, so that the path for the files doesn't need to be specified.
then enter the following command. (keep in mind I don't have spaces between words in the file name, and any audio format and sample rate works, like wav, flac, etc.. output will always be redbook audio)
Code:
> spleeter separate -i NameOfSong.wav -p spleeter:5stems-16kHz -o splits --verbose
 
Last edited:

zcftr29

Active Member
Joined
Jun 18, 2012
Messages
85
Looking through the issues and project files yesterday on spleeter, it sounds like a lot of training options will be in the next version.

I also wondered about training, as "we" wouldn't have the constraints around only using open rights music, etc. but I also wasn't sure if "we" could put together more/enough high quality stereo, with matching stems/mulitracks or not (and keep in mind that rockband/oog files don't count as high quality).

Re: Acoustica, from what my re-mixer friend has said, and what has been mentioned here, I think that (maybe) extracting one source at a time is going to yield cleaner results. However, if the goal is to put back all the separations and end up with the original (and I have done those listening tests) splitting them all at once may be the more valid approach. AFAIK the "other" is generated by summing the separations back together and subtracting from the stereo.

Anyway, for the purposes of remixing/upmixing the normal artifact reduction stuff applies.

1) Don't stress (to much) if you hear it only in isolation​
2) mixing a little of the adjacent channel and/or the original stereo is a trade off between artifacts and separation.​
3) But keep in mind that separation helps things stand out in the mix (surround "WOW" factor)​

All that said, and all the issues with this one tool aside I'm still curious to see what people will do, in terms up upmixing, with the current toolsets.

It would be different story, for me, if you got each drum/cymbal, or each guitar, etc. as separate stems or multitracks, but as it is I'm not sure of the added value for stereo to 5.1

One could take each stem, and upmix it with specweb, using different image widths and rotations, vs. just assigning a stem to C, fronts, or rears, and I haven't done a lot of that, so curious with what others will come up with.

Rotation would let you, say, flip the drum upmix around so the "center" is virtual rear center, or pan an upmixed stem left or right in the surround field.​

I have been thinking about some sort of visualization tool, that shows you the LR angle of energy throughout the song, then having that as a panner/mixing tool for the (upmixed) stems. Hard to describe in words.

Although they might get there eventually, I think separated stems are currently a long way off sufficient quality to use as the basis of a multitrack remix - something which is quite different from an upmix. IMO upmixes should 'feel' like an expanded, more immersive version of the original stereo mix - bringing out details that might have been buried before.

One way of bringing stems into the 'traditional' upmix process (eg SpecWeb, Penteo, Neural etc) could be to split each channel into stems post initial upmixing in order to play with the levels in each speaker (or pair of speakers if you deal with fLR and sLR in tandem).

As per my original post, I initially started using Spleeter exclusively to create a focused centre channel on what were previously quad upmixes. As noted by Jon above, it can also be used to good effect to enhance official quad mixes too. For most albums I've tried this on so far, the automated batch file method works really well and required little manual effort.

Over the last week or so, I have been revisiting my Queen upmixes (all time favourite band!) and giving them a bit more of a manual touchup... Using the CentreCutGui centre channel as my starting point, I have split to both 2Stems and 5Stems (because the vocals are generally cleaner on the 2Stem model) and then brought them into Audition to create a more edited front channel as follows:

1) Invert the 2Stem vocals and add to the 5Stem vocals (Mix Paste) - this leaves only the differences. This is just a check to see if there is anything in the 5Stem vocals that I do want to keep that did not make it into the 2Stem vocals. Usually there is just noise, but occasionally there is some residue that I then mix back into the 2Stem vocals (without inverting)

2) Clean up the 2Stem vocals - silencing bits where I know there is no vocal action (sometimes Spleeter gives a false positive to bits of the lead guitar). I also remove other bits I don't want - eg backing vocals - leaving just the main lead vocal. Save this as a new 'clean' stem.

3) Invert and mix paste the clean vocals back into the source file - this leaves everything else...

4) Now I listen carefully to the file from step 3 and see if there are any particular highlights (eg solos) where there is one or more discreet instrument. This happens quite a lot with the Queen stuff where the multilayered guitar parts separate out into different speakers.

5) Unless there is a particular section that really warrants the drums of bass guitar, I then subtract these two stems from the file from steps 3/4 (again mix paste/invert) and do a similar exercise to step 2 to leave behind just the bits I want.

6) Finally, combine all the stuff I want in the centre channel into one file. To mask any artefacts, I then take the combined/edited centre - mix paste/inverted back into the source file to again leave everything else - and then mix past 'everything else' back into the edited centre channel - not inverted, but on 10% so that it is 'just' audible. That's the centre channel finished.

7) To create the fL+R channels, I then mix paste/invert the finished centre channel back into both front channels, removing all the stuff that is now in the centre to avoid any 'doubling up'. This also means that a fold down to stereo should bring me back to where I started.

Going back to how this could be worked into SpecWeb, after the initial separation, you could do a second pass where you can selectively move stuff from front to back using the stems. EG if you thought there was too much vocal in the rears but still wanted to have some, you could invert the vocal stem from the rear channels and mix in 50%, and simultaneously mix 50% of the non-inverted rear vocal stem into the front channels. At the extreme, this could work on a channel by channel basis - moving specific stems from one speaker to another by way of sliders.
 

winopener

2K Club - QQ Super Nova
Since 2002/2003
Joined
Mar 2, 2002
Messages
4,110
[caveat] Spleeter needs fast computer, and I find that even with 32 Gigs of RAM I can't run any other apps when doing long songs.

Better with a dedicated machine... even if low-power it may take a long while to stem a song, but meanwhile you can do something else on another machine. SSD is a must, since it does swap to disk all time.
 

boondocks

Senior Surround Collector
QQ Supporter
Joined
Apr 19, 2006
Messages
2,761
Location
State of Being
Although they might get there eventually, I think separated stems are currently a long way off sufficient quality to use as the basis of a multitrack remix - something which is quite different from an upmix. IMO upmixes should 'feel' like an expanded, more immersive version of the original stereo mix - bringing out details that might have been buried before.

One way of bringing stems into the 'traditional' upmix process (eg SpecWeb, Penteo, Neural etc) could be to split each channel into stems post initial upmixing in order to play with the levels in each speaker (or pair of speakers if you deal with fLR and sLR in tandem).

As per my original post, I initially started using Spleeter exclusively to create a focused centre channel on what were previously quad upmixes. As noted by Jon above, it can also be used to good effect to enhance official quad mixes too. For most albums I've tried this on so far, the automated batch file method works really well and required little manual effort.

Over the last week or so, I have been revisiting my Queen upmixes (all time favourite band!) and giving them a bit more of a manual touchup... Using the CentreCutGui centre channel as my starting point, I have split to both 2Stems and 5Stems (because the vocals are generally cleaner on the 2Stem model) and then brought them into Audition to create a more edited front channel as follows:

1) Invert the 2Stem vocals and add to the 5Stem vocals (Mix Paste) - this leaves only the differences. This is just a check to see if there is anything in the 5Stem vocals that I do want to keep that did not make it into the 2Stem vocals. Usually there is just noise, but occasionally there is some residue that I then mix back into the 2Stem vocals (without inverting)

2) Clean up the 2Stem vocals - silencing bits where I know there is no vocal action (sometimes Spleeter gives a false positive to bits of the lead guitar). I also remove other bits I don't want - eg backing vocals - leaving just the main lead vocal. Save this as a new 'clean' stem.

3) Invert and mix paste the clean vocals back into the source file - this leaves everything else...

4) Now I listen carefully to the file from step 3 and see if there are any particular highlights (eg solos) where there is one or more discreet instrument. This happens quite a lot with the Queen stuff where the multilayered guitar parts separate out into different speakers.

5) Unless there is a particular section that really warrants the drums of bass guitar, I then subtract these two stems from the file from steps 3/4 (again mix paste/invert) and do a similar exercise to step 2 to leave behind just the bits I want.

6) Finally, combine all the stuff I want in the centre channel into one file. To mask any artefacts, I then take the combined/edited centre - mix paste/inverted back into the source file to again leave everything else - and then mix past 'everything else' back into the edited centre channel - not inverted, but on 10% so that it is 'just' audible. That's the centre channel finished.

7) To create the fL+R channels, I then mix paste/invert the finished centre channel back into both front channels, removing all the stuff that is now in the centre to avoid any 'doubling up'. This also means that a fold down to stereo should bring me back to where I started.

Going back to how this could be worked into SpecWeb, after the initial separation, you could do a second pass where you can selectively move stuff from front to back using the stems. EG if you thought there was too much vocal in the rears but still wanted to have some, you could invert the vocal stem from the rear channels and mix in 50%, and simultaneously mix 50% of the non-inverted rear vocal stem into the front channels. At the extreme, this could work on a channel by channel basis - moving specific stems from one speaker to another by way of sliders.
Sounds like you have a good workflow. I have let my Audition skills pretty much drift away in recent years, time to revisit.
So I'm seeing that regardless of input samplerate to Spleeter, it only kicks out 44.1Khz. Hopefully that will be changed at some point, as well as overall improvement.
Would really like a GUI that works.
 

J. PUPSTER

💿🐕 Senior Disc Chaser 🎸
QQ Supporter
Joined
May 30, 2017
Messages
11,055
Location
CALIFORNIA (CENTRAL)
Better with a dedicated machine... even if low-power it may take a long while to stem a song, but meanwhile you can do something else on another machine. SSD is a must, since it does swap to disk all time.
That’s the main reason I love my NUC, small but powerful, does most of my audio processing now, sits next to my main computer. One of the best purchases I ever made!
 

zcftr29

Active Member
Joined
Jun 18, 2012
Messages
85
Better with a dedicated machine... even if low-power it may take a long while to stem a song, but meanwhile you can do something else on another machine. SSD is a must, since it does swap to disk all time.
My machine is very modest and showing its age (about 8-9 years I think) - Core i5 2500, 16GB RAM, Nvidia Quatro K620 2GB graphics card - and it gets through an album in about 25 minutes - including all the splitting files into 20 second chunks and then stitching them back together again. The actual Spleeter bit only takes about 8 minutes... And that's both 2 and 5 stems
 
Last edited:

zeerround

Moderator
Staff member
Moderator
QQ Supporter
Joined
Apr 11, 2010
Messages
803
Sounds like you have a good workflow. I have let my Audition skills pretty much drift away in recent years, time to revisit.
So I'm seeing that regardless of input samplerate to Spleeter, it only kicks out 44.1Khz. Hopefully that will be changed at some point, as well as overall improvement.
Would really like a GUI that works.

If you look at the help, there is a bit rate flag. Also answered (not very clearly) in some of the issue tracking is bit depth. But I don't know if proccessing happens at the input rate or only at 44.1kHz.
 

skherbeck

2K Club - QQ Super Nova
QQ Supporter
Joined
Apr 15, 2015
Messages
3,346
Location
SoCal, USA
A lot of Rockband stems have all the vocals (leads and backgrounds) stacked on top of each other in a single stem. Would a tool like this be able to split them up?
 

zcftr29

Active Member
Joined
Jun 18, 2012
Messages
85
A lot of Rockband stems have all the vocals (leads and backgrounds) stacked on top of each other in a single stem. Would a tool like this be able to split them up?
Spleeter won't do it, but if the vocal stems are stereo, CentreCutGui might be able to separate them into lead in the centre and backing to the sides
 

skherbeck

2K Club - QQ Super Nova
QQ Supporter
Joined
Apr 15, 2015
Messages
3,346
Location
SoCal, USA
Spleeter won't do it, but if the vocal stems are stereo, CentreCutGui might be able to separate them into lead in the centre and backing to the sides
I honestly don’t understand most of this thread, lol... is this something a novice might be able to use, and if so, where might I find it? 😁
 

J. PUPSTER

💿🐕 Senior Disc Chaser 🎸
QQ Supporter
Joined
May 30, 2017
Messages
11,055
Location
CALIFORNIA (CENTRAL)
Spleeter won't do it, but if the vocal stems are stereo, CentreCutGui might be able to separate them into lead in the centre and backing to the sides
I'm wondering if one potential way to get around this:
In Audacity, highlight the track, go to edit and choose Duplicate; then highlight both tracks and in the dropdown box on the left of a track choose make a stereo track?
 

zcftr29

Active Member
Joined
Jun 18, 2012
Messages
85
CenterCutGui can be downloaded from here: http://www.moitah.net/download/latest/Center_Cut_GUI.zip It's very easy to use, either via a command line or it also has a GUI (hence its name). RE Pupster's comment, duplicating a mono track to make it stereo won't work as CentreCutGui extracts the 'phantom' centre channel (ie stuff that is equally loud in both channels of a stereo mix) and outputs two files - stereo 'sides' and a mono centre. Depending on how the vocal stems are mixed, you might then get lead vocal in the mono track and backing vocals in stereo.
 

zeerround

Moderator
Staff member
Moderator
QQ Supporter
Joined
Apr 11, 2010
Messages
803
Just FYI, the "slice" mode of Spec and SpecWeb was modeled after center cut, but it uses multiple stages to go from stereo to 5.1.
 

jeffko

Jeffko!
QQ Supporter
Since 2002/2003
Joined
Sep 7, 2003
Messages
75
Location
Bulgaria or LAS VEGAS
I am having a lot of fun playing with this. Does anyone see an easy way to save the stems as individual files Vocals. piano, drums, Bass and other? it is incredible how good of a job the AI does on the vocals. I have thus far muted the other 4 out of 5 stems to then save the unmuted stem. I am really looking forward to the AI plugins that will give us guitar stems And organ stems and saxophone stems and most importantly a crowd removal AI for concert recordings.
 
Top