Surround Virtualization for Headphones

QuadraphonicQuad

Help Support QuadraphonicQuad:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

zeerround

Moderator
Staff member
Moderator
QQ Supporter
Joined
Apr 11, 2010
Messages
923
So this is coming from some discussion of Surround Virtualization in a thread on Apple's Atmos Headphones:

https://www.quadraphonicquad.com/fo...tmos-on-apple-headphones-actually-work.29544/
It got me moving to (Re) test a few different solutions and share my results and see what others thought as well.

First off, two things about me:

1) I have non-average ears, in that before Waves NX and Smith Realiser A16 (both of which have head tracking) nothing ever gave me any convincing sense of depth. Regardless of what product or what HRTF (Head-related transfer function - Wikipedia) I tried. So, you may have much better luck with a given solution than I have, depending on your ear shape, etc.

2) I own and use every day a Smyth Realiser A16 and Sennheiser HD 800 S headphones so I'm pretty invested (to put it mildly) in that solution and I do believe it to be the ultimate solution. So, yeah, I will admit to some bias in that regard.

Here are Product/Solutions I have used in the last week

ProductChannelsSurroundHead Tracking
Realiser A16
16​
7.1.4Yes
Facebook 360 VST
Oculus Rift
AmbisonicAmbisonicYes
Waves Nx6, 8, or Ambisonic5.1, 7.1 or AmbisonicYes*
Dolby Atmos for Headphones10 + Objects7.1.2 plus objectsNo**
DTS Headphone:X12 + Objects7.1.4 plus objectsNo
HeSuVi & Equlaiser APO
8​
7.1​
No
Others not worth mention…
* Both a wireless tracking sensor and tracking via a webcam and facial recognition are offered. I actually found the webcam worked best
**Apple's latest have head tracking (not tested by me) but only when "watching" content on an iPhone or iPad (iMac??).

Again the Realiser A16 being so good you honestly can't tell the difference between headphones/no headphones (other than the weight of the headphones on your head).

The only lacking piece there is it doesn't yet do 6DOF (6 degrees of freedom; virtual speakers would get louder as you move towards them or softer as you move away). Also in terms of decoders only Dolby and Dolby Atmos is working currently (even though the data sheet mentions DTS, DTS:X, Auro 3D etc. Those are "coming").

I added the Oculus Rift VR stuff because it does have 6DOF, but it's not really practical for just listening to music. None of the VR theater's seem to have Atmos or DTS:X decoding yet. I guess there are some ways to do multi-channel playback, using VR Game development frameworks, but not anything user friendly for surround at this time.

What I have done is use 12 instances of the FaceBook 360 VST in Plogue Bidule, each one panned to a 7.1.4 speaker position, to play 7.1.4 mch wavs. Inside the VR you see a grid pattern with glowing balls for each channel, that get bigger with louder sounds. You can walk around physically or, using the touch controllers, virtually. You can also grab a sound and move it...

Waves NX is also more a production tool than a casual playback device, and you do need DAW software to run it as it is a plugin, but as far as capability you can feed it 5.1, stereo, or Ambisonics in and get binaural (headphones) or Ambisonics out. It has Presets for some common headphones and settings for the distance between your ears and the circumference of your head.

1609550914607.png


You can buy it with a wireless head tracker, or use a webcam as shown above. I actually found that the webcam worked better than the wireless module, so I could have saved money on that. Either tracking solution has a limited range, compared to the A16, however. You can really look behind you, for instance.

As I mentioned above, Waves NX was the first thing that worked for me AT ALL, but I can't see it as really being good enough to mix/produce surround music with out any speakers, which is its purpose.

Continued...
 
Last edited:
Then there's Dolby Atmos for Headphones and DTS Headphone:X.

Neither of my 2019 Home Theater units have those (Marantz AV7704, Yamaha RX-A2070) but now Windows 10 does (if you pay for it).

This week I went back and did some testing on Windows 10. I have to say, without heavy visual cues in movies, neither was convincing as Virtual Surround. They do an OK job (for me) of putting the Center channel inside your head, instead of the top, but not really out in front, and definitely nothing convincing as behind. Like a lot of solutions, the rears just seem to be above the sides, inside my head, vs. in the back.

I took some screen shots of the apps you have to buy and the settings you need to make. I can include them in a later post.

Then there is a little messy combination of two free products that seem to have some promise, and certainly as a platform to try a bunch of different standard HRTFs and impulse responses (IR) recorded from other products.

That is this:

https://sourceforge.net/projects/hesuvi/
which is a front end for this:

https://equalizerapo.sourceforge.io/
My main complaint about HeSuVi is the instructions seem overly complicated, poorly written, etc. but it does work as free software that lets you do surround virtualization with headphones.

In yesterday's testing I wasn't super impressed, other than in the ability to try a bunch of different stuff with one tool, but again I have my non-average ears...

I was playing back 7.1 channel IDs to get an idea of what the different IR files sounded like, and actually found that the Out Of My Head file, ooyh1.wav, performed the best for me on the channel IDs, but sounded horrible with music :0(

Hopefully you will have better luck.

Note that they have impulse responses recorded from Atmos for Headphones, DTS headphone X and many other surround virtualization solutions.

That's only a partial solution of course. It's not going to decode Atmos or DTS:X sound objects, and it's not immersive, being limited to 7.1, and there's no head tracking...

But in theory you could record your own HRTF as an IR file and use this to play 7.1 mch files on Windows via headphones.

More on that in a minute...
 
Before today I would have said (and have said) that Head Tracking is absolutely necessary for convincing surround virtualization (at least in my case) but now I think if the HRTF is personalized enough/ high quality enough you can get away with out it.

Of course the sound field is fixed to your head, so it moves with you, but today's experiment showed that that doesn't necessarily make the simulation fall apart.

The experiment was to record the headphone out of the A16, which is going to be a combination of three things. The shape of my ears, the response of my headphones, and an Impulse Response recording in a BBC 7.1.4 room with a Neumann KU100 head/mics.

First I just recorded the A16 output for 7.1.4 channel IDs and then some 7.1.4 (upmixed) music (2 ch. recording), while holding my head in the center of the Head Tracking Display from the A16 on top of my monitor. Then I played them back with a different sound card and different headphones.

It worked! Best with the HD800s (which the A16 is setup for) but still convincing, if not flat frequency response, in terms of virtual surround. As I said above the Head tracking was not necessary for enjoyable playback (but still needed for surround music mixing/producing exclusively with headphones).

Again, this was customized, by the A16 mics in ears calibration for me personally, but I think it proved a point.

Check out the 7.1.4 Channel ID binaural recording and see if it does anything for you (just curious):

https://drive.google.com/file/d/1FbMghb7pfXefF_w9ypq_NGVEb0f8kjwG/view?usp=sharing

The next experiment was to create an IR file, from the A16, to use with HeSuVi.

That took a while, and was tedious, but again success!

Recording the supplied dirac_delta_7.1.wav file from the headphone out of the A16, then doing a bunch of editing to get the two channel recording into a 14ch. file in the file order needed...

Instructions here: HeSuVi / Wiki / How-To Record Impulse Responses Digitally

With the file all fixed up and loaded into HeSuVi I tested 7.1 Channel IDs and lots of multichannel 5.1 music. Basically sounds Identical to the A16 (at least with the HD800S headphones) sans head tracking.

Again, customized for my ears and headphones but here is the IR file if you want to try it with HeSuVi:

https://drive.google.com/file/d/11_yAnw792ArBAdDJtgpbg_Qa9gpB9k7O/view?usp=sharing
 
I'm kind of on a different planet with this crazy idea but here it is: a head tracker for quadphones, using very simple technology such a compass sensor, accelerometer, and raspberry pi/arduino, no face tracking. The head tracker would control the channel output rotation, could be just a servo on the mouse tracking pad with the clicker taped down lol. It wouldn't be hard for someone with good programming skills(not me), would be very ugly and only work for side-to-side panning of course, I like looking like an escaped science project anyway.
I'm also looking for a set of the Koss quadphones with the special control box, something like that, a circuit that controls balance/blend/phase, or circuit like the pre-synth idea @Sonik Wiz has talked about for using with the SM(I cant find the specific thread) could also be interesting through quadphones, attached to a servo controlled by the head tracker. Need octo-phones! 🧠🎧🎚🕹🤘
 
There are DIY head trackers, some using a hat with IR LEDs and a modded camera from an old game system (PS3 eye with IR filter removed). I have one. Never went much farther with it than just getting it to track.

https://www.trackhat.org/
If you google "DIY head tracker" there are lots of hits on other projects as well.

e.g.: Up Your Game With DIY Headset Motion Tracking

But I think the beauty of the A16 is that there is NO lag or noticeable effect, no matter how fast you move around. It just sounds exactly like you're not wearing headphones and listening to speakers. To rotate the sound rapidly in 3D takes some heavy math (ambisonics anyway) on top of low latency sensors.

I also note that the A16 has two tracking technologies in play. Magnetic AND Optical. While the optical is off by default, my experience is that you need them both on else the center keeps drifting off to one side and you have to look center and press a button on the top of the headphone mounted unit every few minutes.
 
And speaking of "octophones" I have tried a couple of the multispeaker surround headphones.

They sound like crap, for music. Designed for gaming...
 
And... now that I have the IR for my head and headphones, I can also use it in VST convolvers or FIR tables in DAWs like Plogue Bidule, etc.

There is at least one free Convolver VST, Ignite - NadIR, that does the trick. You'll need 5 instances in parallel for 5.1, and 7 for 7.1.

1609572694814.png
 
There are DIY head trackers, some using a hat with IR LEDs and a modded camera from an old game system (PS3 eye with IR filter removed). I have one. Never went much farther with it than just getting it to track.

https://www.trackhat.org/
If you google "DIY head tracker" there are lots of hits on other projects as well.

e.g.: Up Your Game With DIY Headset Motion Tracking

But I think the beauty of the A16 is that there is NO lag or noticeable effect, no matter how fast you move around. It just sounds exactly like you're not wearing headphones and listening to speakers. To rotate the sound rapidly in 3D takes some heavy math (ambisonics anyway) on top of low latency sensors.

I also note that the A16 has two tracking technologies in play. Magnetic AND Optical. While the optical is off by default, my experience is that you need them both on else the center keeps drifting off to one side and you have to look center and press a button on the top of the headphone mounted unit every few minutes.
@zeerround thanks for starting this thread and also, thanks for the tip regarding using the magnetic and optical settings simultaneously on the A16. The “trick” you cited has served to step up considerably my enjoyment of the A16.

I don’t know if you’ve heard about or ever experienced the Boss Waza Air headphone system. It’s a guitar amp simulator that’s said to convincingly simulate a surround experience. I have yet to experience the unit first hand but I’m almost intrigued enough to drop $400 on one to use for late night guitar practicing. There are, of course, YouTube videos out there and the best one AFAIK is on Mary Spender’s channel.
 
@zeerround thanks for starting this thread and also, thanks for the tip regarding using the magnetic and optical settings simultaneously on the A16. The “trick” you cited has served to step up considerably my enjoyment of the A16.

I don’t know if you’ve heard about or ever experienced the Boss Waza Air headphone system. It’s a guitar amp simulator that’s said to convincingly simulate a surround experience. I have yet to experience the unit first hand but I’m almost intrigued enough to drop $400 on one to use for late night guitar practicing. There are, of course, YouTube videos out there and the best one AFAIK is on Mary Spender’s channel.

The free VST I mentioned is also intended for Guitar cabinet simulation. I think there is at least one other (free) as well, but certainly any number of paid products.

For the free STL NadIR, each instance can do two convolutions (two IRs) so is also suitable to convert one surround channel into left and right binaural (virtual surround). They have a paid version that does 8 IRs at once; STL Ignite - Libra. $30 and/or free trial.

Two instances would be needed for 5.1 or 7.1, 3 for 7.1.4.

So we've shown multiple ways to playback in virtual surround, but are still using ~$6,500 US in gear to record the custom IR, so I think the next step is to look to DIY HRTFs (IRs). Maybe like this:

https://www.earfish.eu/sites/default/files/2018-01/DIY_earfish_iPhone_0.pdf
which is said to be on the order of ~$200 US (assuming you have access to an iPhone).

https://www.earfish.eu/
 
"They have a paid version that does 8 IRs at once; STL Ignite - Libra. $30 and/or free trial. "

Actually that only has two inputs, so doesn't really have any advantage over their free 2 IR stereo in VST, for surround virtualization.
 
How do headphones achieve the vertical axis?

Not to be flip, but, the same way your ears do. Or really specifically tailoring the sound with knowledge of what your ears do.

Its actually a deep topic, and has a LOT to do with the shape of your ears, and that your brain has spent your whole life learning to localize sounds with them.

For "the median plane", specifically the shape of your pinna.

http://www.visualdictionaryonline.com/human-being/sense-organs/hearing/pinna.php
Basically sounds coming from different vertical directions get filtered (frequency response) in specific ways, due to the shape of your ears.

We can simulate that, with convolution against impulse responses recorded with (calibrated) mics in your ear canal.

Most of the discussion in this thread was for 7.1, but it also works (with more convolutions, more IRs) for height information.

e.g. for 7.1.4 surround we would add 8 more IRs. For each height speaker, an IR from each of your ears.

For instance the A16 can present 16 virtual speakers in any 3D location (I typically have it set up for 7.1.4, 12 virtual speakers)

from Wikipedia:

The human outer ear, i.e. the structures of the pinna and the external ear canal, form direction-selective filters. Depending on the sound input direction in the median plane, different filter resonances become active. These resonances implant direction-specific patterns into the frequency responses of the ears, which can be evaluated by the auditory system for vertical sound localization. Together with other direction-selective reflections at the head, shoulders and torso, they form the outer ear transfer functions. These patterns in the ear's frequency responses are highly individual, depending on the shape and size of the outer ear. If sound is presented through headphones, and has been recorded via another head with different-shaped outer ear surfaces, the directional patterns differ from the listener's own, and problems will appear when trying to evaluate directions in the median plane with these foreign ears. As a consequence, front–back permutations or inside-the-head-localization can appear when listening to dummy head recordings, or otherwise referred to as binaural recordings. It has been shown that human subjects can monaurally localize high frequency sound but not low frequency sound. Binaural localization, however, was possible with lower frequencies. This is likely due to the pinna being small enough to only interact with sound waves of high frequency.[17] It seems that people can only accurately localize the elevation of sounds that are complex and include frequencies above 7,000 Hz, and a pinna must be present.[18]
To be clear what we are talking about here, surround virtualization, is not binaural recording, recording music or other sounds with mics in the ears, but using the mics in ears to determine what the frequency response, time delay, and power difference of sound is at each of your ears for each sound source location.

Once you have those "Impulse Response" measurements, you pipe each surround channel of music through a pair of "convolution" filters (one for each ear).

With head tracking, you also keep track of where your head is pointed (and if 6 degrees of freedom, its distance from each virtual speaker position) and mathematically modify those impulse responses in real time, or maybe switch/fade to the appropriate IR for that source angle and distance.
 
Back to head tracking, I have a fair amount of Arduino experience from my other hobby (astro imaging) so this might be a good head tracker project:

https://github.com/Razor-AHRS/razor-9dof-ahrs/wiki/Tutorial
I've also done a few Raspberry Pi projects, but the Arduino hardware is going to be much smaller physically so more suited to mounting on headphones.

 
This looks to be a promising tool for doing your own measurements for virtual 7.1 (and/or headphone and room correction):

https://github.com/jaakkopasanen/Impulcifer
I'm thinking about seeing if I can do it all in virtual box, so as not to mess up my anaconda python I use for other things, and maybe with an eye toward packaging it up for others.
 
For 7.1 Virtual Surround, If you have installed HeSuVi and Equalizer APO, here are some demo files from impulcifer to try.

These are not measurements I took, but just the demo stuff proving things work (but not too shabby for my ears anyway).

Put these two files in C:\Program Files\EqualizerAPO\config\HeSuVi\hrir and restart HeSuVi.

In HeSuVi, select one of the two files from the Common HRIRs list on Virtualization tab.

hesuvi_i_demo is the default demo file from impulcifer. It has a lot of room reverb in it, which may help with your sense of surround, but I didn't care for it.

I re-ran the demo command, adding --decay=5, to change the reverb decay time to 5ms. So that is the 2nd file: hesuvi_i_demo2.

I'm curious to no how that works for people.

I did do all that in virtual box, by the way (The impulcifer stuff, not the actual playback with hesuvi). Next I will try some measurements, using the in ear capsule mics that came with the A16.
 

Attachments

  • hesuvi_i_demo.zip
    1.3 MB · Views: 285
By the way, the way I am managing when my playback goes through HeSuVi is by using the "default" driver in foobar2000 for surround virtualization, and switching to ASIO for no virtualization.

Also the Realtek sound chips on my motherboard didn't work with HeSuVi, and I didn't feel like messing with virtual cables, etc., so I found and hooked up my Gigaport HD interface, for this project.

Basically you need a sound device that supports 7.1, even though you are only going to be using 2 channels for the headphone out. Equalizer APO sits in the middle of the Windows Sound stuff and routes the 7.1 through the IR convolution and then to the stereo out for headphones.
 
Only one problem: Each person's pinnas are different, and the filtering and phase effects depend on the pinna.
 
Only one problem: Each person's pinnas are different, and the filtering and phase effects depend on the pinna.

Absolutely. That's why I'm looking into less costly (equipment, effort, and $$) ways to take measurements.

That said, one may stumble on some measurements that are "good enough", and that's why I've posted my measurements from the A16, as well as sound files convolved with those.

Also the "demo" measurement file(s) from impulsifer, because they might be "good enough" for some users.

...
Speaking of measurements, I've realized I don't know the pinout for the A16 mics, but did discover that I have some SP-TFB-2 mics, which are one of the recommended solutions for impulsifer measurements. I need to work on cabling and test with different audio interfaces.
 
FYI, given the IR measurements, I can now drag and drop convert a multi-channel surround file to a "binaural" virtual surround for headphones file.

This is using ffmpeg.

I'm working with 7.1, but it will be extendable to immersive formats like 7.1.4 etc. (given the IR measurements for your ears). Quad, 5.1, etc. will also work.
 
Back
Top