Two and a half years ago, I started making music; I find it a very interesting and vast subject. I would like to deepen my work and, in that regard, I was impressed a few months ago by a song I heard on the radio.
I honestly thought a sound (from the radio) was not coming from my headphones - I believed something producing that sound was actually in my room.
I would love to be able to create such a sound. I am currently working with Ardour (DAW) on Debian Linux (OS); I have started using Panagement2 (binaural plugin) and IEM (Ambisonic plugin).
Could someone please help me move forward with this task?
I love that topic because it's a complicated one, and you know what's the great thing about complicated topics? It doesn't matter if you achieve 100% of what you're going for or not, you're still gonna learn a lot about it.
Now, let's say you want to achieve the same sound as if the source is coming from your living room while listening to headphones, right?
A simple answer would be that you should try to emulate some of the characteristic of a sound that passes through the same changes as a sounds that really came from your living room would. You could for instance get a dry sound like a short burst of wideband noise and play it through a speaker placed in the room exactly where the source should be, and then put a microphone at the position of the listener and record that short burst coming from the speaker. You can then analyze the recorded file, and if you have the ability to also reverse engineer any changes that the speaker and the recording chain added to the final file. This is not easy but can you see where I'm leading you? You need to know what changes happen to the sound while travels from it's point of origin to your ears.
Here's a list of what changes as sound energy is transferred from the point of origin to your ears:
1. It loses high frequencies because of the friction between the air particles, the phenomenon is called damping. Temperature and humidity also play a role here.
2. It gets multiplied by all the reflections that are created by the sound interacting with the surfaces around the space. Those are separated into Early Reflections which are the first sparse reflections that our brains use to define spatial information, and Late Reflections which are the diffused multiple reflections that arrive later and our brains use to define the space's size and other things. The characteristics of both the early and late reflections also define the room.
3. Two ears actually get two copies of the sound, the main differences between them is the difference in time (because of the ear-to-ear distance) and the difference in frequency spectrum (because of the acoustic shadow that our head produces for each ear). Those differences define a lot of the information at our brains for where a sound is coming from. This is what binaural plugins usually do.
So as you see the effect you try to achieve is not only done by binaural panners but also needs careful set up of a reverb plugin and filters to simulate air damping by distance.
To add to the complexity of that, think that even if you do all that, the final sound will still emulate your living room and not another person's living room.
But you can simulate a more generic living room and achieve a high percentage of realism so that the end result will be believable in most living rooms from listeners with headphones.
More realism can be achieved my augmented reality technologies but that's out of the scope of this discussion, unless you want to also develop a specialized app that delivers your music in this unique but very believable way.
Thank you for your messages, Pan, and please excuse the delay in my reply.
It's true, "the journey is as important as arriving at the destination"; I've noticed this several times, and this mindset makes any study far more rewarding. I’m truly passionate about our subject, and things are becoming increasingly clear — thanks in large part to your help.
Your information is more than interesting. For example, the two types of reflections, damping and the sound shadow of the head, have allowed me to understand the phenomenon.
I’ve started some experiments following your advice, and I’ll be happy to share them in this part of your forum.
I assume it should be mono? A physical sound source is always mono, right?
It seems the result is the same, for a stereo or mono sound source, if I use P2 or P3 plugins. I think IEM encoders are converting stereo in mono.
For this track, the best binaural effect seems to be achieved with S2 + P3, and:
Regarding P3: I listen this in my living room. I think the effect might be different outdoors.
Regarding S2: It is a short sound. Is it more difficult to achieve a good effect with longer sounds?
I’m looking to improve binaural audio. I’ve identified two directions:
Plugins: I think I will use IEM, and, for now (since my current goal is headphone listening), I only see:
stereo encoder
room encoder
binaural decoder
Sound design:
I use Surge XT and plan to start with Cardinal.
If I understand correctly, I need to create mono sources.
Here, I've only used Surge XT and disabled all effects: aren't reflections handled by one of the previous encoders? But some instruments seem to have a kind of ‘natural reflection’, like the trumpet and other brass. It’s not very clear to me...
I have many questions:
Should I adapt my sound design to the binaural/ambisonic world?
Very cool that you continued with the experiments, I just downloaded your example render and I'll give it a listen. I'll post my thoughts today or tomorrow. Cheers!
I took a good listen to your tests, here are my thoughts:
About the IEM RoomEncoder:
Using it alone and in high level doesn't sound real, the plugin is designed to simulate early reflections and early reflections are not so loud or the only thing that's heard in a space in reality. You should use it together with the IEM FDNReverb. Here's a nice video explainer:
Regarding the source type:
Sources are not mono or stereo, those are ways to artificially play back audio, in the real world the way a source emits energy is determined by many factors. A violin for instance emits sound from all its body in different ways, a car make sounds from its different parts, and a river is a complicated and dynamic volumetric cloud of sounds that seems to come from a line that follows the river shape.
Depending on the source you want to simulate, you can use:
A point - mono.
Two points - stereophonic effect might be there if there is stereophonic information in the recording/synth output and width is not more than approx. 60 degrees.
More than two points - to recreate a more advanced source configuration.
The IEM StereoEncoder is not converting anything to mono. You just have to use the Width parameter to open up the two points in the sphere.
About listening conditions:
You said that you listened in your room and the results might be different in another listening environment. First of all yes, to an extend. But most importantly, binaural stereo is supposed to be listened using headphones and not speakers in a space. When you don't use headphones the effect is lost and there is only some phasing that occurs that might color the mix. also the effect doesn't work at all if you're not in the perfect sweet spot and don't breathe or move your head. Binaural is for headphones only. You can learn about binaural for speakers if you do the research but this is another beast altogether.
About short/long sounds:
For the best effect you should use sounds that have "interesting" content in the frequencies that our hearing uses to make sense of the location of the source. Any sounds that have life from 250 Hz to 2500 kHz are the best to help you achieve stronger localization effect for the listener. Again, in the use of binaural filters this is for headphones listening, in open space listening the probability of the effect failing is large. so, it's not about the duration but about the content, meaning the way the timbre is changing in key frequencies our hearing uses to identify location.
About the workflow:
Best approach for creating binaural material using ambisonics and IEM is:
Source --> StereoEncoder --> BinauralDecoder
or
Source --> RoomEncoder --> BinauralDecoder
*RoomEncoder Output --> FDNReverb --> BinauralDecoder (the same BinauralDecoder as above)
or
Source --> MultiEncoder for simulating more than two-point sources. You can adapt this on the scenarios above and also use multiple points in the RoomEncoder if you like.
Pro Tip: Use the RoomEncoder elegantly, a lot of early reflections can sound artificial.
About sources that contain spatial information:
Any recorded or synthetic sound that includes spatial information, like panning, reverb, chorus, very short delays, doublers, will mess with the psychoacoustics of localization and work against your purposes. Try to see which parts of the sound you want to keep and which parts you must remove to create the desired outcome. For example, a synth patch that includes reverb is no good to you when you will add this in your simulation later using RoomEncoder and FDNReverb, but maybe a chorus effect is part of the sound itself and you could work including it.
It's very common for synth sounds and sampled instrument to include reverb and other spatial effects in their presets, to give the user a polished result. Personally, I turn those effects off and use the virtual acoustics I created for my production.
Regarding your closing questions:
- Should I adapt my sound design to the binaural/ambisonic world?
It's a matter of preference. Personally, I find ambisonics the best way to produce sound, even if it is for a simple stereo set of speakers or even mono. I also don't like binaural, I find it to be a very opinionated and coloring way to filter my sound, and in most cases it works 50%. I think it's better to deliver is ambisonics and let the playback device or player do the work, that way it will work best in any case. If delivery in ambisonics is not possible, like in streaming platforms, then I would deliver using stereo or surround formats. If I would create a special binaural edition of an album, I would use the ambisonics to binaural decoder of Blue Ripple Sound, which is based on a better statistical analysis of the binaural datasets and it's far more better on translating to various people's heads, ears, and hearing. Another way to create the best possible binaural output is to use the EAR Production Suite to produce your material in any format including binaural, this features one of the best binaural renderer: https://ear-production-suite.ebu.io/
You can use it together with the IEM suite and achieve very realistic binaural output. The EAR suite is object-based audio, you can find all the information and some tutorials on their page.
- Is my choice of plugins appropriate?
I think yes, see above. My workflow includes also the Blue Ripple Sound plugins which are paid but - to my ears - they have the best ambisonic and binaural sound you can find. Those are state of the art if you want to seriously start producing using ambisonics.
- How can I improve this work?
Read, experiment, share, discuss. You are already improving I think