ADM: production workflows and known issues

Immersion in audio is more than multiple speakers and 3D panners. It is achieved inside the listener's mind. Use this subforum to discuss immersive audio techniques, psychoacoustics, multimodality, and any topic related to creating immersion with sound in any medium.
Post Reply
Flo Angerer
Posts: 5
Joined: 08 Apr 2022 09:14

Hi all!
The title pretty much says it all, but it is probably a good idea to explain a bit my reasoning for making this thread. There are two main reasons:
1. It's always interesting to hear what other people are doing. This way we might solve issues, or just get new inspiring ideas to try.
2. People working in different parts of the industry have different needs. For example, someone mixing a film in atmos for a theatrical release would use objects mostly for their ability to adress individual speakers, as opposed to using speaker arrays. In contrast, someone working in broadcasting might use objects more for interactivity. A musician, on the other hand, may not care about any of that, and may, above all, be concerned with creating a coherent sound field. So even though we are all to a large extent using the same formats, we may have very different expectations, preferences or even requirements. My (perhaps slightly over-the-top) vision is, that we identify and discuss some of these issues, so that these things may inform possible future developments related to data interchange (and inteligent data reduction) using ADM and related open standards.
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

Coming from the game development sector which is a sub-set of interactive audio practices in general, I think that we can find a lot of solutions if we look at how game audio middleware is handling any issues.

Games have been in the frontier of interactive audio content that can be rendered in arbitrary speaker configurations and don't forget that 3D sound production found its early home in games as a commercially successful craft.

In games for instance we all knew and implemented right from the beginning open formats such as the module file (https://en.wikipedia.org/wiki/Module_file). We also used an industry-defined reverb algorithm that was also embedded to run in hardware devices like the Soundblaster soundcard.

Those features gave the game audio producers an edge in creating immersive content because we didn't have to re-invent the wheel and just created content and designed interactions, which is great.

I believe that the same should happen with ADM by extending the format to include reverberation in two forms, one should be using impulse response files, so that the creators can add exactly the reverb they like in their products, and the other should be somehow supporting this pipeline in hardware and players. The best convolution for channel-agnostic playback should be ambisonics in my opinion and it should have two processors enabling the smooth transition from one reverb to another to simulate that transfer between different spaces if the story needs it.

As a fallback, the format and decoders should also offer an algorithmic reverb solution, just like the reverb that the Unity and Unreal game engines use.

The creator could include the IR files in HOA and the decoder could choose the appropriate channels to load depending on each manufacturer's implementation in the hardware (or software). For example, if I included all the IRs of my MPEG-H production as 5th order ambisonic files in my ADM export, a high-end decoder for domestic use could use the 3rd order for its convolution, while a car audio solution could just use the 1st order to save resources.

Having two convolution processes crossfading can also create great experiences with smooth overlaps over virtual scenes and the algorithmic fallback reverb should be programmed for real-time use so that the creator can author morphing between scenes in the story.

Those are techniques used for decades in games and in my opinion are the definitive solution to the ever-present "what to do about reverb" issue of ADM and NGA.

I know that it's not easy to shift the hardware manufacturers to embed more hardware with the same pricing, but if the ADM format and authoring tools like the EAR Production Suite and MPEG-H Authoring Suite offered such a solution, then it could be implemented in software players that run on computers and mobile devices, which offer the power to playback that kind of content.

To support devices that don't offer real-time reverb features, the creators could also supply a switch group containing a scene-based (HOA) track with the baked reverb. Players could use that channel when the hardware (or software player) doesn't support real-time reverb capabilities.

I'm sure that if the ADM format and the authoring software would support this, then some software players could implement it as well, and as hardware becomes faster and cheaper, eventually hardware manufacturers would also use it as a unique selling point for their products. This would make this eventually the standard.

From my experience, this is one of the best ways to extend an immersive audio format, as reverb is as important as a point source panning in 360 degrees, if not more in some cases (depending on the story and experience design). It's also a way to liberate 3D audio producers from endless searching for reverb plugins.

By the way, what I'm proposing is also backward compatible, as the creators can use whatever they choose and supply it as they do right now, and everything will sound ok. The real-time reverberation solution of the ADM would be a feature to support interactive immersion for dynamic content creators, which would eventually become some kind of standard, exactly as the same solution worked for interactive audio production in the videogames sector for decades now.

In fact, if you open the Unity game editor and add a reverb area in the game level, you will find the same settings that developers used for Soundblaster-supported games back in the 90s.

Decoupling the content from the reverb and giving the power of choosing the settings or supplying impulse responses to the 3d audio authors, and then supporting reverb rendering on the decoding stage at the location of the audience also allows for more clean sound to be used in the production which helps the psychoacoustic localization of the listener, resulting in more immersion and the suspension of disbelief.
Flo Angerer
Posts: 5
Joined: 08 Apr 2022 09:14

OK, so here are some thoughts on music production.
Unlike in game audio or film post production, surround sound hasn't really had much of an impact in music yet. Of course, there have been multichannel music mixes over the years, but in most cases the basis material for these things is still a stereo mix. I would even go so far as saying, that the sound of recorded music itself has been fundamentally shaped by the fact, that it is being made in stereo, including all the production workflows that come with this. We are just very used to things like panning tracks, summing them to groups, bus processing, send- and insert effects and so on, and, perhaps more importantly, we are very used to how music created in such a way sounds. If we move from stereo to any immersive format, we should probably ask ourselves, what it is that we actually want to gain. Are we aiming to make, in a way, "larger" stereo mixes? Do we care about interactivity? I will assume for now, that we don't care about interactivity, although that is an interesting discussion.

So as we are moving beyond stereo, it is desirable, that we retain as many of these features as we can. In channel based formats like 5.1 this is relatively easy to do, but, as we know, CBA has some drawbacks, the biggest of which being, that we are mixing to a specific pre-defined speaker layout, which the end user is less likely to have, the higher the channel count. As we switch from discrete channels to an object-based paradigm, we inevitably face the fact, that there is now no master bus to send things to, as all our objects have to be kept separate. We can think of ways to adress this, but this requires a bit of mental gymnastics on our part, and whatever we may come up with in terms of crazy side-chain workflows, they can't fully compensate for the lack of bus processing. Also, there is this whole issue with reverb, which for music production has an added layer of uglyness to it: even if we do come up with a standardized approach for object based reverb, it is very likely that this won't fully solve our problem. Music producers and mixing engineers are very particular about the equipment they use, and would, in many cases, likely not want to use the built-in reverb objects. Convolution might partially solve that problem, but even then we are starting to create sounds in the renderer, that weren't available as tracks during production, which means that there is no way to put a compressor or limiter on that reverb, even with side-chain trickery. This leads to the question: do we now need object-based limiters? Or rather: at which point do we stop? And we haven't even gotten into the issue of timbral differences when using amplitude panning on different speaker setups. So as we are not dealing with interactive media (and if we were the whole side-chain thing would be more or less useless anyway), maybe OBA is just no good for music. That is honestly a rather depressing conclusion to reach, if we consider, that the only two immersive formats currently available on music streaming services are object-based.
So what about HOA? From my own experience, things are a bit more streightforward in that domain. Even though there is no routing anything to discrete channels or locking to the nearest speaker (side note: the panning in HOA might perhaps even be preferable in some cases), we also gain a lot of things, the most notable of which is probably a master bus, or, more generally speaking, the ability to largely retain a lot of the things that we are used to from stereo. We might need special software or hardware that can deal with HOA input without breaking its spatial integrity, but typical mixing strategies like summing, send- and insert effects etc. work pretty much the way we're used to. As a bonus, we can pre-render any reverb or room simulation, even really intricate and resource hungry ones, given that we have suitable tools. And if we start thinking of Ambisonics as kind of a 3d extension of M-S or Blumlein stereophony, and considering how often we break the rules of this system by introducing delays between channels, we can start to think of creative ways of translating some of those "errors" into the HOA domain. Encoding stereo signals and panning sources using delays, encoding spaced microphone arrays, or even spaced spherical arrays can all work nicely, and, if used sensibly, may enable us to make mixes, that are informed by the artistic decisions we often make in stereo, while also exploiting the extra benefit of three-dimensional audio. Speaking of which ...

If we forget all the technicalities for a moment, we should probably ask ourselves the question, whether or not immersive is always better, or, perhaps on a more fundamental level, what it is, that may be preferable about immersive? As we dont even have a good, universally accepted definition of immersion yet, the first question is probably tricky to answer, but we might be able to come up with a somewhat satisfying answer for the second one: i would say, that we mostly use immersive or spatial audio to help transport the listener to a specific environment. This is part of the reason it is used so much in games and movies, and probably has applications in broadcasting, where we might want to give viewers the feeling of being, for instance, at a football match. Games, VR, and, to a lesser extent, broadcasting also have an interactive component to them. Music usually doesn't have this, but we may still be able to envelop a listener and transport them to a certain place, ... or maybe not. If we think about it, this might further help to explain, why some genres of music might work better than others in spatial audio. It is relatively clear, how we can put a listener into a concert hall like the Wiener Musikverein, but how do we go about putting them into, say, a Billie Eilish album? Much of what makes this sort of music exciting is the fact that it is not connected to any particular space, real or simulated. In fact, i distinctly remember that, when the album "when we all fall asleep, where do we go?" came out in 2019, much of the talk surrounding it (in music production circles at least) had to do with how insanely close and intimate some of this stuff sounded, especially on headphones, which is how many people are listening to it. I'm bringing this up because, judging by the fact that this thing has more streams than there are people living on earth, we can safely say, that this sort of aesthetic is popular. Anyway, now they are making Atmos mixes of these things, and that's nice, but how much is such a production artistically gaining from this?
In conclusion I would say, that in music we would benefit a lot from the ability to be able to have spatial audio more readily available during production, and maybe even composition, in order to somewhat get away from having to produce everything in stereo, and then treating the immersive version as kind of a post-stereo manual upmix. This includes things like high quality monitoring during recording (ideally even binaural), but also the ability to design software instruments, that can communicate with a spatial audio authoring environment, really anything that helps us as creators work natively in spatial audio. On the distribution side of things it would be great to have more HOA integration. As far as I can see that would also be the easiest way to not rely on a future ADM squeezer too much, as I am not sure, how great of an idea this is for music only content. But that's a different story, and it will be interesting to see how this goes.
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

The thoughts on the post above are my thoughts exactly.

This is why I use a workflow based on HOA and I would inject object-based audio only if the project needs it.

It not only allows for much more creative freedom but also simplifies the production of 3D audio enough to make it easy for us at SoundFellas to concentrate on the creative part of the work, rather than troubleshooting the necessities of object-based audio all the time.

I think that using the advantages of HOA-based workflows we should push towards HOA used in streaming services and this should be our top priority for now. After all Ambisonics market acceptance is long overdue in my opinion and while newer formats may appear shiny, we should concentrate on the benefits that more mature and stable workflows offer for 3D sound.
plush2
Posts: 6
Joined: 04 May 2022 04:31

I finally managed to wade through all the excellent points and counterpoints you two have made. It seems we are all leaning toward HOA as a format. The best reason is because it reflects the reality of sound better than an object based model does. A single sound in the environment rarely emanates from anything approaching a single point. I was having a discussion with a friend recently and we were talking about wave propagation (don't ask how we got on that subject). He was conceptualizing sound waves emanating from his mouth when he spoke and I countered that a certain part of the sound came from the mouth but significant timbres came from resonant cavities, chest and sinus. A single object reduces this complexity which is maybe why we use tricks to introduce the "errors" into our stereo productions.

I heartily agree with everything above with one exception. I think interactivity of a sort is one of the best new frontiers of spatial music production. I read a paper several years ago which of course I can't seem to locate right now. It was challenging the prevailing test method of the time for auditory spatial location tests. They had the usual dummy head setup rigged up with real-time mechanical head tracking of the participants so they could wobble the head and turn it side to side. This greatly reduced the usual areas of confusion for the listeners. Being able to interact even in a very limited way within the spatial audio mix will enhance immersion greatly. That's one of the reasons why I'm so baffled that Apple has this ability for Netflix on their head tracking but not for spatial audio music offerings....what the hell Apple?

I propose a reductive approach to using HOA for music production. I'm adding my 1 listen to the billions who've listened to the album you mentioned, Flo Angerer. One of the complaints of these head-locked mixes being released on Atmos is that they nearly always sound worse than the stereo mix. First of all, they are headlocked so of course they do. More importantly I think many of the tricks we use in stereo production to create space, depth and the illusion of distance are pretty close analogs for the localization, ambience and distance in a soundfield. If there is a good way to reduce HOA down to something like UHJ instead of binaural for monitoring then I think we might have a chance to start drawing straight lines between the production vernacular of stereo and spatial audio. I think something like UHJ is preferrable to binaural in this case. The stationary head position is rationalized by the assertion that better and more personalized hrtf/hrir models will improve localization and realism to the point that it sells the spatial qualities of the mix. To me this is a bit of a shell game as I'm pretty confident not even the content creators have any kind of solid reference for this better binaural model. At least by using UHJ or a <=60 degree stereo decode we have that M/S or blumlein type reference to repeatably compare against.

I apologize if this is a bit scattered. I feel like I came to the party a little late here and I'm scrambling to catch up.
Flo Angerer
Posts: 5
Joined: 08 Apr 2022 09:14

@plush2 Thanks for your great insights!
First a clarification on my part: When I was talking about interactivity I was refering more to things like changing volume or positions of individual objects. I just assumed head tracking to be an integral part of binaural listening, but it is of course a form of interaction, so I should've been more specific about that.
Also, I didn't really get your UHJ reference. Are you just talking about stereo compatibility?
plush2
Posts: 6
Joined: 04 May 2022 04:31

Flo Angerer wrote: 20 May 2022 12:39 @plush2 Thanks for your great insights!
First a clarification on my part: When I was talking about interactivity I was refering more to things like changing volume or positions of individual objects. I just assumed head tracking to be an integral part of binaural listening, but it is of course a form of interaction, so I should've been more specific about that.
Looking back I see now how that comment was tied to your larger point about the limited usefulness of dynamic object panning in spatial music. I fully agree with that point.
Also, I didn't really get your UHJ reference. Are you just talking about stereo compatibility?
I was referring to UHJ as a direct translation of the abstract (needing a decode) form of b-format ambisonics into a directly playable format derived from it. I like UHJ for FOA because it retains some of the spatial qualities of the b-format (or ambix as the case may be) mix it was derived from. It's nice that it can be re-encoded to regain some or all (depending on how many channels it has) of the original but more importantly it presents a reliable fold-down for monitoring and reference sake. Binaural provides one person's ears as a reference which is kind of the ultimate in abstract formats. You literally need to be in that person's head to decode it properly.

What I would like to find, discover, invent if necessary is a format for HOA (ambix 3rd order or better ideally) that worked similarly to UHJ. This would provide a stereo deliverable and, just as important, a stereo reference for content creators.
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

plush2 wrote: 18 May 2022 16:43 I finally managed to wade through all the excellent points and counterpoints you two have made. It seems we are all leaning toward HOA as a format. The best reason is because it reflects the reality of sound better than an object based model does.
I totally agree with you on that.
plush2 wrote: 18 May 2022 16:43That's one of the reasons why I'm so baffled that Apple has this ability for Netflix on their head tracking but not for spatial audio music offerings....what the hell Apple?
Maybe they want to test-drive it first in films before putting filters in their music platform which serves people that might not want extra filtering to mess with the sound in their experience.
plush2 wrote: 18 May 2022 16:43I apologize if this is a bit scattered. I feel like I came to the party a little late here and I'm scrambling to catch up.
No need to apologize, the reason I created this forum and trying to gather people here is to leave our thoughts and brainstorm for future discovery open to the world. In that regard, I'm very glad you joined and honor us with your input. :-)
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

I create a different thread about HOA for massive consumption here: viewtopic.php?t=52

I think the idea deserves its own thread.
Flo Angerer
Posts: 5
Joined: 08 Apr 2022 09:14

plush2 wrote: 20 May 2022 15:30
I was referring to UHJ as a direct translation of the abstract (needing a decode) form of b-format ambisonics into a directly playable format derived from it. I like UHJ for FOA because it retains some of the spatial qualities of the b-format (or ambix as the case may be) mix it was derived from. It's nice that it can be re-encoded to regain some or all (depending on how many channels it has) of the original but more importantly it presents a reliable fold-down for monitoring and reference sake. Binaural provides one person's ears as a reference which is kind of the ultimate in abstract formats. You literally need to be in that person's head to decode it properly.

What I would like to find, discover, invent if necessary is a format for HOA (ambix 3rd order or better ideally) that worked similarly to UHJ. This would provide a stereo deliverable and, just as important, a stereo reference for content creators.
OK, I see. So we are basically talking about a hirarchical system, in which the first two channels produce a stereo compatible output, right? Or should it be more something that a decoder outputs when folding to stereo?
Anyway, do you know of any good software UHJ encoders/decoders?
plush2
Posts: 6
Joined: 04 May 2022 04:31

Flo Angerer wrote: 21 May 2022 12:34 OK, I see. So we are basically talking about a hirarchical system, in which the first two channels produce a stereo compatible output, right? Or should it be more something that a decoder outputs when folding to stereo?
Anyway, do you know of any good software UHJ encoders/decoders?
That is a pretty good description, yes. The list of software isn't long, ATK in reaper will do both encode and decode, Hector Centeno has made a player for Android that will play back UHJ (and other FOA B-format and AMBIX mixes) with phone tracking/panning available. It's called AmbiExplorer.
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

plush2 wrote: 22 May 2022 20:58Hector Centeno has made a player for Android that will play back UHJ (and other FOA B-format and AMBIX mixes) with phone tracking/panning available. It's called AmbiExplorer.
This is good, we need as many players as possible on all platforms. A unified solution would work even better to promote trust for all stakeholders. I think that Spotify will be the one leading the race on that matter, they are a tech company after all.
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

Flo Angerer wrote: 21 May 2022 12:34Anyway, do you know of any good software UHJ encoders/decoders?
The Blue Ripple Audio 3OA Decoders pack has an excellent UHJ Stereo decoder. We used those to produce the 5 albums that we have as a music label on Spotify, YouTube Music, Amazon, and other streaming services.
plush2
Posts: 6
Joined: 04 May 2022 04:31

I should add that Bruce Wiggins has an excellent JSFX UHJ decoder that can convert to planar quad speakers, and FOA ambix or fuma.

https://www.brucewiggins.co.uk/?p=1836
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

plush2 wrote: 29 May 2022 03:19 I should add that Bruce Wiggins has an excellent JSFX UHJ decoder that can convert to planar quad speakers, and FOA ambix or fuma.
Nice! And it's also based on the original recommendations from Gerzon.

I have 3 albums that I mixed in UHJ Stereo just for the fun of it, I would like to listen to them through this plugin, but I can't find the download link.

Can you share the file?
plush2
Posts: 6
Joined: 04 May 2022 04:31

I'm attaching the zipped folder. I imagine this is okay as he has had problems with that link for a while. I recall posting a comment to that effect for him but I don't think it's been fixed yet.
Attachments
WigWare_UHJ_Decoder.zip
(12.72 KiB) Downloaded 2305 times
User avatar
Pan Athen
SoundFellas Crew
Posts: 78
Joined: 04 Dec 2021 20:51
Location: Athens / Greece
Contact:

Thanks!
plush2 wrote: 31 May 2022 16:47I recall posting a comment to that effect for him but I don't think it's been fixed yet.
Websites of people from academia are always like that... :-)
Post Reply