OpenMediaLib User and Development Guide

Audio Mixing

As the penultimate section on audio handling we will briefly consider audio mixing.

A common naïve approach to mixing is revealed when you consider a stereo to mono converter. The silly (and typically first) approach is to say – ah, if I divide each samples in the left and right channels by 2, and add the samples together, I'll get mono.

This is, of course, completely wrong. I consider it like this – when you have two speakers which are 'correctly' [or semi-correctly] positioned to provide a stereo image and you push them together so that they touch, you're getting a better approximation of a mono conversion – it'll sound louder (by virtue of the fact that there are two speakers rather than one), but the resultant 'image' is more or less correct.

Importantly, if you consider a stereo conversation between two people (one on the left, one on the right and neither speaking at the same time), the mono image should sound like both are being delivered through a single speaker.

One correct approach is to add the samples unscaled and use a low pass filter to remove any high frequency component induced by any clipping that may result.

This approach extends from the stereo/mono conversion, to the types of mixing required in a multitrack video editing environment.

As with the description on resampling, low pass filters provide better results with more input, and the compromise of the 2 or 3 frames applied in resampling provides a more than adequate/deterministic solution here as well. Again, the separate filter graph provides a more pleasing result since it can employ past and future samples at a granularity which is distinct from the video frame rate.

However, as a courtesy, the afore mentioned composite filter will mix audio on a per frame basis (no history or future consideration taken into account) providing the audio make up is identical. This is sufficient for preview capabilities and approximations, but for bigger problem areas (where quality is the watchword) the need for separation should be clear.

The separation of the audio and video graph will be explained in more detail in a future section of this document.