Basic means and methods of sound processing. Basic Research

1

Modulation theory has a wide range of applications based on signal processing in the time domain; in particular, it can be used as a basis for solving problems of processing broadband audio signals when transmitting them over a narrowband radio channel, incl. via channel telephone communication. In modulation theory, a signal is described as a complexly modulated (simultaneously amplitude and frequency) process in the form of a product of the envelope (amplitude-modulating function of the signal) and the cosine of the phase (frequency-modulating function of the signal). A characteristic feature of this theory is the selection of information parameters of the signal, the number of which increases after each subsequent stage of its decomposition into modulating functions (multi-stage decomposition). This opens up the opportunity to influence selected information parameters different levels and achieve the desired type of signal processing. The application of modulation theory with the implementation of multi-stage decomposition will make it possible to conduct new research on the study of natural modulations of sound signals in order to improve technical means radio communications using speech signals as the main transmitted information. The review made it possible to draw a conclusion about the relevance of the prospect of using modulating functions for processing audio signals. The prospects for using the division-multiplying operation of the instantaneous frequency of a signal without isolating modulating functions for the purpose of noise reduction are revealed. The prerequisites for its use are given, and methods are developed to study the possibility of using the instantaneous frequency division operation for noise reduction when transmitting frequency-compressed signals in two versions: tracking frequency noise reduction and dynamic filtering.

modulation analysis-synthesis

instantaneous frequency

noise reduction

1. Ablazov V.I., Gupal V.I., Zgursky A.I. Conversion, recording and playback of speech signals. – Kyiv: Lybid, 1991. – 207 p.

2. Ageev D.V. Active band of the frequency spectrum of the time function // Proceedings of GPI. – 1955. – T. 11. – No. 1.

3. Gippernreiter Yu.B. Perception of sound pitch: Author's abstract. dis. Ph.D. Psychol.Sc. – M.: 1960. – 22 p.

4. Ishutkin Yu.M. Development of the theory of modulation analysis-synthesis of sound signals and its practical application in film sound recording technology: Abstract of thesis. diss. for academic qualifications Art. Doctor of Technical Sciences – M.: NIKFI, 1985. – 48 p.

5. Ishutkin Yu.M., Uvarov V.K. Fundamentals of modulation transformations of audio signals / Ed. Uvarova V.K. – St. Petersburg: SPbGUKiT, 2004. – 102 p.

6. Ishutkin V.M. Prospects for processing audio signals based on their modulating functions / In the collection: Problems of sound engineering // Proceedings of LIKI, Vol. XXXI. – L.: LIKI, 1977. – P. 102–115.

7. Korsunsky S.G. Influence of the spectrum of perceived sound on its height // Problems of Physiol.Acoust. – 1950. – T. 2. – P. 161–165.

8. Markel J.D., Gray A.H. Linear speech prediction: Trans. from English / Ed. Yu.N. Prokhorova, V.S. 3star. – M.: Communication, 1980. – 308 p.

9. Markin D.N., Uvarov V.K. Results of practical studies of the relationships between the spectra of the signal, its envelope, phase cosine and instantaneous frequency. Dep. hands No. 181kt-D07, ONTI NIKFI, 2007. – 32 p.

10. Markin D.N. Development of a method and technical means for companding the spectra of speech signals. Author's abstract. dis. for academic competition Art. k.t. n. – St. Petersburg: SPbGUKiT, 2008. – 40 p.

11. Muravyov V.E. On the current state and problems of vocoder technology // Modern speech technologies, collection of works of the IX session of the Russian Acoustic Society, dedicated to the 90th anniversary of M.A. Sapozhkova. – M.: GEOS, 1999. – 166 p.

12. Orlov Yu.M. Dynamic filter-noise suppressor // TKiT. – 1974. – No. 10. – P. 13–15.

13. Sapozhkov M.A. Speech signal in cybernetics and communications. Speech conversion in relation to problems in communication technology and cybernetics. – M.: Svyazizdat, 1963. – 452 p.

14. Uvarov V.K., Plyushchev V.M., Chesnokov M.A. Application of modulation transformations of audio signals / Ed. VC. Uvarov. – St. Petersburg: SPbGUKiT, 2004. – 131 p.

15. Uvarov V.K. Compression of the frequency range of sound signals to improve sound quality during film screening: Abstract of thesis. Ph.D.Tech. Sci. – L.: LIKI, 1985. – 22 s.

16. Zwicker E., Feldkeller R. The ear as a receiver of information: Trans. with him. – M.: Communication, 1971. – 255 p.

17. Gabor D. Theory of communications. – The Journal of the institute of Electrical Engineers, Part III (Radio and Communication Engineering), Vol. 93, No. 26, November 1946. – R. 429–457.

18. Ville J.A. Théorie et application de la notion de signal analytique. – Cables a Transmissions, 2A, No. 1, January, 1948. – R. 61–74; translated from the French in I. Selin, “Theory and applications of the notion of complex signal.” – Tech. Rept. T-92, The RAND Corporation, Santa Monica, CA, August 1958.

Modulation theory has a wide range of applications based on signal processing in the time domain; in particular, it can be used as a basis for solving problems of processing broadband audio signals when transmitting them over a narrowband radio channel, incl. via telephone channel.

The review of methods for processing audio signals revealed the promise of modulation analysis-synthesis developed by Yu.M. Ishutkin in the 70s of the last century for processing and measuring distortions. Subsequently, the modulation theory was developed in the works of his students and followers.

Modulating functions of oscillations of complex shape

In the middle of the twentieth century, two scientists, D. Gabor and J. Wie, independently created the theory of an analytical signal, which makes it possible to describe any random process as an explicit function of time. It was this theory that became the mathematical basis on which the modulation theory of sound signals was subsequently formed.

Under some non-rigid restrictions, any oscillations of complex shape can be represented as a product of two explicit functions of time

where s(t) is the original audio signal,

S(t) - non-negative signal envelope, amplitude modulation function;

cos φ(t) - cosine of the signal phase, frequency-modulated function;

φ(t) - current phase of the signal, phase-modulating function of the signal.

Instantaneous signal frequency, frequency modulating function of the signal.

The modulating functions S(t), φ(t) and ω(t) of the signals are real functions of the real argument t. In general cases, modulating functions cannot be determined based on the original signal s(t): it must be supplemented with a second signal, called the reference s1(t) and for a pair of these signals (, ) modulating functions can be determined. The appearance of these functions depends equally on both signals.

J. Gabor was the first to show in 1946 the need for a reference signal when determining modulating functions and for this purpose applied the direct Hilbert transform to the original signal s(t). In theoretical radio engineering this led to the concept of an analytical signal. However, analytical signal theory was developed for narrow-band oscillations.

Modulating functions of a wideband signal

Subsequently, strict mathematical concepts of modulating functions were extended to broadband audio signals. However, the choice of the reference signal is assumed to be arbitrary, and only requirements are put forward for the orthogonality of the main and reference signals. Nevertheless, at the moment, it is the Hilbert transform that is considered as a technically convenient way to construct a pair of orthogonal signals.

Since in the general case audio signals are non-periodic and can be considered quasi-periodic only at certain fairly short time intervals, in modulation theory, the direct Hilbert transform with the Cauchy kernel is used to determine the reference signal

, (2)

where H is the Hilbert transform operator, integral (2) is singular, i.e. does not exist in the usual sense at the point t = τ, it should be understood as the Lebesgue integral, and its value at the point t = τ as the Cauchy principal value.

Two functions related to each other by transformation (2) are called Hilbert conjugate. From the theory of the Hilbert transform it is known that these functions satisfy the orthogonality condition, that is, their scalar product is equal to zero throughout the entire domain of definition

. (3)

Expression (3) is a definite integral understood in the Lebesgue sense. T - means the range of values ​​of the variable t over which integration is carried out.

In geometric representation, the amplitude-modulating function S(t) is a signal vector rotating around the origin with an angular frequency ω(t), and the signal can develop quickly or slowly, but only in the forward direction, and not in the reverse direction. This means that both modulating functions can take on any positive and negative values ​​(and are not limited by anything) and each has, in the general case, constant and variable components:

where S0 is the constant component (average value) of the signal envelope;

SS(t) - envelope of the variable component of the signal envelope;

cos ωS(t) - cosine of the phase of the variable component of the signal envelope;

ω0 - average value of the instantaneous signal frequency (carrier frequency);

ωd(t) - deviation of the instantaneous frequency of the signal;

ωm(t) - modulating frequency of the signal.

Multi-stage modulation conversion

From the above it follows that the process of decomposing the signal into its modulating functions can be continued - carry out a multi-stage modulation decomposition.

The first stage of expansion gives a pair of first-order modulating functions (see formula 4)

The second stage of expansion gives an additional two pairs of second-order modulating functions. In this case, the first-order envelope S1(t) gives the envelope envelope and the instantaneous frequency of the envelope: S21(t) and ω21(t).

The second stage of the first-order expansion of the instantaneous frequency ω1(t) gives the instantaneous frequency envelope and the instantaneous frequency: S22(t) and ω22(t).

After the third expansion, four more pairs of third-order modulating functions are obtained, etc.

The parameters of modulating functions of various orders listed after formula (4) are important information features of an audio signal, the impact on the values ​​and frequency location of which opens up wide possibilities for processing an audio signal: spectrum compression, timbre change, dynamic range conversion and noise reduction, signal transposition, etc. d.

The technical tasks of processing audio signals by influencing their modulating functions are as follows:

● create a multi-stage demodulator (converter), when a voltage u(t) = s(t) is applied to the input, voltages proportional to the modulation functions of the first, second, etc. would be provided at the outputs. orders;

● influence the values ​​and spectra of these voltages;

● restore the audio signal using processed modulation functions, i.e. carry out amplitude and frequency modulation of generator oscillations.

For example, the use of a nonlinear corrective effect on the parameters of the amplitude modulation function will allow compression and noise reduction of the reconstructed audio signal. By influencing the channel signal with a frequency-modulating function using a nonlinear circuit that has a decrease in the differential transmission coefficient with an increase in the instantaneous values ​​of the output voltage, it is possible to achieve compression of the frequency range of the processed audio signal. By dividing the frequency ωm(t) and eliminating the high-frequency part of its spectrum, the spectrum of the audio signal can be significantly compressed while maintaining high noise immunity.

Prospects for the use of division-multiplication of the instantaneous frequency of a signal without isolating modulating functions for the purpose of noise reduction

Formulation of the problem

When transmitting audio signals over narrowband communication channels, frequency compression leads to a noticeable limitation in the width of the instantaneous frequency spectrum. We are exploring the possibility of replacing components in the spectrum of phonemes of such signals, caused by high frequencies of frequency modulation, with other components - located at close frequencies, but caused by an increase in the deviation of the instantaneous frequency of the phoneme when restoring frequency-compressed signals. Such a replacement should improve the quality of sound transmission due to a more complete subjective perception.

The prerequisites for such a formulation of the problem can be the following:

1. Vowel sounds for most of their duration can be considered as a periodic signal. As the frequency deviation increases, the number of harmonics of the fundamental tone will increase. Consequently, it is possible to reduce the number of fundamental tone harmonics when transmitting a signal, and restore their number on the receiving side of the channel by increasing the frequency deviation.

2. The spectra of voiceless consonants are continuous. The spectra of their instantaneous frequencies are also continuous, in a band approximately equal to half the frequency band of the signal spectrum. Therefore, as the frequency deviation increases, the spectrum of the instantaneous frequency will remain continuous, but the spectrum of the phoneme will expand.

3. The influence of the spectral composition of complex signals on the perception of their pitch is known. Sounds rich in high-frequency spectral components are perceived as higher in pitch compared to sounds that have the same fundamental frequency, but with weak high-order harmonics or fewer of them.

4. Since the substitution of spectral components will occur at high frequencies, it can be assumed that such a substitution will be imperceptible or almost imperceptible to the ear. The basis for this is the reduced sensitivity of hearing to changes in pitch in the high frequency region.

Development of research methodology

Frequency tracking noise reduction

The possibility of using the instantaneous frequency division operation for the purpose of noise reduction will be quantitatively justified after preliminary studies of the permissible limits for reducing the spectra of the modulating functions of audio signals for different transmission channels.

When using instantaneous frequency division for the purpose of transmitting audio signals in frequency-compressed form, it is obvious that the transmitted signal is concentrated in the low-frequency region. Moreover, the frequency bandwidth, which is necessary for undistorted signal transmission, will constantly change, along with the change in the audio signal. Therefore, one of the main tasks of this research can be identified as determining the possibility of creating a tracking low-pass filter (LPSF), the upper limit frequency of which would change over time, taking values ​​in accordance with certain permissible limits on the frequency band of the instantaneous frequency and envelope, which will be known after conducting preliminary research. It appears that the reduction in bandwidth for narrowband signals, which have little to no masking of transmission channel noise, will be very significant. Therefore, for such signals the gain in signal-to-noise ratio will be significant.

The second task of this study should be the determination of the control signal for the low-pass filter. As the first candidates for the role of a control signal, we can propose signals proportional to either ωн(t), or the derivative of the instantaneous frequency of the signal in accordance with . Since noise reduction is achieved by distinguishing the frequency ranges of signal and noise, such noise reduction can be called frequency reduction.

When using the envelope for threshold amplitude noise reduction or for dynamic filtering, we obtain a combined noise suppressor for frequency-compressed signals.

Dynamic filtering

As is known, in existing versions of dynamic filters, all frequency range sound signals are divided into bands, in each of which noise reduction is carried out using a threshold noise suppressor (usually these are inertial devices). The disadvantages of dynamic filters usually include hardware complexity, since a dynamic filter is a combination of several threshold noise suppressors (usually four or more). In addition, difficulties arise in ensuring linear frequency characteristics.

Now it is possible to explore the option of dynamic filtering in one low-frequency band when transmitting frequency-compressed signals, controlling the bandwidth of the envelope signal. As is known, when the sound signal level decreases, first the upper harmonics of the sound are drowned in the noise of the sound transmission channel, and lastly, the vibration of the fundamental tone. This suggests that it is possible, by reducing the filter bandwidth in proportion to the decrease in the envelope, to provide a noise reduction effect without the usual disadvantages of dynamic filters.

Conclusion

In modulation theory, a signal is described as a complexly modulated (simultaneously amplitude and frequency) process in the form of a product of the envelope (amplitude-modulating function of the signal) and the cosine of the phase (frequency-modulating function of the signal). A characteristic feature of this theory is the selection of information parameters of the signal, the number of which increases after each subsequent stage of its decomposition into modulating functions (multi-stage decomposition). This opens up the opportunity to influence selected information parameters of different levels and achieve the desired type of signal processing.

The application of modulation theory with the implementation of multi-stage decomposition will make it possible to conduct new research on the study of natural modulations of sound signals in order to improve technical means of radio communication that use speech signals as the main transmitted information.

The review made it possible to draw a conclusion about the relevance of the prospect of using modulating functions for processing audio signals. The prospects for using the division-multiplying operation of the instantaneous frequency of a signal without isolating modulating functions for the purpose of noise reduction are revealed. The prerequisites for its use are given, and methods are developed to study the possibility of using the instantaneous frequency division operation for noise reduction when transmitting frequency-compressed signals in two versions: tracking frequency noise reduction and dynamic filtering.

Reviewers:

Smirnov N.V., Doctor of Physical and Mathematical Sciences, Associate Professor, Professor of the Department of Modeling Economic Systems, Applied Mathematics of Control Processes, St. Petersburg State University, St. Petersburg;

Starichenkov A.L., Doctor of Technical Sciences, Associate Professor of the Institute of Transport Problems named after. N.S. Solomenko Russian Academy of Sciences, St. Petersburg.

Bibliographic link

Uvarov V.K., Redko A.Yu. MODULATION ANALYSIS-SYNTHESIS OF SOUND SIGNALS AND PROSPECTS FOR ITS USE FOR NOISE REDUCTION PURPOSES // Basic Research. – 2015. – No. 6-3. – P. 518-522;
URL: http://fundamental-research.ru/ru/article/view?id=38652 (access date: 04/26/2019). We bring to your attention magazines published by the publishing house "Academy of Natural Sciences"

Adapter

Since the line input of an audio adapter is the main receiver of an external signal when recording, each manufacturer strives to provide sufficient signal amplification quality at this input. The sensitivity of the line inputs of most sound adapters is approximately the same, and the quality parameters are proportional to the overall quality of the cards. The situation is completely different with microphone inputs: a board costing $100 may have a much worse input in terms of sensitivity and quality than a consumer-grade one for $8. The reason here is that the microphone input for the sound adapter is secondary and its functionality is most often limited to connecting the simplest a cheap microphone for giving voice commands, where the noise level and frequency response are not so critical.

The microphone inputs of modern adapters are designed, as a rule, to connect electret microphones with a built-in amplifier receiving power from the adapter. Such a microphone has a high output impedance and develops up to 50-100 mV at the output, so to amplify the signal to the linear input level (about 500 mV), a simple preamplifier is sufficient. Some adapters, according to the documentation, allow you to connect dynamic microphones that do not require power, but such a microphone develops only 1-3 mV at the output and requires a fairly sensitive and low-noise amplifier, which is quite rare on sound cards. Therefore, a typical board, at best, allows you to get from such a microphone an insufficiently loud, muffled sound, replete with noise and interference, and in the worst case, you will not get any sound from a dynamic microphone at all. Preference is given to electret microphones due to the fact that the computer is a source of many electromagnetic radiation, creating noticeable interference at the sensitive microphone input, which is quite difficult to cope with. Creating a low-noise amplifier would require a special board layout, careful filtering of supply voltages, shielding of the input circuit area, and other complex and expensive tricks.

The microphone input connector of most adapters is monophonic; it uses only the end contact (TIP) of the plug to transmit the signal, which in a stereo jack is responsible for the left channel signal. The middle contact (RING), which is responsible for the right channel in the stereo connector, is either not used at all in the microphone connector, or serves to transmit +5 V supply voltage for an electret microphone. When there is no separate contact for powering the microphone, the supply voltage is supplied directly to the signal input, and the amplifiers in this case must have capacitive isolation of the input and output.

Microphone

As we found out, for direct connection to the adapter, electret microphones are best suited, which are usually available in fairly miniature versions: in the form of “pencils” with stands or “clips” attached to clothing or to the monitor body. They are inexpensive and sold in computer accessory stores; If you do not require high recording quality close to professional, you can get by with such a microphone. Otherwise, you need a high-quality professional microphone, for which you will have to go to a music equipment store, and its price will be about an order of magnitude higher.

There are bound to be a number of problems with connecting a professional microphone. Such microphones are most often dynamic and produce a signal with an amplitude of several millivolts, and the microphone input of most sound adapters, as already mentioned, is not capable of normally perceiving such weak signals. There can be two outputs: either buy a microphone preamplifier in the same music store (which can turn out to be a rather expensive toy) and connect its output not to the microphone, but to line input adapter; or use a microphone with a built-in preamp and power (battery). If you have radio engineering skills, you can assemble a simple amplifier yourself - circuit options are quite often found in books and on the Internet.

Additionally, professional microphones usually have XLR connectors, while computer audio adapters usually have mini-DIN connectors, so an adapter will be required; Sometimes such adapters are sold in music stores, but you may have to solder it yourself.

And finally, it may well happen that any professional microphone will be much superior to your sound adapter in terms of quality parameters, and the sound you get with such a microphone will ultimately be no better than what a simple electret can provide. Therefore, if you have doubts about the high quality of your adapter (and simple adapters costing about $10, especially built-in ones, have very mediocre parameters), then it makes sense for you to negotiate with the store about the possible return of the purchased microphone if you cannot get it using fairly high-quality sound.

Recording technology

Unlike fixed signal sources, a microphone has a number of features that must be taken into account when working with it. First of all, he likes to “phonon”: if an amplified signal from the microphone arrives at the speakers, then the microphone perceives it, the signal is amplified again, etc., that is, the so-called positive feedback is formed, which “swings” the sound path, introduces it into self-excitation mode, which manifests itself through a loud whistle, ringing or rumble. Even if the path does not enter the self-excitation mode, the positive connection can produce a ringing or whistling sound, which noticeably spoils the signal. At the same time, a sensitive microphone can successfully pick up a signal even from headphones, if the sound in them is loud enough and the external sound insulation is weak. Therefore, it is necessary to experimentally determine the position/direction of the microphone and the volume of the amplified sound at which the positive relationship is least apparent. It is recommended to make the final recording with the speakers turned off or at least as muted as possible.

Sensitive microphones, especially simple and cheap ones, perfectly perceive extraneous sounds, such as the rustling of fingers on the microphone body or the slight creaking of the body itself, even from slight compression (you've probably heard similar sounds during telephone conversations). To avoid such interference, it is better to install the microphone on a comfortable stand or hold it freely without squeezing it with your fingers.

Another unpleasant moment in using a microphone is the so-called spitting air flow, which is especially pronounced on plosive consonants such as “p”, “b”, “t” and the like. As a result of an intense sound pulse hitting the membrane, a sharp surge in the signal amplitude is formed, overloading the amplifier and/or ADC. Professional microphones have wind protection against this - a mesh or soft pad located at some distance from the capsule, but even this does not always save, so you have to get used to each microphone, getting used to holding it either at the right angle so that direct air flows pass by, or sufficient distance so that they reach the microphone in an already weakened state.

As you experiment with the microphone, you will find that the timbre of the recorded voice depends quite a lot on the distance from the mouth to the microphone and on the angle of the microphone relative to the face. This is due to the fact that low-frequency components of the voice are most scattered and attenuated with distance, while high-frequency components are attenuated less, but have a more pronounced directionality. The most juicy and velvety voice timbre can be obtained by placing the microphone directly at the mouth, but then you will have to tinker a lot with the angle of inclination and practice a lot to avoid “spitting.”

Recording via external devices

Recently, very exotic ways of recording sound from a microphone and transferring it to a computer have appeared. Thus, Creative releases a digital player, Jukebox, containing a miniature hard drive, standalone controller And USB interface. The main function of the player is to play sound files that are transferred to it from a computer, but the built-in microphone allows you to use it as a stand-alone voice recorder: the sound is recorded on the hard drive, which ensures continuous recording for several hours, and subsequently the soundtrack can be transferred to the computer. Another Creative product, PC Cam, is a hybrid of a digital camera, camcorder and voice recorder and allows you to record audio into the built-in Flash memory, from where it is retrieved using the same USB interface.

Removing noise and interference

Since the voice signal has a fairly narrow spectrum (hundreds of hertz - a few kilohertz), the noise removal operation can be applied to it with a greater depth than in the case of an arbitrary musical signal. When recording, it may also turn out that in the most successfully recorded fragment (from an artistic point of view), the microphone still turns out to be “spitted” in one or several places, and attempts to repeat a phrase or verse of a song with an equally successful placement of accents do not give the desired result. In such cases, you can try to round off the overload pulses, maintaining or reducing their amplitude. With a small number of impulses, it is convenient to do this manually, enlarging the image until nodal points appear that can be clicked with the mouse.

Voice processing methods

As we have already said, a complex musical signal contains many heterogeneous components, which are affected by most sound processing methods with different effects, so the range of universal signal processing methods is very narrow. The most popular reverberation method imitates multiple reflections of sound waves and creates the effect of space - a room, hall, stadium, mountain canyon, etc.; Reverberation allows you to add richness and volume to a “dry” sound. Other universal processing methods come down to manipulating the frequency response (equalizer), cleaning the phonogram from noise and interference.

In relation to the primary, simple sound signal, the entire range of existing processing methods - amplitude, frequency, phase, time, formant, etc. can be quite successfully applied. Those methods that give rise to cacophony on a complex signal can often lead to the creation of very interesting and striking effects on simple signals, widely used in the audio industry.

Installation

Computer editing of speech phonograms - a typical journalist's activity after recording an interview - is both simple and complex. At first it seems simple, thanks to the structure of speech that is convenient for visual analysis, the presence of noticeable pauses between words, bursts of amplitude in places of emphasis, etc. However, when you try, for example, to rearrange two phrases separated by literally seconds, it turns out that they do not want to join - the intonation, the breathing phase, the background noise have changed, and a fill can be clearly heard at the junction. Such interruptions are easily discernible in almost any radio interview, when the speech of a person who is not a professional radio journalist and, therefore, does not know how to say only what should go on air is recorded. The unnecessary is cut out of the speech, some fragments are rearranged to better suit the meaning, as a result of which the ear is constantly “surprised”, since such intonation and dynamic transitions do not occur in the flow of natural human speech.

To smooth out transition effects, you can use the crossfade method, although it will allow you to match speech fragments only by amplitude, but not by intonation and background noise. Therefore, we consider it necessary to warn those who find computer editing a convenient way to falsify a recording, for example, a negotiation: an examination can easily identify even gluing points that are indistinguishable to the ear, as is the case with the forgery of documents using a scanner and printer.

Amplitude processing

The simplest type of dynamic amplitude processing of the voice is its modulation with a periodic signal, when the amplitudes of the signals are multiplied and the voice acquires the amplitude characteristics of the modulating signal. By modulating with a low-frequency (units of hertz) sinusoidal signal, we get a “gurgling” voice, increasing the frequency of the signal - vibrating. By using a rectangular, triangular or sawtooth shape instead of a sine wave, you can give your voice a metallic, distorted, “robotic” intonation.

Amplitude modulation of a selected fragment of a phonogram is performed as part of the Generate g Tones operation for generating periodic signals. In the Base Frequency field, the main frequency of the signal is set in hertz, in the Flavor field - the type of pulse, in the Duration field - the duration in seconds. Volume controls set the signal level.

The Frequency Components group of sliders determines the harmonic levels of the main signal with the numbers specified in the sliders. Frequency modulation of the signal can be obtained using the Modulate By fields - offset from the fundamental frequency in hertz - and Modulation Frequency - modulation frequency. When the Lock... field is checked, all these parameters, including the fundamental frequency, are stationary; When unchecked, you can set their initial/final values ​​in the Initial/Final Settings tabs - they will change linearly during the generated segment.

The Source Modulation field group determines how the generated signal will be used. By default, when none of these fields are checked, the signal is inserted into the soundtrack or replaces the selected fragment; otherwise, it is used to perform a given operation with the selected fragment: Modulate - normal modulation (multiplication), Demodulate - demodulation (division), Overlap (mix) - simple mixing of signals. Successive modulation and demodulation of the same signal restores the original signal (possibly with a modified overall level). Experimenting with different combinations of parameters sometimes gives very funny and unexpected results.

Temporary processing

This type of processing is based on shifting the original signal in time and mixing the result with the original signal, after which the shift and mixing can be applied again. When shifts occur for short periods of time, comparable to the duration of the period of the original signal, phase effects such as interference occur, causing the sound to acquire a specific color; This effect is called a flanger and is used both with a fixed shift value and with a periodically changing or even completely random one. With shifts at intervals exceeding the duration of the period, but not more than 20 ms, a choral effect (chorus) occurs. Due to the commonality of the technology, these two effects are often implemented by one software block with different parameters.

With multiple shifts at intervals of 20...50 ms, a reverberation effect occurs - booming, volume, because the hearing aid interprets delayed copies of the signal as reflections from surrounding objects. At intervals greater than 50 ms, the ear no longer clearly associates individual copies with each other, resulting in an echo effect.

In Cool Edit 2000, effects based on time delays are grouped under the Transform g Delay Effects group. The flanger and chorus effects are created by the flanger operation:

The Original/Delayed engine controls the ratio of the original and delayed signals (intensity, or depth of the effect). Initial/Final Mix Delay - the initial and final delay of the copy - changes cyclically within these limits. Stereo Phasing - the angle of phase shift between channels - allows you to create a curious effect of “twisting” the sound, especially in headphones. Feedback - feedback depth (the amount of the resulting signal mixed into the original one before applying the operation) - allows you to control the severity and sharpness of the effect.

The Rate group specifies the cycling parameters of the effect. Period - the time interval during which the flanger passes from the initial delay to the final delay and back; Frequency - the reciprocal value, the frequency of round trips; Total Cycles - the number of complete passes through the selected fragment. Setting any parameter causes automatic recalculation of the rest.

The Mode group controls the features of the effect: Inverted - inversion of the delayed signal, Special EFX - additional inversion of the original and delayed signals, Sinusoidal - sinusoidal law of delay change from initial to final (if it is disabled, the delay changes linearly).

A set of presets allows you to visually study the features of the operation. Try selecting several presets, changing the preset parameters in each of them and remembering to Undo each time to compare the effect on the sound of different combinations of parameters.

The reverb effect in Cool Edit 2000 can be implemented in two ways: using Echo Chamber, a room simulator with given dimensions and acoustic properties, and Reverb - a volume effect generator based on an algorithm built into the editor for simulating multiple reflections in space. Because the this type processing is universal and applies to any sound material, we will briefly describe the second method as the most popular.

The Total Reverb Length field/slider determines the reverberation time during which the reflected signals are completely attenuated; it is indirectly related to the volume of space in which sound travels. Attack Time - time for the reverberation depth to rise to the nominal level; serves for a smooth manifestation of the effect throughout the processed fragment. High Frequency Absorption Time - the time of absorption of high-frequency components by the volume, proportional to the “softness” and “muffling” of the volume. Perception - degree of intelligibility: lower values ​​(smooth) - weak and soft reflections that do not interrupt the main signal, larger values ​​(echoey) - clear and strong, clearly audible reflections that can worsen speech intelligibility.

The Mixing sliders/fields determine the ratio of the original (dry) and processed (wet) signals in the result.

The echo effect is implemented by the Echo operation and adds gradually fading copies of it to the signal, shifted by equal periods of time. The Decay regulator sets the amount of attenuation - the level of each successive copy as a percentage of the level of the previous one. Initial Echo Volume - the level of the first copy as a percentage of the level of the original signal. Delay - delay between copies in milliseconds. The Successive Echo Equalization group of controls controls the equalizer through which each successive copy is passed, which allows you to set different acoustic characteristics of the simulated space.

Because the effect is "ongoing" in time, it can create a sound fragment that is longer than the original one. For this purpose, the Continue echo beyond selection item is provided - permission to mix an echo signal to a section of the phonogram that continues beyond the border of the selected fragment. In this case, only the selected fragment will be taken as the source signal, and the remaining part of the phonogram will be used exclusively to place the “tail”. If there is not enough space for a “tail” in the phonogram, an error message will be displayed and you will have to add a section of silence to the end of the phonogram using the Generate g Silence operation.

The effect is best perceived on relatively short sounds. On long words or phrases, in order to exclude the occurrence of “gibberish” - multiple repetitions of various syllables or words interrupting each other, it is better to make the effect “end”, choosing for repetition only the short final fragment of the phrase or even the last stressed syllable of the word. Try experimenting with different words and phrases to get a feel for which final part is best to use for "breeding" in each particular case.

Spectral processing

The most striking and interesting effect from this class, implemented in Cool Edit 2000, is the change in height and speed. Everyone knows the effect of increasing or decreasing the pitch of the signal when changing the speed of the tape in the tape recorder or the rotation of the record. With the development of digital signal processing methods, it has become possible to plausibly implement each of these effects separately - changing the pitch while maintaining the timing characteristics, or vice versa.

This type of processing in Cool Edit 2000 is carried out by the Transform g Time/Pitch g Stretch operation. There are two options - with a constant or with a sliding coefficient. The coefficients are set by the Initial/Final Ratio fields, which are also associated with the engines for ease of change. The coefficient can, in addition, be set indirectly by the Transpose field in the form of the number of musical chromatic semitones up (sharp) or down (flat). In the mode of changing the duration, the Length field is also available, in which you can set the required length of the resulting fragment.

The Precision switch sets the processing accuracy: low (Low), medium (Medium) and high (High) - this is necessary because the spectral processing operation requires a lot of calculations and reducing the accuracy allows for faster processing - at least at the experimental stage. The Stretching Mode switch sets the type of processing: Time Stretch - acceleration/deceleration in time, Pitch Shift - pitch shift, Resample - simple resampling, similar to changing the speed of a tape/record.

The Pitch and Time Settings group of parameters controls the specifics of the operation. Processing is performed by breaking the fragment into small sound blocks; The Splicing Frequency parameter specifies the number of such blocks in one second of the fragment. Increasing this “sampling frequency” makes the blocks smaller, increasing the naturalness of the processing, but at the same time the crushing effect increases, giving rise to unpleasant overtones. The Overlapping parameter sets the degree of overlap of adjacent blocks when assembling the resulting signal - a small mutual overlap allows you to smooth out the overlapping sounds from their joining. The Choose appropriate defaults item is used to automatic installation these parameters to the most appropriate, from the editor's point of view, values.

This article completes a short series on recording and processing sound on a home computer.

ComputerPress 12"2002

Dynamic range converters of audio signals based on O ve module And functions

Kharitonov Vladimir Borisovich,

Candidate of Technical Sciences, Professor

Zirova Yulia Konstantinovna,

graduate student of the department h audiotechnicians

St. Petersburg State University University of Cinema and Television.

Conversion of the dynamic range of audio signals based on modulating functions in the theory of modulation analysis-synthesis is inertia-free. Moreover, we are not talking about nonlinear processing of instantaneous signal values, which is inertia-free, but introduces nonlinear distortions into the processed signal. Processing signals according to their modulating functions theoretically in a number of cases solves the problem of converting the dynamic range of signals without introducing distortion into them. In practice, as noted by the authors who performed the analog implementation of modulating function processing devices, it is impossible to achieve theoretical results due to the limited accuracy and instability of the parameters of analog devices for isolating and processing modulating functions. This article presents the results of a study of the digital implementation of a dynamic range conversion device on the theoretical basis of modulation analysis-synthesis, which made it possible to clarify the potential of the method by means of precise digital signal processing.

Initially, dynamic range converters (DRCs) of audio signals were used in radio broadcasting to protect audio paths from overloads, match the range of signal levels with the dynamic range of the channels over which they are transmitted, and reduce the influence of noise from recording media. For similar reasons, the sound for the cinematic audio transmission channel was compressed, since the dynamic range of analogue photophonograms is usually 35–45 dB, and the range of levels of the sound program (perceived by hearing) can reach almost 110 dB. They often resort to compression of both speech and music to smooth out sound dynamics and increase speech intelligibility at live performances when, for example, the speaker for some reason moves significantly away from the microphone or approaches the latter. There are many more examples of using only amplitude compressors. But dynamic range conversion is not limited to its compression, but also includes limiting, noise reduction and expansion of audio signals. All these types of sound processing are now widely used and presumably will help sound engineers realize their creative ideas and solve technical problems for a long time.

Some reasons for the inclusion of traffic rules in audio paths with the transition to digital recording methods have lost their relevance: the dynamic range of digital recording media and audio paths is comparable to the dynamic range of human hearing. But in a cinema, if you do not compress the dynamic range of the soundtrack of a movie when listening to it, then the quiet fragments will simply drown in the noise of the auditorium. Also, leaving an upper limit on the signal level can deafen audiences or overload power amplifiers and theater speakers. Thus, transforming the dynamic range of cinematic sound is necessary to ensure comfortable listening in a cinema hall. In digital audio format Dolby Digital , used to record soundtracks of most modern films, provides for the formation of a special dynamic range control signal. The playback equipment provides for the use of this signal to regulate the dynamic range, and it is possible to change the degree of compression depending on the conditions of a particular auditorium. In this regard, it remains relevant to develop dynamic range converters that provide high quality audio signal processing with minimal intervention from the sound engineer.

According to the performance criterion, traffic rules are divided into two groups: inertial (with a dynamically changing transmission coefficient) and inertialess (instant action).

Inertial converters have been used in audio engineering for many decades; the principle of their operation, advantages and disadvantages are described in sufficient detail in the literature. Their work is based on isolating the envelope from the signal, generating a control signal based on the envelope, and then multiplying these two signals: the input audio signal and the control signal:

where is the input signal, is the control signal, is the output signal.

Multiplication in the time domain corresponds to convolution of the spectra of the named signals in the frequency domain.

where is the frequency spectrum of the input signal, is the frequency spectrum of the control signal, is the frequency spectrum of the output signal.

To generate a control signal, low-frequency filtering of the maximum or root-mean-square values ​​of the input signal is used. As a result of such filtering, the control signal changes inertia with respect to changes in the amplitude or root-mean-square value of the signal. Due to the contradictory requirements for filtering the transmission coefficient function, disadvantages of inertial converters arise:

· A smooth increase in the control signal leads to surges in the output signal when the input signal sharply increases. These emissions may go beyond the linear portion of the transmission characteristics of the audio path. In this case, nonlinear distortions will appear;

· a sharp increase in the control signal will eliminate emissions, but at the same time the control signal will acquire a steep edge - this will enrich its spectrum, which means that after convolution of the spectra of the input signal and the control signal, the spectrum of the output signal will be significantly enriched. This will cause the appearance of audible effects of the traffic rules;

· the slow process of restoration of the transmission coefficient with a sharp decrease in the signal leads to the appearance of the effect of “breathing pause noise”. This effect is expressed in an audibly noticeable decrease in the volume of a quiet fragment of the signal with a gradual subsequent increase;

· a sharp restoration of the transmission coefficient will cause the appearance of ripples in the control signal when processing an audio signal with intense low-frequency components. These pulsations cause amplitude modulation of the processed signal and lead to the appearance of nonlinear distortions.

In order for the listed distortions not to be noticeable to the ear, it is necessary to select the optimal filter parameters for a certain type of audio material: speech or music.

Inertia-free signal level limiters are known; in them, instantaneous signal values ​​exceeding a certain specified threshold signal value are subject to limitation. In this case, the signal shape changes and large nonlinear distortions appear, so such devices are practically not used. They are used mainly as a means of protecting against overload of the signal transmission path.

Inertia-free conversion of the dynamic range of audio signals based on modulating functions in the theory of modulation analysis-synthesis is free from the listed disadvantages of inertial devices and the above-mentioned inertia-free signal level limiters. In the theory of modulation analysis-synthesis, all transformations, including transformation of the dynamic range, are based on the isolation from the signal and subsequent processing of modulating functions: amplitude and/or frequency. In addition to transforming the dynamic range of a signal, based on modulation transformations it is possible to carry out: inertia-free control of the timbre of audio signals, compression of the frequency range of audio signals based on the selection and nonlinear processing of their instantaneous frequency and other types of transformations.

An analog spin-off compressor based on modulating functions turned out to be quite difficult to implement. The results of his work, presented in, show that all the disadvantages of inertial compressors in in this case none. But due to the limited accuracy and instability of the parameters of analog devices for isolating and processing modulating functions, the results are far from theoretically possible. Due to the complexity of analog implementation, of course, the creation of a digital inertia-free traffic control system based on modulating functions is of great interest. Firstly, this will improve the quality of sound processing through the use of dynamic range conversion algorithms that are inaccessible or difficult to implement with the required accuracy in analog form. Secondly, due to the widespread use digital ways For audio recording, processing and playback, it is most natural to perform dynamic range conversion also in digital form. An accurate digital implementation of PDD based on modulation analysis-synthesis will make it possible to fully elucidate the potential capabilities of the method, which has so far been hampered by fundamentally irremovable errors in the analog implementation.

Before presenting the results of the operation of a digital inertia-free compressor, it makes sense to consider in more detail the modulating functions and the basics of modulation signal conversions.

According to the theory of modulation analysis-synthesis, an arbitrary signal can be represented as the result of the combined application of amplitude and frequency modulation:

,

if you successfully select a pair of modulating functions - amplitude modulating function and - frequency modulating function. It has been proven in theory that to uniquely select this pair of functions, it is necessary to supplement the original signal with a reference signal using the Hilbert transform. The concepts of modulating functions of a signal were introduced back in 1945 by D. Gabor.

The envelope (amplitude modulating function) of a pair of signals conjugate according to Hilbert is called a non-negative function of time

.(1)

The instantaneous frequency (frequency modulating function) of a pair of signals is the derivative of the current phase:

The concepts introduced by D. Gabor have found wide application in describing the transformations of narrowband signals.

Yu. M. Ishutkin proposed to generalize the definitions of modulating functions introduced by D. Gabor, without imposing restrictions on the width of the frequency spectrum of signals.

The idea of ​​processing an audio signal by influencing its modulating functions, proposed by Yu. M. Ishutkin, is to:

1. Based on a known real signal use the Hilbert transform to create a complex signal

,

Where – Hilbert mapping of the signal.

2. For this pair of signals, calculate the modulating functions: amplitude modulating function and frequency modulating function of the signal.

3. Convert modulating functions for processing purposes using linear and nonlinear circuits.

4. Using the modified modulating functions, synthesize a new audio signal.

The combination of the first two operations, as a result of which the modulating functions of the signal become known, is called modulation analysis. The last operation is called modulation synthesis. The structure of the full modulation analysis-synthesis channel is shown in Fig. 1.

.

To build a digital system, it is necessary to make the required transformations of the digital representation of the audio signal. A digital inertia-free traffic control system can be built using a direct control scheme. Taking into account the sampling of analog signals, its block diagram is shown in Fig. 2.


Fig.2. Structural scheme digital inertia-free compressor with direct regulation.

The original signal is represented as

,

where is the discrete instantaneous amplitude modulating function, and is the discrete instantaneous phase of the signal. The envelope demodulator performs the Hilbert transform and calculates the amplitude modulating function. The frequency spectrum of the original signal, in accordance with the property of the spectrum of the product of images, will be equal to

,(2)

where is the frequency spectrum of the amplitude modulating function, is the symbol of the direct Fourier transform, its implementation in this expression forms the frequency spectrum of the cosine of the instantaneous phase of the signal as the second convolution operand, and is the frequency spectrum of the signal.

As a result of the nonlinear transformation of the instantaneous amplitude modulating function, performed in the exponentiation block, we obtain a new discrete amplitude modulating function , Where - some nonlinear function, in this case a power function, realizing the amplitude characteristic of the traffic flow of the required type. The new amplitude modulating function will correspond to a new frequency spectrum. The frequency spectrum of the signal synthesized using the modified amplitude modulating function will have the form

The introduction of a delay line is necessary to synchronize the source signal with the envelope signal, the calculation of which is inevitably accompanied by a time delay.

Adding a constant signal to the new envelope is necessary so that the conversion begins with an envelope value greater than the threshold level specified by the constant term.

For the special case of implementing the amplitude characteristic of a compressor with compression of the dynamic range by half, the signal at the output of an inertia-free SPD can be represented by the following relation:

.

Most of the required mathematical operations are performed by a digital system with high accuracy. Perhaps the most complex element of the digital PDA is the wideband digital Hilbert converter (DHC), which is part of the envelope demodulation unit. The quality of traffic rules largely depends on it. To achieve high conversion quality, the CPG must, in a wide frequency band, from 32 Hz to 16000 kHz, provide a frequency-independent phase shift of the signal with an error of the order of . The magnitude of the phase error is chosen such that the pulsations of the instantaneous amplitude of the tone signal resulting from it are not noticeable by ear. With such a phase error, their level will not exceed -80 dB. The implementation of such a converter is discussed in [ 10 ].

A computer model of a digital inertia-free compressor based on modulating functions, built according to the scheme in Fig. 2, when testing its performance on test single-tone signals, gave positive results, thereby demonstrating the correctness of the algorithmic solutions found, as well as the successful solution of problems that inevitably arise during the transition from analog representation of signals to digital. The amplitude modulating function of the original single-tone signal is a constant function of time. As a result of the nonlinear transformation of the amplitude modulation function, a new time amplitude modulation function is obtained, but in the case of a single-tone signal it will again be a constant function of time. The spectrum of the original amplitude modulating function and the nonlinearly transformed one for a single-tone signal consists of one harmonic at the signal frequency. The result of convolution of a single-tone signal synthesized using a modified amplitude modulating function will certainly be similar in shape to the original signal.

Testing the operation of the inertia-free traffic control motor on a real sound signal yielded unexpected results, namely, at times, very poor quality of the sound that had undergone nonlinear processing. To find the cause that causes audibly noticeable artifacts when listening to a processed complex audio signal, an analysis of the shapes of the modulating functions was performed (comparing the values ​​at the output of the digital model with the calculated values), as well as a comparison of the shapes of their frequency spectra. A signal consisting of two harmonic components with frequencies Hz and Hz was selected as a test signal for such a test:

where, is the signal sampling frequency equal to 44,100 Hz.

Below is the calculated time diagram of the test signal and its frequency spectrum (Fig. 3a), as well as time diagrams and frequency spectra of its amplitude modulating function (Fig. 3b) and phase cosine (Fig. 3c). The spectra of the amplitude modulating function and phase cosine consist of many components, but as a result of convolution of such spectra, only two components remain.

Fig.3. Time functions (right) and frequency spectra (left): a) beat signal with frequencies of 1000 and 1500 Hz; b) amplitude modulating function of the beat signal; c) cosine of the phase of the beat signal.

The analytical expression for the time amplitude modulating function has the form:


To calculate its spectrum, it is convenient to use the tabular cosine transform of the Fourier function at

(3)

Table values ​​of the gamma function for arguments in the range from 0 to 2, as well as formulas for calculating the gamma function for large and negative argument values, are presented in. Table 1 summarizes the results of the analytical calculation of the spectral components of the amplitude modulating function at angular frequencies from 0 to with a period equal to . In the discrete domain, the frequency corresponds to a frequency equal to the ratio of the sampling frequency to the number of samples in the signal period. Almost complete coincidence of the values ​​indicated in the diagrams in Fig. 3, with the result of the analytical calculation, confirms the correctness of the constructions in Fig. 3.

Table 1.

Analytical values ​​of the spectrum of the amplitude modulating function.

Discrete frequency

–3,93

–13,47

1000

–27,45

1500

–34,82

2000

–39,94

2500

–43,88

3000

–47,1

Compression of the dynamic range by half corresponds to power-law processing of the amplitude modulating function with an exponent equal to 1/2. For this case in Fig. Figure 4 shows the differences in the time functions and frequency spectra of the original (dashed line) and processed (solid line) signals (Fig. 4a), as well as their amplitude modulating functions (Fig. 4c). The spectra of the original and processed signals are shifted relative to each other by 30 counts in the diagram for a more visual representation of their differences.

In table Figure 2 presents the results of the analytical calculation of the spectrum of the nonlinearly transformed envelope, calculated using formula (3) for . They almost exactly coincide with the values ​​indicated in the diagrams in Fig. 4, which confirms the accuracy of the latter.


Fig.4. Time functions (right) and frequency spectra (left): a) signals at the input and output of the inertia-free converter; b) the amplitude modulating function of the input signal and the result of its power-law processing.

Table 2.

Analytical values ​​of the spectrum of a nonlinearly transformed amplitude modulating function (power 1/2).

Discrete frequency

2,35

16,33

1000

25,87

1500

31,25

2000

35,03

2500

37,95

3000

40,33

The nonlinear transformation of the amplitude modulating function modified its spectrum (Fig. 3b). As a result of convolution of such a nonlinearly transformed envelope and the input signal, the output signal will have a significantly enriched spectrum compared to the input (Fig. 3a). In the beat signal synthesized using the modified envelope, additional components will appear that can be considered as intermodulation distortions. They change the subjective perception of the converted signal. It is obvious that in any other case, except for processing a single-tone signal, due to a change in only one of the operands in expression (2), a spectrum of the converted signal will be obtained that differs from the original one. The degree of enrichment of the output signal spectrum depends on the bandwidth of the envelope: the wider the spectrum of the nonlinearly transformed envelope, the more enriched the frequency spectrum of the converted signal is.

Thus, in its pure form, inertia-free conversion of the dynamic range of audio signals based on modulating functions is unsuitable due to changes in the frequency spectrum of the converted signal, in some cases clearly noticeable by ear. It is possible, of course, to filter the spectrum of the nonlinearly transformed envelope, approximately to the width of the critical hearing band in the low-frequency part of the audio range. Then the additional components will be within the same critical band with the components of the spectrum of the original signal and will be masked. But the PDD will lose its inertia-free property due to the limited duration of the transition function of the instantaneous envelope filter.

Conclusions:

· The digital model of the SDA made it possible to get rid of the fatal errors of the analog implementation and to clarify the potential capabilities of the SDA method based on nonlinear processing of modulating functions.

· Listening to phonograms after processing with digital traffic data based on modulating functions revealed the occurrence at some moments of audible gross distortions of the audio signal.

· Analysis of the time functions and frequency spectra of signals arising during traffic control using modulating functions made it possible to explain the occurrence of audible distortions by enriching the frequency spectrum of the processed signal due to a change in the spectrum of the amplitude modulating function. To reduce the noticeability of distortion, it is necessary to filter the converted amplitude modulating function. At the same time, its spectrum narrows, and provided that the additional components are in the same critical hearing bands with the main components, the former are effectively masked by the latter. True, in this case, traffic rules lose their property of inertialessness.

· Due to the presence of fundamental shortcomings in inertia-free traffic controllers based on modulating functions, it seems that the creation of more advanced traffic controllers should follow the path of improving inertial converters.

Literature.

1. Fundamentals of modulation transformations of audio signals: Monograph / Ishutkin Yu. M., Uvarov V. K.; Ed. V. K. Uvarova. – St. Petersburg: SPbGUKiT, 2004.

2. Radio broadcasting and electroacoustics: Textbook for universities / A. V. Vykhodets, M. V. Gitlits, Yu. A. Kovalgin and others; Ed. M. V. Gitlitsa. – M.: Radio and Communications, 1989.

3. Udo Zoelzer. Digital Audio Signal Processing. John Willey & Sons. Chichester, New York, Weinheim, Brisbane, Singapore, Toronto, 1997.

4. Ostashevsky E. N. Development of a method and equipment for controlling non-stationary phases of signals for timbre conversion when creating sound effects: Abstract of thesis. dis. Ph.D. those. Sci. – L.: LIKI, 1987.

5. Uvarov V.K. Accurate companding of frequency and dynamic ranges of audio signals. – St. Petersburg: SPbGUKiT, 2002.

6. Plyushchev V. M. Development of a method and devices for inertia-free conversion of the dynamic range of sound signals: Abstract of thesis. dis. Candidate of Technical Sciences –L.: LIKI, 1986.

7. Application of modulation transformations of audio signals: Monograph / V. K. Uvarov, V. M. Plyushchev, M.A. Chesnokov; Ed. IN.K. Uvarova - St. Petersburg: SPbGUKiT, 2004.

8. Prospects for processing audio signals based on their modulating functions. Ishutkin Yu.M. Proceedings of the Leningrad Institute of Film Engineers, 1977, vol. . -With. 102-115.

9. D.Gabor, J.IEE 93, (pt3), (1946).

10. Journal of scientific publications of graduate students and doctoral students, ISSN 1991-3087, 2008, No. 9. – p. 213-218.

11. Integral transformations and operational calculus, V. A. Ditkin, A. P. Prudnikov, Main editorial office of physical and mathematical literature of the Nauka publishing house, M., 1974.

12. Handbook of mathematics (for scientists and engineers). G. Korn, T. Korn. – M., 1977.

03/02/2015 at 10:15

So, in this cycle In this article we will talk about what compression is and how to use it. Unfortunately, people often use it without understanding its basics and the result is far from the best. best quality. This is what prompted me to write a series of articles, where we will analyze in detail the operation of a device called a compressor, and I will show its use in practice.

One of the main parameters of sound is its dynamics. With the help of dynamics, you can emphasize notes and musical phrases by adding new colors to the work, but, as practice shows, few musicians (professionals in their field are not taken into account here) manage to do this. And drums that are not subject to dynamic processing sound dry and inexpressive. And the answer here is simple - our hearing is more sensitive to high sounds and less to low ones. As an example, we can compare the sound of a cymbal and a sub-kick, normalize them to 0db and listen: people will perceive the cymbal more brightly, clearly and richly. Of course, we can raise the level of the sub-kick, but at the same time (considering the presence of other instruments in the mix) we risk getting an ear-piercing mess of sounds, where the kick will go off scale and the cymbal will ring somewhere behind. It is to prevent “dynamic conflicts” that the compressor serves. Also, ready-made mixes can be passed through it in order to level out the overall sound, add density to it and create a pumping effect.

To summarize the above:Compressor - a device used to reduce the dynamic range - the gap between the quietest and loudest levels of an audio signal.

The principle of operation of the compressor is not as complicated as it seems - it captures everything that exceeds the set value in db and reduces it according to the settings. Let's look at the example of a compressor from the T-Racks Plugin Bundle

Threshold - this parameter is responsible for the compressor response threshold. It is they who set the threshold at which the compressor starts working. It is measured in db. For example, if we set the value of this parameter to -11.1, this means that everything below this range will not be processed, and everything above the compressor will be captured and processed.

I would like to warn you right away - you need to work with this parameter extremely carefully and constantly look at the information panel (top right). When it processes an audio signal, there is a risk of capturing quieter sounds that do not need compression.

ratio - ratio. Often many people do not understand this parameter, or understand it incorrectly. In fact, everything is very simple - it is responsible for the amount of signal attenuation. It is also measured in db. Let's say we have a value of 2 (in some compressors the designation 2:1 may be used), this means that the signal has exceeded the threshold Threshold will be attenuated to 1 db above the threshold value, 8 db will be attenuated to 4x and so on. Meaning ratio in the region of 3 it will be considered moderate compression, 5 - medium, 8 - strong, and values ​​​​over 20 will already be considered limiting. In this case, our compressor begins to resemble Limiter, but this compressor does not allow you to set such extreme values.

AttackTime - compressor response time, which is required for the signal to become maximally compressed after passing the threshold specified by the parameter Threshold. Measured in milliseconds.

On some compressors, the attack time is expressed in dB/sec.

Release - recovery time. This parameter is completely opposite to the parameter AttackTime. Specifically, this is the time it takes for the signal to return to its original state. The recovery time is usually significantly longer than the attack time.

On the compressor from T - Racks this is especially noticeable, because time valueRelease represented in seconds vs milliseconds valueAttack Time .

MakeUp - Due to the fact that the compressor is a device that reduces the dynamic characteristics of the signal, the output sound will be quieter than it was before processing. This parameter is used to compensate for this process. In other words, we use it to increase the volume of the signal after processing.

In some compressors it is the samemay be denoted asOutput Gain , Output , Gain etc.

Knee - this parameter shows the smoothness of the transition between the compressed and uncompressed signal. Has 2 types - HardKnee And SoftKnee. Using SoftKnee this transition occurs more smoothly and naturally, the compressor operates softer and more unnoticeably. His work is very well illustrated by the following graph

Types of compression (by principle of use):

1. Sequential compression - most common type dynamic processing sound. IN Insert channel we add the compressor that we need and configure it. It's simple.

2.Parallel compression - this type of compression is also quite widespread, but there is one significant difference from sequential compression - with it we add a compressor to Send-channel and already mix it with clean and unprocessed sound.

Some compressors have a parameter Mix, which allows you to adjust the ratio of clean signal to processed without resorting to creating a separate track Send.

3.Multi-band compression - compression, in which individual frequency ranges are processed in different ways. Let's take a look at the Multiband Compressor from Waves

The operating principle of this compressor is not as complicated as it might seem at first glance: it is based on a device called Crossover, which divides the frequencies of the audio signal into different ranges. And then the work proceeds as with a regular compressor, but each frequency range can be processed with its own settings, which is very useful when processing individual instruments in a mix.

That's all. In the second part I will talk about the features of using various compressors.

Methods used for audio processing:

1. Installation. Consists of cutting out some sections from a recording, inserting others, replacing them, duplicating them, etc. Also called editing. All modern sound and video recordings are edited to one degree or another.

2. Amplitude transformations. They are performed using various actions on the signal amplitude, which ultimately come down to multiplying sample values ​​by a constant factor (amplification/attenuation) or a time-varying modulator function (amplitude modulation). A special case of amplitude modulation is the formation of an envelope to give a stationary sound development over time.

Amplitude transformations are performed sequentially on individual samples, so they are easy to implement and do not require much computation.

3. Frequency (spectral) transformations. Performed on the frequency components of sound. If we use spectral decomposition - a form of sound representation in which frequencies are measured horizontally, and the intensities of the components of these frequencies are measured vertically, then many frequency transformations become similar to amplitude transformations over the spectrum. For example, filtering - amplification or attenuation of certain frequency bands - comes down to imposing a corresponding amplitude envelope on the spectrum. However, frequency modulation cannot be imagined in this way - it looks like a displacement of the entire spectrum or its individual sections in time according to a certain law.

To implement frequency transformations, spectral decomposition using the Fourier method is usually used, which requires significant resources. However, there is an algorithm for the fast Fourier transform (FFT), which is done in integer arithmetic and allows even on lower-end 486 models to unfold the spectrum of a signal of average quality in real time. Frequency conversions also require processing and subsequent convolution, so real-time filtering has not yet been implemented on general-purpose processors. Instead there is a large number of digital signal processors (Digital Signal Processor - DSP), which perform these operations in real time and across multiple channels.

4. Phase transformations. They come down mainly to a constant phase shift of the signal or its modulation by some function or other signal. Due to the fact that the human hearing system uses phase to determine the direction of the sound source, phase transformations of stereo sound make it possible to obtain the effect of rotating sound, chorus and the like.

5. Temporary transformations. They involve adding copies of it to the main signal, shifted in time by different amounts. At small shifts (on the order of less than 20 ms), this gives the effect of multiplying the sound source (chorus effect), at large shifts - an echo effect.

6. Formant transformations. They are a special case of frequency ones and operate with formants - characteristic frequency bands found in sounds pronounced by humans. Each sound has its own ratio of amplitudes and frequencies of several formants, which determines the timbre and intelligibility of the voice. By changing the parameters of the formants, you can emphasize or shade out individual sounds, change one vowel to another, shift the voice register, etc.

Based on these methods, many hardware and software sound processing. Below is a description of some of them.

1. Compressor (from the English “compress” - compress, squeeze) is electronic device or computer program, used to reduce the dynamic range of an audio signal. Downcompression reduces the amplitude of loud sounds that are above a certain threshold, while sounds below that threshold remain unchanged. Upcompression, on the other hand, increases the volume of sounds below a certain threshold, while sounds above that threshold remain unchanged. These actions reduce the difference between soft and loud sounds, narrowing the dynamic range.

Compressor parameters:

Threshold is the level above which the signal begins to be suppressed. Typically set in dB.

Ratio - Determines the ratio of input/output signals exceeding the Threshold. For example, a ratio of 4:1 means that a signal 4 dB above the threshold will be compressed to a level 1 dB above the threshold. The highest ratio of ∞:1 is usually achieved using a ratio of 60:1, and effectively means that any signal above the threshold will be reduced to the threshold level (except for short sharp changes in volume, called "attack").

Attack and Release (attack and recovery, Fig. 1.3). The compressor can provide a degree of control over how fast it operates. The "attack phase" is the period when the compressor reduces the volume to a level that is determined by the ratio. The "release phase" is the period when the compressor increases the volume to the level specified by the ratio, or to zero dB when the level drops below the threshold. The duration of each period is determined by the rate of change in signal level.

Rice. 1.3. Compressor attack and recovery.

With many compressors, the attack and release are user adjustable. However, in some compressors they are determined by the designed circuit and cannot be changed by the user. Sometimes the attack and release parameters are "automatic" or "software dependent", meaning that their timing changes depending on the incoming signal.

The compression knee (Knee) controls the compression bend at a threshold value; it can be sharp or rounded (Fig. 1.4). The soft knee slowly increases the compression ratio, and eventually reaches the compression set by the user. With a stiff knee, compression starts and stops abruptly, making it more noticeable.

Rice. 1.4. Soft and hard knee.

2. Expander. If a compressor suppresses the sound after its level exceeds a certain value, then an expander suppresses the sound after its level falls below a certain value. In all other respects, an expander is similar to a compressor (sound processing parameters).

3. Distortion (English “distortion” - distortion) is an artificial rough narrowing of the dynamic range in order to enrich the sound with harmonics. During compression, waves increasingly take on square rather than sinusoidal shapes due to artificial limitation of the sound level, which have the largest number of harmonics.

4. Delay (English delay) or echo (English echo) - a sound effect or corresponding device that simulates clear fading repetitions of the original signal. The effect is realized by adding a copy or several copies of it, delayed in time, to the original signal. Delay usually means a single delay of the signal, while the echo effect means multiple repetitions.

5. Reverberation is the process of gradually decreasing the intensity of sound during its multiple reflections. There are many parameters in virtual reverbs that allow you to get the desired sound specific to any room.

6. Equalizer (English “equalize” - “level”, general abbreviation - “EQ”) - a device or computer program that allows you to change the amplitude-frequency characteristic of an audio signal, that is, adjust its (signal) amplitude selectively, depending on frequency . First of all, equalizers are characterized by the number of frequency filters (bands) adjustable in level.

There are two main types of multiband equalizers: graphic and parametric. A graphic equalizer has a certain number of level-adjustable frequency bands, each of which is characterized by a constant operating frequency, a fixed bandwidth around the operating frequency, as well as a level adjustment range (the same for all bands). Typically, the outermost bands (lowest and highest) are "shelf" filters, and all others have a "bell-shaped" characteristic. Graphic equalizers used in professional applications typically have 15 or 31 bands per channel, and are often equipped with spectrum analyzers for ease of adjustment.

A parametric equalizer provides much greater possibilities for adjusting the frequency response of a signal. Each of its bands has three main adjustable parameters:

Central (or operating) frequency in hertz (Hz);

Quality factor (the width of the operating band around the central frequency, denoted by the letter “Q”) is a dimensionless quantity;

The level of boost or cut of the selected band in decibels (dB).

7. Chorus (English: chorus) - a sound effect that imitates the choral sound of musical instruments. The effect is realized by adding to the original signal its own copy or copies, time-shifted by values ​​of the order of 20-30 milliseconds, and the shift time is continuously changing.

First, the input signal is split into two independent signals, one of which remains unchanged while the other is fed to the delay line. In the delay line, the signal is delayed by 20-30 ms, and the delay time changes in accordance with the signal from the low-frequency generator. At the output, the delayed signal is mixed with the original one. The low frequency generator modulates the signal delay time. It produces vibrations of a certain shape, ranging from 3 Hz and below. By changing the frequency, shape and amplitude of the oscillations of the low-frequency generator, you can obtain a different output signal.

Effect parameters:

Depth - characterizes the range of variation of the delay time.

Speed ​​(speed, rate) - the speed of change in the “swimming” of sound, regulated by the frequency of the low-frequency generator.

The low frequency generator waveform (LFO waveform) can be sinusoidal (sin), triangular (triangle) and logarithmic (log).

Balance (balance, mix, dry/wet) - the ratio of raw and processed signals.

8. Phaser, also often called phase vibrato, is a sound effect that is achieved by filtering an audio signal to create a series of highs and lows in its spectrum. The position of these highs and lows varies throughout the sound, which creates a specific sweeping effect. The corresponding device is also called a phaser. The principle of operation is similar to chorus and differs from it in delay time (1-5 ms). In addition, the signal delay of the phaser at different frequencies is not the same and varies according to a certain law.

Electronic effect A phaser is created by splitting the audio signal into two streams. One stream is processed by a phase filter, which changes the phase of the audio signal while maintaining its frequency. The amount of phase change depends on frequency. After mixing the processed and unprocessed signals, frequencies that are out of phase cancel each other out, creating characteristic dips in the sound spectrum. Changing the ratio of the original and processed signal allows you to change the depth of the effect, with maximum depth being achieved at a ratio of 50%.

The phaser effect is similar to the flanger and chorus effects, which also use addition to sound signal its copies supplied with a certain delay (the so-called delay line). However, unlike flanger and chorus, where the delay value can take an arbitrary value (usually from 0 to 20 ms), the delay value in a phaser depends on the signal frequency and lies within one oscillation phase. Thus, a phaser can be considered a special case of a flanger.

9. Flange (English flange - flange, ridge) - a sound effect reminiscent of a “flying” sound. The principle of operation is similar to chorus, but differs from it in the delay time (5-15 ms) and the presence of feedback. Part of the output signal is fed back to the input and into the delay line. As a result of the resonance of signals, a flanger effect is obtained. At the same time, in the signal spectrum, some frequencies are amplified, and some are attenuated. The resulting frequency response presents a series of maxima and minima, resembling a ridge, hence the name. The phase of the feedback signal is sometimes inverted, thereby achieving additional variation in the audio signal.

10. Vocoder (English: “voice coder” - voice encoder) - a speech synthesis device based on an arbitrary signal with a rich spectrum. Initially, vocoders were developed in order to save frequency resources of the radio link of a communication system when transmitting voice messages. Savings are achieved due to the fact that instead of the speech signal itself, only the values ​​of its certain parameters are transmitted, which control the speech synthesizer on the receiving side.

The basis of a speech synthesizer consists of three elements: a tone generator for the formation of vowels, a noise generator for the formation of consonants, and a system of formant filters for recreating the individual characteristics of the voice. After all the transformations, the human voice becomes similar to the voice of a robot, which is quite tolerable for communications and interesting for the music field. This was the case only in the most primitive vocoders of the first half of the last century. Modern communication vocoders provide highest quality voices at a significantly higher degree of compression compared to those mentioned above.

A vocoder as a musical effect allows you to transfer the properties of one (modulating) signal to another signal, which is called a carrier. The human voice is used as a modulator signal, and a signal generated by a musical synthesizer or other musical instrument is used as a carrier. This achieves the effect of a “talking” or “singing” musical instrument. In addition to the voice, the modulating signal can also be a guitar, keyboards, drums, and in general any sound of synthetic and “live” origin. There are also no restrictions on the carrier signal. By experimenting with the modeling and carrier signals, you can get completely different effects - a talking guitar, drums with a piano sound, a guitar that sounds like a xylophone.