Tools for Signal Compression

First published 2011 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Adapted and updated from Outils pour la compression des signaux: applications aux signaux audioechnologies du stockage d’énergie published 2009 in France by Hermes Science/Lavoisier © Institut Télécom et LAVOISIER 2009

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

The rights of Nicolas Moreau to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

“Adapted and updated from Outils pour la compression des signaux : applications aux signaux audioechnologies du stockage d’energie.”

1. Sound--Recording and reproducing--Digital techniques. 2. Data compression (Telecommunication) 3. Speech processing systems. I. Title.

Introduction

In everyday life, we often come in contact with compressed signals: when using mobile telephones, mp3 players, digital cameras, or DVD players. The signals in each of these applications, telephone-band speech, high fidelity audio signal, and still or video images are not only sampled and quantized to put them into a form suitable for saving in mass storage devices or to send them across networks, but also compressed. The first operation is very basic and is presented in all courses and introductory books on signal processing. The second operation is more specific and is the subject of this book: here, the standard tools for signal compression are presented, followed by examples of how these tools are applied in compressing speech and musical audio signals. In the first part of this book, we focus on a problem which is theoretical in nature: minimizing the mean squared error. The second part is more concrete and qualifies the previous steps in seeking to minimize the bit rate while respecting the psychoacoustic constraints. We will see that signal compression consists of seeking not only to eliminate all redundant parts of the original signal but also to attempt the elimination of inaudible parts of the signal.

The compression techniques presented in this book are not new. They are explained in theoretical framework, information theory, and source coding, aiming to formalize the first (and the last) element in a digital communication channel: the encoding of an analog signal (with continuous times and continuous values) to a digital signal (at discrete times and discrete values). The techniques come from the work by C. Shannon, published at the beginning of the 1950s. However, except for the development of speech encodings in the 1970s to promote an entirely digitally switched telephone network, these techniques really came into use toward the end of the 1980s under the influence of working groups, for example, “Group Special Mobile (GSM)”, “Joint Photographic Experts Group (JPEG)”, and “Moving Picture Experts Group (MPEG)”.

The results of these techniques are quite impressive and have allowed the development of the applications referred to earlier. Let us consider the example of a music signal. We know that a music signal can be reconstructed with quasi-perfect quality (CD quality) if it was sampled at a frequency of 44.1 kHz and quantized at a resolution of 16 bits. When transferred across a network, the required bit rate for a mono channel is 705 kb/s. The most successful audio encoding, MPEG-4 AAC, ensures “transparency” at a bit rate of the order of 64 kb/s, giving a compression rate greater than 10, and the completely new encoding MPEG-4 HE-AACv2, standardized in 2004, provides a very acceptable quality (for video on mobile phones) at 24 kb/s for 2 stereo channels. The compression rate is better than 50!

In the Part 1 of this book, the standard tools (scalar quantization, predictive quantization, vector quantization, transform and sub-band coding, and entropy coding) are presented. To compare the performance of these tools, we use an academic example of the quantization of the realization x(n) of a one-dimensional random process X(n). Although this is a theoretical approach, it not only allows objective assessment of performance but also shows the coherence between all the available tools. In the Part 2, we concentrate on the compression of audio signals (telephone-band speech, wideband speech, and high fidelity audio signals).

Throughout this book, we discuss the basic ideas of signal processing using the following language and notation. We consider a one-dimensional, stationary, zero-mean, random process X(n), with power

and power spectral density S_X(f). We also assume that it is Gaussian, primarily because the Gaussian distribution is preserved in all linear transformations, especially in a filter which greatly simplifies the notation, and also because a Gaussian signal is the most difficult signal to encode because it carries the greatest quantization error for any bit rate. A column vector of N dimensions is denoted by

and constructed with X(mN) … X(mN + N − 1). These N random variables are completely defined statistically by their probability density function:

Toeplitz matrix with N × N dimensions. Moreover, we assume an auto-regressive process X(n) of order P, obtained through filtering with white noise W(n) with variance

via a filter of order P with a transfer function 1/A(z) for A(z) in the form:

The purpose of considering the quantization of an auto-regressive waveform as our example is that it allows the simple explanation of all the statistical characteristics of the source waveform as a function of the parameters of the filter such as, for example, the power spectral density:

where the notation A(f) is inaccurate and should be more properly written as A(exp(j2πf)). It also allows us to give analytical expressions for the quantization error power for different quantization methods when quadratic error is chosen as the measure of distortion. Comparison of the performance of the different methods is thereby possible. From a practical point of view, this example is not useless because it is a reasonable model for a number of signals, for example, for speech signals (which are only locally stationary) when the order P selected is high enough (e.g. 8 or 10).

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

1.1. Introduction

Let us consider a discrete-time signal x(n) with values in the range [−A, +A]. Defining a scalar quantization with a resolution of b bits per sample requires three operations:

– partitioning the range [−A, +A] into L = 2^b non-overlapping intervals {Θ¹ … Θ^L} of length {Δ¹ … Δ^L},

– selecting the reproduction value for each interval, the set of these reproduction values forms a dictionary (codebook)¹

Encoding (in the transmitter) consists of deciding which interval x(n) belongs to and then associating it with the corresponding number i(n) ∈ {1 … L = 2^b}. It is the number of the chosen interval, the symbol, which is transmitted or stored. The decoding procedure (at the receiver) involves associating the corresponding reproduction value

from the set of reproduction values

with the number i(n). More formally, we observe that quantization is a non-bijective mapping to [−A, +A] in a finite set C with an assignment rule:

The process is irreversible and involves loss of information, a quantization error which is defined as

. The definition of a distortion measure

is required. We use the simplest distortion measure, quadratic error:

This measures the error in each sample. For a more global distortion measure, we use the mean squared error (MSE):

This error is simply denoted as the quantization error power. We use the notation

for the MSE.

Figure 1.1(a) shows, on the left, the signal before quantization and the partition of the range [−A, +A] where 6 = 3, and Figure 1.1(b) shows the reproduction values, the reconstructed signal and the quantization error. The bitstream between the transmitter and the receiver is not shown.

Figure 1.1. (a) The signal before quantization and the partition of the range [−A, +A] and (b) the set of reproduction values, reconstructed signal, and quantization error

The problem now consists of defining the optimal quantization, that is, in defining the intervals {Θ¹ … Θ^L} and the set of reproduction values

to minimize

1.2. Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random process X(n). In scalar quantization, what matters is the distribution of values that the random process X(n) takes at time n. No other direct use of the correlation that exists between the values of the process at different times is possible. It is enough to know the marginal probability density function of X(n), which is written as px(.).

1.2.1. Necessary conditions for optimization

To characterize the optimum scalar quantization, the range partition and reproduction values must be found which minimize:

This joint minimization is not simple to solve. However, the two necessary conditions for optimization are straightforward to find. If the reproduction values

are known, the best partition {Θ¹ … Θ^L} can be calculated. Once the partition is found, the best reproduction values can be deduced. The encoding part of quantization must be optimal if the decoding part is given, and vice versa. These two necessary conditions for optimization are simple to find when the squared error is chosen as the measure of distortion.

If we define tⁱ such that it defines the boundary between the intervals Θⁱ and Θⁱ⁺¹, minimizing the MSE

relative to tⁱ is found by noting:

– Condition 2: Given a partition {Θ¹ … Θ^L}, the optimum reproduction values are found from the centroid (or center of gravity) of the section of the probability density function in the region of Θⁱ:

First, note that minimizing

relative to

involves only an element from the sum given in [1.1]. From the following:

where

is the conditional probability density function of X, where X ∈ Θⁱ, we find:

The required value is the mean value of X in the interval under consideration.²

It can be demonstrated that these two optimization conditions are not sufficient to guarantee optimized quantization except in the case of a Gaussian distribution.

Note that detailed knowledge of the partition is not necessary. The partition is determined entirely by knowing the distortion measure, applying the nearest neighbor rule, and from the set of reproduction values. Figure 1.2 shows a diagram of the encoder and decoder.

Figure 1.2. Encoder and decoder

1.2.2. Quantization error power

When the number L of levels of quantization is high, the optimum partition and the quantization error power can be obtained as a function of the probability density function p_X(x), unlike in the previous case. This hypothesis, referred to as the high-resolution hypothesis, declares that the probability density function can be assumed to be constant in the interval [tⁱ⁻¹, tⁱ] and that the reproduction value is located at the middle of the interval. We can therefore write:

is the probability that X(n) belongs to the interval [tⁱ⁻¹,tⁱ] The quantization error power is written as:

The quantization error power depends only on the length of the intervals Δ(i). We are looking for {Δ(l) … Δ(L)} such that

is minimized. Let:

since this integral is now independent of Δ(i), we minimize the sum of the cubes of L positive numbers with a constant sum. This is satisfied with numbers that are all equal. Hence, we have:

The relation means that an interval is even smaller, that the probability that X(n) belongs to this interval is high, and that all the intervals contribute equally to the quantization error power. The expression for the quantization error power is:

This demonstration is not mathematically rigorous. It will be discussed at the end of Chapter 4 where we compare this mode of quantization with what is known as quantization with entropy constraint.

Two cases are particularly interesting. When X(n) is distributed uniformly, we find:

Note that the explanation via Bennett’s formula is not necessary. We can obtain this result directly!

This equation is referred to throughout this book. From this, we can write the equivalent expression:

From this we deduce the 6 dB per bit rule. We can show that for all other distributions (Laplacian, etc.), the minimum quantization error power is always between these two values. The case of the uniformly distributed signal is more favorable, whereas the Gaussian case is less favorable. Shannon’s work and the rate/distortion theory affirm this observation.

It is interesting to know the statistical properties of the quantization error. We can show that the quantization error is not correlated to the reconstructed signal but this property is not true for the original signal. We can also show that, only in the framework of the high-resolution hypothesis, the quantization error can be modeled by white noise. A detailed analysis is possible (see [LIP 92]).

1.2.3. Further information

1.2.3.1. Lloyd-Max algorithm

In practice, p_X(x) is unknown. To construct a quantizer, we use empirical data, assign the same weight to each value and apply the Lloyd-Max algorithm in the so-called Linde-Buzo-Gray (LBG) form. This algorithm, which is generalized for vector quantization, is presented in the following chapter.

1.2.3.2. Non-linear transformation

A non-uniform scalar quantizer can be seen as a uniform scalar quantizer that has been preceded by a nonlinear transformation and followed with the inverse transformation.³ The transformation is defined by its characteristic f(x). From this perspective, the problem consists of choosing the non-linear transformation which minimizes the quantization error power. This forms the subject of important developments in two works by: Jayant and Noll [JAY 84] and Gersho and Gray [GER 92]. This development no longer seems to be of great importance because vector quantization became the basic tool of choice.

1.2.3.3. Scale factor

During a quantization operation on real signals (speech, music, and pictures), it is important to estimate the parameter A which varies with time; real signals do not satisfy the stationarity hypothesis! We examine this problem in the following chapter by introducing a special quantization called gain shape which is particularly well adapted to signals with significant instantaneous changes in power, for example, audio signals.

1.3. Predictive scalar quantization

1.3.1. Principle

In the preceding exposition, we saw that during quantization, no other use is made with the statistical links between successive values of the signal. We will see that predictive scalar quantization aims to decorrelate the signal before quantizing it and that the use of correlation improves the general behavior of the system, that is, it reduces the quantization error power.

An outline of the principle of predictive scalar quantization is shown in Figure 1.3. We subtract a new signal v(n) from the signal x(n). Next, we perform the encoding/decoding procedure on the signal y(n) = x(n) − v(n). At the decoder, we add v(n) back to the reconstructed signal values

Figure 1.3. Outline of the principle of predictive scalar quantization

We can immediately see that, in a real-world application of coding, this scheme is not very realistic since the signal v(n) must also be transmitted to the decoder, but let us wait until the end of the chapter before demonstrating how we go from a open-loop scheme to a more realistic, but more complicated to analyze, closed-loop scheme.

If we subtract a value from the signal before encoding it and add it back after decoding, the quantization error

and reconstruction error

must always be equal because:

Hence their respective powers are identical. Since the main interest of the user of the complete system is to have the smallest possible reconstruction error power, the problem becomes simply the reduction of the quantization error power. If we assume an optimized scalar quantization of y(n), we know that the quantization error power can be expressed as:

From this, we conclude that seeking to minimize the reconstruction error power

leads us back to minimize

from y(n).

while introducing P parameters, we speak of linear prediction of order P. The signal y(n) is the prediction error which is expressed as:

The relationship between x(n) and y(n) is that of a transfer function filtering operation⁴:

This problem has been the subject of numerous studies since 1960s. All modern books that present basic techniques for signal processing include a chapter on this problem, for example [KAY 88]. In this book, we set out a few reminders.

1.3.2. Reminders on the theory of linear prediction

1.3.2.1. Introduction: least squares minimization

Since this theory, which can be used in numerous signal processing applications, was developed quite rightly with the goal of determining a method of speech coding with reduced rates and, in coding speech, the method uses a block of N samples,⁵ the problem can be posed in the following way: knowing

determine

while minimizing the empirical power of the prediction error:

1.3.2.2. Theoretical approach

Let us assume that the signal x(n) can be interpreted (modeled) as the realization of a stationary random process with an autocovariance function r_X(k)= E{X(n)X(n − k)}. We need to find the vector

which minimizes the prediction error power:

and as the autocovariance matrix

is definite-positive (except for the limiting case where X(n) is an harmonic random process), it is invertible. We therefore have:

Note that these two equations together [1.6] and [1.7] allow the unique matrix representation:

1.3.2.3. Comparing the two approaches

The two solutions are comparable, which is unsurprising because

is an estimate of the vector

and 1/NΓ^tΓ is an estimate of

. More precisely, the two approaches are asymptotically equivalent since, when the signal X(n) is an ergodic random process, that is, if:

The difference (essential but subtle) is that since the exact covariance matrix is definite-positive (except in the limiting case where the process is harmonic), it is always invertible while the matrix Γ^tΓ (expressed here with P = 1 for simplicity):

In practice when we have only N observed data, we wish to maintain the positive property of the matrix. We can see that to approximate the autocovariance function as:

then to construct

and

from

maintains the definite-positive characteristic of

which therefore allows its invertibility. We can then define the filter coefficients by:

We can also show that the positive property requires that all the zero values of polynomial A(z) are inside the unit circle which assures the stability of the filter 1/A(z). This is a very important property in practice as we will see shortly when we look at coding speech at a reduced rate.

1.3.2.4. Whitening filter

We can show that the prediction error Y(n) is white (more precisely, the signal X(n) is whitened). Note that:

Assume that P is large. As Y(n) is not correlated with the preceding X(n − i) and Y(n − i) is a linear combination of the X(n − i), we can deduce from this that Y(n) is not correlated with Y(n − i). The prediction error Y(n) is therefore white noise but this property is not proven a priori unless P → ∞ (asymptotic behavior). This filter, which gives Y(n) from X(n), is called the “whitening filter.”

Recall that the most regularly used spectral density estimate is the periodogram. From N observed data [x(0) … x(N − 1)], we derive:

The equation

hints at a second spectral density estimate. From N observed data, we calculate the whitening filter coefficients through an LPC analysis before using the preceding equation.

ISTE Ltd 27-37 St George’s Road London SW19 4EU UK	John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA
www.iste.co.uk	www.wiley.com

Table of Contents

Introduction

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

1.1. Introduction

1.2. Optimum scalar quantization

1.2.1. Necessary conditions for optimization

1.2.2. Quantization error power

1.2.3. Further information

1.2.3.1. Lloyd-Max algorithm

1.2.3.2. Non-linear transformation

1.2.3.3. Scale factor

1.3. Predictive scalar quantization

1.3.1. Principle

1.3.2. Reminders on the theory of linear prediction

1.3.2.1. Introduction: least squares minimization

1.3.2.2. Theoretical approach

1.3.2.3. Comparing the two approaches

1.3.2.4. Whitening filter

1.3.2.5. Levinson algorithm