Perceptual Postfiltering for Low Bit Rate Speech Coders
M.Eng. Thesis, November 2007
Supervisor: P. Kabal
Adaptive postfiltering has become a common part of speech coding standards based on the Linear Prediction Analysis-by-Synthesis algorithm to decrease audible coding noise. However, a conventional adaptive postfilter is based on empirical assumptions of masking phenomena, which sometimes makes it hard to balance between noise reduction and speech distortion.
This thesis introduces a novel perceptual postfiltering system for low bit rate speech coders. The proposed postfilterworks at the decoder, as is the case for the conventional adaptive postfilter. Specific human auditory properties are considered in the postfilter design to improve speech quality. A Gaussian Mixture Model based Minimum Mean Squared Error estimation of the perceptual postfilter is performed with the received information at the decoder. Perceptual postfiltering is then applied to the reconstructed speech to improve speech quality. Test results show that the proposed system gives better perceptual speech quality over conventional adaptive postfiltering.
Robustness in ASR: An Experimental Study of the Interrelationship between Discriminant Feature-Space Transformation, Speaker Normalization and Environment Compensation
M.Eng. Thesis, March 2007
Supervisors: R. Rose and P. Kabal
This thesis addresses the general problem of maintaining robust automatic speech recognition (ASR) performance under diverse speaker populations, channel conditions, and acoustic environments. To this end, the thesis analyzes the interactions between environment compensation techniques, frequency warping based speaker normalization, and discriminant feature-space transformation (DFT). These interactions were quantified by performing experiments on the connected digit utterances comprising the Aurora 2 database, using continuous density hidden Markov models (HMM) representing individual digits.
Firstly, given that the performance of speaker normalization techniques degrades in the presence of noise, it is shown that reducing the effects of noise through environmental compensation, prior to speaker normalization, leads to substantial improvements in ASR performance. The speaker normalization techniques considered here were vocal tract length normalization (VTLN) and the augmented state-space acoustic decoder (MATE). Secondly, given that discriminant feature-space transformation (DFT) are known to increase class separation, it is shown that performing speaker normalization using VTLN in a discriminant feature-space leads to improvements in the performance of this technique. Classes, in our experiments, corresponded to HMM states. Thirdly, an effort was made to achieve higher class discrimination by normalizing the speech data used to estimate the discriminant feature-space transform. Normalization, in our experiments, corresponded to reducing the variability within each class through the use of environment compensation and speaker normalization. Significant ASR performance improvements were obtained when normalization was performed using environment compensation, while our results were inconclusive for the case where normalization consisted of speaker normalization. Finally, aimed at increasing its noise robustness, a simple modification of MATE is presented. This modification consisted of using, during recognition, knowledge of the distribution of warping factors selected by MATE during training.
On Distance Measurement Methods for Turbo Codes
Ph.D. Thesis, November 2005
Supervisor: P. Kabal
New digital communication applications, such as multimedia, require very powerful error correcting codes that deliver low error rates while operating at low to moderate signal-to-noise ratios (SNRs). Turbo codes have reasonable complexity and can achieve very low error rates if a proper interleaver design is in place. The use of well-designed interleavers result in very low error rates, especially for medium to long interleavers where turbo codes offer the greatest potential for achieving high minimum distance (dmin) values.
The reliable determination of a code’s error performance at very low error rates using simulations may take months or may not be practical at all. However, the knowledge of dmin and its multiplicities can be used to estimate the error rates at high SNR. This thesis is concerned with efficient and accurate distance measurement methods for turbo codes. Since high values of dmin can be caused by high input weight values, say up to 20 or higher, if a brute force algorithm is used the accurate determination of dmin requires that all possible input sequences of input weight up to 20 be tested. Testing all possible input sequences becomes impractical as the size of the interleaver and the value of input weight increase. Thus, the accurate determination of the distance spectrum, or at least dmin and its multiplicities, is a significant problem, especially for interleavers that yield high dmin. Based on Garello’s true distance measurement method, this thesis presents an efficient and accurate distance measurement method for single- and double-binary turbo codes that uses proper trellis termination such as dual-termination or tail-biting. This method is applied to determine the distance spectra for the digital video broadcasting with return channel via satellite (DVB-RCS) standard double-binary turbo codes. It is also used to design new interleavers for DVB-RCS that yield a significant improvement in error performance compared to those in the standard.
This method fits particularly well with tail-biting turbo codes that use structured interleavers. This is because the distance properties repeat and this method can use this knowledge to reduce the search space. The reduction in search space results in significant reduction in complexity (i.e., execution time), which allows the determination of high dmin values in reasonable time. This efficiency is demonstrated for both single- and double-binary turbo codes, using structured interleavers that have high dmin values for various code rates. This method reduces the execution times by a factor of 40 to 400.
A Study of Bit Allocation for Gaussian Mixture Model Quantizers and Image Coders
M.Eng. Thesis, September 2005
Supervisors: P. Kabal and R. Rose
This thesis describes different bit allocation schemes and their performances when applied on coding line spectral frequencies (LSF) using the GMM-based coder designed by Subramaniam and a simple image transform coder. The new algorithms are compared to the original bit allocation formula, the Pruning algorithm used by Subramaniam, Segall's method and the Greedy bit allocation algorithm using the Log Spectral Distortion and the Mean-Square Error for the LSF quantizer and the Peak Signal-to-Noise Ratio for the image coder.
First, a Greedy level allocation algorithm is developed based on the philosophy of the Greedy algorithm but it does so level by level, considering the best benefit and bit cost yielded by an allocation. The Greedy level allocation algorithm is computationally intensive in general, thus we discuss combining it with other algorithms to obtain lower costs.
Second, another algorithm solving problems of negative bit allocations and integer level is proposed. The level allocations are to keep a certain ratio with respect to each other throughout the algorithm in order to remain closest to the condition for lowest distortion. Moreover, the original formula assumes a 6dB gain for each added bit, which is not generally true. The algorithm presents a new parameter k, which controls the benefit of adding one bit, usually set at 0.5 in the high-rate optimal bit allocation formula for MSE calling the new algorithm, the Two-Stage Iterative Bit Allocation (TSIBA) algorithm. Simulations show that modifying the bit allocation formula effectively brings about some gains over the previous methods.
The formula containing the new parameter is generalized into a formula introducing a new parameter which weights not only the variances but also the dimensions, training the new parameter on their distribution function. The TSIBA was an a-posteriori decision algorithm, where the decision on which value of k to select for lowest distortion was decided after computing all distortions. The Generalized TSIBA (GTSIBA), on the other hand, uses a training procedure to estimate which weighting factor to set for each dimension at a certain bit rate. Simulation results show yet another improvement when using the Generalized TSIBA over all previous methods.
An Enhanced Joint Source-Channel Decoder
M.Eng. Thesis, May 2005
Supervisor: F. Labeau
Tandem coding and decoding has been demonstrated to yield arbitrarily low error rates provided a sufficiently large block length is used. When applied to practical systems that are inherently limited to a finite complexity and therefore to finite block lengths, such a strategy may be largely suboptimal. Indeed, a tandem decoding strategy ignores two types of information: the source memory and the residual redundancy of the source coder. Moreover, conventional source decoders, within a tandem decoding strategy, are designed to perform the inverse operation of the source coder and may severely decrease performance if errors are still present at their input. One viable alternative, that has been demonstrated to yield gains, is the design of a joint source-channel decoding scheme that would take the additional sources of redundancies - the source memory and the source coder’s residual redundancy - into account. In this context, we propose a novel, iterative joint source-channel decoding algorithm. The proposed scheme is derived from a Bayesian network representation of the coding chain and incorporates three types of information: the source memory; the residual redundancy of the source coder; and finally the redundancy introduced by the channel coder. Specifically, we modify an existing algorithm by first deriving a new, equivalent Bayesian network representation of the coding chain. Next, we derive a fully consistent methodology, within the framework of Bayesian networks, for performing the iterations. The proposed algorithm is shown to yield significant gains along with a drastic reduction in computational complexity when compared with the existing one. Finally, we outline additional possible improvements on the proposed algorithm. They include methods for further reductions in computational complexity at no cost in performance in one case, and at a slight cost in performance in the other.
Quantization Noise Shaping in Oversampled Filter Banks
M.Eng. Thesis, April 2005
Supervisors: F. Labeau, P. Kabal
The use of a noise-shaping system in oversampled filter banks has been shown to improve the effective resolution of subband coders. Although the filter banks directly determine the noise-shaping coefficients, a comparison between theoretical and simulated results has not been done, while the effect of the selection of the filter banks on the performance of the noise-shaping system has not yet been evaluated. Therefore, an algorithm for the generation of cosine-modulated perfect-reconstruction filter banks is presented, such that the generated filters could be used as a test bed. The optimal noise-shaping coefficients are then derived, and the noise-shaping system is inserted into the subband coder. It is found that the theoretical results agree with the simulations, but that the performance of the noise-shaping system is limited by ill-conditioning at higher system orders. An increase in filter length and an increase in the degree of overlap between neighbouring channels contribute independently to a better performance. Also, it is seen that near-perfect reconstruction filter banks are limited by their reconstruction error but yield good results at low bitrates.
Simplified Trellis Decoding of Block Codes by Selective Pruning
M.Eng. Thesis, January 2005
Supervisor: F. Labeau
Error correcting codes are of paramount importance for reliable communications. By adding redundancy to the transmitted data they allow the decoder to detect and correct errors. However in favorable channel conditions, a part of this redundancy can be removed in order to increase throughput. Unfortunately most coding schemes are poorly adapted to these higher coding rates. For example, the decoding of block codes grows exponentially with code length. In this thesis we propose a novel solution to this problem: selective trellis pruning. Selective trellis pruning reduces decoding complexity by removing certain codewords from the trellis. This reduction is accomplished by making hard decisions on the values of bits in the received sequence above the certainty threshold. This method can produce near-optimal results with only a fraction of the operation required by full decoding thanks to the reduced trellis size. In this work we also introduce an innovative way of obtaining the pruned trellis directly from a simplified version of the generator matrix. By using this method we avoid the long process of constructing and then pruning the full trellises, thus making the selective trellis pruning algorithm an efficient decoding tool. Finally we apply this algorithm to the parallel concatenated turbo block code decoder in order to reduce its complexity.
Audio Coding with an Excitation Pattern Distortion Measure
M.Eng. Thesis, January 2005
Supervisor: P. Kabal
A new distortion measure for audio coding is proposed, posed as a distance measure on the space of excitation patterns. We investigate the psychoacoustic properties of the measure, as well as the implementation issues that arise under a constrained-distortion coding structure. Experimental results show that the excitation distortion metric produces higher-quality coded files than the usual Noise-to-Mask ratio measure, at the same rate.