CompAudio
Routine:
CompAudio [options] AFileA [AFileB]
Purpose:
Compare audio files, printing statistics
Description:
This program gathers and prints statistics for one or two input audio files.
With one input file, this program prints the statistics for that file. With
two input files, the signal-to-noise ratio (SNR) of the second file relative
to the first file is also printed. For this calculation, the first audio
file is used as the reference signal. The "noise" is the difference between
sample values in the files.
For SNR comparisons between multi-channel audio files, the data are treated as
a single channel. For stereo files, the SNR calculated is the same as if the
two channels contain the real and imaginary components of complex samples.
For each file, the following statistical quantities are calculated and printed
for each channel.
- Mean:
-
Xm = SUM x(i) / N
- Standard deviation:
-
sd = sqrt[ (SUM x(i)^2 - Xm^2) / (N-1) ]
- Max value:
-
Xmax = max(x(i))
- Min value:
-
Xmin = min(x(i))
The above values are reported as a percent of full scale. For instance for
a file with 16-bit integer data, full scale is 32768.
- Number of Overloads:
-
Count of values at or exceeding the full scale value. For integer data
from a saturating A/D converter, the presence of such values is an
indication of a clipped signal.
- Significant bits test:
-
For integer data, if the number of significant bits is less than the
number of bits in each sample, the non-significant bits should be zeros.
If they are not, a message is printed.
- Number of Anomalous Transitions:
-
An anomalous transition is defined as a transition from a normalized
sample value greater than +0.5 directly to a sample value less than -0.5
or vice versa. A large number of such transitions is an indication of
corrupted data.
- Active Level:
-
This measurement calculates the active speech level using Method B of
ITU-T Recommendation P.56. The smoothed envelope of the signal is used
to segment the signal into active and non-active regions. The active
level is the average envelope value for the active region. The activity
factor in percent is also reported. The speech activity measurements are
calculated only for mono or stereo files with sampling rates appropriate
for recording speech.
An optional delay range can be specified when comparing files. The samples
in file B are delayed or advanced relative to those in file A by each of the
delay values in the delay range. For each delay, the SNR with optimized gain
factor (see below) is calculated. For the delay corresponding to the largest
SNR (with optimized gain), the full regalia of file comparison values is
reported.
In calculating the SNR for stereo files, the channels are considered to
represent the real and imaginary parts of a complex signal. For multi-channel
files, the comparisons are carried out on samples in each channel, unless the
number of channels is large. In that case, the samples in the files are
considered to be a single channel.
- Conventional SNR:
-
SUM |xa(i)|^2
SNR = --------------------- .
SUM |xa(i) - xb(i)|^2
-
The corresponding value in dB is printed.
- SNR with the optimized gain factor:
-
SNR = 1 / (1 - r^2) ,
-
where r is the (normalized) correlation coefficient,
-
SUM xa(i)*xb'(i)
r = ------------------------------------- .
sqrt[ SUM |xa(i)|^2 * SUM |xb(i)|^2 ]
-
The SNR value in dB is printed. This SNR calculation corresponds to using
an optimized gain factor Sf for file B,
-
SUM xa(i)*xb'(i)
Sf = ---------------- .
SUM |xb(i)|^2
- Segmental SNR:
-
This is the average of SNR values calculated for segments of data. The
default segment length corresponds to 16 ms (128 samples at a sampling
rate of 8000 Hz). However if the sampling rate is such that the segment
length is outside the range of 64 to 768 samples, the segment is set to
the appropriate edge value. For each segment, the SNR is calculated as
-
SUM xa(i)^2
SS(k) = 1 + ------------------------- .
eps + SUM [xa(i)-xb(i)]^2
-
The term eps in the denominator prevents division by zero. The additive
unity term discounts segments with SNR's less than unity. The final
average segmental SNR in dB is calculated as
-
SSNR = 10 * log10( 10^[SUM log10(SS(k)) / N] - 1 ) dB.
-
The first term in the bracket is the geometric mean of SS(k). The
subtraction of the unity term tends to compensate for the unity term in
SS(k).
If any of these SNR values is infinite, only the optimal gain factor is
printed as part of the message (Sf is the optimized gain factor),
"File A = Sf * File B".
The comparison between the files also include maximum absolute difference
between samples of the files and the number of samples that differ as a
percentage of the number of samples in all channels.
An example of the output for stereo files with integer 16-bit data over a
range of delays is shown below.
Delay: 20, SNR = 5.25 dB (File B Gain = 0.558)
Delay: 21, SNR = 65.4 dB (File B Gain = 0.667)
Delay: 22, SNR = 5.25 dB (File B Gain = 0.558)
File A:
Channel 1:
Number of Samples: 23829
Std Dev = 855.02 (2.609%), Mean = -9.9431 (-0.03034%)
Maximum = 4774 (14.57%), Minimum = -6156 (-18.79%)
Active Level: 946.7 (2.889%), Activity Factor: 81.6%
Channel 2:
Number of Samples: 23829
Std Dev = 427.5 (1.305%), Mean = -4.9737 (-0.01518%)
Maximum = 2387 (7.285%), Minimum = -3078 (-9.393%)
Active Level: 473.34 (1.445%), Activity Factor: 81.6%
File B:
Channel 1:
Number of Samples: 23829
Std Dev = 1282.4 (3.914%), Mean = -14.886 (-0.04543%)
Maximum = 7160 (21.85%), Minimum = -9233 (-28.18%)
Active Level: 1418.6 (4.329%), Activity Factor: 81.7%
Channel 2:
Number of Samples: 23829
Std Dev = 641.2 (1.957%), Mean = -7.4417 (-0.02271%)
Maximum = 3580 (10.93%), Minimum = -4617 (-14.09%)
Active Level: 709.29 (2.165%), Activity Factor: 81.7%
Best match at delay = 21
SNR = 6.023 dB
SNR = 65.39 dB (File B Gain = 0.667)
Seg. SNR = 6.02 dB (256 sample segments)
Max Diff = 3077 (9.39%), No. Diff = 46127 (573 runs)
Options:
- Input file(s), AFileA [AFileB]:
-
The environment variable AUDIOPATH specifies a list of directories to be
searched for the input audio file(s). One or two input files can be
specified. For a single input file, specifying "-" as the input file
indicates that input is from standard input (use the "-t" option to
specify the format of the input data).
- -d DL:DU, --delay=DL:DU
-
Specify a delay range (in samples). Each delay in the delay range
represents a delay of file B relative to file A. A single delay value can
also be specified. The default range is 0:0.
- -g GAIN, --gain=GAIN
-
A gain factor applied to the data from the input files. This gain applies
to all channels in a file. The gain value can be given as a real number
(e.g., "0.003") or as a ratio (e.g., "1/256"). This option may be given
more than once. Each invocation applies to the input files that follow
the option. The gain factor when two files are compared. The gain does
not affect the statistics calculated for each input file.
- -l L:U, --limits=L:U
-
Sample limits for the input files (numbered from zero). Each invocation
applies to the input files that follow the option. The specification "L:"
means from sample L to the end; "N" means from sample 0 to sample N-1.
- -h, --help
-
Print a list of options and exit.
- -v, --version
-
Print the version number and exit.
See routine CopyAudio for a description of other options.
- -t FTYPE, --type=FTYPE
-
Input file type and environment variable AF_FILETYPE
- -P PARMS, --parameters=PARMS
-
Input file parameters and environment variable AF_INPUTPAR
Environment variables:
- AUDIOPATH:
-
This environment variable specifies a list of directories to be searched
when opening the input audio files. Directories in the list are separated by
colons (semicolons for Windows).
Author / version:
P. Kabal / v10r5 2023-04-10
See Also
CopyAudio,
InfoAudio
Main Index AFsp