Results 1 - 10
of
15
ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition
"... This paper presents the ALIZE/SpkDet open source software packages for text independent speaker recognition. This software is based on the well-known UBM/GMM approach. It includes also the latest speaker recognition developments such as Latent Factor Analysis (LFA) and unsupervised adaptation. Discr ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
(Show Context)
This paper presents the ALIZE/SpkDet open source software packages for text independent speaker recognition. This software is based on the well-known UBM/GMM approach. It includes also the latest speaker recognition developments such as Latent Factor Analysis (LFA) and unsupervised adaptation. Discriminant classifiers such as SVM supervectors are also provided, linked with the Nuisance Attribute Projection (NAP). The software performance is demonstrated within the framework of the NIST’06 SRE evaluation campaign. Several other applications like speaker diarization, embedded speaker recognition, password dependent speaker recognition and pathological voice assessment are also presented. 1.
State-of-theArt Performance in Text-Independent Speaker Verification through Open-Source Software
- IEEE Transactions on Audio, Speech and Language Processing
, 2007
"... Abstract—This paper illustrates an evolution in state-of-the-art speaker verification by highlighting the contribution from newly developed techniques. Starting from a baseline system based on Gaussian mixture models that reached state-of-the-art performances during the NIST’04 SRE, final systems wi ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
Abstract—This paper illustrates an evolution in state-of-the-art speaker verification by highlighting the contribution from newly developed techniques. Starting from a baseline system based on Gaussian mixture models that reached state-of-the-art performances during the NIST’04 SRE, final systems with new intersession compensation techniques show a relative gain of around 50%. This work highlights that a key element in recent improvements is still the classical maximum a posteriori (MAP) adaptation, while the latest compensation methods have a crucial impact on overall performances. Nuisance attribute projection (NAP) and factor analysis (FA) are examined and shown to provide significant improvements. For FA, a new symmetrical scoring (SFA) approach is proposed. We also show further improvement with an original combination between a support vector machine and SFA. This work is undertaken through the open-source ALIZE toolkit. Index Terms—Channel compensation, factor analysis, nuisance attribute projection, speaker verification. I.
Kernel Methods for Text-Independent Speaker Verification
, 2010
"... In recent years, systems based on support vector machines (SVMs) have become standard for speaker verification (SV) tasks. An important aspect of these systems is the dynamic kernel.
These operate on sequence data and handle the dynamic nature of the speech. In this thesis a number of techniques are ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
In recent years, systems based on support vector machines (SVMs) have become standard for speaker verification (SV) tasks. An important aspect of these systems is the dynamic kernel.
These operate on sequence data and handle the dynamic nature of the speech. In this thesis a number of techniques are proposed for improving dynamic kernel-based SV systems.
The first contribution of this thesis is the development of alternative forms of dynamic kernel. Several popular dynamic kernels proposed for SV are based on the Kullback-Leibler divergence between Gaussian mixture models. Since this has no closed-form solution, typically a matched-pair upper bound is used instead. This places significant restrictions on the forms of model structure that may be used. In this thesis, dynamic kernels are proposed based
on alternative, variational approximations to the divergence. Unlike standard approaches, these allow the use of a more flexible modelling framework. Also, using a more accurate approximation may lead to performance gains.
The second contribution of this thesis is to investigate the combination of multiple systems to improve SV performance. Typically, systems are combined by fusing the output scores.
For SVM classifiers, an alternative strategy is to combine at the kernel level. Recently an efficient maximum-margin scheme for learning kernel weights has been developed. In this thesis several modifications are proposed to allow this scheme to be applied to SV tasks.
System combination will only lead to gains when the kernels are complementary. In this thesis it is shown that many commonly used dynamic kernels can be placed into one of two broad classes, derivative and parametric kernels. The attributes of these classes are contrasted and the conditions under which the two forms of kernel are identical are described. By avoiding these conditions gains may be obtained by combining derivative and parametric kernels.
The final contribution of this thesis is to investigate the combination of dynamic kernels with traditional static kernels for vector data. Here two general combination strategies are available: static kernel functions may be defined over the dynamic feature vectors. Alternatively, a static kernel may be applied at the observation level. In general, it is not possible to explicitly train a model in the feature space associated with a static kernel. However, it is shown in this thesis that this form of kernel can be computed by using a suitable metric with approximate component posteriors. Generalised versions of standard parametric and derivative kernels, that include an observation-level static kernel, are proposed based on this
approach.
A generalised derivative kernel for speaker verification
- In Proc. ICSLP
, 2008
"... An important aspect of SVM-based speaker verification systems is the choice of dynamic kernel. For the GLDS kernel, a static kernel is used to map each observation into a higher order feature space. Features are then obtained by taking a simple average over all frames. Derivative kernels, such as th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
An important aspect of SVM-based speaker verification systems is the choice of dynamic kernel. For the GLDS kernel, a static kernel is used to map each observation into a higher order feature space. Features are then obtained by taking a simple average over all frames. Derivative kernels, such as the Fisher kernel, use a generative model as a principled way of extracting a fixed set of features from each utterance. However, the model and features are defined using the original observations. Here, a dynamic kernel is described that combines these two approaches. In general, it is not possible to explicitly train a model in the feature space associated with a static kernel. However, by using a suitable metric with approximate component posteriors, this form of dynamic kernel can be computed. This kernel generalises the GLDS and derivative kernel as special cases and is also closely related to parametric kernels such as the GMMsupervector kernel. Preliminary results using this kernel are presented on the 2002 NIST SRE dataset.
Chapter 1 Cognitive Radio Network for Smart Grid
"... Recently, Cognitive radio and Smart Grid are two areas which have received considerable research impetus. Cognitive radios are fully pro-grammable wireless devices that can sense their environment, and dy-namically adapt their transmission waveform, channel access method, spectrum use, and networkin ..."
Abstract
- Add to MetaCart
Recently, Cognitive radio and Smart Grid are two areas which have received considerable research impetus. Cognitive radios are fully pro-grammable wireless devices that can sense their environment, and dy-namically adapt their transmission waveform, channel access method, spectrum use, and networking protocols. It is widely anticipated that cognitive radio technology will become a general-purpose programmable radio that will serve as a universal platform for wireless system devel-opment, much like microprocessors have served a similar role for com-putation. The salient features of the cognitive radio, namely, frequency agility, transmission speed, and range, are ideal for application to the smart grid. In this regard, a Cognitive Radio network can serve as a ro-bust and efficient communications infrastructure that can address both the current and future energy management needs of the smart grid. The Cognitive radio network can be deployed as a large scale Wireless Re-gional Area Network (WRAN) in a smart grid, to utilize the unused TV bands recently approved for use by the Federal Communications Com-mission (FCC) In addition, a Cognitive Radio network testbed for the smart grid would serve as an ideal platform to not only address various issues related to the smart grid, such as security, information flow and power flow management, etc., but also reveal more practical problems for further research. In this chapter, the novel concept of incorporating a cognitive radio network as the communications backbone for the smart grid is outlined. A brief overview of the cognitive radio is provided, including the recently proposed IEEE 802.22 standard. In particular, an overview of Cogni-tive Radio Network testbed, existing and new hardware platforms for
IEEE TRANSACTIONS ON SMART GRID 1 Cognitive Radio Network for the Smart Grid: Experimental System Architecture, Control
"... eb ..."
(Show Context)
1Cognitive Radio Network as Wireless Sensor Network (III): Passive Target Intrusion Detection and Experimental Demonstration
"... for radio frequency (RF) passive target intrusion detection. Compared to a cheap WSN, the CRN based WSN is expected to deliver better results due to its strong communication functions and powerful computing ability. Issues addressed in this paper include experimental architecture, waveform design, a ..."
Abstract
- Add to MetaCart
(Show Context)
for radio frequency (RF) passive target intrusion detection. Compared to a cheap WSN, the CRN based WSN is expected to deliver better results due to its strong communication functions and powerful computing ability. Issues addressed in this paper include experimental architecture, waveform design, and machine learning algorithm for classification. In particular, passive target intrusion is experimentally demonstrated using multiple WARP platforms that serve as the cognitive/sensor nodes. In contrast to traditional localization methods relying on radio propagation properties, the technique used in this research is based on machine learning with measured data, considering complicated multipath environment and high dimensional sensing data col-lected by the CRN based WSN. Preliminary experimental results are quite encouraging, suggesting that a large-scale CRN based WSN supported by machine learning techniques has promising potential for passive target intrusion detection in harsh RF environments.
unknown title
"... In this paper, we propose a novel semi-supervised speaker identification method that can alleviate the influence of nonstationarity such as session dependent variation, the recording environment change, and physical condition/emotion. We assume that the utterance variation follows the covariate shif ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we propose a novel semi-supervised speaker identification method that can alleviate the influence of nonstationarity such as session dependent variation, the recording environment change, and physical condition/emotion. We assume that the utterance variation follows the covariate shift model, where only the utterance sample distribution changes in the training and test phases. Our method consists of weighted versions of kernel logistic regression and crossvalidation and is theoretically shown to have the capability of alleviating the influence of covariate shift. We experimentally show through text-independent speaker identification simulations that the proposed method is promising in dealing with variations in session dependent utterance variation. Index Terms — Speaker identification, covariate shift, semi-supervised learning, kernel logistic regression, importance estimation. 1.
1Wireless Tomography in Noisy Environments using Machine Learning
"... This paper, one in a continuing series, describes a new initiative in wireless tomography. Our goal is to combine two technolo-gies: wireless communication and radio frequency (RF) tomography, for the close-in remote sensing. The hybrid system including wireless communication devices for wireless to ..."
Abstract
- Add to MetaCart
This paper, one in a continuing series, describes a new initiative in wireless tomography. Our goal is to combine two technolo-gies: wireless communication and radio frequency (RF) tomography, for the close-in remote sensing. The hybrid system including wireless communication devices for wireless tomography is proposed in this paper. Noise reduction, modified standard phase reconstruction, and imaging are exploited sequentially to perform wireless tomography in noisy environments. The performance given in this paper illustrates the significance and prospect of wireless tomography. The contributions of this paper are threefold: (1) the hybrid system provides a strong and flexible infrastructure for wireless tomography; (2) machine learning, especially non-linear dimensionality reduction, is explored to execute noise reduction and combat the non-linear noise effect; (3) modified standard phase reconstruction is well achieved using the de-noised amplitude-only total fields from the simple sensors and the received accurate full-data total fields from the advanced sensors. Experimental data provided by the Institute Fresnel in Marseille, France are used to demonstrate the concept of wireless tomography and validate the corresponding algorithms. Index Terms wireless tomography, noise reduction, machine learning, phase reconstruction, imaging