DMCA
Classifiers for Synthetic Speech Detection: A Comparison
Citations: | 1 - 1 self |
Citations
13178 | Statistical learning theory, in: A
- Vapnik
- 1996
(Show Context)
Citation Context ...e training data are adapted and the remaining components remain unchanged. In the recognition phase, detection score is computed using (2) as above. 2.2. GMM supervectors Support vector machine (SVM) =-=[21]-=- is a well-known discriminative classifier used extensively in speaker and language recognition [22]. It models the decision boundary between two classes as a separating hyperplane optimized to maximi... |
11917 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ight and pi(x) is a D-variate Gaussian density function with mean vector μi and covariance matrix Σi. The model parameters are denoted by λ = {wi, μi, Σi}Mi=1. Expectation-maximization (EM) algorithm =-=[18, 19]-=- is used to estimate the parameters of each class independently via maximum likelihood (ML) criterion. In the test phase, given the models, λnat and λsynth, and feature vectors of the test utterance,Y... |
6467 | LIBSVM: a library for support vector machines
- Chang, Lin
- 2011
(Show Context)
Citation Context ...rs from WSJ0 and WSJ1 databases [32]. TheT-matrix, for the i-vector system, is trained using 35704 utterances from 178 male and 177 female speakers selected from WSJ0 and WSJ1 corpora. LIBSVM package =-=[33]-=- is used to train SVM models for GMMSVM, GLDS-SVM and SVM back-end using i-vector systems. 4. Results We first optimize the number of Gaussian components used to train natural and synthetic speech mod... |
1009 | Speaker verification using adapted gaussian mixture models
- Reynolds, Quatieri, et al.
- 2000
(Show Context)
Citation Context ...on parameter estimation for GMMs is maximum a posteriori (MAP) adaptation of a universal background model (UBM) trained on a large amount of speech data from many speakers, popularly known as GMM-UBM =-=[20]-=-. The UBM represents a general distribution of the acoustic feature space while the target models, λnat and λsynth, are obtained via MAP adaptation of the UBM. The mean vectors of the target models ar... |
629 |
Robust text-independent speaker identification using gaussian mixture speaker models
- Reynolds, Rose
- 1995
(Show Context)
Citation Context ...detection jointly against voice conversion attacks was proposed with promising results. Previous studies on spoof detection mostly utilize standard GMM trained using maximum likelihood (ML) criterion =-=[18]-=- classifier and focus on the feature extraction based on the prior knowledge about the synthesis system to improve detection performance. However, robust generalized countermeasures are desired to det... |
315 | Front-End Factor Analysis For Speaker Verification
- Dehak, Kenny, et al.
- 2010
(Show Context)
Citation Context ...itional data or model (i.e. UBM in GMM-SVM) to compute highdimensional supervectors. 2.4. I-vector System The so-called I-vector technique has become a modern de-facto standard in speaker recognition =-=[24]-=-. Recently, it has been used for speaker verification and spoof detection jointly against voice conversion attacks in [17]. It extracts a low-dimensional vector, w, called an i-vector, from a speech s... |
188 | Support Vector Machines Using GMM Supervectors for Speaker Verification
- Campbell, Sturim, et al.
- 2006
(Show Context)
Citation Context ...s the decision boundary between two classes as a separating hyperplane optimized to maximize the margin of separation. In speaker recognition, SVM is generally combined with the GMM (GMM supervector) =-=[23]-=-. First, the set of feature vectors extracted from a speech signal is represented with a single high-dimensional vector obtained by concatenation of mean vectors of MAP-adapted GMM. Those supervectors... |
156 | An Overview of Text-independent Speaker Recognition: From Features to Supervectors - Kinnunen, Li - 2010 |
111 | Support vector machines for speaker and language recognition
- Campbell, Campbell, et al.
- 2006
(Show Context)
Citation Context ... detection score is computed using (2) as above. 2.2. GMM supervectors Support vector machine (SVM) [21] is a well-known discriminative classifier used extensively in speaker and language recognition =-=[22]-=-. It models the decision boundary between two classes as a separating hyperplane optimized to maximize the margin of separation. In speaker recognition, SVM is generally combined with the GMM (GMM sup... |
104 |
Espy-Wilson, "Analysis of I-vector Length Normalization in Speaker Recognition Systems
- Garcia-Romero, Y
- 2011
(Show Context)
Citation Context ...thm and serves as i-vector extractor as detailed in [24]. The extracted i-vectors are pre-processed by applying within-class covariance normalization (WCCN) [25] followed by length normalization (LN) =-=[26]-=-. In speaker recognition, WCCN normalizes within-speaker variation [24]. In synthetic speech detection, in contrast, we use WCCN to normalize within-class (natural or synthetic) variation caused by ch... |
81 | Within-Glass Covariance Normalization for SVM-Based Speaker Recognition
- Hatch, Kajarekar, et al.
- 2006
(Show Context)
Citation Context ...The T matrix is trained using the EM algorithm and serves as i-vector extractor as detailed in [24]. The extracted i-vectors are pre-processed by applying within-class covariance normalization (WCCN) =-=[25]-=- followed by length normalization (LN) [26]. In speaker recognition, WCCN normalizes within-speaker variation [24]. In synthetic speech detection, in contrast, we use WCCN to normalize within-class (n... |
23 | Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech,”
- Kinnunen, Wu, et al.
- 2012
(Show Context)
Citation Context ...er, recent developments in voice conversion and speech synthesis technology and mass-market adoption of speaker verification technology, have drawn increased attention to spoofing attacks [9, 10]. In =-=[6, 7, 11, 12, 13]-=-, it has been independently reported that current systems are highly vulnerable to spoofing attacks based on speech synthesis and voice conversion. Speaker recognition systems should be integrated wit... |
20 |
Artificial impostor voice transformation effects on false acceptance rates,”
- Bonastre, Matrouf, et al.
- 2007
(Show Context)
Citation Context ... another to gain unauthorized access, is a security problem [1]. Speaker recognition systems can be deliberately spoofed by replay [2], impersonation [3, 4], speech synthesis [5] and voice conversion =-=[6, 7]-=-. Replay attack, repetition of a prerecorded speech signal of the target speaker is one of the easiest ways to spoof recognizers [2, 8]. Impersonation, in turn, is a difficult attack since it requires... |
18 |
Evaluation of speaker verification security and detection of hmm-based synthetic speech,” Audio, Speech, and Language Processing,
- Leon, Pucher, et al.
- 2012
(Show Context)
Citation Context ...r speaker masquerading as another to gain unauthorized access, is a security problem [1]. Speaker recognition systems can be deliberately spoofed by replay [2], impersonation [3, 4], speech synthesis =-=[5]-=- and voice conversion [6, 7]. Replay attack, repetition of a prerecorded speech signal of the target speaker is one of the easiest ways to spoof recognizers [2, 8]. Impersonation, in turn, is a diffic... |
17 | On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals.
- Alegre, Vipperla
- 2012
(Show Context)
Citation Context ...er, recent developments in voice conversion and speech synthesis technology and mass-market adoption of speaker verification technology, have drawn increased attention to spoofing attacks [9, 10]. In =-=[6, 7, 11, 12, 13]-=-, it has been independently reported that current systems are highly vulnerable to spoofing attacks based on speech synthesis and voice conversion. Speaker recognition systems should be integrated wit... |
17 | Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals
- Alegre, Vipperla, et al.
- 2012
(Show Context)
Citation Context ...ompared in synthetic speech detection task using Gaussian mixture model (GMM) classifier, yielding EER of 10.98% with MFCCs whereas tailored group delay features reduced EER further down to 1.25%. In =-=[16]-=-, EER of 2.7% to discriminate converted speech and natural speech was reported. In a more recent study [17], an i-vector system performing speaker verification and spoof detection jointly against voic... |
15 | Detection of synthetic speech for the problem of imposture
- Leon, Hernaez, et al.
- 2011
(Show Context)
Citation Context ...er, recent developments in voice conversion and speech synthesis technology and mass-market adoption of speaker verification technology, have drawn increased attention to spoofing attacks [9, 10]. In =-=[6, 7, 11, 12, 13]-=-, it has been independently reported that current systems are highly vulnerable to spoofing attacks based on speech synthesis and voice conversion. Speaker recognition systems should be integrated wit... |
14 |
Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in Interspeech,
- Wu, Chng, et al.
- 2012
(Show Context)
Citation Context ...r synthetic/converted, in order to safeguard recognizers against attacks. There are a few studies which concentrate on the detection of natural and synthetic/converted speech signals. For example, in =-=[14]-=-, the authors compared three different feature sets and reported EERs of 6.60% and 3.93% for GMM-based and unit selection based converted speech detection, respectively. In [15], four different sets o... |
13 |
Speaker verification performance degradation against spoofing and tampering attacks
- Villalba, Lleida
(Show Context)
Citation Context ...trics, spoofing, the situation of an impostor speaker masquerading as another to gain unauthorized access, is a security problem [1]. Speaker recognition systems can be deliberately spoofed by replay =-=[2]-=-, impersonation [3, 4], speech synthesis [5] and voice conversion [6, 7]. Replay attack, repetition of a prerecorded speech signal of the target speaker is one of the easiest ways to spoof recognizers... |
13 | How vulnerable are prosodic features to professional imitators?,” in Odyssey
- Farrs, Wagner, et al.
- 2008
(Show Context)
Citation Context ...e situation of an impostor speaker masquerading as another to gain unauthorized access, is a security problem [1]. Speaker recognition systems can be deliberately spoofed by replay [2], impersonation =-=[3, 4]-=-, speech synthesis [5] and voice conversion [6, 7]. Replay attack, repetition of a prerecorded speech signal of the target speaker is one of the easiest ways to spoof recognizers [2, 8]. Impersonation... |
13 |
Effect of speech transformation on impostor acceptance
- Matrouf, Bonastre, et al.
- 2006
(Show Context)
Citation Context ... another to gain unauthorized access, is a security problem [1]. Speaker recognition systems can be deliberately spoofed by replay [2], impersonation [3, 4], speech synthesis [5] and voice conversion =-=[6, 7]-=-. Replay attack, repetition of a prerecorded speech signal of the target speaker is one of the easiest ways to spoof recognizers [2, 8]. Impersonation, in turn, is a difficult attack since it requires... |
9 | I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry
- Hautamäki, Kinnunen, et al.
- 2013
(Show Context)
Citation Context ...e situation of an impostor speaker masquerading as another to gain unauthorized access, is a security problem [1]. Speaker recognition systems can be deliberately spoofed by replay [2], impersonation =-=[3, 4]-=-, speech synthesis [5] and voice conversion [6, 7]. Replay attack, repetition of a prerecorded speech signal of the target speaker is one of the easiest ways to spoof recognizers [2, 8]. Impersonation... |
9 |
Spoofing and countermeasures for speaker verification: a need for standard corpora, protocols and metrics
- Evans, Yamagishi, et al.
- 2013
(Show Context)
Citation Context ...asets. However, recent developments in voice conversion and speech synthesis technology and mass-market adoption of speaker verification technology, have drawn increased attention to spoofing attacks =-=[9, 10]-=-. In [6, 7, 11, 12, 13], it has been independently reported that current systems are highly vulnerable to spoofing attacks based on speech synthesis and voice conversion. Speaker recognition systems s... |
9 |
Torres-Carrasquillo, Support vector machines for speaker and language recognition
- Campbell, Campbell, et al.
- 2006
(Show Context)
Citation Context ... detection score is computed using (2) as above. 2.2. GMM supervectors Support vector machine (SVM) [21] is a well-known discriminative classifier used extensively in speaker and language recognition =-=[22]-=-. It models the decision boundary between two classes as a separating hyperplane optimized to maximize the margin of separation. In speaker recognition, SVM is generally combined with the GMM (GMM sup... |
7 | Synthetic speech detection using temporal modulation feature”,
- Wu, Xiao, et al.
- 2013
(Show Context)
Citation Context ...gnals. For example, in [14], the authors compared three different feature sets and reported EERs of 6.60% and 3.93% for GMM-based and unit selection based converted speech detection, respectively. In =-=[15]-=-, four different sets of features including standard mel-frequency cepstral coefficients (MFCCs) were compared in synthetic speech detection task using Gaussian mixture model (GMM) classifier, yieldin... |
4 |
Speaker recognition anti-spoofing,” in Handbook of biometric anti-spoofing
- Evans, Kinnunen, et al.
- 2014
(Show Context)
Citation Context ...asets. However, recent developments in voice conversion and speech synthesis technology and mass-market adoption of speaker verification technology, have drawn increased attention to spoofing attacks =-=[9, 10]-=-. In [6, 7, 11, 12, 13], it has been independently reported that current systems are highly vulnerable to spoofing attacks based on speech synthesis and voice conversion. Speaker recognition systems s... |
3 |
andK.Nandakumar,Biometric Authentication: System Security and User Privacy
- Jain
- 2012
(Show Context)
Citation Context ...tional face and fingerprint biometrics. However, similar to these biometrics, spoofing, the situation of an impostor speaker masquerading as another to gain unauthorized access, is a security problem =-=[1]-=-. Speaker recognition systems can be deliberately spoofed by replay [2], impersonation [3, 4], speech synthesis [5] and voice conversion [6, 7]. Replay attack, repetition of a prerecorded speech signa... |
3 |
A study on replay attack and anti-spoofing for text-dependent speaker verification
- Wu, Gao, et al.
- 2014
(Show Context)
Citation Context ... impersonation [3, 4], speech synthesis [5] and voice conversion [6, 7]. Replay attack, repetition of a prerecorded speech signal of the target speaker is one of the easiest ways to spoof recognizers =-=[2, 8]-=-. Impersonation, in turn, is a difficult attack since it requires special skills for mimicking a target speaker [3]. Speech synthesis involves artificial production of a target speaker’s voice given a... |
3 |
From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification
- Rajan, Afanasyev, et al.
- 2014
(Show Context)
Citation Context ...training i-vectors for natural and synthetic speech classes, respectively. Another method, when multiple training i-vectors are available, is score averaging over all training i-vectors of each class =-=[27]-=-, i.e. scorenatavg = (1/J) ∑J j=1 score(w j nat,wtst) where score(wjnat,wtst) is the cosine similarity defined in (3) between the jth training i-vector of natural class, wjnat, and the test i-vector,w... |
2 | Joint speaker verification and anti-spoofing in the i-vector space
- Sizov, Khoury, et al.
- 2015
(Show Context)
Citation Context ...0.98% with MFCCs whereas tailored group delay features reduced EER further down to 1.25%. In [16], EER of 2.7% to discriminate converted speech and natural speech was reported. In a more recent study =-=[17]-=-, an i-vector system performing speaker verification and spoof detection jointly against voice conversion attacks was proposed with promising results. Previous studies on spoof detection mostly utiliz... |
2 | Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge
- Wu, Kinnunen, et al.
- 2015
(Show Context)
Citation Context ...velopment subsets but also five new unknown methods. More details about the database, voice conversion/speech synthesis methods, recording conditions and number of trials and speakers can be found in =-=[28]-=-. 3.2. Performance Measure Equal error rate (EER) is used as the objective performance criterion. It corresponds to the error rate for the threshold at which the false alarm (Pfa) and the miss rate (P... |
1 | A comparison of features for synthetic speech detection
- Sahidullah, Kinnunen, et al.
- 2015
(Show Context)
Citation Context ... in turn, we provide the average EERs for five known methods (S1S5) and unknown methods (S6-S10). 3.3. Feature Extraction Standard MFCC features are used in the experiments. While our companion paper =-=[30]-=- demonstrates that these may not be the optimal features for synthetic speech detection task, they are the standard features in speaker verification and provide still low error rates on ASVspoof 2015.... |