#### DMCA

## A Geometric Approach to Sound Source Localization from Time-Delay Estimates (2014)

Venue: | IEEE Transactions on Audio, Speech and Language Processing |

Citations: | 3 - 2 self |

### Citations

7413 | Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...b and s-lb: In [1], the constrained problem is converted into an unconstrained problem with a different cost function. The intuition is that the cost function is modified to penalize those points that are closer to the feasibility border. In practice, the inequality constraint is added to the cost by means of a log-barrier function min t J(t) − µ log(∆(t)), s.t. t ∈ W ∩ B, E (t) = 0, (25) where µ ≥ 0 is a regularizing parameter. Consequently, the original task (23) is converted into a sequence of tasks indexed by µ. Each of the problems has an optimal solution tµ. It can be proven (see [3]) that tµ → t when µ → 0. Log-barrier methods are gradient-based techniques, which decrease the value of µ with the iterations, thus converging to the closest feasible local minimum of J . Therefore, it is recommended to provide the analytic derivatives in order to increase both the convergence speed and the accuracy (see Appendices A and B for the expressions of the gradients and Hessians of the cost function and the constraints, respectively). Unfortunately, log-barrier methods are designed for convex problems. In other words, these methods find the local minimum closest to the initializat... |

651 |
The generalized correlation method for estimation of time delay
- Knapp, Carter
- 1976
(Show Context)
Citation Context ...These local optimization techniques are initialized on the unconstrained 8 (Gu), constrained (Gc) and sparse (Gs) grids respectively. The details are given in Section VII-B1. • dm is the straightforward generalization of [11] to arbitrarilyshaped microphone arrays. J , defined in (21) is evaluated on Gc, and the minimum over the grid is selected. The difference between dm and d-lb is that in the former no local minimization is carried out. • n-mult, t-mult and f-mult are implementations of the method described in [8]. In this case the time delay estimates, t are computed independently (using [26]), and the sound source position, S, is chosen to be as close as possible to the hyperboloids associated with t. Because the algorithm was designed for distributed sensor networks and not for egocentric arrays, we had to modify it. Further explanations are given in Section VII-B2. • pi corresponds to pair-wise independent time delay estimation based on cross-correlation [26]. That is, t1,j is the maximum of the function ρ1,j(τ); This is the simplest multilateration algorithm one can think of. Except for n-mult, t-mult, and f-mult, which provide S directly, all other algorithms provide a time ... |

227 |
The darpa timit acoustic-phonetic continuous speech corpus.’ Tech. Rep. Speech Disc CD1-1.1
- Garofolo, Lamel, et al.
- 1986
(Show Context)
Citation Context ... placed on a sphere of 1.7 m radius centred at the microphone array. More precisely, the source was placed at 21 different azimuth values, between −160◦ and 160◦, and at 9 different elevation values between −60◦ and 60◦, hence at 189 different directions. The speech fragments emitted by the source 2In order to decide whether a region satisfies the constraints or not, we test its centre. This approximation is justified by the fact that, at this stage of the algorithm, the regions are extremely small (since we force a maximum region size). were randomly chosen from a publicly available data set [18]. One hundred millisecond cuts of these sounds were used as input of the evaluated methods. In the simulated case, we controlled two parameters. Firstly, the value of T60, which is a parameter of the image-source model [29] (available at [28]), controlling the amount of reverberations. More precisely, T60 measures the time needed for the emitted signal to decay 60 dB. The higher the T60, the larger the amount of reverberations and their energy. In our simulations, T60 took the following values (in seconds): 0, 0.1, 0.2, 0.4 and 0.6. Secondly, we controlled the amount of white noise added to th... |

99 |
Closed-form least-squares source location estimation from range-difference measurements,
- Smith, Abel
- 1987
(Show Context)
Citation Context ...e-of-arrival at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time del... |

94 |
Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks
- Sheng, Hu
- 2005
(Show Context)
Citation Context ...consider a threemicrophone linear array. Let tm be the time-of-arrival at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of method... |

84 | A practical methodology for speech source localization with microphone arrays.
- Brandstein, Silverman
- 1997
(Show Context)
Citation Context ...l at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at onc... |

74 |
A Tree-Search Algorithm for Mixed Integer Programming Problems,"
- Dakin
- 1965
(Show Context)
Citation Context ...l optimization technique proposed to solve (23). VI. BRANCH & BOUND OPTIMIZATION Global optimization is, in most cases, an extremely challenging task. Nevertheless, the optimization of (23) is well suited for a global optimizer. Indeed, J is continuously differentiable on B, therefore ∇J is continuous. This implies that ∇J is bounded on any compact set, in particular on B. Hence, by means of theorem 9.5.1 in [44], J is Lipschitz on B. Subsequently, a branch & bound (B&B) type of algorithm is well suited. Such optimization techniques were initially proposed for linear mixed-integer programming [13] and extended later on to the nonlinear case [30]. They alternate between the branch and bound procedures in order to recursively seek the potential regions where the global minimum is. While the branch step splits the potential regions into smaller pieces, the bound step estimates the lower and upper bounds of each potential region. After the bounding, the discarding threshold is set to the minimum of the upper bounds. Then, all regions whose lower bound is bigger than the discarding threshold are discarded (since they cannot contain the global minimum). The B&B algorithm that we propose main... |

67 | A closed-form location estimator for use with room environment microphone arrays.
- Brandstein, Adcock, et al.
- 1997
(Show Context)
Citation Context ...rrival at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays a... |

62 | Time delay estimation in room acoustic environments: An overview,”
- Chen, Benesty, et al.
- 2006
(Show Context)
Citation Context ...presents the full geometric analysis, together with the formal proofs. Section V casts the TDE-SSL task into a constrained optimization task. Section VI describes the branch-and-bound global optimization technique. The proposed SSL-TDE method is evaluated and compared to the state-of-the-art in Section VII. Finally, conclusions and a discussion for future work are provided in Section VIII. 1http://www.aldebaran-robotics.com/ 2 II. RELATED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer functi... |

60 |
Analysis and design of spherical microphone arrays,”
- Rafaely
- 2005
(Show Context)
Citation Context ...if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general geometric configuration. Recently, we addressed multichannel TDE-SSL in the case of arbitrary arrays, thus guaranteeing the system’s adaptability [1]. TDE-SSL was modelled as a non-linear programming task, for which a gradient-based local optimization technique was proposed. However, this method has several drawbacks. First, the geometric analysis is incomplete. Indeed, the reported model is not valid for arrays with more than... |

58 | Real-time passive source localization: A practical linear-correction least-squares approach. Speech and Audio Processing,
- Huang, Benesty, et al.
- 2001
(Show Context)
Citation Context ...th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensu... |

54 |
A passive localization algorithm and its accuracy analysis,
- Friedlander
- 1987
(Show Context)
Citation Context ...the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thu... |

45 | Integrating SQP and branch-and-bound for mixed integer nonlinear programming
- Leyffer
(Show Context)
Citation Context ...VI. BRANCH & BOUND OPTIMIZATION Global optimization is, in most cases, an extremely challenging task. Nevertheless, the optimization of (23) is well suited for a global optimizer. Indeed, J is continuously differentiable on B, therefore ∇J is continuous. This implies that ∇J is bounded on any compact set, in particular on B. Hence, by means of theorem 9.5.1 in [44], J is Lipschitz on B. Subsequently, a branch & bound (B&B) type of algorithm is well suited. Such optimization techniques were initially proposed for linear mixed-integer programming [13] and extended later on to the nonlinear case [30]. They alternate between the branch and bound procedures in order to recursively seek the potential regions where the global minimum is. While the branch step splits the potential regions into smaller pieces, the bound step estimates the lower and upper bounds of each potential region. After the bounding, the discarding threshold is set to the minimum of the upper bounds. Then, all regions whose lower bound is bigger than the discarding threshold are discarded (since they cannot contain the global minimum). The B&B algorithm that we propose maintains two lists of regions: P containing the pote... |

41 | A class of frequency-domain adaptive approaches to blind multichannel identification
- Huang, Benesty
(Show Context)
Citation Context ...derstanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can be further split into two sub-groups. The first sub-group performs SSL using the TDEs extracted from the acoustic impulse responses [16], [42], [21], [32], [36]. These responses are directly estimated from the raw data, which is very challenging. As with bichannel SSL, large training sets and complex learning procedures are necessary. Moreover, the estimated impulse responses correspond to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if the me... |

37 | Robust time delay estimation exploiting redundancy among multiple microphoens
- Chen, Benesty, et al.
- 2003
(Show Context)
Citation Context ...ms SSL using the TDEs extracted from the acoustic impulse responses [16], [42], [21], [32], [36]. These responses are directly estimated from the raw data, which is very challenging. As with bichannel SSL, large training sets and complex learning procedures are necessary. Moreover, the estimated impulse responses correspond to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for ... |

37 |
An EM algorithm for localizing multiple sound sources in reverberant environments
- Mandel, Ellis, et al.
- 2007
(Show Context)
Citation Context ...nstrained optimization task. Section VI describes the branch-and-bound global optimization technique. The proposed SSL-TDE method is evaluated and compared to the state-of-the-art in Section VII. Finally, conclusions and a discussion for future work are provided in Section VIII. 1http://www.aldebaran-robotics.com/ 2 II. RELATED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the intrinsi... |

32 | On the use of spatial cues to improve binaural source separation.
- Viste, Evangelista
- 2003
(Show Context)
Citation Context ...ned optimization task. Section VI describes the branch-and-bound global optimization technique. The proposed SSL-TDE method is evaluated and compared to the state-of-the-art in Section VII. Finally, conclusions and a discussion for future work are provided in Section VIII. 1http://www.aldebaran-robotics.com/ 2 II. RELATED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the intrinsic prop... |

29 |
Prediction of energy decay in room impulse responses simulated with an image-source model,”
- Lehmann, Johansson
- 2008
(Show Context)
Citation Context ...nce at 189 different directions. The speech fragments emitted by the source 2In order to decide whether a region satisfies the constraints or not, we test its centre. This approximation is justified by the fact that, at this stage of the algorithm, the regions are extremely small (since we force a maximum region size). were randomly chosen from a publicly available data set [18]. One hundred millisecond cuts of these sounds were used as input of the evaluated methods. In the simulated case, we controlled two parameters. Firstly, the value of T60, which is a parameter of the image-source model [29] (available at [28]), controlling the amount of reverberations. More precisely, T60 measures the time needed for the emitted signal to decay 60 dB. The higher the T60, the larger the amount of reverberations and their energy. In our simulations, T60 took the following values (in seconds): 0, 0.1, 0.2, 0.4 and 0.6. Secondly, we controlled the amount of white noise added to the received signals by setting the signal-to-noise ratio (SNR) to −10, −5, or 0 dB. In the real case, we used a slightly modified version of the acquisition protocol defined in [15]. This protocol was designed to automatical... |

29 | Maximum likelihood sound source localization and beamforming for directional microphone arrays in distributed meetings,” in Trans. Multimedia
- Zhang, Florencio, et al.
- 2008
(Show Context)
Citation Context ...ear array. Let tm be the time-of-arrival at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel... |

27 |
Acoustic source localization and beamforming: theory and practice.
- Chen, Yao, et al.
- 2003
(Show Context)
Citation Context ...t, we consider a threemicrophone linear array. Let tm be the time-of-arrival at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of ... |

24 | Robust adaptive time delay estimation for speaker localization in noisy and reverberantacoustic environments
- Moonen
- 2003
(Show Context)
Citation Context ... a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can be further split into two sub-groups. The first sub-group performs SSL using the TDEs extracted from the acoustic impulse responses [16], [42], [21], [32], [36]. These responses are directly estimated from the raw data, which is very challenging. As with bichannel SSL, large training sets and complex learning procedures are necessary. Moreover, the estimated impulse responses correspond to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Ev... |

22 | Robust 3D localization and tracking of sound sources using beam forming and particle filtering
- Valin, Michaud, et al.
(Show Context)
Citation Context ...lobal optimization technique is proposed to solve the programming task, and hence to estimate the time delays and to localize the sound source. An extensive set of experiments is performed on simulated and real data. The experiments clearly show that the global optimization technique that we proposed outperforms existing methods in both the multilateration and the multichannel SSL literatures. This work could be extended in several ways. First of all, considering the multiple source case. This could be achieved using a frequency filter bank, that would also discard empty frequency bands as in [52]. Second, a different set of experiments could be performed on distributed microphone arrays, to evaluate the behaviour of the proposed methods in such settings. Other TDOA estimators, possibly more accurate than [26], could be used in our benchmark, such that f-mult, t-mult, and n-mult yield better results. Third, the method could also be used in calibration applications. Indeed, the positions of the microphones could be estimated if they were free parameters in our current formulation. In that case, measures from many different source positions would certainly be required, e.g., [25]. Fourth... |

21 | Classification of time delay estimates for robust speaker localization.
- Strobel, Rabenstein
- 1999
(Show Context)
Citation Context ...crophone linear array. Let tm be the time-of-arrival at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as ... |

19 | Binaural localization of multiple sources in reverberant and noisy environments,”
- Woodruff, Wang
- 2012
(Show Context)
Citation Context ...timization task. Section VI describes the branch-and-bound global optimization technique. The proposed SSL-TDE method is evaluated and compared to the state-of-the-art in Section VII. Finally, conclusions and a discussion for future work are provided in Section VIII. 1http://www.aldebaran-robotics.com/ 2 II. RELATED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the intrinsic properties... |

15 |
Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays,” in
- Brutti, Omologo, et al.
- 2005
(Show Context)
Citation Context ...ated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can be fu... |

15 | The cocktail party robot: Sound source separation and localisation with an active binaural head.
- Deleforge, Horaud
- 2012
(Show Context)
Citation Context ...algorithm outperforms existing methods. These in-depth geometric understanding, practical algorithms, and encouraging results, open several opportunities for future work. I. INTRODUCTION For the past decades, source localization has been a fruitful research topic. Sound source localization (SSL) in particular, has become an important application, because many speech, voice and event recognition systems assume the knowledge of the sound source position. Time delay estimation (TDE) has proven to be a high-performance methodological framework for SSL, especially when it is combined with training [15], statistics [48] or geometry [8], [1]. We are interested in the development of a generalpurpose TDE-based method for SSL, i.e., TDE-SSL, and we are particularly interested in indoor environments. This is extremely challenging for several reasons: (i) there may be several sound sources and their number varies over time, (ii) regular rooms are echoic, thus leading to reverberations, and (iii) the microphones are often embedded in devices (for example: robot heads and smart phones) generating high-level noise. In this context, we focus on arbitrarily shaped non-coplanar microphone arrays, becaus... |

14 | Novel closed-form ML position estimator for hyperbolic location.
- Urruela, Riba
- 2004
(Show Context)
Citation Context ...hreemicrophone linear array. Let tm be the time-of-arrival at the m th microphone, and tm,n = tn − tm be the time delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred ... |

11 | 2d sound-source localization on the binaural manifold.
- Deleforge, Horaud
- 2012
(Show Context)
Citation Context ...K The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the intrinsic properties of the recording device encompassed in the HRTF must be estimated separately from the acoustic properties of the environment, modeled by the RIR. Furthermore, these methods lead to localization techniques which do not yield closed form expressions, thus increasing the computational complexity. Moreover, the dependency ... |

11 |
Spoken Dialogues with Computers.
- Mori
- 1997
(Show Context)
Citation Context ...associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can ... |

11 | Closed-form formulae for time-difference-ofarrival estimation. Signal Processing,
- So, Chan, et al.
- 2008
(Show Context)
Citation Context ...orms existing methods. These in-depth geometric understanding, practical algorithms, and encouraging results, open several opportunities for future work. I. INTRODUCTION For the past decades, source localization has been a fruitful research topic. Sound source localization (SSL) in particular, has become an important application, because many speech, voice and event recognition systems assume the knowledge of the sound source position. Time delay estimation (TDE) has proven to be a high-performance methodological framework for SSL, especially when it is combined with training [15], statistics [48] or geometry [8], [1]. We are interested in the development of a generalpurpose TDE-based method for SSL, i.e., TDE-SSL, and we are particularly interested in indoor environments. This is extremely challenging for several reasons: (i) there may be several sound sources and their number varies over time, (ii) regular rooms are echoic, thus leading to reverberations, and (iii) the microphones are often embedded in devices (for example: robot heads and smart phones) generating high-level noise. In this context, we focus on arbitrarily shaped non-coplanar microphone arrays, because of three main r... |

9 |
Metric spaces.
- Searcoid
- 2006
(Show Context)
Citation Context ...hm. In other words, the estimation procedure will always provide a set of time delays corresponding to a position in the sound source space. Next Section describes the branch & bound global optimization technique proposed to solve (23). VI. BRANCH & BOUND OPTIMIZATION Global optimization is, in most cases, an extremely challenging task. Nevertheless, the optimization of (23) is well suited for a global optimizer. Indeed, J is continuously differentiable on B, therefore ∇J is continuous. This implies that ∇J is bounded on any compact set, in particular on B. Hence, by means of theorem 9.5.1 in [44], J is Lipschitz on B. Subsequently, a branch & bound (B&B) type of algorithm is well suited. Such optimization techniques were initially proposed for linear mixed-integer programming [13] and extended later on to the nonlinear case [30]. They alternate between the branch and bound procedures in order to recursively seek the potential regions where the global minimum is. While the branch step splits the potential regions into smaller pieces, the bound step estimates the lower and upper bounds of each potential region. After the bounding, the discarding threshold is set to the minimum of the up... |

9 |
A survey of mathematical methods for indoor localization.
- Seco, Jimenez, et al.
- 2009
(Show Context)
Citation Context ... Section IV presents the full geometric analysis, together with the formal proofs. Section V casts the TDE-SSL task into a constrained optimization task. Section VI describes the branch-and-bound global optimization technique. The proposed SSL-TDE method is evaluated and compared to the state-of-the-art in Section VII. Finally, conclusions and a discussion for future work are provided in Section VIII. 1http://www.aldebaran-robotics.com/ 2 II. RELATED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related tra... |

8 |
2d binaural sound localization: for urban search and rescue robotics.
- Kullaib, Al-Mualla, et al.
- 2009
(Show Context)
Citation Context ...ED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the intrinsic properties of the recording device encompassed in the HRTF must be estimated separately from the acoustic properties of the environment, modeled by the RIR. Furthermore, these methods lead to localization techniques which do not yield closed form expressions, thus increasing the computational complexity. Moreover, the depen... |

8 |
Azimuthal source localization using interaural coherence in a robotic dog: modeling and application.
- Liu, Wang
- 2010
(Show Context)
Citation Context ...o a constrained optimization task. Section VI describes the branch-and-bound global optimization technique. The proposed SSL-TDE method is evaluated and compared to the state-of-the-art in Section VII. Finally, conclusions and a discussion for future work are provided in Section VIII. 1http://www.aldebaran-robotics.com/ 2 II. RELATED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the in... |

8 |
Spherical microphone array beamforming,”
- Rafaely, Peled, et al.
- 2010
(Show Context)
Citation Context ... Even if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general geometric configuration. Recently, we addressed multichannel TDE-SSL in the case of arbitrary arrays, thus guaranteeing the system’s adaptability [1]. TDE-SSL was modelled as a non-linear programming task, for which a gradient-based local optimization technique was proposed. However, this method has several drawbacks. First, the geometric analysis is incomplete. Indeed, the reported model is not valid for arrays with mor... |

8 | Robust localization of multiple sources in reverberant environments using EB-ESPIRIT with spherical microphone arrays,” in
- Sun, Teutsch, et al.
- 2011
(Show Context)
Citation Context ... method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general geometric configuration. Recently, we addressed multichannel TDE-SSL in the case of arbitrary arrays, thus guaranteeing the system’s adaptability [1]. TDE-SSL was modelled as a non-linear programming task, for which a gradient-based local optimization technique was proposed. However, this method has several drawbacks. First, the geometric analysis is incomplete. Indeed, the reported model is not valid for arrays with more than four ... |

7 | Time delay estimation via minimum entropy.
- Benesty, Huang, et al.
- 2007
(Show Context)
Citation Context ...e acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general geometric configuration. Recently, we addressed multichannel TDE-SSL in the case of a... |

6 |
An enhanced binaural 3d sound localization algorithm,”
- Keyrouz, Diepold
- 2006
(Show Context)
Citation Context ...izing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the intrinsic properties of the recording device encompassed in the HRTF must be estimated separately from the acoustic properties of the environment, modeled by the RIR. Furthermore, these methods lead to localization techniques which do not yield closed form expressions, thus increasing the computational complexity. Moreover, the dependency on both HRTF and RI... |

5 | Geometrically-constrained robust time delay estimation using non-coplanar microphone arrays,”
- Alameda-Pineda, Horaud
- 2012
(Show Context)
Citation Context ... These in-depth geometric understanding, practical algorithms, and encouraging results, open several opportunities for future work. I. INTRODUCTION For the past decades, source localization has been a fruitful research topic. Sound source localization (SSL) in particular, has become an important application, because many speech, voice and event recognition systems assume the knowledge of the sound source position. Time delay estimation (TDE) has proven to be a high-performance methodological framework for SSL, especially when it is combined with training [15], statistics [48] or geometry [8], [1]. We are interested in the development of a generalpurpose TDE-based method for SSL, i.e., TDE-SSL, and we are particularly interested in indoor environments. This is extremely challenging for several reasons: (i) there may be several sound sources and their number varies over time, (ii) regular rooms are echoic, thus leading to reverberations, and (iii) the microphones are often embedded in devices (for example: robot heads and smart phones) generating high-level noise. In this context, we focus on arbitrarily shaped non-coplanar microphone arrays, because of three main reasons. First, microp... |

5 |
Speaker localization based on oriented global coherence field.
- Brutti, Omologo, et al.
- 2006
(Show Context)
Citation Context ...to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can be further... |

5 | Alignment of binocularbinaural data using a moving audio-visual target,”
- Khalidov, Forbes, et al.
- 2013
(Show Context)
Citation Context ...s as in [52]. Second, a different set of experiments could be performed on distributed microphone arrays, to evaluate the behaviour of the proposed methods in such settings. Other TDOA estimators, possibly more accurate than [26], could be used in our benchmark, such that f-mult, t-mult, and n-mult yield better results. Third, the method could also be used in calibration applications. Indeed, the positions of the microphones could be estimated if they were free parameters in our current formulation. In that case, measures from many different source positions would certainly be required, e.g., [25]. Fourth, by testing the proposed model and algorithms in the case of dynamic sources, and subsequently extending the framework to perform tracking. Finally, experiments with higher number of microphones should be performed, and the influence of the microphones’ positions should be evaluated. APPENDIX A THE DERIVATIVES OF THE COST FUNCTION The log-barrier algorithm relies on the use of the gradient and the Hessian of both, the objective function and the constraint(s). Providing the analytic expression for them would lead to a much more efficient and precise algorithm than estimating them using... |

5 |
A robust and self-reconfigurable design of spherical microphone array for multi-resolution beamforming.
- Li, Duraiswami
- 2005
(Show Context)
Citation Context ...microphones into a robot head, such as INRIA Grenoble Rhone-Alpes and Universite de Grenoble. This work was supported by the EU project HUMAVIPS FP7-ICT-2009- 247525. the humanoid robot NAO1 which possesses four microphones in a tetrahedron-like shape. There are robot design constraints that are not compatible with a particular type of microphone array. Moreover, solving for the most general non-coplanar microphone configuration opens the door to dynamically reconfigurable microphone arrays in arbitrary layouts. Such methods have been already studied in the specific case of spherical arrays [31]. Nevertheless, the most general case is worthwhile to be studied, since non-coplanar arrays include an extremely wide range of specific configurations. This paper has the following original contributions: • The geometric analysis of the microphone array. We are able to characterize those time delays that correspond to a position in the source space. Such time delays will be called feasible and the derived necessary and sufficient conditions will be called feasibility conditions. • A closed-form solution for SSL. Indeed, we formally prove that every feasible set corresponds to exactly one posi... |

5 |
Speaker localization in chil lectures: Evaluation criteria and results.
- Omologo, Svaizer, et al.
- 2006
(Show Context)
Citation Context ...delay associated to the microphone pair (m, n). In the particular set up of a three-microphone linear array, the case t1,2 > 0 and t3,2 > 0 is not physically possible. Indeed, this is equivalent to say that the acoustic wave reaches the first and the third microphones before reaching the middle one, which is inconsistent with the propagation path of the acoustic wave. In order to overcome this issue, multilateration is formulated either as maximum likelihood (ML) [10], [46], [48], [51], [49], [56], [55], as least squares (LS) [47], [4], [5], [17], [22], [8] or as global coherence fields (CFG) [37], [35], [6], [7]. Multilateration methods posses the advantage of being able to evaluate different TDE and SSL techniques. This allows for a better understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SS... |

4 |
Real-time multiple sound source localization and counting using a circular microphone array. Audio, Speech, and Language Processing,
- Pavlidi, Griffin, et al.
- 2013
(Show Context)
Citation Context ...correlation is proposed. Even if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general geometric configuration. Recently, we addressed multichannel TDE-SSL in the case of arbitrary arrays, thus guaranteeing the system’s adaptability [1]. TDE-SSL was modelled as a non-linear programming task, for which a gradient-based local optimization technique was proposed. However, this method has several drawbacks. First, the geometric analysis is incomplete. Indeed, the reported model is not v... |

4 |
Acoustic Source Localization in a Room Environment and at Moderate Distances. Tampereen teknillinen yliopisto.
- Pertila
- 2009
(Show Context)
Citation Context ...on IV presents the full geometric analysis, together with the formal proofs. Section V casts the TDE-SSL task into a constrained optimization task. Section VI describes the branch-and-bound global optimization technique. The proposed SSL-TDE method is evaluated and compared to the state-of-the-art in Section VII. Finally, conclusions and a discussion for future work are provided in Section VIII. 1http://www.aldebaran-robotics.com/ 2 II. RELATED WORK The task of localizing a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer ... |

3 |
Acoustic source localization with distributed asynchronous microphone networks.
- Canclini, Antonacci, et al.
- 2013
(Show Context)
Citation Context ...hods. These in-depth geometric understanding, practical algorithms, and encouraging results, open several opportunities for future work. I. INTRODUCTION For the past decades, source localization has been a fruitful research topic. Sound source localization (SSL) in particular, has become an important application, because many speech, voice and event recognition systems assume the knowledge of the sound source position. Time delay estimation (TDE) has proven to be a high-performance methodological framework for SSL, especially when it is combined with training [15], statistics [48] or geometry [8], [1]. We are interested in the development of a generalpurpose TDE-based method for SSL, i.e., TDE-SSL, and we are particularly interested in indoor environments. This is extremely challenging for several reasons: (i) there may be several sound sources and their number varies over time, (ii) regular rooms are echoic, thus leading to reverberations, and (iii) the microphones are often embedded in devices (for example: robot heads and smart phones) generating high-level noise. In this context, we focus on arbitrarily shaped non-coplanar microphone arrays, because of three main reasons. First, m... |

3 |
MATLAB primal-dual interior-point solver for convex programs with constraints,
- Carbonetto
- 2008
(Show Context)
Citation Context ...to the same log-barrier method initialized on a sparse grid Gs. We conjecture that the global minimum of J corresponds to one of the local maxima of ρ1,m in (22) for m = 2, . . . ,M . For each microphone pair (1, m) we extract K = 3 local maxima of ρ1,m. Gs consists of all possible combinations of these values, thus containing KM−1 = 27 points (in the case of M = 4 microphones). slb is implemented to assess the robustness towards initialization of the local optimization technique. Both, d-lb and s-lb are reimplementations of the publicly available MATLAB log-barrier dual interior-point method [9]. 2) Methods n-mult, t-mult and f-mult: As already mentioned, we implemented [8] with some modifications. Indeed, the method was designed for distributed microphone arrays. With such a setup, the sound source position lies inside the volume defined by the microphone positions in the room. In the case of an egocentric array the sound source is necessarily located outside the volume delimited by the microphone array. The method described in [8] seeks the locations the closest to the hyperboloids given by the independently estimated time delays t. More precisely, the following criterion is minim... |

3 |
Matlab code for image-source model in room acoustics. http://www.eric-lehmann.com/ism code.html,
- Lehmann
- 2012
(Show Context)
Citation Context ...t directions. The speech fragments emitted by the source 2In order to decide whether a region satisfies the constraints or not, we test its centre. This approximation is justified by the fact that, at this stage of the algorithm, the regions are extremely small (since we force a maximum region size). were randomly chosen from a publicly available data set [18]. One hundred millisecond cuts of these sounds were used as input of the evaluated methods. In the simulated case, we controlled two parameters. Firstly, the value of T60, which is a parameter of the image-source model [29] (available at [28]), controlling the amount of reverberations. More precisely, T60 measures the time needed for the emitted signal to decay 60 dB. The higher the T60, the larger the amount of reverberations and their energy. In our simulations, T60 took the following values (in seconds): 0, 0.1, 0.2, 0.4 and 0.6. Secondly, we controlled the amount of white noise added to the received signals by setting the signal-to-noise ratio (SNR) to −10, −5, or 0 dB. In the real case, we used a slightly modified version of the acquisition protocol defined in [15]. This protocol was designed to automatically gather sound sig... |

3 |
Spherical microphone array for spatial sound localization for a mobile robot.
- Sasaki, Kabasawa, et al.
- 2012
(Show Context)
Citation Context ...posed. Even if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general geometric configuration. Recently, we addressed multichannel TDE-SSL in the case of arbitrary arrays, thus guaranteeing the system’s adaptability [1]. TDE-SSL was modelled as a non-linear programming task, for which a gradient-based local optimization technique was proposed. However, this method has several drawbacks. First, the geometric analysis is incomplete. Indeed, the reported model is not valid for arrays wi... |

2 |
Humanoid Binaural Sound Tracking Using Kalman Filtering and HRTFs. Robot Motion and Control
- Keyrouz, Diepold, et al.
- 2007
(Show Context)
Citation Context ...a sound source from time delay estimates has received a lot of attention in the past; recent reviews can be found in [45], [39], [12]. One group of approaches (referred to as bichannel SSL) requires one pair of microphones. For example [33], [34], [53], [54] estimate the azimuth from the interaural time difference. These methods assume that the sound source is placed in front of the microphones and it lies in a horizontal plane. Consequently, they are intrinsically limited to one-dimensional localization. Other methods either guess both the azimuth and elevation [27], [14] or track them [24], [23]. These methods are based on estimating the impulse response function, which is a combination of the head related transfer function (HRTF) and the room impulse response (RIR). In order to guarantee the adaptability of the system, the intrinsic properties of the recording device encompassed in the HRTF must be estimated separately from the acoustic properties of the environment, modeled by the RIR. Furthermore, these methods lead to localization techniques which do not yield closed form expressions, thus increasing the computational complexity. Moreover, the dependency on both HRTF and RIR of t... |

2 | Intelligent sound source localization and its application to multimodal human tracking.
- Nakamura, Nakadai, et al.
- 2011
(Show Context)
Citation Context ...of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can be further split into two sub-groups. The first sub-group performs SSL using the TDEs extracted from the acoustic impulse responses [16], [42], [21], [32], [36]. These responses are directly estimated from the raw data, which is very challenging. As with bichannel SSL, large training sets and complex learning procedures are necessary. Moreover, the estimated impulse responses correspond to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if the method is base... |

2 | Adaptive time delay estimation using filter length constraints for source localization in reverberant acoustic environments.
- Salvati, Canazza
- 2013
(Show Context)
Citation Context ...ter understanding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can be further split into two sub-groups. The first sub-group performs SSL using the TDEs extracted from the acoustic impulse responses [16], [42], [21], [32], [36]. These responses are directly estimated from the raw data, which is very challenging. As with bichannel SSL, large training sets and complex learning procedures are necessary. Moreover, the estimated impulse responses correspond to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if ... |

1 |
Time delay estimation via non-mutual information among multiple microphones.
- He, Lu, et al.
- 2013
(Show Context)
Citation Context ... to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general geometric configuration. Recently, we addressed multichannel TDE-SSL in the case... |

1 |
Time difference of arrival estimation exploiting multichannel spatio-temporal prediction. Audio, Speech, and Language Processing,
- He, Wu, et al.
- 2013
(Show Context)
Citation Context ...g procedures are necessary. Moreover, the estimated impulse responses correspond to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if the method is based on pair-wise cross-correlation functions, the estimation of the time delays is performed at once. [11] has been extended using temporal prediction [20] and has also proven to be equivalent to two information-theoretic criteria [19], [2], under some statistical assumptions. However, all these methods were specifically designed for linear microphone arrays. Indeed, the line geometry is directly embedded in the proposed criterion and in the associated algorithms. Likewise, some methods were designed for other array geometries, such as circular [38] or spherical [43], [41], [40], [50] arrays. Again, the geometry is directly embedded in the methods in both cases. Hence, all these methods cannot be generalized to microphone arrays owing a general ... |

1 | Time delay estimation method based on canonical correlation analysis. Circuits, Systems and Signal Processing,
- Lim, Pang
- 2013
(Show Context)
Citation Context ...nding of the interactions between TDE and SSL. Unfortunately, even if the ML/LS/GCF frameworks are able to discard TDE outliers, they can neither prevent nor reduce their occurrence. Consequently, the performance of these methods drops dramatically when used in highly reverberant environments. A third group of methods (referred to as multichannel SSL) estimates all time delays at once, thus ensuring their mutual consistency. Multichannel SSL can be further split into two sub-groups. The first sub-group performs SSL using the TDEs extracted from the acoustic impulse responses [16], [42], [21], [32], [36]. These responses are directly estimated from the raw data, which is very challenging. As with bichannel SSL, large training sets and complex learning procedures are necessary. Moreover, the estimated impulse responses correspond to the acoustic signature of the environment associated with one particular microphonearray position and orientation. Therefore, such methods suffer from low adaptability to a changing environment. The second sub-group exploits the redundancy among the received signals. In [11] a multichannel criterion based on cross-correlation is proposed. Even if the method i... |