Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. (2010)
Venue: | NeuroImage, |
Citations: | 24 - 6 self |
BibTeX
@ARTICLE{Pearce10unsupervisedstatistical,
author = {Marcus T Pearce and Herrojo Ruiz and Selina Kapasi and Geraint A Wiggins and Joydeep Bhattacharya},
title = {Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation.},
journal = {NeuroImage,},
year = {2010}
}
OpenURL
Abstract
The ability to anticipate forthcoming events has clear evolutionary advantages, and predictive successes or failures often entail significant psychological and physiological consequences. In music perception, the confirmation and violation of expectations are critical to the communication of emotion and aesthetic effects of a composition. Neuroscientific research on musical expectations has focused on harmony. Although harmony is important in Western tonal styles, other musical traditions, emphasizing pitch and melody, have been rather neglected. In this study, we investigated melodic pitch expectations elicited by ecologically valid musical stimuli by drawing together computational, behavioural, and electrophysiological evidence. Unlike rule-based models, our computational model acquires knowledge through unsupervised statistical learning of sequential structure in music and uses this knowledge to estimate the conditional probability (and information content) of musical notes. Unlike previous behavioural paradigms that interrupt a stimulus, we devised a new paradigm for studying auditory expectation without compromising ecological validity. A strong negative correlation was found between the probability of notes predicted by our model and the subjectively perceived degree of expectedness. Our electrophysiological results showed that low-probability notes, as compared to high-probability notes, elicited a larger (i) negative ERP component at a late time period (400-450 ms), (ii) beta band (14-30 Hz) oscillation over the parietal lobe, and (iii) long-range phase synchronization between multiple brain regions. Altogether, the study demonstrated that statistical learning produces information-theoretic descriptions of musical notes that are proportional to their perceived expectedness and are associated with characteristic patterns of neural activity. © 2009 Elsevier Inc. All rights reserved. Introduction The brain's ability to anticipate forthcoming events accurately and efficiently has a clear adaptive value and predictive successes and failures often entail significant psychological and physiological consequences, modulating arousal and affecting reward circuits in the brain Previous research has investigated event-related potential (ERP) responses to violations of expectation in harmony While harmony is important in Western tonal music, it plays a less significant role in other musical traditions, which emphasize pitch, timbre, and rhythm. To date, little is known about the neural correlates of expectation in these musical dimensions. However, there is a sparse literature reporting ERP responses to violations of melodic expectation, and the picture appears to be somewhat more complex than for violations of harmonic expectation. Early studies To address these issues, we systematically investigated melodic expectation using a tripartite approach involving distinct computational, behavioural, and electrophysiological (electroencephalogram, EEG) components. Instead of designing stimuli to match experimental hypotheses, our work started with an operational definition of musical expectation embodied in a computational model Materials and methods Computational models of musical expectation Existing computational models of musical expectation fall into two groups: (i) supervised or rule-based models which generate expectations according to some static rules that predict what will happen next in a given context and (ii) unsupervised models which generate expectation based on learned associations between events that cooccur and uses these acquired associations to predict future events on the basis of the current context. Probably the best-known rule-based account of melodic expectation is that of In this study, we adopted a model of musical expectation based on statistical learning, probability estimation and information theory. We hypothesised that, while listening to music (or, indeed, perceiving other phenomena which are sequential in time), the brain anticipates or predicts possible continuations of the current (musical) context. These predictions were based on a model of the perceived domain (music, in the current case) formed by an inductive process of unsupervised statistical learning of perceived sequential structure. The learned model encodes past experience, and can be used to anticipate future events on that basis, using its acquired statistical knowledge of sequential structure to generate estimates of the probabilities of known events occurring, conditional upon the current sequential context. In music, such expectations depend on many aspects of musical structure, including harmony, but here we focused on pitch expectations for single note continuations to melodic contexts. Specifically, we predicted that a listener estimates the probability of different anticipated pitched continuations to a melody using the frequency with which each one has followed the context in his/her previous musical experience. High-probability notes are expected, while low-probability notes are unexpected. We have developed a computational model that embodies this account of expectation. The model's goal is to estimate in any context a conditional probability distribution governing the probability of the pitch of the next note in a melody given the preceding notes. Thus, if we represent a melody X of n notes as a sequence of pitches, x 1 , x 2 , ..., x n , the goal of the model is to estimate the conditional probability of the i-th note in the melody, p(x i |x 1 , .., x i − 1 ). Given these estimates of conditional probability, the model's expectations may be quantified by information content The model has been designed to produce probability estimates that are as accurate as possible and we now summarise how this is achieved. Probabilities were estimated using n-gram models commonly used in statistical language modeling The most elementary n-gram model of melodic pitch structure (a monogram model where n = 1) simply tabulates the frequency of occurrence for each chromatic pitch encountered in a traversal of each melody in the training set. During prediction, the expectations of the model are governed by a zeroth-order pitch distribution derived from the frequency counts and do not depend on the preceding context of the melody. In a digram model (where n = 2), however, frequency counts are maintained for sequences of two pitch symbols and predictions are governed by a first-order pitch distribution derived from the frequency counts associated with only those digrams whose initial pitch symbol matches the final pitch symbol in the melodic context. Fixed-order models such as these suffer from a number of problems. Low-order models (such as the monogram model discussed above) clearly fail to provide an adequate account of the structural influence of the context on expectations. However, increasing the order can prevent the model from capturing much of the statistical regularity present in the training set. An extreme case occurs when the model encounters an n-gram that does not appear in the training set in which case it returns an estimated probability of zero. To address these problems, the models used in the present research maintain frequency counts during training for n-grams of all possible values of n in any given context. During prediction, distributions are estimated using a weighted linear combination of all models below a variable order bound, which is determined in each predictive context using simple heuristics designed to minimize model uncertainty. The combination is designed such that higher-order predictions (which are more specific to the context) receive greater weighting than lower-order predictions (which are more general). In a given melodic context, therefore, the predictions of the model may reflect the influence of both the digram model and (to a lesser extent) the monogram model discussed above. Furthermore, in addition to the general, low-order statistical regularities captured by these models, the predictions of the model can also reflect higher-order regularities which are more specific to the current melodic context (to the extent that these exist in the training set). For the purposes of this study, the model derives its pitch predictions from a representation of pitch interval and scale degree reflecting the fundamental influence of melodic and tonal structure respectively (though in other work we use richer representations). Each note in a melody is represented by a pair of values: first, the pitch interval preceding the note; and second, the scale degree of the note relative to the notated key of the melody. The long-and short-term models produce probability distributions generated over an alphabet of such pairs and these are converted into probabilities for concrete chromatic pitches before being combined. The long-term component was trained on the corpus of melodies shown in Ethics statement Both behavioural and electrophysiological experiments were approved by the local ethics committee of the Department of Psychology at Goldsmiths College, University of London. Informed written consent was obtained from all participants. Behavioural experiment Participants and experimental design Forty participants (17 females and 23 males, age range 19-72 years, mean age 27.58 years) consisting of 20 musicians (10 females and 10 males, age range 19-72 years, mean age 30.5 years, 18 right-handed, 2 left-handed) and 20 non-musicians (7 females and 13 males, age range 19-40 years, mean age 29.6 years, 17 right-handed, 3 left-handed) took part in the experiment. Musicians had an average of 12.5 years of training and had played a musical instrument for an average of 22.5 years, whereas non-musicians had an average of 0.48 years of formal training and had played an instrument for an average of 1.6 years. All participants were either students or staff at Goldsmiths, University of London, and were in good health, with normal hearing and no past history of neurological illness. In total, five participants self-identified as being left-handed. The stimuli consisted of 28 hymn melodies (see In each melodic excerpt, two notes were selected as locations to probe the expectations of listeners. The probe locations were selected according to the predictions of the computational model of perceived pitch expectations in melody (Pearce and Wiggins, 2006) described earlier. According to the model, in the melodic context in which they appear, one of these notes has a high conditional probability of occurrence while the other has a low probability of occurrence. Participants were instructed to listen carefully to the musical stimuli presented binaurally by headphones. For each stimulus, the probe locations were indicated by the rotating hand of a clock, which counted down, stepwise, in quarters, in time with the music, informing the participant in advance when they were required to respond. The participant was required to give a rating on a Likert scale of 1 to 7 (1 being highly unexpected and 7 being highly expected) on how expected or unexpected the probe note was in the context of the preceding melodic passage. After listening to each melody, the participants were asked to indicate if it was familiar to them. Practice trials were provided for familiarisation with the experimental procedure. The order of presentation of the stimuli was randomised across participants. Electrophysiological experiment Participants and experimental design Twenty healthy adult humans (13 males and 7 females, age range 19-26 years, mean age 20.7 years) participated in the EEG study. None of the participants had taken part in the behavioural study. All participants were in good health, had no past history of neurological disorders, and had no reported hearing difficulties. None of the participants reported having any formal musical training. The same set of 28 melodic excerpts selected for the behavioural experiment was used here. To avoid artefacts caused by eye/head movements, the participants were asked to listen attentively to each melodic excerpt with eyes closed. No explicit expectedness ratings were requested, and the participants were not made overtly aware of the location of the probe notes, thereby emphasizing the implicit aspect of melodic processing. Data acquisition and preprocessing EEG signals were recorded from 28 Ag/AgCl electrodes according to the extended 10-20 system (Fp1, Fp2, F7, F3, Fz, F4, F8, FC3, FCz, FC4, C5, C3, Cz, C4, C6, CP5, CP3, CPz, CP4, CP6, P7, P3, Pz, P4, P8, O1, Oz, O2) We used the EEGLAB Matlab® Toolbox (Delorme and Makeig, 2004) for visualization and filtering purposes. A high-pass filter at 0.5 Hz was applied to remove linear trends and a notch filter at 50 Hz (49-51 Hz) was applied to eliminate line noise. The EEG data were further cleaned of remaining artefacts by means of wavelet-enhanced independent component analysis Data analyses We performed the following types of data analysis. (i) The standard time-averaging technique to analyze the ERPs associated with high-and low-probability notes. The ERPs for each subject and condition were baseline-corrected with the mean activity from 200 to 0 ms before the note onset. Next, we computed the wavelet based time-frequency representations (TFR) to analyze (ii) the spectral power of the oscillatory contents and (iii) the spatiotemporal dynamics of the phase coupling as measured by bivariate synchronization analysis A complex Morlet wavelet was used to extract time-frequency complex phases, at an electrode i and epoch k, and amplitudes of the EEG signal x(t). The frequency domain was sampled from 2 to 60 Hz with a 1-Hz interval between each frequency. To study changes in the spectral power, we used the TFR of the wavelet energy (Tallon-Baudry et al., 1997). After removing the baseline level (200 prestimulus), we normalized the wavelet energy with the standard deviation of the baseline period and expressed it as percentage of power change. Oscillatory activity was analyzed in the theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz), and gamma (31-60 Hz) frequency bands. Bivariate phase synchronization is a useful approach to assess phase synchronization between neurophysiological signals where n is the number of epochs. This index approaches 0 (1) for no (strict) phase relationship between the considered electrode pair across the epochs. The average of this index across pairs of electrodes represents a measure of global synchronization strength (R). For the bivariate synchronization analysis, a modified version of the nearestneighbour Hjorth Laplacian algorithm computed by Taylor's series expansion The PLI ranges between 0 for no coupling or coupling around 0 mod π, and 1 for nonzero phase coupling. We used an average reference before computing the PLI. At each frequency from 2 to 60 Hz with a step size of 1 Hz, the indexes R ij and PLI ij were computed and baseline-corrected (baseline being 200 ms prestimulus). They were subsequently averaged across electrodes to obtain a measure of the global synchronization strength, R and PLI. We focused our analysis on the theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz), and gamma (31-60 Hz) frequency bands. Statistics To assess the statistical differences in the spectral power and phase synchronization and phase lag indices, we first averaged these measures for each participant and condition across all electrodes. Next, for each time-frequency point in the bands under study, the averaged measures were analyzed by means of a nonparametric pairwise permutation test Results Behavioural experiment There were two categories of probe notes: high-and lowprobability. The size of the pitch interval preceding the high-probability notes (mean = 2.4 semitones) was found to be smaller than that preceding the low-probability notes (mean = 5.3, t = 6.6, p b 0.01). Furthermore, using the empirical key profiles of The mean expectedness ratings and response times are summarised in For the perceived expectedness ratings, the analysis revealed significant main effects of probe type, F Finally, to assess the significance of these results, we compared the performance of our proposed computational model with a competing rule-based model, the two-factor model of