«Year: 2016 On the origin of post-aspirated stops: production and perception of /s/ + voiceless stop sequences in Andalusian Spanish Ruch, Hanna; ...»
In what concerns the emergence of post-aspirated stops in Andalusian Spanish, the role of both articulatory (Parrell, 2012; Torreira 2012) and perceptual factors (Ruch & Harrington, 2014) has been discussed and tested, with different and, to some extent, conflicting results (see Section 1.2). In the present study this issue is readdressed by systematically investigating the influence of the place of articulation in two varieties and two apparent-time stages of the sound change. By doing so, a comparison can be made of phonetic variation in the production of /sp, st, sk/ in groups of speakers that have to a different degree undergone the sound change from pre-to postaspiration. The paper will acoustically compare /sp, st, sk/-sequences produced by older EAS speakers — who have not yet undergone the sound change — with those of younger WAS speakers, who appear to be the most advanced in the sound change, and two intermediate groups, older WAS and younger EAS speakers (Ruch & Harrington, 2014). The influence of stop type on the duration of pre- and post-aspiration will also be analyzed. If for VOT the typical pattern as predicted by articulatory and aerodynamic factors among all four speaker groups is found, that is, the greatest VOT for the velar, the shortest for the bilabial context (Cho & Ladefoged, 1999), then the effect of stop type on VOT will be ascribed to universal phonetic principles (Maddieson, 1997).
Based on the literature on Andalusian Spanish and on findings for languages with phonological pre-aspiration, it is expected that pre-aspiration in Andalusian Spanish /s/ + voiceless stop sequences will be longer in velar than in dental stops, and shortest preceding bilabial stops, and VOT is expected to be longest in /sk/, and to be shortest in /sp/. At the same time, it is expected that pre-aspiration will fade faster in the bilabial than in the velar context, a hypothesis that will be tested by combining the factors age and stop type. Younger and older speakers are hypothesized to differ in pre-aspiration more clearly among bilabial than among velar stops. Concerning words with intervocalic stops, the typical VOT pattern (Cho & Ladefoged, 1999) is expected to occur, with velars showing the longest and bilabials showing the shortest VOT, and no difference among the age groups or varieties.
A third issue to be addressed in this paper is the perception of post-aspirated stops.
Conflicting claims have been made about the possible phonologization of post-aspirated stops in Andalusian Spanish based on acoustic data. As discussed in Section 1.2, Torreira (2012) Art. 2, page 8 of 36 Ruch and Peters: On the Origin of Post-Aspirated Stops concluded that post-aspirated stops in Western Andalusian Spanish are the result of articulatory overlap, and that no series of phonologized post-aspirated stops exists in this variety. Parrell (2012) observed that some speakers produced a long VOT across all speech rates, suggesting that this came about because post-aspirated stops might be phonologized to a certain degree in Western Andalusian Spanish.
If listeners of Andalusian Spanish are able to distinguish a minimal pair /pasta/pata/ based only on the presence or absence of post-aspiration, it can be concluded that post-aspiration is interpreted as a cue to /sp, st, sk/ on its own, and that the originally phonetic effect of a long post-aspiration is interpreted as a distinct phonetic target. Following Hyman (1976, 2013) phonologization is defined as the exaggeration of a phonetic effect “beyond what can be considered universal” (Hyman, 2013, p. 6). The use of such an exaggerated phonetic effect in the perception of a phonological contrast—in the absence of pre-aspiration or [s]—would then indicate that postaspiration in Andalusian Spanish is to a certain degree phonologized (see Baker et al., 2011; Beddor, 2009; Harrington & Stevens, 2014; Kirby, 2014, for similar accounts of phonologization).
In order to test if and to what degree listeners of Andalusian Spanish make use of post-aspiration to distinguish between /t/ and /st/, a forced-choice perception experiment was conducted with a minimal pair pata-pasta that differed only in VOT. The perception experiment was accomplished using younger and older listeners of Eastern and Western Andalusian Spanish. The idea was to assess whether listeners of EAS and WAS differed in the perception of the phonological contrast, and to learn if the sound change in apparent-time found for production of /st/ (Ruch & Harrington, 2014) also takes hold in perception. The hypothesis tested is that younger and WAS listeners will distinguish pata and pasta more categorically than older and EAS listeners. The results of this perception test were further correlated with the production data of the same speakers. The findings of several studies suggest a relationship between the perception and the production of a phonological contrast in a sound change in progress (e.g., Fridland & Kendall, 2012; Harrington et al., 2008; Kleber et al., 2012). In the case of Andalusian Spanish, the question is whether a speaker who realizes the contrast between /t/ and /st/ in production based on post-aspiration is also more sensitive to post-aspiration as a cue to the /t/-/st/ distinction in perception. However, empirical work on vowel mergers illustrates how perception and production can be misaligned, that is, how a perceived phonological contrast is by the same speaker merged in production (Babel et al., 2013; Labov et al., 1991). Further evidence against a direct production-perception link comes from studies on compensation for coarticulation (Grosvald & Corina, 2012; Kataoka, 2011) in which the degree between perceptual compensation for coarticulation and the production of coarticulation was not correlated within speakers. An analysis of the relationship between production and perception may provide further insights into the role of perceptual or articulatory factors that contribute to the sound change.
In summary, there are three aims to this study. First, to assess whether the sound change from pre- to post-aspiration for /st/ found by an apparent-time study (Ruch & Harrington,
2014) takes hold also for /sk/- and /sp/-sequences. Second, to systematically compare velar, dental, and bilabial stops in order to understand in which contexts the sound change might have started and which articulatory factors might have brought it about. And third, to tackle the hypothetical process of phonologization of post-aspirated stops in this variety of Spanish by testing if Andalusian listeners use post-aspiration as a perceptual cue to /st/-sequences.
Ruch and Peters: On the Origin of Post-Aspirated Stops Art. 2, page 9 of 36 2 Production
2.1 Method To investigate the influence of stop type, age, and variety on the production of pre- and post-aspiration, 18 isolated words from 48 speakers were analyzed.2 All target words were trisyllabic words with either intervocalic /s/ + voiceless stop sequences (12 target words, e.g., espada, estado, escapa) or intervocalic singleton stops (6 target words, e.g., separa, etapa, secaba), embedded into an /e_a/ context. The lexical stress in all target words fell on the second syllable so that a phonological /s/ occurred in the unstressed syllable (/esˈpada/, /esˈtado/, /esˈkapa/). Every target word was produced three times, resulting in a total number of 18 (target words) × 3 (repetitions) × 48 (speakers) = 2,592 tokens.
Table 1 contains a list of the target words.
All 48 subjects were native speakers of Andalusian Spanish; 24 were from Seville, the capital of Western Andalusia, and 24 from Granada, a city in Eastern Andalusia. For each variety, there was an older group (age range 55–79 years) and a younger group (age range 20–36 years; see Ruch & Harrington, 2014). These four speaker groups were equal in terms of gender, i.e., there were six women and six men in each speaker group. All but six subjects had lived for at least 20 years in Seville or in Granada. The remaining six speakers had lived for at least 20 years in the nearby surrounding area.
The recordings were carried out in spring 2011 in Seville or Granada, using the SpeechRecorder software (Draxler & Jänsch, 2004). One recording session consisted of a semi-directed interview, reading a text, and reading isolated words in which the target words of this study were imbedded. The 18 target words were displayed individually and in a randomized order on a laptop monitor at a constant rate at just over 40 items per minute, together with 45 fillers and 118 words for a related study, resulting in a total number of 181 words per speaker. A laptop computer was used with a USB device (Cakewalk UA-25 EX CV 2 or M-Audio MobilePre) and a headset microphone (Beyerdynamic Opus 54.16-3), and the recordings were digitized at 44.1 kHz. Recordings were carried out in the phonetics laboratory at the University of Seville, in the radio studio of the University
Table 1: Target words in the production study.
The current production study shares the method and parts of its materials with Ruch and Harrington
of Granada, or in a quiet room at the subjects’ residence or work place. Before starting the interview, all speakers were asked to speak in their dialect, in a natural way as if they were talking to a friend. Despite these instructions, some speakers used a more formal speech style and produced some tokens with a full alveolar fricative [s] instead of with a lenited /s/. These tokens were removed using an acoustic procedure (see below) and were not considered in the statistical analysis since the focus of this study is not on whether, but on how /s/-aspiration is realized.
From the 2,592 target tokens, 219 had to be discarded because of hesitations or false starts or because the speaker had produced a different word from the one displayed on the screen. The remaining 2,373 target words were segmented automatically using the Munich Automatic Segmentation System (MAuS; Schiel, 2004) on the basis of a broad phonemic transcription. The segment boundaries were then adjusted manually for the onset of V1 (V1.Onset), the onset (Cl.Onset) and offset of oral closure (Cl.Offset), and the offset of V2 (V2.Offset; see Figure 1). The boundaries of the oral closure were set where the energy decreased clearly, as inferred from the spectrogram and the waveform. The onset of V1 was set at the beginning of the first periodic waveform. All /s/-tokens were classified auditorily as [s], corresponding to a full alveolar fricative, or [h], corresponding to a weakened /s/ realized as either [h] or elided.
The onset of pre-aspiration (i.e., V1.Offset) and the offset of post-aspiration (i.e., V2.Onset) were set by an automatic procedure using two pitch trackers, one based on ESPS/Waves and one based on Scheffers (1983). The idea of this procedure is to find the offset of voicing preceding the oral stop closure, and the onset of voicing subsequent to the stop closure. For the onset of pre-aspiration, this was done by moving from left to right and starting at V1.Onset to find the first point in time where the pitch value was zero. When voicing ceased preceding the oral closure, which was the case for the majority of /sp, st, sk/ tokens, this was done based on the pitch calculated with the Figure 1: Waveform and spectrogram of the word /esˈtanko/, produced by a young female speaker from Granada. The solid lines represent the manually set boundaries, the dashed lines the automatically set boundaries.
Ruch and Peters: On the Origin of Post-Aspirated Stops Art. 2, page 11 of 36 ESPS/Waves algorithm. When voicing extended into the oral closure (which happened in several tokens with intervocalic /p, t, k/), then the pitch calculated with the Scheffers (1983) algorithm was used instead (see Ruch & Harrington, 2014, p. 15, for details on the differences between the two pitch trackers and for further reasoning about this method).
To set the onset of post-aspiration, exactly the same process was used in exactly the same way, but going backwards in time from right to left, starting at V2.Offset. Tokens where no pitch could be calculated because the preceding vowel was completely voiceless or deleted were removed; this resulted in the removal of 137 tokens.
The semi-automatically calculated interval between V1.Offset and Cl.Onset was defined as voice termination time (VTT) which is henceforth used to measure the duration of pre-aspiration. Accordingly, voice onset time (VOT) — the interval between Cl.Offset and V2.Onset — is used to measure post-aspiration duration. It has to be kept in mind that in perception, Andalusian listeners might also rely on other acoustic details such as the voiced transition between the preceding vowel and the voiceless pre-aspiration (see Ní Chasaide, 1985, for a perception experiment with Icelandic listeners).