Chapter 2 Intonation and emotions in Kɔnni: A preliminary study Michael Cahill SIL International One of the paralinguistic functions of intonation is the use of gradient changes of pitch and duration to indicate emotional states of the speaker. This study examines the difference in pitch of Kɔnni native speakers speech which accompanies several different emotions. A neutral utterance was compared to the same sentence uttered as if the speakers were surprised, bored, angry, contemptuous, and wanted to emphasize the sentence. Base pitch level, pitch range and overall duration of the sentences were measured and compared to the neutral statement. The results of this study are compatible with those found in other languages, and add to the knowledge of how tone languages are able to express paralinguistic intonation in a systematic way. 1 Introduction The term intonation does not have a universally agreed on definition. Some researchers either explicitly define it in terms of pitch alone or seem to assume such a definition (Lieberman 1967; most papers in Bolinger 1972; Gussenhoven 2004). As Hirst & Di Cristo (1998: 3) note, the terms intonation and prosody have often been used interchangeably in the literature. These authors spend significant time discussing the ambiguities of various terms, distinguishing intonation proper, which deals with pitch, from the broader concept prosody, which also includes intensity and quantity. Ladd (2008: 4) gives a useful definition which we will assume here: the use of suprasegmental phonetic features to convey postlexical or sentence-level pragmatic meanings in a linguistically structured way (his emphasis). Though pitch will be the most common measure referred to in this paper, duration will also be examined. A particular instance of intonation can be either structural or paralinguistic (Gussenhoven 2004; Ladd, Scherer & Silverman 1986; Ladd 2008). Structural intonation is categorical and phonological, indicating linguistic boundaries or morphosyntactic functions. Paralinguistic intonation involves gradient phonetic values of pitch, as well as duration and intensity, often indicating emotions and attitudes. Kɔnni has cases of each Michael Cahill Intonation and emotions in Kɔnni: A preliminary study. In Doris L. Payne, Sara Pacchiarotti & Mokaya Bosire (eds.), Diversity in African languages, Berlin: Language Science Press. DOI: /langsci.b Michael Cahill of these, and a broader range of intonational patterns is examined in Cahill (2016), but this paper s focus will be on paralinguistic intonation. Intonation in tone languages has not been studied nearly as much as in non-tonal languages, probably on the assumption that lexical and grammatical tone would override any pitch differences attributable to intonation. 1 The papers on various languages in Hirst & Di Cristo s (1998) survey include a few tone languages (Thai, Vietnamese, and Beijing Mandarin), and the papers on Thai and Vietnamese have some detailed remarks on the topic of this paper: how emotional states influence intonation. However, the overall literature on emotions and intonation in tone languages is still sparse. Green s (2009) work titled Prosody and Intonation in Non-Bantu Niger-Congo Languages: An Annotated Bibliography includes 125 works on individual languages, of which only five deal at all with intonation, and none with the emotion/intonation issues addressed here. Tone languages often use particles or other morphosyntactic strategies, rather than pitch, to indicate grammatical functions which are indicated by pitch in other languages. Focus will serve to illustrate this difference. Narrow focus in English is indicated intonationally, with pitch as a major component: You drove to the store (that is, you didn t walk ). Cruttenden (1997: 73) notes that tone languages are less likely to use intonation as a means of focus than non-tone languages, and several recent studies affirm this. In Awutu (Lomotey 2014), a deliberate attempt to have speakers focus on one part of a sentence produced almost no pitch variation. Schwarz (2009) writes that Kɔnni and two closely related languages (Buli and Dagbani) use only morphosyntactic structure to indicate focus. Harley (2009) notes five strategies that Tuwili uses for focus. Four are morphosyntactic, though there is also a pitch-accent strategy. Even in the non-tonal African language Wolof, focus is marked by morphosyntactic means, not by intonation (Rialland & Robert 2001). Fiedler & Jannedy (2013) show that Ewe s most reliable prosodic cue for focus is not F0, but duration of the focused element. In light of this, the natural question that arises is how intonation can function in a tone language, since both intonation and tone affect pitch. Cruttenden (1997: 9 10) notes four ways that tone languages may implement what he terms superimposed intonation: The pitch level of the whole utterance may be raised or lowered. The range of pitch may be narrower or wider. The normal downdrift of a sentence may be suspended. The final tone of the utterance may be modified. The first two of these, and sometimes the others, are paralinguistic expressions of intonation, and these are more common than structural intonation in African languages. We will see the first two of these change in pitch level and change in pitch range exemplified in the present study on Kɔnni on the interaction of pitch with states of emotions in Kɔnni. Kɔnni ([kma], Gur family) has two underlying tones, High (H) and Low (L). These may combine as rising (LH) and falling (HL) contours on single syllables. A second 1 An exception to this is a a volume entirely devoted to intonation in African languages, Downing & Rialland (2016). This includes my broader review of several intonation patterns in Kɔnni. 26 2 Intonation and emotions in Kɔnni: A preliminary study High may be downstepped from the preceding High (H! H). This sequence can appear on adjacent syllables or on a single syllable as a second type of falling tone. The tonebearing unit is the syllable, to which one or two autosegmental tones may associate. A detailed presentation and analysis of Kɔnni tone can be found in Cahill (2007). Cahill (2012) gives an examination of Kɔnni polar question intonation. This phenomenon is structural: the tone of the final syllable of the utterance is lowered in one of several distinct ways by adding tonal autosegments. For a final noun ending in High tone, either a L autosegment is added, resulting in a falling HL tone as in Table 1, example (a); or LH autosegments are added, resulting in a falling H! H tone as in example (b). Which pattern applies appears to be a lexical choice. If the final noun ends in a Low tone, HLH autosegments are added, in effect raising the tone before it is lowered, as in example (c). The final vowel of the syllable is also categorically lengthened. 2 Table 1: Statements with corresponding polar questions Kɔnni gloss a. ù sìé gìlìnsìèlé s/he is dancing gilinsiele dance sìé gìlìnsìèléè is s/he dancing gilinsiele dance? b. ʊ ŋmɪ á gúúm! bú s/he is rolling the rope ʊ ŋmɪ á gúúm! bú! ú is s/he is rolling the rope? c. ʊ dàwá níígè s/he has bought a cow ʊ dàwá níí! gé! é has s/he bought a cow? Sometimes polar questions are the response given when the speaker is asked to act surprised, as we will see in some situations in this paper. 2 Methodology The data for this study was gathered by Mr. Konlan Kpeebi of the Ghana Institute of Linguistics, Literacy, and Bible Translation (GILLBT). It was a small part of a broader project which gathered data on several aspects of Kɔnni intonation (Cahill 2016). I provided detailed instructions but was not personally present for the data gathering. Kpeebi recorded Mr. Naaza Solomon Dintigi and Mr. Mumuni Salifu Barnabas, both men in their 20s and native speakers of Kɔnni. This was done in a recording studio in Tamale, Ghana. Specifics of the recording hardware are not available, but the recording quality was free of roosters and other outside noises so frequently encountered in field recording situations, and the quality was more than adequate for pitch and duration analysis. I am extremely grateful to all of them for their input and expertise. 2 Word order in Kɔnni is, like English, SVO, so the Kɔnni and their English translations here can be matched essentially word for word. 27 Michael Cahill These two Kɔnni speakers were verbally given a natural Kɔnni sentence, and told to first say it normally (termed the neutral intonation here), then to repeat the same sentence, saying it as if they were experiencing various emotional states. Instructions were given in English, in which the speakers are both fluent. They repeated each utterance three times. Solomon produced one sentence with its emotional variants, and Salifu produced that sentence as well as six additional sentences with emotional variants, seven in all. Table 2: Sentences produced by speakers Both speakers ʊ dìgìwó nyʊ à s/he has cooked yams Salifu only a. ʊ gàwá! nyʊ ŋ s/he has gone to market b. ʊ dààwá kpááŋ s/he has bought oil c. ù dìgìwó gɪĺà s/he has cooked eggs d. ù chʊ ŋwá gɪĺà s/he has fried eggs e. ù dùùwó! sááŋ s/he has eaten TZ (porridge) f. ʊ chɔ gɪ sɪ wó! bólíŋ s/he has fetched fire The emotional states chosen were as if the speakers were surprised, bored, angry, contemptuous, and finally, emphatically (emphasizing that the statement is what really happened). Studies on emotions and intonation have covered a very wide and inconsistent list of emotions, even including irony and admiration (Đố Thế Dũng, Trần Thiên Hưong & Boulakia 1998: 402). All the emotions in this study (plus several others) were included in the study on Thai by Luksaneeyanawin (1998), and several other studies in Hirst & Di Cristo (1998) included emphasis. Surprise, anger, boredom, and emphasis were all mentioned by Ladd (2008) as emotions that have been the subject of intonation studies. Considering my previous background in Ghana, as well as these fairly common prior mentions of these emotions in intonation studies, the choice of particular emotions in this study were a reasonably practical subset of possible states to elicit. The response sentences had the same word order and lexical and grammatical tones as the input neutral sentences, with one partial exception. The surprise response often resulted in the speakers producing a polar question. That is, S/he went to market, expressed with surprise, became S/he went to market? These are discussed somewhat separately in this study. To test the phonetic variation of pitch in the utterances, the pitch range and base pitch level were measured. 3 This was done by measuring the frequency at two positions in each utterance: the initial Low tone of the sentence, and the first High tone in the 3 All recordings were analyzed using SIL s Speech Analyzer program, available as free download at http: // 28 2 Intonation and emotions in Kɔnni: A preliminary study sentence. 4 The initial Low is labeled as the base level, and the difference between this and the first High is the range. All the input utterances of this study had a Low-toned pronoun sentence-initially, and a High-toned verb suffix two or three syllables later, as exemplified in Figure 1. The syllables which were measured are bolded and underlined. Figure 1: ù dìì-wó! sááŋ s/he ate TZ (a type of porridge) The frequency, in Hertz 5 is read directly off a part of the graph not included in Figure 1. The frequency was read at either the stable part of the vowel or, lacking a flat portion of the frequency, at the midpoint of the vowel. The base pitch level in Figure 1 is the frequency at the left cursor, i.e. at the Low toned [ù]. The pitch range is the difference between this Low and the High of the [wó] syllable at the right cursor. Duration has also been found relevant in studies of other African languages (Hyman & Monaka 2011; Fiedler & Jannedy 2013), even when pitch is not directly involved. So the duration of the entire sentence was also measured, the distance between two cursors again being read directly off a part of the graph not included in Figure 1. Regular and systematic differences were found between the neutral form of the utterance and the various emotional states for which data was gathered. We turn now to these. 4 An alternative would be to measure the maximal pitch range, that is, the highest and lowest pitch in the sentence. This was not done because of downdrift. As is common in African languages, there is a continual downdrift of High tones after a Low, so that in a H-L-H-L-H-L-H sequence the last H is considerably lower than the first H, and in a longer sentence the last High may even be a lower pitch than the initial Low tone. Downdrift is also the reason why an average pitch was not taken across the sentence; the longer the sentence, the lower the average pitch. 5 Reading the data in semitones is also an option in Speech Analyzer, and would be reported if there had been a mixed gender sample. 29 Michael Cahill 3 Results All the numbers reported in the following tables are averages of three utterances of that particular sentence. Frequencies are reported in Hertz (Hz), and duration is reported in milliseconds (ms). Other abbreviations in the tables are: exp= expanded range (a larger L-H difference than the neutral) cont= contracted range (a smaller L-H difference than the neutral) l means that it s only slightly more of that quality, noticeable but with perhaps marginal significance. As noted before, the requested surprise intonation often elicited a polar question as the response. In terms of structural vs. paralinguistic intonation, the polar question exhibits both. As briefly mentioned above and detailed in Cahill (2012), a polar question in Kɔnni is not only phonetically raised in pitch (paralinguistic), but is analyzable phonologically in terms of autosegments added to the neutral sentence (structural), and has one of several varieties of a falling tone on the final syllable. That syllable is categorically lengthened, and this accounts for the total duration of the surprise intonation being lengthened in all the measurements to follow (thus the label longer rather than slower ). We begin with a detailed examination of results from one sentence, with separate charts for the two speakers. Each figure in the cells is the average from three repetitions. The columns L (Hz), H (Hz) and duration are all direct measurements, with the range (H-L) being derived from the first two. The last column sums up, in general terms, the difference between that emotional state and the neutral base form. Table 3: S/he has cooked yams ù dìgìwó nyʊ à (Solomon) L (Hz) H (Hz) range (H-L) duration compared to neutral neutral bored l-exp, l-slower angry higher, l-exp, faster contemptuous higher, l-exp, faster emphatic l-higher, exp, faster surprise (no Q) higher, l-exp, faster The first thing to note is that the two speakers had a few seemingly categorial differences in their expressions. Especially noteworthy is that the bored expression was slower than the neutral one for Solomon and faster for Salifu. Also, Solomon s angry and emphatic expressions were faster than Salifu s. There was other minor variation, but the main difference between speakers was speed in three utterances. 30 2 Intonation and emotions in Kɔnni: A preliminary study Table 4: S/he has cooked yams ù dìgìwó nyʊ à (Salifu) L (Hz) H (Hz) range (H-L) duration compared to neutral neutral bored l-higher, l-exp, faster angry higher, exp contemptuous l-higher, l-exp emphatic higher, exp, slower surprise (ques) higher, exp, longer But more broadly, both speakers had results consistent with each other in that: Bored was slightly expanded in range, a definite but not robust result. Angry was definitely higher with an expanded range. Contemptuous was slightly higher and slightly expanded, again definite but not robust. Emphatic was higher, with an expanded range. On the surprise intonation, Salifu consistently responded by turning the statement into a polar (yes/no) question ( She has cooked yams? ). Solomon, however, uttered a non-question surprise intonation, which was higher and faster. It seems likely that pragmatically, the polar question is a more natural response to a surprising situation, but this cannot be verified at this point. Also, the pitch in a polar question in isolation is higher than in the corresponding statement, but the pitch in a polar question when someone is surprised is yet higher (Cahill 2012), and these two are quite distinguishable. The second situation is that which was produced and examined here. Next we turn to a variety of input sentences, with the results of speaker Salifu. These are the same ones listed in Table 2. Figure 2 shows the aggregate results for pitch of the six sentences that Salifu repeated with neutral intonation and various emotional states. For this, the raw data measurements were used and combined. (Sentence-by sentence summary tables are in the Appendix.) For each emotional state, the bottom measure is the initial Low tone of the sentence, and the second measure is the first High tone. Bars represent one standard deviation above and below the average. As can be seen, the bored and contemptuous states have approximately the same starting pitch as neutral, but with the High of the sentences slightly lower than the neutral, they have a slightly narrower range. The angry and emphatic states start slightly higher than neutral, but have a significantly larger pitch range. The surprise intonation starts at the highest pitch of all (recall that Salifu turned this into a question), and also has a significantly larger pitch range than the neutral. At this point, as a rough approximation, 31 Michael Cahill Figure 2: Base pitch and first H tone a measure of height and of range the bored and contemptuous intonations appear quite similar to each other, as do the angry and emphatic, while surprise stands somewhat apart. Measurements of duration must be done sentence by sentence, since the target sentences are not uniform in syllable count. We would expect ʊ chɔ gɪ sɪ wó! bólíŋ, with seven syllables, to have an inherently longer duration than the four syllables of ʊ gàwá! nyʊ ŋ, and indeed in the neutral form they average 801 vs. 651 ms respectively. Thus the ratio of the various emotive sentences to the neutral one is what is revealing, and these ratios are presented in Figure 3. Figure 3: Ratio of duration of emotive sentences to corresponding neutral sentence The duration of the surprise question is somewhat due to the extra mora added at the end of a polar question, as illustrated in Table 1. If 100 ms is subtracted from the average duration to account for this extra vowel, the duration of the surprise sentences drops closer to the range of the neutral. 32 2 Intonation and emotions in Kɔnni: A preliminary study The duration of the emphatic sentence is worth singling out, as it contrasts with the other sentences (ignoring surprise for now) as being longer in duration than neutral. Several observations can be made on the basis of Figure 2 and Figure 3. Again, actual data for these sentences is found in the Appendix. The bored expressions in Salifu s speech were consistently faster than the neutral ones, and most of the time had a contracted range. No consistent pattern of raising or lowering the base pitch was found. The contemptuous expressions in Salifu s speech varied in speed, but were generally faster than neutral, and most of the time had a contracted range. Again, no consistent pattern of raising or lowering was found. The bored and contemptuous patterns thus were quite similar to each other. The angry expression was sometimes higher than ne
