Intonation cues to English public discourse perception

Cover Page

Cite item

Full Text

Abstract

The study considers the discourse functions of intonation and identifies a few problems and contradictions in the current theories of intonation functioning. Particular attention is paid to such issues as the multiple functions of nuclear tones, pitch declination in utterances and perceptual identification of spoken paragraphs. An attempt has been made to handle these problems in the context of English public speeches delivered in the format of TED talks. The purpose of the auditory analysis of discourse intonation in these talks is to check the perceptual reliability of intonation cues in processing spoken discourse and the possibility of using intonation as an on-line perception strategy. The methods applied in the research are descriptive, auditory and comparative, supported by a certain amount of quantitative data. The results obtained in the auditory analysis show that intonation cues work most effectively at the level of intonation groups and utterances but are not self-sufficient in paragraph identification. The leading function of nuclear tones in public discourse organization turns out to be the information structuring of utterances, which is occasionally interrupted by the attitudinal function. Pitch declination has been found to be one of the most important cues to the integrity and cohesion of utterances, with an average length of three-four intonation groups. Multiple declinations in spoken utterances are infrequent and are triggered by particular types of syntactic relations in elongated sentences. The study contributes to the linguistic description of discourse intonation, and its results can be beneficial for language teaching practice and automated speech synthesis.

Full Text

Introduction

Intonation has long been viewed as an important linguistic means facilitating spoken discourse production and perception. It is responsible for marking global and local relations between subsequent discourse units, being a cue to segmentation and cohesion of texts [Wennerstrom 2001; Wichmann 2013]; it contributes to revealing the information structure of utterances, their theme-rheme components, given and new information [Brazil, Coulthard, Johns 1980; Halliday, Matthiessen 2004; Fawcett 2007]; it can even signal certain types of rhetorical relations between neighbouring discourse units [Asher, Vieu 2005; Kleinhans et al. 2017; Riester, Nápoles, Hoek 2021]; it can bear some stylistic and pragmatic implications [Kalita 2019; Mitrofanova 2022]. The aim of the present study is to provide an overview of the theoretical framework for the analysis of intonation in spoken discourse, spotting particular problems and contradictions, and to use this framework in the auditory evaluation of English public speeches to check its reliability and effectiveness as a perception strategy.

By intonation we mean, first and foremost, speech melody created by pitch variations in speech flow, although a wider notion of it, embracing pitch, tempo, loudness, sentence stress and voice timbre, is also widely applied in phonetics and termed 'prosody' [Crystal 1969]. Suprasegmental and continuous, melody nevertheless includes perceptually bright discrete points called nuclear tones which can be defined as considerable changes of pitch on the last stressed word of the intonation unit. The intensive pitch changes are also supported by increased values of duration and loudness on nuclear syllables or words. Nuclear tones are not only perceptually prominent, they are most actively involved in the performance of all the intonation functions and form paradigmatic rows of contours and meanings for different functions.

The intonation group created and delineated by a nuclear tone is considered by some scholars to be the basic unit of discourse production and perception, as its length corresponds to the volume of human working memory and one focus of consciousness, while extended utterances with a few intonation groups are coordinated by the so-called superfocus of consciousness [Chafe 1994]. We will follow this point of view on the basic role of the intonation group in the structure of spoken discourse.

Apart from dividing speech stream into basic discourse units and providing a certain degree of connection between them, nuclear tones are also involved in the expression of the information structure of utterances, that is marking the semantic centre of every intonation group and identifying each of them as belonging to a theme or rheme component. This actual division into a theme and a rheme is primarily performed at the level of utterances, not intonation groups [Firbas 1992]. The type of tone in an intonation group indicates if it is referred to by the speaker as part of a theme or a rheme. The theme is expected to be pronounced with the falling-rising nuclear tone [Brazil, Coulthard, Johns 1980; Halliday, Matthiessen 2004], characterized by the researchers as referring. While a theme contains information accessible from the preceding context or circumstances, the rheme expresses new information, added by the speaker to what is taken to be known. The semantic centre of the rheme, therefore, is marked by the proclaiming falling tone [Brazil, Coulthard, Johns 1980]. In the framework of this function, a rising nuclear tone can emerge in a non-final intonation group of a theme or a rheme component to integrate them into a unified structure. The theme component can vary in length and be occasionally reduced to a stressed or unstressed word joining the rheme. This kind of null or atonic theme is found, for instance, in sentences with personal pronouns in the initial position. Besides, themes may be of various types: ideational, contextual and interpersonal [Fawcett 2007].

The problem with nuclear tones is determined by their multi-functional character. The orderly system of tones and their meanings presented above is likely to be shattered if a speaker chooses to express their personal stance, involvement or passion, which will lead to a large number of falling tones substituting the other types. The speaker's subjective stance, their concentration on self-expression paves the way for another communicative function – attitudinal, which is realized through a different paradigmatic line of tones and their meanings [Mitrofanova 2022]. In the context of our research material, we are going to find out what function of nuclear tones predominates in public speaking or, perhaps, how different functions interact.

Another pitch component which is actively involved in organising spoken discourse is the pitch range consisting of relative pitch levels. The pitch range can be described as a span between the lowest and the highest points of pitch modulations, which are treated as the baseline and the top-line respectively, creating the intermediate area for pitch variation. English has been found to have a wider pitch range compared to many other languages, and native English speakers demonstrate more refined pitch distinctions than non-native English learners [Clark 1999; Bus, Cardoso, Kennedy 2015], although the pitch range is subject to stylistic modifications and narrowing [Ayers 1994; Dilley 2010]. The pitch range is used and structured by speakers in an orderly way. In a number of studies [Wennerstrom 2001; Ladd 2008; Wichmann 2013], it has been established that all the intonation units in discourse – intonation groups, utterances and spoken paragraphs – are realised with declination, that is “the tendency of pitch to gradually fall in the course of an utterance and across higher units of discourse – paragraph declination” [Wichmann 2013, p. 6]. Thus, acoustic measurements of reading news items showed that a new paragraph (item) regularly began with an extra-high initial accent in the region of 240 Hz, while the first accented syllable in other sentences only reached 140-150 Hz [Wichmann 2013, p. 42]. In another reading experiment, the paragraph-initial accent was at 315 Hz and the intermediate utterances started at a stable level of about 256 Hz [Levis, Pickering 2004, p. 59). In spite of the differences in the absolute pitch values in the two observations, their relative contrast remains stable. Paragraphs appear to be prosodically marked by an extra-high pitch reset on the first accent and extra-low final falling tone suggesting a high degree of finality.

Declination is also observed at the level of extended utterances consisting of a number of intonation groups. The first intonation group in such an utterance has been found to have a greater pitch range and higher mean pitch than the other groups. If the mean pitch of the initial group is 200 Hz, the corresponding measurement for medial tone groups is 165–170 Hz [Clark 1999, p. 70]. According to such observations, every utterance appears to be realised as one declination. However, there is a suggestion that there may be a few of them, particularly in a long utterance read aloud. Thus, S. Nooteboom states that in longer sentences we often come across a declination reset, marking a new chunk of speech [Nooteboom 1997]. So, due to the different views on the character of utterance declination, this problem needs further consideration. Handling this issue about the number of declinations in an utterance is one of the objectives of the present research.

Studying intonation as a means of showing semantic connection between discourse units, some researchers go further and try to reveal particular types of dependence disclosed by intonation patterns, such as elaboration, background, result, consequence, parallel, contrast and other rhetorical relations [Jasinskaja, Mayer, Schlangen 2004]. However, a few recent studies have reported a great deal of overlapping between prosodic patterns involved in their realisation [Asher, Vieu 2005; Kleinhans et al. 2017; Riester, Nápoles, Hoek 2021]. As this issue requires further consideration and testing, it is not included into our perceptual analysis.

The purpose of this research is to apply the presented theoretical basis to intonation analysis in a particular communicative context in order to assess its validity in perceptual terms and to find some intonation peculiarities in this type of discourse. The research objectives are as follows:

1/ to find out the dominating function of nuclear tones within intonation groups;

2/ to analyse declinations in utterances and to see if multiple declinations occur in public discourse;

3/ to check if paratones (paragraph declinations) can serve as a key to dividing spoken texts into paragraphs.

Handling these issues may contribute to the solution of some practical problems in the areas of foreign language teaching (FLT) and automated speech synthesis. Public speaking is often used in FLT both as a teaching resource to develop listening and speaking skills and as a way to test the level of prepared speaking. Most textbooks on public speaking concentrate on the psycholinguistic aspects of speech preparation and delivery but do not treat intonation in any detail, whereas intonation could be a structural basis for a spoken text preparation and presentation. Speech synthesis can also benefit from phonetic research, from new phonetic facts and, on the other hand, it can verify them and show if the research has involved enough detail and captured the effect correctly [Clark 1998].

 

Materials and methods

Public discourse has been chosen for research material because we expect it to demonstrate discourse intonation features to the maximum extent, as in these conditions, speakers do their best to put across ideas as clearly, distinctly and expressively as possible. We deal here with cases of prepared speaking, as public speeches are always carefully planned and even rehearsed beforehand. The considerable amount of phonetic research done on public speeches so far has often concentrated on their elocutionary, rhetorical potential and structure, but not on the way intonation helps to achieve the phonetic clarity of speech, which is a special challenge for foreign learners of English. In addition to that, researchers have often compared the intonation organisation of spontaneous speech and reading aloud [Ayers 1994; Swerts, Strangert, Heldner 1996; Tǿndering 2011]; much less attention has been paid to intonation differences between spontaneous and prepared speech.

For listening material we selected ten public speeches from the Internet platform TED Talks (technology, entertainment, design), where speakers raise urgent contemporary problems and put forward solutions to them speaking in front of a big audience. Generally, TED talks pursue the goals to inform, persuade and entertain the listeners. These goals cannot be achieved without a skillful use of intonation. The ten talks (each 10–12 minutes long) were selected at random and had been given by male and female speakers of British, American and Australian variants of English. The major phonetic differences between them lay in speech sounds; the similarity of the communicative goals accounted for no considerable differences in intonation.

The methods used in the arrangement of the research are descriptive, auditory and comparative, supplemented with elementary mathematical calculations. The preference of the auditory intonation analysis to its electro-acoustic description is determined by aiming at obtaining results applicable in teaching practice.

At the preliminary listening stage, the spoken texts were supplied with written scrips which were annotated with the help of standard symbols reflecting their intonation: (|) – pauses between intonation groups inside utterances, (||) – boundaries between declinations inside utterances, (|||) – boundaries between spoken paragraphs, (') – stressed word or syllable, (\) – falling nuclear tone, (/) – rising nuclear tone, (\/) – falling-rising nuclear tone. The annotation was based on the researcher's listening observations, subsequently confirmed by three invited auditors, experienced teachers of English phonetics.

At the second stage, the three auditors had to do some listening to answer the following questions:

1/ Is it easy to identify intonation groups? What helps to recognise their boundaries?

a/ the nuclear tone; b/ the high-pitch initial accent (stress); c/ the pitch contrast between the low ending of one intonation group and the high beginning of the following one; d/ the physical pause.

2/ Is it easy to identify utterances that could make complete sentences in writing? Is every utterance integrated into a whole on the basis of a gradual pitch declination? Are there utterances containing more than one declination? If there are, mark the place of declination reset in them.

3/ Try to divide the speech into paragraphs. What are the most helpful cues to paragraph boundaries?

a/ the extra-high pitch of the initial accent (stress); b/ the extra-low ending of the final falling tone; 3/ the increased length of the pause; 4/ lexical means – initial discourse markers.

The listening results were discussed with the auditors and generalized into regularities, supported by some quantitative data.

 

Results and discussion

Intonation groups

We will start generalizing our observations at the level of the smallest discourse units – intonation groups. According to the auditors' assessment, it is quite easy to identify intonation groups in public speaking because they are delineated by several bright cues: nuclear tones, high initial accents and pitch level contrasts at the boundary. Pauses between intonation groups do not appear to be a reliable guide, with some functional pauses being almost imperceptible and haphazard hesitation pauses occurring inside intonation groups. If listeners follow a succession of nuclear tones, they cannot fail to identify the highlighted information centres and at the same time the boundaries of intonation groups. Thus, the initial utterance of the talk by A. Shariff consists of 6 intonation groups, all of them quite short, with one, two or three stressed words. The initially accented words imagine, job, advanced, do, same are pronounced at a high pitch level, with imagine being the highest at the beginning of the utterance and the paragraph. All the nuclear tones in this series are falling ones, the first five incomplete and the last one on 'free' complete, reaching the baseline of the speaker's pitch range.

I'magine for a \second | that your 'job was 'made re\dundant | by an ad'vanced 'piece of \software | that could 'do the \work | at the 'same 'level of \quality | for \free (Shariff 2023).

Although nuclear tones are easy to identify, their functions and meanings may be difficult to interpret. Our research material provides evidence that information structuring may be viewed as the basic default function of nuclear tones in public speaking. Large chunks across all the talks consistently demonstrate the predominant use of falling-rising tones in thematic intonation groups and falling tones on the semantic centres of rhemes. Here is a sample of such a chunk from the same speech.

The anthro'pologist 'David \/Graeber | 'wondered how \/capitalism | could su'stain 'so many of what he 'bluntly called \bullshit jobs. 'These are \/jobs | in which 'even the 'people 'doing the \/work | 'see it as \pointless, | a'ccomplishing 'nothing of so'cietal \worth. A capita'listic \/system | should 'root 'out those ine\fficiencies | but it \doesn't. And the \/reason it doesn't | is because a'longside \/capitalism | we 'also 'operate 'under a\nother system, | what the 'journalist 'Derek \/Thomson | 'calls \workism. \/Workism | is about your \/job | 'not just being the 'source of your \/paycheck, | but the 'source of your i\dentity | and your 'pathway to .self-actuali\sation (Shariff 2023).

The falling-rising tone in the sequence of utterances above allows the listener to easily identify their thematic component and leads them through it to the more important rhematic part. The theme consists here from one to three intonation groups and is never reduced to one stressed or unstressed word joining the rheme, although the latter option is possible with the ideational theme becoming more familiar.

In some thematic clusters, the attitudinal function of intonation over-weighs its information structuring capacity and, instead of falling-rising tones, we hear incomplete and complete falls. The most typical attitude expressed in the context of a public speech is an appeal to the listeners to support the speaker's position, some idea the speaker feels strongly about. Such an appeal is likely to occur at the opening or closing stage of a speech. For instance, A. Shariff starts his talk with the sentence we have already commented on above, in which the falling tones appear to be the best choice for drawing the listeners' attention to the dilemma – to work or not to work. The same speaker also ends his talk in an appealing way pronouncing even the sentence-initial if-clauses, typically associated with falling-rising tones, with falling ones.

Nuclear tones also change when people refer to their personal experiences or reminiscences, as is evident in an extract from the talk by S. Sinek, where the speaker is relating an incident at the airport with falling tones. However, after finishing this episode, the person returns to the combination of a falling-rising and a falling tone in his generalising judgments.

I was 'flying on a \trip, | and I was 'witness to an \incident | where a \/passenger | a'ttempted to \board | before their 'number was \called, | and I 'watched the 'gate agent 'treat this \man | like he had 'broken the \law, | like a \criminal. He was \yelled at | for a'ttempting to 'board 'one 'group ↑too \soon. I \said, “Why do you 'have to 'treat us like \cattle? 'Why 'can't you 'treat us like \human beings?” And 'this is e'xactly what she \said to me. She \said, “\/Sir, | if I 'don't 'follow the \/rules, | I could 'get in \trouble | or 'lose my \job”. 'All she was \/telling me | is that she 'doesn't 'feel \safe (Sinek 2014).

These observations show that the phonetic clarity of a public speech depends on the clear boundaries between intonation groups provided by nuclear tones, with the types of tones used in them indicating primarily the information structuring of the utterance, although this function may occasionally be interfered with by the other intonation functions, particularly attitudinal. The list of interfering factors remains open and requires further observations.

Utterances

Intonation groups are integrated into utterances which can be roughly defined as pronounced sentences. They are grammatically complete and carry more complicated pieces of information than their building blocks – intonation groups. Table 1 generalises some structural quantitative data including the number of paragraphs, utterances and intonation groups in the ten talks, the mean utterance length measured in intonation groups, the number of utterances with multiple declinations (or pitch resets) and the frequency of discourse markers introducing paragraphs.

The last two columns reveal two important characteristics of utterances – the mean number of intonation groups per utterance and the number of utterances with multiple declinations. According to the calculations, the average utterance in a public speech is likely to consist of 3-4 intonation groups (3,56). Presumably, speakers intuitively find this number to be comfortable for their oral speech production and listeners' perception. However, particular utterances demonstrate certain deviations from this mean, the shortest consisting of one intonation group and the longest extending to as many as 12. Both cases are extreme and rare. The former case emerges in questions, important concise statements and conclusions pronounced in one intonation group. Elongated utterances, on the other hand, are typically brought about by the introduction of homogeneous parts of sentences, enumerations or lists. This can be observed in the example below – a syntactically complex sentence including 12 intonation groups and two series of lists.

And the 'story was \/so embedded | that when re'searchers 'looked at the 'names of /trees, | /birds, | /flowers | and 'other \/keywords relating to nature, | 'used across 'millions of /books, | /songs | and /movies, | from 19'00 to 20\/14 | they 'found a dra'matic de\cline | in the 'use of those \words | across 'that \period (Gameau 2022).

Alongside the common tendency, however, there are individual variations in the utterance length. It can be noticed in Table 1 that Speech 3 is characterised by the shortest average utterance – 2,8 intonation groups, while Speech 8 has the highest measurement – 4,97. Building shorter utterances, a speaker renders an idea in a more discrete way. For instance, the following extract enumerates different causes of danger and stress in the modern world, presenting each cause in a separate short sentence.

The 'modern \/day | is e'xactly the 'same \thing. The \/world | is 'filled with \dander, | 'things that are 'trying to frus'trate our \lives | or re'duce our su\ccess, | re'duce our oppor'tunity for su\ccess. It could be the 'ups and 'downs in the e/conomy, | the un'certainty of the /stock market. It could be a 'new tech\nology | that 'renders your /business model | 'obsolete over\night. Or it could be your compe\tition | that is 'sometimes 'trying to \kill you. It's 'sometimes 'trying to 'put you 'out of \business, | but at the 'very \/minimum | is 'working \hard | to frus'trate your \growth | and 'steal your \business from you. We have 'no con\trol over these forces. 'These are a \constant, | and they are 'not going a\way (Sinek 2014).

The opposite elongated example from Speech 8 contains a brief summary of recent scientific advances in neurosience mentioning two varieties of MRI and the types of research questions they can help to answer. The first sentence is so long that it is pronounced with three declinations or two declination resets separated with the symbol || and joined with the conjunctions so, and. The second sentence also starts with the conjunction and providing a closer connection of the two sentences. On the whole, the following chunk sounds less structured than the preceding one.

In the 'past \/decade or so, | 'mainly 'due to ad'vances in 'brain \/imaging technology | such as mag'netic \/resonance imaging, | or MR\/I, | \/neuroscientists | have \started | to 'look in'side the 'living 'human \brain | of 'all \ages, | and to 'track 'changes in 'brain /structure | and 'brain \function, || so we 'use /structural MRI | if you'd 'like to 'take a \/snapshot, | a \/photograph, | at 'really 'high reso\lution | of the 'inside of the 'living 'human /brain, || and we can 'ask \questions like | 'how much \gray matter does the brain contain, | and 'how does 'that \change | with \age. And we 'also use \functional MRI, | 'called \fMRI, | to 'take a \video, | a \movie, | of 'brain ac\tivity | when par\/ticipants | are 'taking /part | in some 'kind of \task | like \thinking | or \feeling | or per\ceiving something (Blakemore 2012).

In the auditors' opinion, long and short utterances make different impressions on the listener: shorter utterances recurring in a sequence sound matter-of-fact and businesslike, whereas longer sentences make the chunk sound expository and argumentative. Utterances of different length create different rhythms and modes of perception [Borisova, Daineko 2023].

Answering the research questions about the perception of utterances, the auditors find their identification quite easy due to the high-pitched initial accent, the extra-low ending of the final falling tone and the declination throughout the whole utterance, which holds all the intonation groups together. While a single declination appears to be a universal feature of all the utterances, multiple declinations take place rarely. Table 1 indicates that there were 58 utterances with more than one declination, including 51 utterances with two declinations, 5 utterances with three and 2 utterances with four. Three talks did not contain any multiple declinations.

Table 2 shows the types of syntactic structures which are likely to trigger a new declination or pitch reset in extended utterances.

The table presents the syntactic parts of sentences which tend to form a declination reset in case of being extended over 1 intonation group. It is evident that contrastive chunks start a new declination more often than the other types, which is quite predictable as they are semantically opposed to the preceding chunk. The utterance below has three declinations, with two declination resets introduced by the contrastive conjunctions but, and yet.

And the 'central i\/dea of this work | is that the 'human 'mind and \/brain | is 'not a /single, | 'general-purpose pro\cessor, || but a co\/llection | of 'highly 'specialised com\ponents, | 'each 'solving a 'different 'specific \problem, || and yet co\/llectively | 'making /up | 'who we \are | as 'human \beings | and \thinkers (Kanwisher 2014).

In this sentence, the intonation groups but a co\/llection and and yet co\/llectively do not continue the preceding pitch declinations but are pronounced with a pitch step-up which starts a new declination. As a result, the sentence has three declinations which make it easier for the listener to catch the semantic contrasts. It is noteworthy that the average number of intonation groups in each declination is three or four, like in an independent sentence with one declination.

Additive chunks are often joined with the coordinator and, which can have different shades of meaning: and then; as a result, therefore; also, similarly; and in contrast [Wichmann 2013, p. 81], hence its ample ability to start a new declination. The sentence below is realized with three declinations, two of them separated with the conjunction and. Such a division of the long complicated utterance clearly brings home to the listener the three ideas it contains: about the brain research procedure, its positive and negative results.

So we 'spent 'much of the 'next 'couple of \/years | \scanning subjects | while they 'looked at 'lots of 'different 'kinds of \images, || and we 'showed that 'that \/part of the brain | re'sponds \/strongly | when you 'look at \/any images | that are 'faces of \any kind, || and it re'sponds 'much \/less strongly | to \any image you show | that 'isn't a \face, | like some of \these (Kanwisher 2014).

Adverbial modifiers creating a declination are usually located at the beginning of an utterance and, if lengthened, are likely to be separated not to shadow the subject-predicate line. Here is an example with an adverbial modifier of time.

'Once we 'start ex'plaining its \/properties | in 'terms of /things | 'happening in'side 'brains and \/bodies, || the a'pparently in'soluble \/mystery | of what 'consciousness \/is | should 'start to 'fade a\way (Seth 2017).

In our research material, there are only four utterances, in which the subject group forms a declination and gets separated from the predicate declination. It usually happens when the subject group is extended and specified by relative clauses and parentheticals, like in the following sentence.

So, a \/second line of enquiry | that we \/use | to 'track \/changes | in the ado'lescent \/brain || is 'using \functional MRI | to 'look at 'changes in 'brain ac\tivity | a'cross \age (Blakemore 2012).

It is typical of a subject group to form a separate intonation group but not a separate declination, because it is not recommended to make the subject group too long and interrupt the subject-predicate connection with long phrases and clauses, as “dilly-dallying will impede comprehension” [Barry 2019, p. 43].

Enumerations, or lists, often belong to the same declination, but if they become syntactically more complicated, each of them can form a separate declination to set clear boundaries for each element of the enumeration. In the following sentence there are two extended parts of the enumeration forming two declinations.

They 'tunneled her \bosom | for 'coal and \metals, || they 'scraped and \plowed | over her 'skin with their \tractors (Gameau 2022).

It has been observed that lists often include three components because “there is an attractive rhythm that comes from orderly information in threes” [Barry 2019, p. 52]. Our material confirms this regularity, most lists introduced by the speakers contain three parts, usually belonging to one declination, like in the example 'Take the /money, | 'go /home, | 'watch \TED talks (Shariff 2023).

Close to lists are parallel constructions which characterize different sides of the same object through the same syntactic pattern. In our research material, there were only two examples of this type, one of them, containing four declinations, is given below.

Because to \/them | those \/trees ... | those \/trees | that were 'home to 'thousands of 'species of \/animals | and 'millions of 'species of \/insects || those \/trees | that 'sent \/nutrients to each other | via 'underground 'fungal \/networks ... || those \/trees | that tran'spired 'moisture into the \/air | to cr'eate \rainfall | that would 'feed 'crops in \countries | 'thousands of ki'lometers a\way || … 'those \trees | were 'just \timber | for \decking (Gameau 2022).

Realising a declination in a long utterance, a speaker should have a pitch range wide enough for scaling and arranging intonation groups in a gradually descending succession. That is why longer sentences require a higher beginning and a wider pitch span to locate the intonation groups at different levels [Tǿndering 2011]. If a speaker is planning a long utterance ahead, they use the pitch span thriftily not to reach the low base-line too quickly, as a result this declination appears less steep than in short sentences [Swerts, Strangert, Heldner 1996]. The importance of declination for utterance cohesion may be connected with the high demands of the English language for syntactic and semantic linearity [Asher, Vieu 2005].

To sum up our observations and illustrations of utterances, we should charaterise pitch declination as an important intonation cue for the perception of the English utterance and setting its boundaries. Multiple declinations, however, are limited to extended sentences with particular types of syntactic relations among their parts. Multiple declinations are infrequent in public speaking, only occurring in 4,8 % of utterances.

Paragraphs

The most difficult task for the auditors was to identify prosodic paragraphs. If there were almost no discrepancies between the auditors in the perception of intonation groups and utterances, their agreement on division of the talks into paragraphs only reached 72 % (Table 3). It follows from the previous research that paragraphs are marked by an extra-high pitch on the first accented word. The auditors in our research, however, heard no significant contrasts in pitch between the initial accent in the first utterance of a paragraph and the initial accent in the subsequent utterances, all of them appearing quite high. We can suppose that the prominent extra-high beginnings of paragraphs are more typical of reading, where a reader observes the graphical indentation and signals it to the listener through a complex of prosodic features, including an extra-high pitch level on the initially-accented word. In public speaking, every utterance has a sharp prosodic contour marked by a high beginning, a low-pitch ending and a declination, therefore the initial utterance does not stand out in this respect.

To identify boundaries between paragraphs the listeners resorted to lexical and grammatical means, in particular to discourse markers, such as Now, So, Well, But, You see, You know, OK, Let me give you... / describe... / show.... They started 45,8 % of the paragraphs (Table 1).

Another way of introducing paragraphs was by asking questions stimulating the audience to think about and predict the content of the coming passage.

A third possibility speakers use to structure their speech into paragraphs is to introduce a new theme or character at the very beginning, for example, below are some paragraph introductions from Speech 6.

Another school in Lincoln …. Here's one more example … There's one more thing …

And speaking of learners, … One of the real challenges is … (Dziko 2021).

When discourse markers are chosen as a starting point, they do not elicit a pitch raise, because they themselves require a low pitch level. Being extraneous to the ideational content and performing purely organisational and interactional functions, they are likely to have a low-pitch accent [Wennerstrom 2001, p. 99]. Thus, the most frequent discourse markers in our talks Now and So were accompanied by a low falling tone or remained unstressed. Due to the low pitch level of discourse markers, the sentence initial pitch raise is shifted from the very beginning to some subsequent word, which makes the paragraph onset prosodically less prominent than expected and “accounts for at least some of the irregularities across paragraphs” [Wichmann 2013, p. 92].

In a few cases, it was the whole initial sentence of a paragraph that was pronounced at a low pitch level to make a sharp contrast with the general high pitch level of the preceding paragraph, where the speaker was discussing something in an involved, passionate manner. By coming down to a lower pitch level and decreased loudness at the beginning of a paragraph, a speaker changes the emotional key of his talk and switches the listeners' attention over to a new idea. For instance, this manner of paragraph introduction was vividly realised in the following initial utterances.

So where does this feeling come from? (Sinek 2014).

But of course, it wasn't a new story at all (Gameau 2022).

The analysis of the listeners' approaches to paragraph perception also revealed their different notions of how long a spoken paragraph should be: what was perceived as one major idea by one auditor was split into two paragraphs by another. This discrepancy typically took place at the beginning or close to the end of the text. In presenting and perceiving paragraphs as conceptual units, speakers and listeners may be guided by schemata, for instance introduction, complication, climax, denouement, final state and coda in a narrative text [Chafe 2015, p. 394]. As a result, paragraph identification requires more cognitive efforts than an analysis of the surface intonation structures, and a transition from paragraph to paragraph may sometimes be not instantaneous but gradual.

As for declination going throughout a whole paragraph, it was not easily identified by ear. In the auditors' opinion, the medial utterances started approximately at the same level, without a noticeable step-down in pitch on the beginning of a subsequent utterance. On the contrary, in Speech 3 they noticed an occasional step-up in pitch in the speaker's transition from utterance to utterance as the speaker became more involved and passionate. But when the emotion weakened, the default declination inside utterances was restored.

Now let us recapitulate the most consistent intonation cues to public discourse perception, as indicated by the auditory analysis, in Table 3.

 

Conclusion

The auditory analysis has shown that in public speaking intonation provides reliable cues to the perception and identification of such discourse units as intonation groups and utterances. Unlike spontaneous speech, prepared public speaking contains well-shaped utterances which could be easily converted into written sentences. Utterances are built of clear-cut intonation groups, which underlie a gradual deployment of thought, its linear unfolding over time. On the other hand, utterances are united into paragraphs under the control of the super-foci of consciousness, on the basis of long-term planning. Intonation alone does not seem to be a sufficient marker of paragraph boundaries. Difficulties with the discourse analysis of paragraphs are pointed out by other researchers, for example J. Sinclair and M. Coulthard, who write that “the research problem with contiguous utterances is primarily a descriptive one; major theoretical problems arise when extensive units are postulated” [Sinclair, Coulthard 2002, p. 22]. We can infer from this statement and our research that paragraphs are still to be studied more closely as regards their discourse organization and the role of intonation in their delimitation and integration.

Unlike paragraphs, utterances and intonation groups are perceived and assessed almost unanimously by ear on the basis of intonation features. Intonation groups stand out quite sharply and represent the smallest steps in the development of discourse. It is essential for perception and comprehension that listeners catch and process intonation groups but not separate words. Some discourse researchers [Ayers 1994; Couper-Kuhlen 2015] present their speech material as a succession of intonation groups, with each of them occupying a separate line. This kind of graphic layout may seem uneconomical if the page is to be printed out, but this design makes a speaker or reader concentrate thoroughly on every intonation group as a sense unit.

Intonation also plays an important role in the perception of utterances as whole complete units. We have described a few intonation cues to the perception of their cohesion and unity. Particular attention has been paid to one of them – declination throughout a whole utterance, which comfortably embraces 3–4 intonation groups. Declination is not a mere gradual physiological descent in pitch, it mirrors semantic, syntactic and prosodic dependencies in an utterance.

In conclusion, we should state that although intonation supplies structural and procedural guidelines, it is essential for information perception and processing. The intonation cues we have pointed out are important not only for the linguistic analysis of discourse, they are key elements in teaching English intonation, listening and speaking skills.

×

About the authors

Yе. N. Mitrofanova

Kursk State University

Author for correspondence.
Email: enmitrofanova2009@yandex.ru
ORCID iD: 0000-0002-6475-4077

Candidate of Philological Sciences, associate professor, associate professor of the Department of Language Theory and Methods of Teaching Foreign Languages

Russian Federation, 33, Radishcheva Street, Kursk, 305000, Russian Federation

References

  1. Asher, Vieu 2005 – Asher N., Vieu L. (2005) Subordinating and coordinating discourse relations. Lingua, vol. 115, issue 4, pp. 591–610. DOI: http://doi.org/10.1016/j.lingua.2003.09.017.
  2. Ayers 1994 – Ayers G. M. (1994) Discourse functions of pitch range in spontaneous and read speech. In: Venditti J.J. (ed.) Working Papers in Linguistics, no. 44, pp. 1–49. Available at: https://kb.osu.edu/items/efc194e3-682f-5330-bb89-a8a8dd11cfe3.
  3. Barry 2019 – Barry P. (2019) Good with Words: Writing and Editing. Michigan: Maize Books, 246 p. DOI: https://doi.org/10.3998/mpub.9997109.
  4. Borisova, Daineko 2023 – Borisova E.B., Daineko M.V. (2023) Rhythmic and syntactic structure of means to create material world image in V. Nabokov’s novel «Машенька» and its English translation «Mary». Vestnik Samarskogo universiteta. Istoriia, pedagogika, filologiia Vestnik of Samara University. History, pedagogics, philology, 2023, vol. 29, no. 3, pp. 170–176. DOI: http://doi.org/10.18287/2542-0445-2023-29-3-170-176. (In Russ.) = Борисова Е.Б., Дайнеко М.В. Ритмико-синтаксическая организация средств создания образа вещного мира в оригинале и переводе романа В. Набокова «Машенька» // Вестник Самарского университета. История, педагогика, филология Vestnik of Samara University. History, pedagogics, philology. 2023. Т. 29, № 3. С. 170–176. DOI: http://doi.org/10.18287/2542-0445-2023-29-3-170-176.
  5. Brazil, Coulthard, Johns 1980 – Brazil D., Coulthard M., Johns C. (1980) Discourse intonation and language teaching. London: Longman, 205 p. Available at: https://archive.org/details/discourseintonat0000braz/mode/2up.
  6. Bus, Cardoso, Kennedy 2015 – Buss L., Cardoso W., Kennedy S. (2015) Discourse intonation in L2 academic presentations: A pilot study. In: Levis J., Mohammed R., Qian M. & Zhou Z. (Eds). Proceedings of the 6th Pronunciation in Second Language Learning and Teaching Conference. Santa Barbara, CA, pp. 27-37. Available at: https://www.iastatedigitalpress.com/psllt/article/id/15245.
  7. Chafe 1994 – Chafe W. (1994) Discourse, Consciousness and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago: University of Chicago Press, 327 p. DOI: https://doi.org/10.2307/1423020.
  8. Chafe 2015 – Chafe W. (2015) Constraining and Guiding the Flow of Discourse. In: Tannen D., Hamilton H.E. & Schiffrin D. (Eds.) The Handbook of Discourse Analysis, 2nd ed., vol. 1. Oxford: Wiley Blackwell, pp. 391-405. DOI: https://doi.org/10.1002/9781118584194.ch18.
  9. Clark 1999 – Clark R.A.J. (1999) Using prosodic structure to improve pitch range variation in text to speech synthesis. In: Proceedings of the XIVth International Congress of Phonetic Sciences. San Francisco, pp. 69–72. Available at: https://www.cstr.ed.ac.uk/downloads/publications/1999/clark_icphs99.pdf.
  10. Couper-Kuhlen 2015 – Couper-Kuhlen E. (2015) Intonation and discourse. In: Tannen D., Hamilton H.E. & Schiffrin D. (Eds.) The Handbook of Discourse Analysis: Second Edition, 2nd ed., vol. 1. Oxford: Wiley Blackwell, pp. 82–104.
  11. Crystal 1969 – Crystal D. (1969) Prosodic systems and intonation in English. Cambridge: Cambridge University Press, 381 p. Available at: https://archive.org/details/prosodicsystemsi0000crys.
  12. Fawcett 2007 – Fawcett R.P. (2007) The Many Types of 'Theme' in English: their Syntax, Semantics and Discourse Functions. Research Papers in the Humanities, 142 p. Available at: http://www.isfla.org/Systemics/Print/Papers/Fawcett-ThemePaperv3.pdf.
  13. Firbas 1992 – Firbas J. (1992). Functional sentence perspective in written and spoken communication. Cambridge: Cambridge University Press, 239 p. DOI: https://doi.org/10.1017/CBO9780511597817.
  14. Halliday, Matthiessen 2004 – Halliday M.A.K., Matthiessen Ch. (2004) An Introduction to Functional Grammar. London: Arnold, 689 p. Available at: https://www.uel.br/projetos/ppcat/pages/arquivos/RESOURCES/2004_HALLIDAY_MATTHIESSEN_An_Introduction_to_Functional_Grammar.pdf.
  15. Dilley 2010 – Dilley L.C. (2010) Pitch Range Variation in English Tonal Contrasts: Continuous or Categorical? Phonetica, no. 67 (1–2), pp. 63-81. DOI: https://doi.org/10.1159/000319379.
  16. Jasinskaja, Mayer, Schlangen 2004 – Jasinskaja E., Mayer J., Schlangen D. (2004) Discourse structure and information structure: interfaces and prosodic realisation. In: Ishihara S., Schmitz M. & Schwarz A. (Eds.) Interdisciplinary Studies on Information Structure, no. 1, pp. 151-206. Available at: http://pub.sfb632.uni-potsdam.de/downloads/publications/A3_Jasinskaja_2004_2.pdf.
  17. Kalita 2019 – Kalita A.A. (2019) Prosodic models of accentuated personalities' English public speeches. Cognition, Communication, Discourse, no. 18, pp. 34-45. DOI: https://doi.org/10.26565/2218-2926-2019-18-03.
  18. Kleinhans et al. 2017 – Kleinhans J., Farrus M., Gravano A., Perez J.M., Lai C., Wanner L. (2017) Using Prosody to Classify Discourse Relations. In: Proceedings of Interspeech 2017: 18th Annual Conference of the International Speech Communication Association. Stockholm, pp. 3201-3205. DOI: http://dx.doi.org/10.21437/Interspeech.2017-710.
  19. Ladd 2008 – Ladd D.R. (2008) Intonational phonology. 2nd ed. Cambridge: Cambridge University Press, 356 p. DOI: https://doi.org/10.1017/CBO9780511808814.
  20. Levis, Pickering 2004 – Levis J., Pickering L. (2004) Teaching intonation in discourse using speech visualization technology. System, vol. 32, issue 4, pp. 505–524. DOI: 10.1016/j.system.2004.09.009' target='_blank'>http://doi.org/doi: 10.1016/j.system.2004.09.009.
  21. Mitrofanova 2022 – Mitrofanova Yе.N. (2022) Speakers' objective / subjective stance as a source of intonation variation in English spontaneous monologues. Russian Linguistic Bulletin, no. 7 (35). DOI: http://doi.org/10.1/RULB.2022.35.11.
  22. Nooteboom 1997 – Nooteboom S. (1997) The prosody of speech: Melody and rhythm. In: Hardcastle W.J., Laver J. (Eds.) The Handbook of Phonetic Sciences. Oxford: Basil Blackwell Ltd, pp. 640–673. Available at: https://www.researchgate.net/publication/46675980_The_prosody_of_speech_Melody_and_rhythm.
  23. Riester, Nápoles, Hoek 2021 – Riester A., Nápoles A.C., Hoek J. (2021) Combined discourse representations: Coherence relations and questions under discussion. In: Proceedings of the First Workshop on Integrating Perspectives on Discourse Annotation. Tübingen, Germany, pp. 26–30. Available at: https://aclanthology.org/2021.discann-1.5.pdf.
  24. Sinclair, Coulthard 2002 – Sinclair J., Coulthard M. (2002) Towards an analysis of discourse. In: Coulthard M. (ed.) Advances in Spoken Discourse Analysis. London: Routledge, Taylor & Francis, pp. 1–34. Available at: https://uomustansiriyah.edu.iq/media/lectures/8/8_2018_04_01!10_12_14_PM.pdf.
  25. Swerts, Strangert, Heldner 1996 – Swerts M.G.J., Strangert E. and Heldner M. (1996) F0 declination in read-aloud and spontaneous speech. In: ICSLP-96: Proceedings of the 4th International Conference on Spoken Language Processing. Philadelphia, USA, pp. 1501–1504. DOI: http://doi.org/10.1109/ICSLP.1996.607901.
  26. Tǿndering 2011 – Tǿndering J. (2011) Preplanning of intonation in spontaneous versus read aloud speech: evidence from Danish. In: Proceedings of the XVIIth International Congress of Phonetic Sciences. Hong Kong, pp. 2010–2013. Available at: https://www.researchgate.net/publication/228520110_Preplanning_of_Intonation_in_Spontaneous_versus_Read_Aloud_Speech_Evidence_from_Danish.
  27. Wennerstrom 2001 – Wennerstrom A. (2001). The Music of Everyday Speech: Prosody and Discourse Analysis. New York: Oxford University Press, 317 p. DOI: https://doi.org/10.5860/choice.39-5642.
  28. Wichman 2013 – Wichmann A. (2013) Intonation in text and discourse: Beginnings, middles, and ends. London: Routledge, Taylor & Francis Group, 172 p. DOI: https://doi.org/10.4324/9781315843599.

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Mitrofanova Y.N.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies