Linguistic Complexity and Planning Effects on Word Duration in Hindi Read Aloud Speech

Faculty: Sumeet Agarwal


Our study investigates the impact of linguistic complexity and planning on word durations in Hindi read aloud speech. Reading aloud involves both comprehension and production processes, and we use measures defined by two influential theories of sentence comprehension, Surprisal Theory and Dependency Locality Theory, to model the time taken to enunciate individual words. We model planning processes using an information-theoretic measure we call FORWARD SURPRISAL, inspired by surprisal theory which has been prominent in recent psycholinguistic work. Forward surprisal aims to capture articulatory planning when readers incorporate parafoveal viewing during reading aloud. Using a Linear Mixed Model containing memory and surprisal costs as predictors of word duration in read aloud speech (parts-of-speech and speakers being intercept terms), we investigate the following hypotheses: 1. High values of linguistic complexity measures (lexical+PCFG surprisal and DLT memory costs) lead to high word durations. 2. High values of forward lexical surprisal tend to induce high word durations. 3. High-frequency words are read aloud faster than low-frequency words. We validate the above hypotheses using data from the TDIL corpus of read aloud speech. Further, using a Generalized Linear Model to predict content and function word labels we show that lexical surprisal measures do not help distinguish between these 2 classes. Thus reading aloud might not involve distinct access strategies for content and function words, unlike spontaneous speech.

Ranjan, Sidharth; Rajkumar, Rajakrishnan; and Agarwal, Sumeet (2022) “Linguistic Complexity and Planning Effects on Word Duration in Hindi Read Aloud Speech,” Proceedings of the Society for Computation in Linguistics: Vol. 5 , Article 11.

PDF available at:
Talk video: