We start our discussion of language and technology with voice recognition. Bernard mentions a general bias towards female voices, as discussed in this paper:
Edworthy J., Hellier E., & Rivers J. (2003). The use of male or female voices in warnings systems: a question of acoustics. Noise and Health, 6(21): 39-50.
Pitch range is also important, as demonstrated in the experiment on using different voices for sat navs that Erika mentions:
Niebuhr, O., & Michalsky, J. (2019). Computer-generated speaker charisma and its effects on human actions in a car-navigation system experiment: or how Steve Jobs’ tone of voice can take you anywhere. In Misra S. et al. (eds) Computational Science and Its Applications – ICCSA 2019. Lecture Notes in Computer Science, vol. 11620: 375-390. Springer, Cham. https://doi.org/10.1007/978-3-030-24296-1_31
Moving from acoustics to culture, the following paper discusses how male voices are perceived as more authoritative:
Anderson R.C., & Klofstad, C.A. (2012). Preference for leaders with masculine voices holds in the case of feminine leadership roles. PLoS ONE, 7(12): e51216. https://doi.org/10.1371/journal.pone.0051216
It is worth sharing a few more auto-captioning gems in the lectures of Veronika and her colleagues at Lancaster University:
“my grammar is leaving me” → “my grandma is leading me”
“n-sizes” → “incisors”
“Hardaker and McGlashan” → “heartache and regression”
“institutional” → “it’s too slow” (truth!)
“masculine” → “mass killer” (bit harsh)
And finally, one for fans of the Doctor:
On readability, Bernard mentions an example from accounting, namely the obfuscation hypothesis. The following paper on the topic is considered the first accounting study that uses automated textual analysis with a very large sample to address readability:
Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting & Economics, 45: 221–247. doi:10.1016/j.jacceco.2008.02.003
The first of three interview guests is Joke Daems from Ghent University, whose publications include the following:
Daems, J., Vandepitte, S., Hartsuiker, R.J., & Macken, L. (2017). Identifying the machine translation error types with the greatest impact on post-editing effort. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.01282
Daems, J., Vandepitte, S., Hartsuiker, R.J., & Macken, L. (2017). Translation methods and experience: A comparative analysis of human translation and post-editing with students and professional translators. Meta: Journal des traducteurs/Meta: Translators’ Journal, 62(2): 245-270.
We then go on to talk about sentiment analysis, which is used to find out about, for example, brand perceptions or patient satisfaction. Here is an example of the latter:
Hopper, A. M., & Uriyo, M. (2015). Using sentiment analysis to review patient satisfaction data located on the internet. Journal of Health Organization and Management, 29(2): 221-233. DOI 10.1108/JHOM-12-2011-0129
In the context of this episode, we want to distinguish between corpus linguistics and computational linguistics. Although language corpora are used to train systems in machine learning, corpus linguists engage in the computer-assisted analysis of large text collections, often combining automated statistical analysis with manual qualitative analysis. A company using such mixed corpus linguistic methods to provide their customers with insights about their products and services is Relative Insight. (We did not receive any funding from them for this episode, but they are a spin-off company that started at Lancaster University.)
A critical evaluation of another area of computational linguistics, topic modelling, written by two corpus linguists is:
Brookes, G., & McEnery, T. (2018). The utility of topic modelling for discourse studies: A critical evaluation. Discourse Studies, 21(1): 3-21. https://doi.org/10.1177/1461445618814032
(Incidentally, the above paper is also based on data about patient satisfaction.)
The PhD thesis on automatic irony detection that Bernard mentions was written by Cynthia Van Hee and is available here.
The second interview quest is another one of Bernard’s colleagues from Ghent University, Orphée De Clercq. Her recent publications include:
De Bruyne, L., De Clercq, O., & Hoste, V. (2021). Annotating affective dimensions in user-generated content. Language Resources and Evaluation, 55(4): 1017-1045.
De Clercq, O., De Sutter, G., Loock, R., Cappelle, B., & Plevoets, K. (2021). Uncovering machine translationese using corpus analysis techniques to distinguish between original and machine-translated French. Translation Quarterly, 101: 21-45.
And finally, we talk to Doris Dippold from the University of Surrey in the UK. Her work on chatbots can be found in:
Dippold, D., Lynden, J., Shrubsall, R., & Ingram, R. (2020). A turn to language: How interactional sociolinguistics informs the redesign of prompt: response chatbot turns. Discourse, Context & Media, 37. https://doi.org/10.1016/j.dcm.2020.100432
The following screenshots of human-chatbot conversations show that more work remains to be done. See also the publication below for recent research on the performance of Facebook’s open domain chatbot Blender.
A. Seza Doğruöz and Gabriel Skantze. 2021. How ‘open’ are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation. SIGDIAL 2021 : 22ND Annual Meeting of the special interest group on discourse and dialogue (SIGDIAL 2021). pp 392-402.