Our minds are required to process a vast amount of sensory information in order to perceive the perceptual world around us. This process involves both the integration and discrimination of multisensory information. Audiovisual interactions are commonly used to investigate multisensory processing. Multisensory processing refers to how the information presented to one sensory modality can impact the way information is processed and perceived in another modality (Briscoe, 2016). Previous research has suggested that there are differences between monolinguals and bilinguals with regards to such multisensory processing (Marian et al., 2018; Bidelman and Heath, 2019). This study will investigate the difference between monolinguals and bilingual’s performance on multisensory processing using a temporal order judgement (TOJ) task. Furthermore, the study will have a specific focus on the discrimination of audiovisual speech perception as this is an underdeveloped field of research.
Existing literature investigating audiovisual speech perception has suggested that there is a difference between how bilinguals and monolinguals integrate the visual and auditory inputs. Bilinguals have been found to place a stronger reliance on the visual input, over auditory input, when dealing with audiovisual processing, relative to monolinguals (Marian et al., 2018). Marian et al. (2018) conducted an experiment on multimodal processing with a focus on the McGurk effect; a phenomenon whereby what we see changes what we hear. For example, hearing the auditory stimulus of “ba” but seeing the visual stimulus of “ga” will make people think they heard “da” which is an intermediate sound. They found that bilinguals experienced more McGurk-type effects, suggesting that they placed a heavier reliance on the visual input compared to monolinguals. This demonstrates that language background has an impact on the way we integrate audiovisual information.
Furthermore, there is a vast amount of research into audiovisual processing which has focused on non-linguistic stimuli. For example, Shams, Kamitani and Shimojo’s (2002) research into audiovisual processing revealed a visual illusion called the double flash illusion whereby, when participants were presented with a single visual flash alongside multiple auditory beeps, they perceived more than one flash. Bidelman and Heath (2019) further investigated the double-flash illusion comparing monolinguals and bilingual’s performance. In their study, participants were presented with one visual flash alongside multiple auditory stimuli and had to make a judgement as to how many flashes they saw. They also measured the temporal binding window, a period of time where paired sensory stimuli are integrated or discriminated (Navarra et al., 2005), by varying the stimulus onset asynchrony (SOA’s) between the visual and auditory stimuli. They found that bilinguals responded faster and were less susceptible to the double flash illusion compared to monolinguals. Therefore, bilingualism may enhance the cognitive ability to discriminate between audiovisual stimuli.
With regards to the temporal binding window, much of the current literature indicates a relationship between the width of the temporal binding window and the accuracy of audiovisual processing. More specifically, it has been shown that individuals with a narrower temporal binding window are less susceptible to illusions such as the McGurk effect and double flash illusion (Stevenson et al., 2012). Bidelman and Heath’s (2019) research found that bilingual individuals have a more refined temporal binding window for audiovisual processing, therefore experiencing fewer double flash illusions, compared to monolinguals. Research concerning why bilinguals have a narrower temporal binding window suggests that this could be due to the auditory and visual system experiencing increased functional connectivity (Erickson et al., 2014 as cited in Bidelman and Heath, 2019).
A common method used to assess audiovisual processing is a TOJ task. Typically, a TOJ task displays different sensory information at varying milliseconds in time (SOA) and asks participants to judge which sensory input came first. Practice trials are usually administered to familiarise participants with the task. For example, Thompson and Frank (in prep) used a TOJ task with varying SOA’s (e.g, 100 milliseconds between audio and visual presentation) to assess audiovisual processing. Participants were required to view (flash) and listen (beep) to the stimuli and make a decision as to which stimuli was presented first. They found that bilinguals were faster and more accurate at discriminating between which stimuli was presented first relative to monolinguals.
Although evidence suggests that bilinguals are proficient at executive function tasks outside of language, they may have difficulty with regards to language-based tasks. Bialystok and Feng (2009) highlighted that a crucial aspect of most evidence suggesting that bilinguals are better at executive function use non-verbal stimuli within their study. Generally, research has shown that bilinguals have a lexical retrieval disadvantage (Bialystok and Feng, 2009), smaller vocabulary (Portocarrero, Burright and Donovick, 2007) and display advantages in nonverbal creativity whereas monolinguals displayed advantages in verbal creativity (Kharkhurin, 2010). Additionally, the frequency-lag hypothesis proposes that as bilinguals know two languages, they divide their frequency-of-use between both languages thus using them less frequently, which consequently leads to slower lexical retrieval compared to monolinguals (Gollan et al., 2008; Emmorey, Petrich and Gollan, 2013). Therefore, the use of language measures in executive function tasks may impact bilingual’s performance relative to monolinguals.
When dealing with multisensory events, the sensory input is subject to natural lagging when being sent to and processed in the different sensory streams (Vroomen and Keetels, 2010). The speed at which the auditory stimuli reaches the sensory receptors is slower compared to the visual stimuli in audiovisual processing (Van Eijk et al., 2008). However, the processing speed for auditory input (approximately 10 milliseconds) is faster than for visual input (approximately 50 milliseconds) (Vroomen and Keetels, 2010). When audio and visual input arrives simultaneously, the brain activation for the auditory input occurs 30-50 milliseconds before the visual input (Van Eijk et al., 2008). Consequently, although the visual input reaches the sensory receptors first, it takes longer to process therefore individuals perceive this as the same multisensory event. Moreover, if the visual was presented before the auditory stimuli, individuals would still perceive this as the same event as the visual stimuli takes longer to process. However, if the auditory was presented before the visual stimuli, individuals would be more likely to view these as two separate events as the auditory stimuli has a fast processing speed. Therefore, individuals are more tolerant of a lag in an auditory stimulus over a lag in a visual stimulus suggesting that they would be better at detecting asynchronies when the audio is presented first (Van Eijk et al., 2008). However, Marian et al. (2018) suggested that bilinguals place a heavier reliance on the visual stimulus over the auditory stimulus they receive. Nevertheless, research suggests that visual information can enhance bilingual auditory ability (Navarra and Soto-Faraco, 2007). Therefore, it could be possible that bilingual’s reliance on the visual input enhances their processing of the auditory input.