HAL Id: hal-03235931
https://hal.archives-ouvertes.fr/hal-03235931
Submitted on 13 Jun 2021
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Peripheral and central factors in contextual
lateralization based on interaural level differences
Bernhard Laback
To cite this version:
Bernhard Laback. Peripheral and central factors in contextual lateralization based on interaural level
differences. Forum Acusticum, Dec 2020, Lyon, France. pp.1287-1294, �10.48465/fa.2020.0743�.
�hal-03235931�
Peripheral and Central Factors in Contextual Lateralization based
on Interaural Level Differences
Bernhard Laback
Acoustics Research Institute, Austrian Academy of Sciences, Austria
ABSTRACT
It is well established that the perceived azimuth of a target sound depends on the azimuthal position of a precursor sound, referred to the precursor effect (PE). Previous studies demonstrated the PE for both interaural time differences (ITDs) and interaural level differences (ILDs). The current study focused on the contribution of peripheral and more central factors to the ILD-based PE at high frequencies. The perceived target azimuth was measured using a head-pointing task in a virtual audio-visual environment. Exp. 1 showed response compression and expansion for ipsilateral and centered precursors, respectively, consistent with the literature. As a new finding, contralateral precursors elicited response expansion across the entire azimuth range tested. A binaurally-linked model of the auditory periphery up to the auditory nerve level accounted for most of the effects observed. Exp. 2 showed, by selectively introducing a temporal pitch cue, that the PE is robust against perceptual segregation of precursor and target. Exp. 3 showed, by comparing the effects of envelope-ITD- vs. ILD based precursors, only partial translation of the PE across binaural cues. Overall, the results support the idea that the ILD-based PE at high frequencies is dominated by peripheral mechanisms.
1. INTRODUCTION
The human auditory system is remarkably accurate in determining the azimuthal angle of incidence of a sound source. Absolute localization errors for a single sound source in the frontal (best) region are in the order of 4-5 degrees (for recent review, see [1]). However, in everyday situations, sounds are often perceived in a context of preceding sounds, resulting in relative localization. On the one hand, the auditory system is very sensitive to relative sound localization, resulting in a minimum audible angle of down to 1 degree. On the other hand, there is a long-lasting line of research showing that the perceived absolute location of a sound can be systematically shifted by the presence of a preceding sound. Already in the 1920ies, it was demonstrated that exposure to a “fatiguing” or “adapting” sound shifts the perceived azimuth of a target sound away from the azimuthal position of the adapter (e.g., [2-3]). To avoid implicit interpretation of the underlying mechanism, in the following the term precursor is used for referring to any preceding sound and its effect on the target sound is referred to as precursor effect (PE). The PE has been observed for free-field stimuli as well as headphone stimuli, where either of the two
binaural cues for azimuthal localization, the interaural time difference (ITD) and the interaural level difference (ILD), was varied independently to control the perceived lateralization of precursor and target. Most important findings were that a lateral precursor shifts the perceived azimuth of ipsilateral targets towards the midline and a centered precursor shifts lateral targets further to the side [4].
Later studies demonstrated that the PE is frequency-specific, meaning that the precursor and target must share the same spectral region for the effect to occur [4-5]. This suggested that the effect involves some form of sensory interaction and is not simply attributable to a response bias at the decision level. Further, the PE caused by an ITD-based precursor was shown to transfer to an ILD-ITD-based target and vice versa, indicating that it is not cue-specific but location-specific [13]. While these (and other studies cited below) pointed to a more lower-level mechanism underlying the PE, [6] reported some influence of a higher-order effect, i.e., the perceptual similarity between precursor and target, on the PE.
Different mechanisms have been proposed to account for the PE, depending on what particular conditions were to account for. Early studies suggested that differential adaptation at the two ears could explain the effect: stronger monaural adaptation at the ear closer to the precursor would shift the “effective” target ILD cue towards the other side, away from the precursor. While this explanation appears attractive, it obviously does not easily account for the PE by purely ITD-based precursors where adaptation effects are interaurally symmetric. [7] suggested that a model based on the relative activity of three neural-perceptual channels, with broad tuning to the left and right hemifields and to the midline, accounts for ITD- and ILD-based PEs for both lateral and central precursors. Most recently, [8] propose a model that explains PEs by hemispheric-specific short-term adaptation effects in the earliest binaural processing stages followed by across-hemisphere comparison.
This study focuses on the high-frequency region (around 4 kHz) where ILDs represent the most prominent and salient azimuthal localization cue [1]. One goal was to characterize ILD-based PEs in this frequency region. Stimulus timing conditions were chosen to be representative of real-life situations with alternating sound sources. Another goal was to clarify to what extent mechanisms like the ones proposed by [7-8] are required for predict ILD-based PE at high frequencies, where peripheral mechanisms (i.e., up the level of the auditory nerve, AN) may already explain some of the effects [9]. [5] concluded that because monaural target detection thresholds were only marginally affected by the presence
of the precursor, peripheral adaptation cannot explain the PE. However, a change in absolute threshold may not be a good predictor for super-threshold effects.
To address these issues, Exp. 1 first measured the effects of lateral and diotic ILD-based precursors on the perceived azimuth of ILD-based targets. The results were then compared to the prediction of a peripheral-physiology-based ILD model. The monaural frontends, incorporating realistic cochlear compression and AN adaptation, had been shown to account for a large variety of monaural spectro-temporal [10]. Combining two such frontends with a simple ILD extraction stage had been shown to predict a variety of ILD effects [11]. Here a binaurally-linked (BL) version of the frontends was combined with the ILD extraction stage. The BL version incorporates efferent feedback via dynamic ipsi- and contralateral cochlear compression control [12]. Thus, the combined model, referred to as the BL-ILD model, has the potential to predict dynamic ILD-based effects. To the extent that the BL-ILD model accounts for the observed PEs, it determines the requirement of additional higher-order mechanisms (beyond the AN). As additional test of the involvement of high-order effects, Exp. 2 studied the influence of perceptual segregation of the target and precursor stimuli on the PEs, by comparing conditions with and without a temporal pitch cue. Finally, Exp. 3 compared the effects of ILD vs. envelope-ITD-based precursors on ILD-based targets, to address the question of binaural cue vs. location specificity of the PE.
2. EXPERIMENT 1
This experiment studied the effects of ILD-based lateral and centered precursors (Ps) on the perceived lateralization of ILD-based targets (Ts).
2.1 Methods
2.1.1 Equipment, Stimuli, and Procedure
Five participants, having no indications for past or present hearing disorders, completed the experiment (average age: 33.6 years; two females). Participants stood on a platform surrounded by a circular railing inside a double-walled sound booth, wearing headphones (HD 580, Sennheiser) and a head-mounted display (HMD, Oculus Rift CV1 Consumer Version 1) that included a head tracker. The HMD immersed the participants in the center of a virtual visual sphere that was rendered in real-time according to the participants’ head rotation. The purpose of the visual environment was to provide a visual reference frame and dynamic visual feedback, in addition to the proprioceptive feedback, in a head-pointing task.
Binaural auditory stimuli were generated using a computer and output via a digital audio interface (ADI-8, RME) at a 96-kHz sampling rate. Both T and P were band-pass filtered Gaussian noises that were generated independently in each trial. T was one octave wide and centered at 4 kHz. P had the same lower cutoff frequency but a slightly higher cutoff frequency of 6489 Hz (thus a bandwidth of 1.2 octaves). This higher cutoff of P was used to facilitate perceptual segregation from T while covering the spectral band of T.
P and T had durations of 600 and 300 ms, respectively (including raised-cosine ramps), and were separated by a 10-ms gap. These timing parameters were chosen to represent real-life situations, where, for example, speech fragments from different speakers are temporally intervened.
Trials without a P had a silent interval such that T was always presented at the same time after initiating a trial. Both P and T had an A-weighted level of 60 dB SPL. ILDs were used to induce the percept of laterality (linearly split (in dB) between ears). Negative and positive ILDs favored the left and right ear, respectively.
The virtual visual environment (implemented using Unity), immersed the participant in a sphere with a light-grey surface that was illuminated from the top and bottom. A red ball represented the frontal reference position (0 degrees azimuth). There were only two more position markers, one at the far left (-90 degrees) and one at the far right (+90 degrees). A cross-hair indicated the participants head orientation within the sphere.
Figure 1. Results of Exp.1, using band-pass noise. Mean
response azimuth across the five participants as a function of the T ILD. NoPrec = without P, Diotic = centered P, Left = left-side P (ILD=-10 dB), and Right = right-side P (ILD=+10 dB). Error bars indicate standard errors.
The general task was to indicate the azimuth of the perceived intracranial image via head turn. At the beginning of each trial the participant oriented towards the visually indicated reference position. A button press on a hand-held control device initiated the auditory stimulus. The participants had to remain at reference position (with a tolerance of r 5 degrees) until the end of the stimulus presentation (which was enforced by repeated presentation if required). The participants then indicated the perceived azimuth by an appropriate head turn and confirmation of the indicated positon by another button press. A minimal onset-to-onset interval of 2 sec between subsequent trials was enforced.
Participants were instructed that Ts were presented at random lateral positions in the frontal arc from left to right. They were asked to report the perceived azimuth of T and ignore P. Because stimuli were relatively narrowband and contained no HRTF filtering, participants likely experienced no externalization. Thus, participants had to map the perceived intracranial image to the visual response field. Recent studies showed that head-pointing in a virtual
audio-visual environment provides intuitive and reliable judgments of perceived azimuth [e.g., 17].
2.1.2 Experimental Conditions
Four basic conditions were studied. In the NoPrec condition the target was presented without precursor. The three conditions involving precursors included a central precursor (Diotic, ILD=0), a left-sided precursor (ILD=-10 dB), and a right-sided precursor (ILD=+10). Each of the four conditions was tested with 11 target ILDs, ranging from -10 to +10 dB in steps of two.
The experiment started with a short training period that contained the conditions NoPrec and Diotic and tested three repetitions per target-ILD (i.e., 66 trials in total). After checking the overall response distribution and eventually presenting follow-up instructions, the four blocks of the main experiment started. The first and fourth block tested condition NoPrec, using 8 repetitions per target-ILD (i.e., 88 trials in total). These were intended to measure the extent of lateralization of the target alone both before (Pre) and after (Post) the main experiment. The second block tested conditions NoPrec and Diotic using 8 repetitions per target-ILD (i.e., 176 trials in total). The third (main) block tested all four precursor conditions, using 12 repetitions per target-ILD (i.e., 528 trials in total). Only the data of the main block are presented here.
The trials within each block were presented in completely randomized order, using a new randomization for each participant. The overall duration of Exp. 1 was about 70 minutes (without breaks).
2.1.3 Modeling
The results were compared to the prediction of a model of the “internal” target ILD cue that is assumed to be indicative of the perceived azimuth. The BL-ILD model consists of three main stages. 1) A well-established auditory frontend up to the level of the AN [10]; 2) A BL version of two frontends that incorporates efferent feedback via the medial olivocochlear (MOC) reflex [12]. This feedback loop controls non-linear cochlear compression via both an ipsi- and a contralateral path and thereby dynamically adapts the ILD representation over the course of a stimulus and across stimuli; 3) A stage extracting the “effective” internal ILD, obtained by windowing the AN response to the target at the two ears and determining the mean interaural spike rate difference. For the range of ILDs used in the present study (r 10 dB), this simple subtraction well captures a hemispheric-difference mechanism of localization [11].
Standard model parameters for normal hearing were applied and the output of the synapse model (i.e., without spike generator) at a CF of 4 kHz was used for the predictions. The internal sampling rate of the model was 100 kHz. The ratio of ipsi- v. contralateral pathway contributions of the MOC mechanism was 2:1 (in dB), based on physiological data [12]. AN fibers with a range of spontaneous rates (SRs) were used (see Section 2.2).
2.2 Results and Discussion
Fig. 1 shows the mean raw lateralization data across participants (including SEs of the mean). The pattern
across conditions is reasonably symmetry across hemispheres.
Figure 2. Data from Fig. 1 mirrored across sides and with
Left/Right conditions transformed into Ipsi-/Contra conditions (see text). Other aspects as in Fig. 1.
Given the hemispheric symmetry of the PEs, the left- and right-side precursor conditions were transformed into ipsilateral (Ipsi) and contralateral (Contra) precursor conditions (e.g., a left-side T and right-side P and vice versa represent the category Contra). The left panel of Fig. 2 shows the transformed data after mirroring across sides. This hemispheric mirroring removed the overall bias and resulted in a completely symmetric pattern at an ILD of zero.
Figure 3. “Internal” ILDs as a function of T ILDs
predicted by the BL-ILD model. The internal ILDs are assumed to be indicative of the perceived azimuth. Different panels show results for different AN-fiber types (see text). Standard errors across model repetitions are smaller than the symbols. Other aspects as in Fig. 2.
Without precursors (NoPrec), the extent of lateralization increases almost linearly with increasing ILD, in line with the natural azimuth-ILD pattern in the tested frequency region. The presence of precursors induced shifts in perceived azimuth relative to condition NoPrec (quantified as the precursor effect, PE, the a priori defined signed difference). An ipsilateral precursor (Ipsi) shifted the perceived T azimuth largely towards the midline (average PE across ILDs ≥ 6 dB: 15.1 degrees), whereas a contralateral precursor (Contra) shifted the perceived T azimuth moderately away from midline (average PE
across ILDs ≥ 6 dB: 4.6 degrees). The ipsilateral PE tends to decrease with decreasing target ILD. The contralateral PE remains constant across the entire target-ILD range. Finally, a centered precursor (Diotic) shifted the perceived azimuth away from midline for non-zero target ILDs (average shift across ILDs ≥ 6 dB: 8.6 degrees).
Figure 4. Results of Exp.2, using “transposed noises”
providing a Pitch cue (left panel: precursor rate = 125 Hz; target rate = 250 Hz) or no Pitch cue (right panel: precursor rate = target rate = 125 Hz). Other aspects as in Fig. 1.
The effects were supported by a repeated-measures analysis of variance (RM ANOVA), using the factors target ILD, Precursor condition, and Side. Data for an ILD of zero were excluded. All the main effects and the interaction ILD vs. Precursor condition were highly significant (p < 0.001). The other two-way interactions were non-significant (p > 0.11). Tukey’s post-hoc test showed significant differences between all combinations of Precursor conditions (p < 0.001). The interaction was found to be driven by the compressive lateralization function for condition Ipsi.
The results for a lateral precursor are overall consistent with the previously shown pulling of perceived lateralization of ipsilateral targets towards midline, either under purely ILD/ITD-based or free field conditions [5,9].
One remarkable finding of the present data is that they show a pushing effect away from midline for contralateral targets, up to the most remote target. This obviously differs from the ITD-based PE at low frequencies, where the lateral PE was confined to the ipsilateral hemifield [8]. This point is addressed again in Section 4.2, by comparing Exps. 2 and 3 that used different amounts of sidedness. Finally, the expansive effect of the diotic precursor is consistent with low-frequency studies (e.g., 3-4, 8), and thus appears to be independent of frequency region.
Fig. 3 shows the predictions by the BL-ILD model, using the same format as for the data in Fig. 2. Panels a, b, and c show results for fiber types with low, mid, and high spontaneous firing rates (SRs, 0.8, 10, and 130 Hz, respectively). These differ in their dynamic range and, more importantly, in the slope of their input-output (I/O) functions. At the stimulus level used, these functions are slightly expansive for low-SR, compressive for mid-SR, and very compressive for high-SR fibers, which impacts the predictions. For all three fiber types, the model correctly predicts a shift of lateralization towards midline for condition Ipsi compared to condition Contra. The
amount of this shift depends on the amount of compression of the I/O function, being largest for the most compressive high-SR fibers. For condition Diotic the model correctly predicts expanded lateralization relative to condition NoPrec in case of compressive (mid- and high-SR) fibers. While none of the three fiber types alone correctly predicts the relation between the interaurally symmetric (NoPrec, Diotic) and asymmetric conditions (Ipsi, Contra), weighted averaging across the three types (panel d) results in a pattern that seems to account for most of the effects shown in Fig. 2. The weighting was chosen “by hand”, attempting to capture the “best” features from each fiber type (weighting coefficients of 0.7/0.2/0.1 for the low-, mid-, and high-SR fibers). Particularly, it correctly predicts the constantly larger perceived azimuth for condition Contra compared to condition NoPrec. However, even that “best” combination of fiber types does not predict the slightly larger perceived azimuth for condition Diotic compared to condition Contra.
In summary, these results demonstrate that several (but not all) PEs observed in Exp. 1 can be attributed to peripheral processing up to the level of the AN level. It should be noted that the deliberately chosen short gap duration between P and T may have emphasized the contribution of peripheral processing. PEs measured in studies using much longer gap durations of up to 500 ms may not that easily be attributed to peripheral effects (but note that the MOC onset and decay time constants are in the order of 100’s of milliseconds, see [12]).
Evaluation of the effects of different model parameters (not shown) revealed that compression and, particularly, its efferent control are important to obtain the predicted effects. Model versions with no compression at all or with compression but without its efferent control largely reduced the predictability of the PEs.
3. EXPERIMENT 2
Previous studies typically used the same stimulus as target and precursor which may promote their perceptual fusion. As state-of-the-art PE models assume that the contextual localization mechanism operates directly upon the localization information (e.g., [8]), they implicitly assume that the mechanism is independent of perceptual grouping. To the knowledge of the author, this question has not been fully answered yet. To that end, Exp. 2 tested the working hypothesis that introducing a segregation cue (i.e., a pitch difference) between precursor and target causes a release from PEs compared to a configuration without the pitch cue. Interestingly, such a scenario is theoretically possible even if the grouping mechanism operates beyond the PE mechanism, e.g. by means of a top-down feedback loop. Alternatively, if perceptual segregation of the precursor and target does not modulate the PE, it may still affect the behavioral results. For example, perceptual fusion of same-pitch precursor and target may result in merging of the precursor’s and target’s lateralization cues at the level of object formation, particularly if the ILD cues are similar. In that scenario, higher-order object fusion would partly overrule the PE.
Figure 5. Difference of PEs between conditions Pitch and
NoPitch from Exp. 2. Error bars indicate mean standard errors.
3.1 Methods
Given the dependence of the PE on the spectral overlap of precursor and target [4-5], it was important to introduce a segregation cue that does not alter their spectral range. For that purpose, the use of so-called transposed-noise stimuli, with different envelope rates for precursor and target, appeared suitable to provide a temporal pitch cue. Thus, in the condition promoting fusion (condition NoPitch), precursor and target stimuli had the same envelope rate (125 Hz) whereas in the condition promoting segregation (condition Pitch) the target had the double envelope rate (250 Hz) of the precursor. These rates approximate the mean fundamental frequencies of male and female voice, respectively. The transposed-noise stimuli were generated by band-pass filtering Gaussian noise at a center frequency of either 125 or 250 Hz (bandwidth: 25 Hz), extracting and low-pass filtering the envelope at 2000 Hz and subsequently modulating a 4-kHz pure tone with that envelope signal. The choice of the bandwidth was guided by the additional requirement that in Exp. 3 the same stimuli should provide salient envelope ITD cues.
The Pitch and NoPitch conditions were each tested in all four precursor configurations (like in Exp. 1). Two blocks with the Pitch condition and two blocks with the NoPitch condition were tested in counterbalanced order across participants. Each block included 6 repetitions of each of the 11 target ILDs for each precursor condition, resulting in 264 trials per block. The precursor and target ILDs, as well as all other aspects of the experiment, were as in Exp.1.
3.2 Results and Discussion
Fig. 4 shows the results for the two pitch cue conditions, showing grossly similar overall patterns of PE. For ease of comparison, Fig. 5 plots the PE difference for condition Pitch minus condition NoPitch. In case of the diotic and contralateral precursors, the PE tends to be smaller for condition Pitch (average PE difference for nonzero target ILDs: -2.3 and -2.5 degrees, respectively). In case of the ipsilateral precursor, however, the PE effect appears to be clearly larger for condition Pitch (average PE difference for nonzero target ILDs: 6.4 degrees).
The statistical analysis first considered the effects for the two pitch cue conditions separately. For condition Pitch, a RM-ANOVA (combined with Tukey’s post-hoc comparisons using the 5% criterion) showed significant
differences between all combinations of factor precursor condition except between NoPrec and Contra. Thus, with the pitch cue, the contralateral precursor had no effect. The analog analysis for condition NoPitch showed significant differences between all combinations of precursor conditions.
For direct comparison across the pitch conditions, a RM-ANOVA was conducted using the factors Pitch cue, Precursor condition, and target ILD, showing significant main effects of all factors (p ≤ 0.016), a significant interaction of Pitch cue vs. Precursor condition (p < 0.001) and of Pitch cue vs. Precursor condition vs. Target ILD (p = 0.01). To find out which of the PEs changed with the Pitch condition, separate RM-ANOVAs were performed for the three precursor conditions, each paired with condition NoPrec, and evaluating the interaction Pitch cue vs. Precursor condition. The interaction was non-significant for the pairs Diotic/NoPrec (p = 0.071) and Contra/NoPrec (p = 0.066), but highly significant for the pair Ipsi/NoPrec (p ≤ 0.001). A RM-ANOVA on the target alone (condition NoPrec) showed a significant main effect of the Condition (i.e., target envelope rate, p < 0.001) but no interaction with the target ILD (p = 0.71).
To summarize, the analyses revealed that introducing a pitch cue caused a larger ipsilateral PE and more lateralization of the target alone compared to the condition without a pitch cue. The stronger lateralization with an envelope rate of 250 re 125 Hz is qualitatively consistent with [11], showing increasing ILD sensitivity with increasing envelope rate from 100 to 400 Hz (which is also predicted by the model, not shown).
The main finding of this experiment was that introducing a pitch cue increased the ipsilateral PE, but had no effect for the contralateral and diotic PEs. The lack of a smaller PE for the pitch-cued condition, thus, disproves the working hypothesis that the degree of precursor-target segregation modulates the PE. The finding of an increased ipsilateral PE with a segregation cue appears consistent with the idea that in the NoPitch condition perceptual fusion of the precursor and target may have caused merging (averaging) of their lateralization cues at the object formation stage. The averaged azimuthal cue would be closer to the precursor azimuth, thus, reducing the PE for condition NoPitch and thereby increasing the PE difference to condition Pitch.
Comparison with Exp. 1 reveals better agreement of the ipsilateral PE in case of the Pitch condition. The fact that Exp. 1 also provided a segregation cue (but in a different form, namely a higher upper-cutoff frequency for P compared to T), fits to the described relation between perceptual segregation and the ipsilateral PE.
[6] reported that contextual localization effects are stronger if precursor and target are perceptually more similar, which appears not consistent with the conclusion from the current experiment. However, the stimulus modifications controlling perceptual similarity in [6] altered the spectro-temporal spread of the stimuli, which potentially invokes effects related to the spectral specificity of the PE [4]. The temporal pitch cue in the current study was intentionally designed to keep the spectro-temporal stimulus extent constant and, thus, to avoid such a potential confound.
Figure 6. Results of Exp.3, using “transposed noises” for
both P and T (precursor rate = target rate = 125 Hz). Form is analog to Fig. 2. Left panel: ILD-based P (r 6 dB); right panel: ITD-based P (r 2 ms). Other aspects as in Fig. 1.
4. EXPERIMENT 3
The first goal of Exp. 3 was to readdress the question of binaural cue- versus location specificity of the PE for high-frequency stimuli. While data with low-high-frequency stimuli showed transfer of the PE across binaural cues, suggesting location rather than cue specificity [13], the interpretation of that result regarding locus of the PE (more peripheral or more central) is somewhat ambiguous. Recent physiological data demonstrated an adaptation mechanism at the binaural cue extraction stages (Medial and Lateral Superior Olives, MSO and LSO) that appears to be capable of explaining the PE. [8] stated “Such adaptation effects should influence the neuronal coding of acoustic space in the ITD and ILD pathway similarly and thus the perception of space”, thus proposing location-specificity of the PE. To the knowledge of the author location-specificity has not yet been shown psychophysically for high-frequency stimuli. Exp. 3 therefore tested the working hypothesis that the PE does not (or not fully) translate across binaural cues. Confirming this hypothesis would suggest that either the properties of short-term adaptation mechanism in the LSO for high frequencies are different for ILD and envelope-ITD cues or that some higher-order cue-specific processes are involved.
The secondary goal of Exp. 3 was to evaluate if an ipsilateral precursor more central than a target pushes the perceived target azimuth to the side or pulls it to the center (as more lateral precursors). This should allow to disambiguate a high-level repulsion mechanism from an adaptation.
4.1 Methods
The PE were compared between ILD- and envelope-ITD-based precursors that were matched in perceived azimuth. In order to provide salient envelope-ITD cues (referred to as ITD cues), precursor and target were composed of transposed noises with an envelope rate of 125 Hz, as used in the NoPitch condition of Exp. 2. To identify perceived-azimuth-matched binaural cues, a pretest measured the perceived azimuth of a range of ILDs and ITDs mixed in one block of trials (ILD from -8 to +8 dB; ITDs from -2 to + 2 ms; 8 repetitions each). A good match was found between ILDs of r 6 dB and ITDs of r 2 ms.
ITDs were imposed on the entire waveforms. The target ILDs were the same as in Exps.1 and 2.
The four precursor conditions from Exps. 1 and 2 were tested. Conditions Diotic and NoPrec were identical to the NoPitch condition of Exp. 2 and were included for evaluating the comparability of experiments 2 and 3. Two blocks with the ILD condition and two blocks with the ITD condition were tested in counterbalanced order across participants. Each block included 6 repetitions of each of the 11 target ILDs for each precursor condition, resulting in 264 trials per block. All other aspects of the experiment were as in Exps. 1 and 2.
Figure 7. Difference of PEs between conditions ILD and
ILD from Exp. 3. Error bars indicate mean standard errors.
4.2 Results and Discussion
Fig. 6 compares the results for ILD-based (left panel) vs. ITD-based (right panel) precursors. The pattern of PE appears to be similar between these two conditions, with one exception: for condition Contra, the small expansive PE effect for the ILD condition completely disappears in the ITD condition, turning into a compressive effect, like for the ipsilateral precursor. For ease of comparison, Fig. 7 plots the PE difference for condition ILD minus condition ITD. In case of Diotic and Ipsi precursors, the PE difference does not systematically deviate from zero. For the Contra precursors, however, the PE difference is clearly positive, indicating largely reduced PE for the ITD condition (average PE difference for nonzero target ILDs: 8.7 degrees).
The statistical analysis was analog to Exp.2, considering first the effects for the two binaural cue conditions separately. For condition ILD, a RM-ANOVA (combined with Tukey’s post-hoc comparisons showed significant differences between all combinations of precursor conditions except for NoPrec vs. Contra. This differs from the NoPitch condition from Exp. 2, where a significant contralateral PE was found. Because the only stimulus difference was the smaller precursor ILD in the present experiment (6 dB) compared to Exp. 2 (10 dB), the loss of the contralateral PE appears to be attributable to smaller precursor laterality. The analog analysis for condition ITD showed significant differences between all combinations except for Ipsi vs. Contra. This confirms that in the ITD condition the Ipsi and Contra precursors had the same effect.
For direct comparison across the binaural-cue conditions, a RM-ANOVA was conducted using the factors Binaural cue, Precursor condition, and Target ILD,
showing significant main effects of all factors (p < 0.001), a significant interaction of Binaural cue vs. Precursor condition (p < 0.001) and of Binaural cue vs. Precursor condition vs. target ILD (p = 0.003). To find out which of the PEs changed with the Binaural cue condition, separate RM-ANOVAs were performed for the three precursor conditions, each paired with the NoPrec condition, and evaluating the interaction Binaural cue vs. Precursor condition. This interaction was non-significant for the pairs Diotic/NoPrec (p = 0.26) and Ipsi/NoPrec (p = 0.75), but highly significant for the pair Contra/NoPrec (p < 0.001).
On the one hand, the ipsilateral PE was similar in size for the ITD- and ILD-based precursors. This indicates a translation of the effect of an ipsilateral ITD-based precursor to an ILD-based target. On the other hand, the contralateral PE clearly differed between ITD- and ILD based precursors. In case of the ITD-based precursor, the contralateral precursor had a compressive effect that resembled the effect of the ipsilateral precursor (more explicitly: both the ipsi- and contralateral ITD-precursors caused compression). In contrast, in case of the ILD-based precursor, the contralateral precursor had no significant effect at all (neither expansive nor compression). As a check of reliability, the diotic PE was rather reproducibly when tested either within blocks of ILD-based vs. ITD-based precursors. Because the ILD- and ITD-ITD-based precursors were matched in perceived azimuth, the results suggest that the PE mechanism at high frequencies is at least partly binaural cue specific and, thus, not completely location specific. Although this outcome seems to differ from outcomes at low-frequencies were location specificity has been reported [13], a direct comparison is complicated by differences in methodology.
The secondary goal of Exp. 3 was to determine the potential impact of an ipsilateral ILD-based precursor being either more medial or more lateral than a target. This was achieved by comparing data from Exp. 2 and 3 that used large and moderate-sized precursor ILDs, respectively. Consistent with the findings of [14] testing low-frequency ITD-based PEs, the PE was always compressive, both for more lateral or more medial target locations relative to the precursors. The supports the general idea that some type of adaptation underlies the PE, rather than a high-level repulsion mechanism as proposed e.g. in [3]. Interestingly, the PE for the more medial precursor (Exp. 3) was confined to the precursor hemisphere while the PE for the more lateral precursors (Exps. 1 and 2) spread to the contralateral hemisphere. This differs from results for ITD-based low frequency stimuli where the opposite pattern was observed [8], suggesting differences of the mechanisms in the two frequency regions.
5. OVERALL CONCLUSIONS
This study used an auditory lateralization paradigm employing visual-proprioceptive feedback to characterize short-time contextual localization effects based on high-frequency ILD cues. Participants indicated the perceived azimuth of a target stimulus in isolation or immediately following different types of task-irrelevant precursor
stimuli. The contribution of nonlinear peripheral auditory processing and of higher-order mechanisms to the observed precursor effects (PEs) was studied.
Consistent with previous studies that mainly focused on ITD-based PEs, Exp. 1 showed systematic response compression of targets following an ipsilateral precursor. A new finding was that the lateral PE spread across the entire contralateral side where it caused response expansion. In a recent low-frequency ITD-based study [8] this spread was much more restricted in lateral extent, which seems to suggest some difference in the underlying adaptation mechanism in low vs. high frequency regions. A diotic (centered) precursor was found to expand the perceived azimuth of lateral targets, as observed previously at both low and high frequency regions.
It was hypothesized that at least some of the PEs might be predicted by a model providing a realistic (physiology-based) representation of the effective ILD cues conveyed by the AN signals at the two ears. Indeed, with appropriate model configurations, the BL-ILD model predicted the expansive effect of a diotic precursor, the compressive effect of an ipsilateral precursor and even the expansive effect of a contralateral precursor. Although not described in detail here, nonlinear cochlear amplification and its dynamic ipsi- and contralateral control via an efferent feedback mechanism [12] were critical for obtaining reasonable model predictions. Given the dynamic nature of efferent compression control, it would be interesting to explore the model’s ability to predict PEs for various temporal configuration of P and T (particularly, regarding their separation). Within the larger context of modeling the PE (see [8]), future studies should further characterize under which conditions and to what extent peripheral processing (up to the AN) contributes to observed effects.
One limitation of the BL-ILD model is that its prediction of PEs depends on the selection and weighting of individual AN-fiber types which largely differ in their I/O function characteristics. It is currently not clear if and how the central auditory system selects relevant fibers at a given stimulus level. Addressing this question by means of modeling would require data for a variety of conditions (e.g., different stimulus levels and ILD configurations).
Exp. 2 addressed the role of perceptual segregation between precursor and target. Given the indications that the PE is mediated by a relatively early mechanism (probably not beyond brainstem binaural processing stages), a rather interesting question is how auditory grouping mechanisms interpret the peripherally adapted (and thus shifted) lateralization cues in the process of object formation. By comparing conditions with and without a salient temporal-pitch cue, meant to segregate T from P, Exp. 2 tested the working hypothesis that strong perceptual segregation of precursor and target deactivates or at least mitigates the PE. In fact, the results showed no indication for a release from the PE when providing the segregation cue. This supports the view that the PE is a quite peripheral effect that cannot be modulated by higher-order auditory object-formation mechanisms.
Instead, the presence of a segregation cue even increased the PE for the ipsilateral precursor effect, while it had no significant effect for the contralateral and diotic precursors. An interesting explanation for the stronger
ipsilateral PE in the pitch-cue condition could be that in the condition without pitch cue perceptual fusion of precursor and target merges (i.e., averaged) their lateralization cues, which would work against the PE (and thus increase the PE difference between conditions with vs. without the pitch cue). This explanation would be consistent with the overall view that monaural features determine the grouping (linking) of objects and, therefore, the allocation of spatial cues to objects (e.g, [15]). Future studies might further address this idea by testing the effects of gradual manipulation of precursor-target similarity on the PE. A general implication of that explanation is that measurements of the duration of the binaural window (e.g., [16]) could be affected by target-interferer similarity.
Exp. 3 further characterized the PE for high-frequency stimuli by examining their binaural-cue specificity. Given that both envelope-ITDs and ILDs are processed at the LSO where also adaptation feedback loops have been found [8], it is not unreasonable to assume that the PE by an ILD-based precursor resembles that of an ITD-based precursor matched in perceived azimuth (as reported for low frequencies [5]). The ipsilateral PE was indeed found to be similar across conditions, i.e., translating across binaural cues (ILD being the target cue in both conditions). However, in contralateral configurations, the expansive effect of an ILD-based precursor was shown to contrast with the compressive effect of an ITD-based precursor. This suggests some cue specificity in the PE for high-frequency stimuli, which may be related to the differential contributions of peripheral mechanisms to ILD vs. envelope-ITD coding (see also [3]).
Finally, the results may have practical relevance. In everyday communication situations sound sources often compete with each other, resulting in their moment-to-moment switching at time constants approximated by the precursor and target stimuli used here. The current results suggest context-dependent shifts in source lateralization of up to 50 percent of the azimuth, particularly for highly-segregated sources such as speakers of different gender.
6. ACKNOWLEDGEMENTS
The author thanks Beth Strickland and Michael Heinz for providing the Matlab code of the model described in [12]. The author also thanks Michael Mihocic for implementing several elements of the audio-visual environment.
7. REFERENCES
[1] C. Stecker and E. Gallun: Binaural Hearing, Sound Localization, and Spatial Hearing. In Tremblay & Burkard: Translational perspectives in audit. neurosc. (pp. 387–438), Plural, San Diego, 2012.
[2] J. Flugel: On local fatigue in the auditory system, Br. J. Psych. 11, 105–134, 1921.
[3] W. Thurlow and C. Jack: Some determinants of localization-adaptation effects for successive auditory stimuli, J. Acoust. Soc. Am. 53, 1573-1577, 1973.
[4] M. Kashino and S. Nishida: Adaptation in the processing of interaural time differences revealed by the auditory localization aftereffect, J. Acoust. Soc. Am. 103, 597-604, 1998.
[5] D. Phillips and S. Hall: Psychophysical evidence for adaptation of central auditory processors for interaural differences in time and level, J. Acoust. Soc. Am. 202, 188-199, 2005.
[6] L. Hladek, B. Tomoriova, and N. Kopco: Temporal characteristics of contextual effects in sound localization, J. Acoust. Soc. Am. 142, 3288–3296, 2017.
[7] R. Dingle, S. Hall, and D. Phillips: The three-channel model of sound localization mechanisms: interaural time differences, J. Acoust. Soc. Am. 133, 417-424, 2013.
[8] A. Lingner, M. Pecka, C. Leibold, and B. Grothe: A novel concept for dynamic adjustment of auditory space, Sci. Rep. 29 (8), 8335, 2018.
[9] J. Dahmen P. Keating, F. Nodal, A. Schulz, and A. King: Adaptation to stimulus statistics in the perception and neural representation of auditory space, Neuron 24, 937-48, 2010.
[10] M. Zilany, I. Bruce, P. Nelson, and L. Carney: A phenomenological model of the synapse between the inner hair cell and auditory nerve: long-term adaptation with power-law dynamics, J. Acoust. Soc. Am. 126, 2390-2412, 2009.
[11] B. Laback, M. Dietz, and P. Yoris: Temporal effects in interaural and sequential level difference perception, J. Acoust. Soc. Am. 142, 3267–83, 2017. [12] C. Smalt, M. Heinz, and E. Strickland: Modeling the time-varying and level-dependent effects of the medial olivocochlear reflex in auditory nerve responses, J. Assoc. Re. Otol. 15, 159-173, 2013. [13] D. Phillips, M. Carmichael, and S. Hall: Interaction
in the perceptual processing of interaural time and level differences, Hear. Res. 211, 96-102, 2006. [14] D. Phillips, E. Mew, and S. Hall: Selective adaptation
in sound lateralization is not due to a repulsion effect, J. Acoust. Soc. Am. 136, EL424, 2014.
[15] V. Best, F. J. Gallun, S. C., and B. Shinn-Cunningham: Binaural interference and auditory grouping, J. Acoust. Soc. Am. 121, 1070-6, 2007. [16] A. Kolarik and J. Culling: Measurement of the
binaural temporal window using a lateralisation task, J. Acoust. Soc. Am. 121, 248, 60-68, 2009.
[17] M. Klingel, N. Kopco, and B. Laback: Reweighting of Binaural Localization Cues Induced by Lateralization Training, J. Assoc. Re. Otol. (under revision).