Effect of Reverb on Speech Perception Class: Perception and Cognition Professor: Peter Zhang
Written By John Garretson
Additional Group Members: Joshua Rivkin Ryan Cranfill
Abstract The effect of reverb on the perception of speech was tested at Columbia College Chicago’s acoustic modeling lab. Test samples were provided by the instructor and then using Pro Tools AIR Non-Linear Reverb plugin reverb was added to them. A simple test program was created to play the files and record the results for each participant. The results showed that there was little difference in perception for the different levels of reverb added to testing samples, but certain words were more sensitive to any amount of reverb. The word that was found to be the most sensitive to reverb was the word “eight” which was never correctly identified.
2.2 Adding Reverb to Samples
Joshua Rivkin, Ryan Cranfill, and John Garretson tested listeners on their ability to correctly identity words when differing amounts of reverb were added to the recordings they were listening to. Testing was performed in the Modeling Lab, room LL12, of the Audio Arts and Acoustics department of Columbia College Chicago’s 33 E. Congress building on May 3, 2011 from approximately 12:30-3:00 p.m. with 11 participants in total taking the test.
Three separate amounts of reverb were added to all test samples using the AIR Non-Linear Reverb plug-in in Pro Tools. The first level had 300 milliseconds of reverb added, the second level had 400 milliseconds of reverb added and the third level had 600 milliseconds of reverb added. All three levels had a wet mix of 100% on the reverb plug-in. A screen of the reverb plug-in used can be seen here in Figure 1.
2. Methods 2.1 Recording Samples A library of test recording were provided by the course instructor, Peter Zhang, which followed a format of a male speaker saying “Ready (name) go to (color) (number) now”. A list of names from the test files is as follows: Charlie, Ringo, Laker, Hopper, Arrow, Tiger, Eagle, Baron. Colors used were blue, red, white and green, and the numbers were limited to 1-8.
Figure 1 Screen capture of the Pro Tools reverb unit used to create test samples
2.3 Test Program and Procedures Using V isual Studio 2008 a Windows Form was created that plays of series of 15 randomly selected tests samples from the bank of sample that had reverb added to them, five with 300ms five with 400ms, and five with 600ms of reverb. The participant can replay the current test sound as many times as they need in order to make their determination, and the program records the amount of plays each participant required for each test sample. After the participant plays a test sample the first time a radiobutton list appears below the play button with all possible names, colors, and numbers from the test samples. The participant then selects the name, color, and number they believe they heard and presses the Submit button below the radiobutton lists. Once the Submit button is pressed, the program compares their answers to the correct answers, records the results as a string (which includes whether they were right or wrong for the name, color, and number, how many times they listened to the sample, and what the correct answers were) and then randomly selects the next test sample to play. The process repeats until all 15 trials have occurred. Once all 15 trials have taken place the program saves the results string to disk in a results text file. A screenshot of the test can be seen in Figure 2 and the results file in Figure 3.
Figure 2 Screen shot of test program used to find reverb effect on speech showing tests sample format, Play and Submit buttons, and the radio button lists used to make selections.
Figure 3 Screen shot of the results stored as a text file. Result data recorded includes the test number for each participants trials followed by whether there answers for the Name, Color, and Number section were correct of incorrect, the number of plays each participant gave to each of the fifteen trails, and finally what the correct answers for that particular trial were.
3. Results The percentage of incorrect and correct responses for the entirety of the testing can be seen in Figure 4. This is the percentage of correct and incorrect answers out of 495 total responses, calculated by multiplying 11 participants with 15 trials each and each trial consisting of three answers given.
The percent that a specific response was incorrectly identified when tested for audibility is shown here in Figure 7.
Figure 4 Percentage of incorrect and correct answers from the 495 total responses.
The distribution of incorrect answers for each level of reverb can be seen in Figure 5 below. Figure 7 The percent that a response was incorrectly identified when it was tested
Figure 5 Percentage of incorrect answers out of the total amount of incorrect answers that each level of reverb test waves produced
The percentage that each individual answer represented in the total amount of incorrect answers is shown in Figure 6.
Figure 6 Percent of incorrect answers that each possible answer represents
The results in Figure 4 show that all three levels of reverb added to the testing waves don’t make the audibility of a majority of speech in the testing waves indistinguishable. Figure 5 shows that even though there is a difference in the amount of reverb from the three different levels, (300ms, 400ms, 600ms) the results that came from these changes in reverb length in milliseconds was negligible to the results. This could be because either the change in reverb time in milliseconds was not drastic enough between samples, or because the lowest level of reverb, 300ms, was too large of a starting reverb time. Figure 6 shows that the most misidentified words were Baron, white, eight, and Laker in that order. It makes sense that “white” and “eight” are similar in the amount they are misidentified because the similar sounds in their pronunciations. Baron was the most misidentified word in all from the list and one possible explanation for this could be the how closely “Baron” sounds to “Arrow”. While Arrow was not misidentified as many times as Baron, Arrow was closer to the top of list of responses. Therefore any person reading the list from the top down would always come across Arrow first, and this might lead to a trend of selecting Arrow without continuing further
down the list. The most interesting thing the data shows can be seen in Figure 7, with the 100% incorrect response percentage of the word eight. Every time the word eight was the correct response for the test samples it was misidentified. While the word white is very similar, and one could expect similar results, it only was misidentified about 30% of the time. This could possibly be due to the unique “w” sound at the begin of white, or possibly because the color section had only four possible choices, compared to the name and number section which both have eight possible responses.
5. Conclusion Testing showed that all three levels of reverb had approximately the same amount of effect on speech recognition. Test results also showed that certain words were more sensitive to reverb than other words, and that most of the words were still discernable with all three levels of reverb. By far the word that was the most effected by the reverb was the word “eight”, which was misidentified 100% of the time.