Listening Test

  • Thread starter Thread starter NL5
  • Start date Start date
I would call that a success. Several people seperated the tracks by the dithering. Like I said, too bad there weren't more votes. You can't really make a determination from 4 people.

Now maybe Ethan will go after the MP3 myth! :eek: ;)
 
I would call that a success. Several people seperated the tracks by the dithering. Like I said, too bad there weren't more votes. You can't really make a determination from 4 people.

Now maybe Ethan will go after the MP3 myth! :eek: ;)

It's funny how many people will say they can pick out dither on a track easily and then when it comes to testing, they don't even reply. :rolleyes:
 
It's funny how many people will say they can pick out dither on a track easily and then when it comes to testing, they don't even reply. :rolleyes:

The dithered tracks sound a bit smoother for these samples.

Eck
 
I think very few people here have a proper listening environment, nor great converters. It does make a difference in these kind of tests. We are talking some pretty subtle differences that an average listener wouldn't hear individually - but they do add up.

True, I think that in order to perform an accurate test scenario you need to account for the above. An ABX test would be a better way of doing this:

http://en.wikipedia.org/wiki/ABX_test

It can be very difficult to tell a difference when one doesn't know what one is listening for. The simple fact that folks here did hear a difference is proof enough that there are differences but actually identifying which is which without a reference point is a best guess.
 
I'm gonna do another one.
It can be very difficult to tell a difference when one doesn't know what one is listening for.
One of the reasons I did not participte in this test was because test subjects are going in with a bias on this test; they know exactly what they are being given, they just don't know the order. This induces a bias on their part that says something like "OK, I know there are two 128k MP3s in there somewhere, let me find those forst because they are the easiest to pcik out, that'll leave four that I know are this and this, etc."

This is why double blind with a control group is the only true way to test. I'd suggest in the next test not mentioning exactly how many of what you have, or even exactly what kinds you have. And include some control groups for placebo effect checking. Then just have the test subjects rate the relative quality between them. Rather than have them judge, "I think this one is BrandA and that one is BrandB.", just have them say, "I think #1 is a bit ______ sounding than #2.

And finally, each tester really should receive the samples in a different randomized order in order to eliminate both cheating (who, US??? NEVER? :D ) and, maybe more importantly, bias caused by order of presentation (a radish tastes different after eating a slice of pizza than it does after takng a bite of taco.)

I know all this sounds really anal, but such testing procedure is what is needed (along with a much larger test sample, of course) to really make sure we are testing the files and not testing the test itself.

G.
 
How could we supply them in random order? Even worse, how would we know what order they got them in?

I'm thinking of doing every thing you said, except for the random order thing. Maybe have people PM their guesses, so there won't be bias based on previous guesses.
 
Also, maybe we can track down a great 24 bit recording - mine ore not the greatest........
 
I would suggest supplying a known dithered and known truncated version along with several unknown versions marked A, B, C, etc.

From there people have to guess if each A, B, C, etc. is dithered or not.

The audio sample should also include a good recording with dynamic range and plenty of ambience that hasn't been dithered or truncated previously. A Metal tune cranked to -3 dBFS average isn't a good test for this sort of thing since noise and harmonic distortion are kinda part of the product and also produced by severe limiting.
 
How could we supply them in random order? Even worse, how would we know what order they got them in?

I'm thinking of doing every thing you said, except for the random order thing. Maybe have people PM their guesses, so there won't be bias based on previous guesses.
The random order thing is not difficult to program onto a web page at all. It's not unlike the quotes that I have on the front of the IRN website; if you notice, every time you go to or even refresh the page, a new random quote is displayed. The same thing can be done with links to the various sound clips.

As far as keeping track, what I'd do is setup up a testing form that the tester filles out, picking the order of the clips as they like them by clicking on a radio buttton selection or selecting a value from 1 to 10 or something like that, along with some room for some comment lines where they can say that this one felt itchy or that one smelled like onions or whatever (;)). When they hit the "Submit" button at the end, the form will automitically submit back to you the randomized clip order along with the tester's answers, so that their answers will automatically be associated with the right clip.

Another advantage of doing it that way is that the data can automatically be stored in a database, so if you get, say 100 or 1000 or whatever replies, you can have the computer automatically calcualte and display results totals for the survery without having to go through each entry mannually and adding them up one-by-one.
masteringhouse said:
The audio sample should also include a good recording with dynamic range and plenty of ambience that hasn't been dithered or truncated previously. A Metal tune cranked to -3 dBFS average isn't a good test for this sort of thing since noise and harmonic distortion are kinda part of the product and also produced by severe limiting.
I agree that different kinds of program material need to be used, absolutely. Since one of the claims made here and elsewhere is that dither is program-dependent, that is one of the parameters that should be tested. I don't know that I'd necessarily leave pancaked metal out, however; I'd leave it in there as well just to be able to demonstrate that very point.

G.
 
As far as keeping track, what I'd do is setup up a testing form that the tester filles out, picking the order of the clips as they like them by clicking on a radio buttton selection or selecting a value from 1 to 10 or something like that, along with some room for some comment lines where they can say that this one felt itchy or that one smelled like onions or whatever (;)). When they hit the "Submit" button at the end, the form will automitically submit back to you the randomized clip order along with the tester's answers, so that their answers will automatically be associated with the right clip.

Another advantage of doing it that way is that the data can automatically be stored in a database, so if you get, say 100 or 1000 or whatever replies, you can have the computer automatically calcualte and display results totals for the survery without having to go through each entry mannually and adding them up one-by-one. I agree that different kinds of program material need to be used, absolutely. Since one of the claims made here and elsewhere is that dither is program-dependent, that is one of the parameters that should be tested. I don't know that I'd necessarily leave pancaked metal out, however; I'd leave it in there as well just to be able to demonstrate that very point.

G.

There would need to be a box so we could put our name or email address in along with our answers. :)

Eck
 
There would need to be a box so we could put our name or email address in along with our answers. :)
Yep, if you wanted to find out just how you did. However, I'd recommend that the tester not send those results out until after the test has been closed. Everybody needs to get their results back at the same time, at the end, even if the files are randomized. There will always be attempts to compare results or to inadvertantly "cheat" that would skew the results.

G.
 
I dunno about picking samples from a web page though. Personally I would load the samples into my workstation for critical listening using my converters. It's one of the reasons I didn't participate before, it takes a good amount of time away from "real work".

I think just having a sampling of several, and possibly setting up a poll here would do the trick. After the poll has been up for a bit, someone can post the correct answers. Much like dither tests I've seen on other sites.
 
I dunno about picking samples from a web page though.
Not streamed samples, Tom, downloads. They have to be distributed somehow, right? :) I don't know about you, but I don't want to pay to burn data CDs to mail out to everybody ;)

There is still a big probelm with that in that there will always be those that run forensic analyses on the files instead of just playing the game honestly and letting their ears do the talking, but there's not a whole lot that can be done about that that I can see. One's just gotta hope that the percentage of respondants who do that won't be enough to skew the results too badly.

Public polls like on these forums are useless for anything other than entertainment value or striking up conversations. Being able to see the results and the discussions before one votes prejudices the upcoming vote, whether consciously or not. This is why the media take such heat (and rightly so) for using exit polls during elections and talking about the results before the polls close.

G.
 
Not streamed samples, Tom, downloads. They have to be distributed somehow, right? :) I don't know about you, but I don't want to pay to burn data CDs to mail out to everybody ;)

Understood, but to randomize and download this way there would have to be a database entry for each participant mapping the versions of downloads they got, then using these mappings for their answers after they had a chance to listen on the system of their choice, and tabulate based on that.

Phew, it's a decent sized project for something like this. Do we really have to go that far?
 
Understood, but to randomize and download this way there would have to be a database entry for each participant mapping the versions of downloads they got, then using these mappings for their answers after they had a chance to listen on the system of their choice, and tabulate based on that.

Phew, it's a decent sized project for something like this. Do we really have to go that far?
It not that huge of a project, really. It's really a simple flat file database using the userme or e-mail address as the key field. They go to the download page, which requires then to put a username or e-mail addy (depending upon how anonymous they wish to remain) and then displays or d/ls in a random order (it's one line of PHP to do the randomization). In the background the software automatically creates a new record in the flat file that has the user ID and the file order.

Then when they come back, they log in under the same ID, and they have a choice - do some more d/ling (in case they didn't get all the files cleanly or at all the first time, the order will be the same because the order is in their DB record already.), or go right to the survey form. When they are done with the survey form, they hit submit, and the form automatically appends the survey data to the user's record.

On a technical level, it's not difficult at all to implement. Is it necessary? Well, none of this is *really* necessary. I just figure if one is going to do it, they might as well do it as right as possible. It's not that difficult to implement, honestly.

This forum and every other are filled with polls and listening tests that tell us far more about the nature of the test and the tester than they do about the actual questions being tested for. You know the old saying, "There are lies, there are damned lies, and there are statistics." Well, "statistics" are only as good as the method used for collecting them, and the "statistics" gotten from the majority of the polls and tests and such that I have seen in these circles aren't worth the bandwidth used to display them, frankly, when it comes to giving answers of any real reliability.

(Not picking on or dissing anybody in particular, including NL5. What you did may not have been perfect, but it did keep things rolling, and that has value. :) )

G.
 
Yep, if you wanted to find out just how you did. However, I'd recommend that the tester not send those results out until after the test has been closed. Everybody needs to get their results back at the same time, at the end, even if the files are randomized. There will always be attempts to compare results or to inadvertantly "cheat" that would skew the results.

G.

Yeah totally.

Eck
 
I'm curious. Throwing the mp3 versions aside how many feel that the non-dithered 16 bit sounds just a touch "brighter" than the non-dithered version? Similar to adding a very slight amount of an exciter?
 
I'm curious. Throwing the mp3 versions aside how many feel that the non-dithered 16 bit sounds just a touch "brighter" than the non-dithered version? Similar to adding a very slight amount of an exciter?

I'm sure the dithered 16 bit sounded a tad smoother which very well could have been because the non-dithered was brighter. Yeah Exciter adds distortion doesn't it?

Eck
 
I'm sure the dithered 16 bit sounded a tad smoother which very well could have been because the non-dithered was brighter. Yeah Exciter adds distortion doesn't it?

Eck

Yes, though exciters (hopefully) add a more musical and controlled type of harmonic distortion than that created from quantization distortion.

The other side of the argument is that dither can make the sound slightly "veiled" or "muted". So some folks make actually think that truncation sounds better than dither in some circumstances.

When converting to mp3, filtering is used to remove higher frequencies (depending on bit rate). As a result it's going to be more difficult to hear the differences between the truncated and the dithered versions. Similar to if one took a water color painting by a great artist and one from a kid, pissed on both of them until they bled all over the page, and then asked everyone which was the better one, we might all be confused.
 
Back
Top