July 10, 2011 Off

Using Mechanical Turk to Generate Random Numbers: Episode 1 – The Jerks

in Experiments in Info Design

Wait, what?  Who would use Amazon’s crowd-sourcing marketplace to generate random numbers?  A fine question!  Two types of people – those with a rock-solid faith in the abilities of mankind, and those that get into discussions in a Google cafe about the likelihood of certain numbers (like primes or perfect squares) being over or under-represented in the resulting sample if you would do such a thing.  I’ll pay you $0.025 if you guess which one I am correctly.

Why $0.025?  Well, because that’s what I paid my helpful mechanical turkers, on average, to complete this task:

” Just send me a single random number between 1 and 100 (including numbers 1 and 100).”

If you’ve ever used Mechanical Turk, you know that it’s pretty much a breeze to set up such a task, and I was able to do so in just a couple minutes.   My first intention was to pay 1,000 people $.01 to complete the task.  So I published the request, sat back, and waited for the joyful flow of random numbers to hit my screen.

Unfortunately, I underestimated either the number of people willing to make 1 cent for 30 seconds of work or the number of people on Mechanical Turk.  After seven hours I had only received 53 responses.  After 24, I had 165.  I didn’t really feel like waiting another 8 days to get to 1,000, so I decided to add a little flavor to the experiment.

Instead of asking 1,000 people to complete the task for $.01, I would c0llect buckets of 100 replies, raising the reward by 1 cent for each bucket.  I had my original 165, and over the next week, submitted requests at $.02,$.03,$.04, and $.05.  I thought it would be interesting to see if the accuracy or response rate changed as I increased the payoff.

After about a week, the experiment was completed.  Here’s a timeline of the first 100 responses I got for each project request, starting on Saturday, June 04 and concluding sometime in the morning of the following Saturday (you’re need to click through so you can see a larger version):

Mechanical Turk Random Number Requests - Overall Project Timeline

Response Rates

You’ll notice that the various projects took different amounts of time to get to 100 replies.  I expected at least an upward trend in response rate as the reward increased, but that wasn’t the case.  Here is a time-normalized view of the same data:

Mechanical Turk Random Number Replies, Time-Normalized

From this view, it looks like the response rate decreased as the reward increased.  This was unexpected, but clearly there are a lot of outside factors that could have impacted these rates.  I didn’t start the requests at the same time of day, I have no idea what other jobs were available for Turkers at the time of publishing, and Amazon may mess with the prioritization of projects.  But, for completeness, here is a table/plot of the average time between responses for each reward:

Mean Response Rate between Mechanical Turk Responses

So, I have absolutely no idea why this would be the case.  I’m debating ways I can run this experiment that might produce better results, such as increased the reward increment, starting the projects simultaneously or at the same day and time of the week, etc.  Until then, I’ll have to claim that in this reward range, there is no obvious increase in the response rate when you offer to pay a bit more.


In the timeline charts above, you’ll notice some ugly red marks.  Yes, these are the people that somehow failed to earn their Abrahams.  Despite the seemingly simple task, about 5% of responses were incorrect.  5%!  I might expect 5% of people to incorrectly answer questions that involve some sort of mathematical operation (addition, perhaps), but to simply give me a number?   I am saddened.

After looking through the 33 replies I rejected, I have decided to group the people that submitted these replies into two groups:

  1. Those who have trouble with directions.  20 of the 33 rejected responses included somewhere between two and a thousand numbers between 1 and 100.  Not a single number, but multiple.  Maybe they thought they would get a bonus for extra effort?  Interestingly enough, not one answer came from a random number generator – most were patterns of numbers (though sadly, no Fibonacci).
  2. Jerks.  13 of the 595 people who submitted replies are jerks.  Does 12,900 look like a number between 1 and 100?  How about “23556235845101000000000000000000000000000000?” .   No, it does not.  You, sir, are a jerk.  And I will not pay you!

Here is a summary of the errors by reward, for those that are interested:

Table of Mechanical Turk Response Errors



Episode 1 Wrap-Up

Well, I learned two things.  First, I didn’t really think through the set-up of my experiment very well.  But for less than $20, I can always give it another shot and I’m open to suggestions.  Second, 2.2% of Mechanical Turkers are jerks, with a fairly tight confidence interval around that number.  More or less than the real world?  We’ll never know.

Stay tuned for Episode 2: Distributed Terror, where I’ll look at the distribution of response rates (are they Poisson-distributed?  I really hope so…) and the actual distribution of Turker responses from 1 to 100. How uniform are they?  Which numbers were most p0pular?  Which were shunned by the Turk community?  Can we somehow tie this all back to the golden ratio?  These and other questions will be answered.



Powered by Facebook Comments

Comments are closed.