Varshney demonstrates novel crowdsourcing approach
Jonathan Damery, ECE ILLINOIS
- ECE Assistant Professor Lav Varshney and collaborators at Syracuse University have demonstrated an algorithmic solution to crowdsourcing tasks.
- Their technique uses error-correcting coding matrices to isolate questions and decoding algorithms to make the final inferences. The results demonstrated that this approach is much more successful than typical majority-based approaches.
- Talks have begun for the implementation of these techniques in real-world crowdsourcing scenarios, including astronomy and education research.
ECE Assistant Professor Lav R Varshney has an algorithmic solution to a problem that plagues ecologists and astronomers, educators and sociologists — the problem of effectively and accurately crowdsourcing information, even when the individual contributors lack training or expertise.
Suppose an ecologist wants to monitor a set of 100 plant species, across a large geographic region, recording when leaves emerge in the spring and drop in the fall. There is no single date for these occurrences with all species. The first leaves on an invasive honeysuckle in Illinois may appear several weeks before the native black walnut. There aren’t possibly enough hours in the day for the ecologists to do this alone. What to do?
“You can think of human attention as a scarce resource, especially expert human attention,” Varshney said. “Doing tasks like wildlife monitoring ... is very difficult because there are not so many ecologists around.”
Varshney and collaborators at Syracuse University — including his father and ECE ILLINOIS alumnus Pramod Varshney (BSEE and BS Computer Science ’72, MSEE ’74, PhD ’76) and graduate student Aditya Vempaty — have recently demonstrated an improved approach that melds human behavioral science with error-correcting coding theory to determine the best answers for crowdsourcing problems. Their work will be published in the IEEE Journal of Selected Topics in Signal Processing in August and is already available online.
For any crowdsourcing scenario, the researchers first need descriptive material like a dichotomous key: two-answer questions that progress step-by-step to the correct identity of an object. (In the case of the hypothetical plant survey, one step might be “leaf margins toothed” versus “leaf margins smooth.”)
These descriptive materials are then input into coding matrices that isolate questions — or query keys — to be sent to the workers. Unlike most dichotomous keys, this method has built-in redundancies. If a worker answers one of the questions incorrectly, it is simply one descriptive feature that is collected about the organism, rather than being a stepwise progression, where one wrong answer automatically leads the subsequent answers astray.
“You get back answers from several different crowd workers, and then you can use decoding algorithms to actually make the final inferences,” Varshney said. “The crowd workers might incorrectly answer the questions. So that’s where the redundancy in the code really comes into play.”
Their results demonstrated that this approach is much more successful than typical majority-based techniques, where the crowd workers send their final determinations (“this is a prairie rose”) and the most common answer is selected. Because the error-correcting codes have a distributed representation (they were originally designed for communication networks with distributed sensors), the observations can be leveraged more effectively.
Varshney and his collaborators also demonstrated how relationships between the workers impact their effectiveness, including the influence of common sources of information: citizen scientists, for example, whose only botanical training came from the same two-day workshop. In these cases, the shared resources link the results, making the observations statistically dependent and allowing errors to propagate more readily.
“We find that these kinds of correlations are not helpful,” Varshney said. “In fact, they degrade performance.”
He pointed out, however, that this does not suggest training programs or educational resources for volunteers are inherently counterproductive. Rather it illustrates the importance of having diverse perspectives represented in the crowd. Instead of recruiting one high-school ecology club, for instance, in which all of the participants have shared resource access, the ecologist should look for volunteers with assorted backgrounds.
The researchers also looked at correlations between the workers, either through cooperation or competition. While the common-sources of information models stemmed from research by ECE Professor Thomas S Huang, these cooperation-and-competition models drew from Computer Science Associate Professor Wai-Tat Fu.
“He and his group have recently argued that if you link people’s rewards together — what they call peer-dependent rewards — then you can actually induce them to operate in some better ways,” Varshney said. “The way we modeled that is that the reliability (or the noisiness) of the workers are actually correlated. And, interestingly, suitable pairing of workers actually leads to improved performance for us.”
The next step for Varshney is to demonstrate these algorithmic solutions in real world scenarios. First up is a discussion with Illinois researchers participating in the multi-institutional Dark Energy Survey. This project aims to collect hundreds of thousands of celestial images over a five-year span and then have volunteers, as well as professional astronomers, determine what is depicted — an effort to understand the accelerating expansion of the universe. Varshney may develop user-friendly query keys for the volunteers.
He’s also been talking with education researchers about implementing this crowdsourcing approach in MOOCs (short for massive open online classes), where there are hundreds of thousands of students enrolled in a single class. For the instructors to evaluate the students on unstructured, subjective questions, peer grading is a must.
“[Yet] by definition, your peers are unskilled and unreliable,” Varshney said. “So if you can have your peers answer simpler questions about your work, like from a scoring rubric, then hopefully you can provide insight and good feedback.”
Regardless of the specific application — astronomy, education, ecology, perhaps even speech transcription — all of these link to Varshney’s overall interest in understanding human effort and helping others maximize their personal energies.
“Using human effort in more efficient ways is generally a good thing,” Varshney said. “We definitely want to draw on people’s passions. If we can direct it in an efficient and effective way, then I think it’s pretty powerful.”