Crowdsourcing, Drug Safety and Internet Searches

Apr 03, 2013
Bart Cobert

Pharmacovigilance, Drug Safety and Regulatory Affairs Author & Expert

There have been many articles and even books about the utility of crowdsourcing which one can define as using large groups of people to obtain information or services that are useful and valid.  The concept is old but the term seems to date to 2006 paralleling the rise of the internet and social media which allow easy and rapid access to large amounts of people and data.  This is a subset of the broad and revolutionary use of social media and the internet to run modern society.

As expected everyone will, at some point, jump on the bandwagon and see if there is a use for crowdsourcing and using big internet databases in his or her area of interest.  We are now starting to see the use of crowdsourcing for drug safety.  In fact, this has been going on for quite some time under different headings such as Bayesian analysis (which requires lots of data), signaling, data mining and other terms.

What has precipitated current interest is an article published in the Journal of the American Medical Informatics Association entitled: “Web-scale pharmacovigilance: listening to signals from the crowd”, by White, Tatonetti, Shah, Altman and Horvitz. The lead author is from Microsoft and the others from Columbia and Stanford Universities.  See J Am Med Inform Assoc doi:10.1136/amiajnl-2012-001482 at http://jamia.bmj.com/content/early/2013/02/05/amiajnl-2012-001482.abstract .

This is a serious paper and a relatively simple one in which the authors looked at millions of search records on Google, Bing and Yahoo (the big three search engines in the US and EU) for paroxetine and pravastatin pairings.  They looked at 82 million queries from 6 million users.  People who looked up these two drugs together were more likely to also look up hyperglycemia terms compared to people who looked up the drugs individually.  The authors also found over 30 other drug pairs associated with high blood sugar searches.

The authors concluded that: “There is a potential public health benefit in listening to such signals, and integrating them with other sources of information…We see a potentially valuable signal, even though search logs are unstructured, not necessarily related to health, and can include any words entered by users…We believe that patient search behaviour directly captures aspects of patients’ concerns …and can complement more traditional sources of data for.”

Certainly can’t disagree with this.  The authors are clearly onto something that is cheap, easy to do, fast and can possibly pick up very low volume, unlikely adverse events and interactions.  Another source of signals and more techniques for signaling.

There are a few large issues here that need to be resolved and which surely will be over time.

Biases in the signals

Signaling using unstructured internet data will clearly introduce biases.  Obviously, it is limited to people who have access to and knowledge of search engines.  There are and probably will continue to be underrepresentation from certain populations including the very young, the very old, the illiterate, those who don’t own a smartphone or a computer etc.

The authors note clustering by time.  In the authors’ example 30 percent of the searches on the combo in question were done on the same day and 60% within the same month.  This suggests an element of “going viral” whereby some topic gets hot and many people look for it and then the next hot topic takes its place.  Does this mean anything and does it bias the results?  Maybe too many queries in a short length of time invalidate the results.  We don’t know yet.

Are there other biases? Surely yes.  Does language make a difference?  Does geographic source make a difference?  Is the key statistic the number of searches on a particular topic? Or the duration  (days, weeks) of the high level of searches?  Or the number of clicks on hits in the search answer?  Do searches with specific symptoms have more meaning than general searches? We don’t know yet.

Gaming the system

Once this technique gains legitimacy and popularity, human nature will take over and folks will start gaming the system.  Someone will surely try to poison the opposition by having as many people search on a competitor’s product and asking about side effects (“Let us use crowdsourcing to see if our competitor’s drug causes cancer”).  Some will surely try to make money on it (“Let us help you clean up your drug’s reputation on the internet” just as there are companies claiming to do this for people with bad items on the internet about themselves.).  Eventually the hackers will discover this and start doing robosearches and other techniques, some of which are not yet known, to crash or debase the techniques and results.

Filtering the garbage from the gold

Clearly crowdsourcing has very low barriers to entry.  Anyone can do a search on “meproazine, snedil and cancer” on three or four or 40 search engines (I made those drug names up).  In fact it can be automated using aggregator technology.  So everyone can get his or her own signals and everyone can become a “pharmacovigilante”!  In other words, there will be a lot of junk science and junk data and it will be very hard to separate the real and important signals from the trash.  And we don’t know how to do that yet.

Wisdom of the crowd/Vox populi, vox dei

For anyone who reads history, there is an ominous element in the concept of the crowd being right. Vox populi vox dei is an old proverb that translates as: the voice of the people is the voice of God.  This was known in antiquity as a dangerous notion: “And those people should not be listened to who keep saying the voice of the people is the voice of God, since the riotousness of the crowd is always very close to madness”. Alcuin to Charlemagne in 798 CE. (The Oxford Dictionary of Quotations. Edited by Elizabeth Knowles. Oxford University Press.)

Many things in history have been justified as being done to please the people or please the crowd from lynching to witch burning to war.

If the wisdom of the crowd is always right, there would probably be a way to pick stock market winners or horse race winners using this methodology.  Of course, then the very act of measuring may influence the results.  This will get very complicated.

In other words, the crowd is not always right.

Do we need more signals?

We seem to be inundated with lots of new methodology on signal generation.  See the report of the working group of CIOMS VIII: (http://www.cioms.ch/index.php/2012-06-10-08-47-53/working-groups/working-group-viii).

We can generate as many signals now as we wish using some of the databases now available with disproportionality methods and other lovely statistical techniques.  We can filter and slice and dice and come up with the top 10 signals or the top 2% or signals with more than 3 cases of the particular MedDRA code or any other subset desired.

The question here then is whether the crowdsourcing technique is too crude and generates far too many signals that turn out to be false signals.  In other words the old question of sensitivity and specificity.

The finance folks (read: the accountants) will also get involved and will start saying things like, “you can only spend x dollars on signaling” or “you had too many false signals last year, so we’re cutting your signaling budget this year by 25%”.  Perhaps they will start publishing batting averages on signaling.  Merck hit .310 last year on signaling while Pfizer was at .295 and Pharmajunk Inc. only hit .096.  To follow the baseball analogy, Pharmajunk may be sent down to the minors till they improve their skills.  There may be a triple crown of best signal average, most drugs with a new SAE added to the label and fewest hospitalizations due to that company’s drugs.

But seriously, the issue will be which techniques are worth pursuing and which should be dropped. Which give the greatest specificity and sensitivity.

 And now the hard stuff: working up the signals.

Finally, the really hard part of public health (and this really does boil down to public health) is what to do with the many signals generated.  Two issues I see here.

The first is that the authors of this paper refer to 30 other hyperglycemia signals.  This is uncomfortable.  They suggest their technique is valuable, predictive and works more quickly than the standard techniques used today.  If this is the case, then we probably should invest some energy and resources in the other 30 signals.  At some point this may become an ethical imperative and we must look at these signals.  Again how many turn out to be real and how many never pan out?

The second is that we really don’t know how to work up signals in the most cost–effective and expeditious manner.  Yes there exist techniques: make a case series of similar cases, look at other databases, check the medical literature, look for plausible cases etc. (See CIOMS III and V) and then gather smart medical people and try to come to a conclusion.  Most signals are grey and the evidence is unclear.  FDA has a signal list published on their website and regularly updated:(http://www.fda.gov/drugs/guidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082196.htm).  But FDA freely admits that they are not sure what these signals mean:

“The appearance of a drug on this list does not mean that FDA has concluded that the drug has this listed risk. It means that FDA has identified a potential safety issue, but does not mean that FDA has identified a causal relationship between the drug and the listed risk. If after further evaluation the FDA determines that the drug is associated with the risk, it may take a variety of actions including requiring changes to the labeling of the drug, requiring development of a Risk Evaluation and Mitigation Strategy (REMS), or gathering additional data to better characterize the risk.”

So we will generate lots of signals, publish them, not know what to do with them, not know what to tell doctors and patients and make a lot of people worried.  This is not necessarily doing the public a service.  But it does cover one’s behind.

Proposal:  Let’s spend 80% of our funding for this on signal work-up methodology and 20% on signal generation.