Wednesday, February 15, 2017

Biostatistics: Possible Remedy in the Era of Fake News?

Well before the debate of "Fake News" started trending, researchers at Stanford University started working on a project aimed at studying how well we evaluate the information, especially from online sources. Starting in early 2015 researchers studied the behaviour of students from schools and universities like Stanford for 18 months. In the summary of the report researchers summed up their disappointment by stating "in every case and at every level, we were taken aback by students’ lack of preparation." The participant did a pretty poor job in assessing the credibility of information and sources. Though it is unfortunate it might not be entirely shocking as it confirms the pattern we are observing around us.

In the era of social media journalism, the reliability of information appears to have taken the back seat. Facebook announced it's intention to crack down on fake news. Recently Twitter has also joined the call. Maybe few other companies will follow the suit. Though commendable initiative, it doesn't seem enough for the enormous scale of the problem. There are 85+ virtual communities worldwide with atleast a million registered users each (like Facebook and Twitter). Additionally, there are few dozen instant messaging services like Whatsapp. Making all these platforms accountable seems like practically an impossible task. And even if a lot of them implement some measure of regulation, can we trust these platforms with their self-moderation policies?

Not just that, recently there were instances of mainstream media publishing news citing social media references, only to find out it was inaccurate. For an example, in February 2017 multiple leading news agencies in India published a story about a Canadian citizen of Indian origin Shawna Pandya, claiming she has been selected for one of the NASA's 2018 flights. (Some of the news articles added additional feathers in her cap like neuroscientist, opera singer etc.) Shawna had to debunk these claims, stressing that it merely a possibility at this point, in a facebook post she published on February 10th 2017. This is not an isolated incident. So it would be unwise to view the mainstream media as a highly reliable source.

In general, trusting information platforms entirely to provide factually accurate and unbiased information doesn't seem like a wise strategy. An alternative approach would be making information consumption points more resistant to the onslaught of misinformation. It would be worth exploring if we can address the problems in information consumption (like subjective bias, exposure to highly exaggerated or false information) by employing some ideas from the field of experiment design. Experiment design techniques are the framework we humans have invented in the quest for the ultimate truth. This framework takes us closer to the ultimate truth by accounting for multiple biases. Though its application is mostly limited to research, let's see if we can borrow few concepts from the framework in our daily life.

Blinding

Just imagine if news articles started excluding identities associated with remarks, opinions, speeches, policies etc. That would be really weird right? Now we might have to actually read and analyse the content before we pass any judgement. There are some fascinating resources on how people with prejudice react when you hide or interchange the identities associated with the source of information.

For example, a guy asked a bunch of questions to Hillary supporters about some made-up stuff but presented it as Hillary's stand or policy.

Interviewer: One of Hillary's primary campaign promises is to expand Sharia law program in minority communities in America. You think that is the right campaign platform to be running on? 
Girl: Yeah 
Interviewer: Sharia law expansion? 
Girl: I would say yes! I am pro. 
Interviewer: To change the way women are treated in America by implementing Sharia law? 
Girl: Absolutely! 
Interviewer: Hillary knows what's best? 
Girl: Hillary is cool!

You will find similar videos for Trump supporters as well. In fact, this phenomenon seems universal amongst followers of politicians, celebrities, entrepreneurs etc. Most of us demonstrate strong association bias, which makes us vulnerable to misinformation. It can also lead us into playing an active role in misinformation distribution network.

In my board exam, the examiner would apply a sticker on my personal information box on the first page of the answer sheet. So anyone downstream dealing with my answer sheet would only be able to see my answers, not my name, town or any other details. This would help reduce personal bias of evaluators or moderators so they can grade based on the content only. Let's say a good researcher wants to test which of the two available medicines works better for the common cold. Being a good researcher what he would do is take off the labels from both brands of pills. Then provide it to two fairly similar groups of patients and measure the outcome. This helps to reduce the personal bias of patients and doctors when they would report the results to the researchers.

Basically blinding helps us to evaluate or measure subjective aspects in an unbiased manner. Unfortunately, we can not blind ourselves selectively in the real world. So, either we have to define what we wish to evaluate as objectively as possible or let source/association not prejudice us. Even better if we could do both together.

Meta-analysis strategy

Let's assume Albert conducts a research and concludes meditating daily for 10 minutes reduces anxiety attacks by 50%. Coincidently his two friends have conducted a similar research "independently". However, the percentage reduction in anxiety attacks differs from Albert's conclusion. His first friend thinks the reduction in anxiety attacks happens by 40% and second believes it is 60%. Statistically, we can combine these results (percentage/effect, standard error, variance, confidence etc) and get closer to the ground truth. Note that the result is not necessarily the simple average of percentages (ie. 50, 40 and 60).

Assume you are a manager planning to hire an engineer in your team. A simple method you could follow is to make four of your teammates interview the candidate. Then collect the feedback from all four team members which could be something as following,

 Interviewer 1 (with 9 years of work experience) - Highly recommends hiring
 Interviewer 2 (with 7 years of work experience) - Recommends hiring
 Interviewer 3 (with 5 years of work experience) - Recommends not hiring
 Interviewer 4 (with 5 years of work experience) - Highly recommends not hiring

As a manager, you might end up hiring the candidate. However, the important detail here is you would consider all four feedbacks in final decision making.

Consider a new piece of information in the same example. The position you are hiring for is highly technical. So you decide to consider the technical experience of the interviewers.

 Interviewer 1 (0 years of Technical experience)
 Interviewer 2 (0 years of Technical experience)
 Interviewer 3 (5 years of Technical experience)
 Interviewer 4 (5 years of Technical experience)

Now you might not hire the candidate. This is a reason why gathering the information from multiple and often diverse sources is crucial.

A lot of us seem to prefer consuming news from similar sources. These could be only leaning-left, leaning-right, pro-environment, pro-industry etc. It is certainly not evil to have a political or social position but building a bubble of the information based on sources favouring your sociopolitical position is a risky business. To make the matter worse, follow/unfollow options and recommendation algorithms of social media expedite this bubble construction. We see a version of this in mainstream media. By following only Fox News we might undermine lives saved by Affordable Care Act and by following CNN alone we might not learn about a possible increase in health insurance premiums due to the same act. A voice of dissent could make our understanding less skewed. As people say in data science, outliers are interesting.

Exposing yourself to diverse information sources and conflicting ideologies could help you to arrive at a less biased conclusion.

Systematic Review approach

Arguably Systematic reviews are the highest level of evidence we have today. It could be pretty time-consuming but highly methodical way to remove bias and arrive at the conclusion. It requires going through available resources/literature associated with a specific question in a systematic way, evaluating the quality of each and then aggregating insights to reach the conclusion. Technically meta-analysis is often part of a systematic review.

Obviously, you would use this approach not very frequently but only for critical decisions by gathering, weighing and aggregating comprehensive information. Not the exact but more realistic version of this method could help us answer very important questions.

A classic application case for systematic reviews would be electoral surveys (opinion polls) conducted in election-bound states by news agencies. Often times, these surveys make contradictory claims and very rarely they are closer to the mark. Praveen Chakravarty published an article in The Hindu analysing 82 electoral surveys from 1996 to 2014 for Lok Sabha and State Assembly elections. The highly disappointing conclusion was zero (yes, zero) surveys were fully accurate. They defined “fully accurate” as being within a +/-5 percent range in terms of seats predicted with regards to the actual results for both the winner and the runner-up.

So, it would be worth trying to run systematic reviews on these surveys for a specific election. One would start by going through methods of data collection and analysis. Once you have the idea about how systematic each survey is, you could take the weighted average. That might end the curve of utterly incorrect predictions.

Another takeaway here is, when presented with a survey we should be asking questions about how the data was collected, what kind of people participated, what kind of questions were asked. These questions are equally applicable when a government claims overwhelming support of citizens for a new nationwide initiative through a nontransparent internal survey. In late 2016, the Indian Prime minister claimed 93% of the citizen support his demonetization initiative - where 86% of the currency notes were cancelled overnight without any warning to general public or banks. If you ask the same set of questions discussed earlier, you would recognise flaws in the survey methods. In addition to considering highly biased (urban and pro-government) sample, the survey asked crafty questions like "Did you mind the inconvenience faced in our fight to curb corruption, black money, terrorism and counterfeiting of currency?" After considering details, this survey seems like a state-sponsored fake news. Again, I don't intend to single out this government as every ruling force will exhibit this aspect to some degree as part of it's propaganda machinery.

Conclusion

We can not rely on information sources to be self-aware and account for various biases. Additionally, we need to work consistently on our prejudice related to information consumption. As Rumi said eight hundred years ago, "Yesterday I was clever, so I wanted to change the world. Today I am wise, so I am changing myself."

No comments:

Post a Comment