Who Is the Data For? March 1, 2017Posted by Peter Varhol in Publishing, Technology and Culture.
Tags: big data, data
add a comment
Andreas Weigend recently published an intriguing book called Data For the People, in which he argues that we are not going to stop the proliferation of personal data that is used to categorize and market to us, so we should embrace this change and find ways to use collected data to our advantage.
He cites many of the data points that I do in my blog posts, but comes to different conclusions. In particular, my own thoughts are to limit my use of personal data on a case-by-case basis. His own conclusion is that we need to accept the proliferation of personal data as inevitable, and embrace it in a way that makes it valuable to us.
He makes a lot of sense, from an alternative point of view from mine, and I won’t dismiss it out of hand.
However, I would like to contrast that with another article, one that points out that when we choose our friends through shared data, we lose our ability to connect with our physical neighbors.
So, here is what I think. I think Andreas is correct, strategically. But I am simply not sure how we get from where we are to where he wants to be. I don’t think it will be clean and neat. And it certainly won’t be convenient, especially for those of us who are at least part way through our lives.
I Am 95 Percent Confident June 9, 2013Posted by Peter Varhol in Education, Technology and Culture.
Tags: big data, statistics
add a comment
I spent the first six years of my higher education studying psychology, along with a smattering of biology and chemistry. While most people don’t think of psychology as a disciplined science, I found an affinity with the scientific method, and with the analysis and interpretation of research data. I was good enough at it so that I went from there to get a masters degree in applied math.
I didn’t practice statistics much after that, but I’ve always maintained an excellent understanding of just how to interpret statistical techniques and their results. And we get it wrong all the time. For example:
- Correlation does not mean causation, even when variables are intuitively related. There may be cause and effect, or it could be in reverse (the dependent variable actually causes the corresponding value of the independent variable, rather than visa versa). Or both variables may be caused by another, unknown and untested variable. Or the result may simply have occurred through random chance. Either way, a correlation doesn’t tell me anything about whether or not two (or more) variables are related in a real world sense.
- Related to that, the coefficient of determination (R-squared) does not “explain” anything in a human sense. There is no explanation in our thought patterns. Most statistics books will say that the square of the correlation coefficient explains that amount of variation in the relationship between the variables. We interpret “explains” in a causative sense. Wrong. It’s simply that the movement between two variables is a mathematical relationship with that amount of variation. When I describe this, I prefer using the term “accounts for”.
- Last, if I’m 95 percent confident there is a statistically significant difference between two results (a common cutoff for concluding that the difference is a “real” one), our minds tend to interpret that conclusion as “I’m really pretty sure about this.” Wrong again. It means that if I conducted the study 100 times, I would draw the same conclusion 95 times. And that means five times I will draw the opposite conclusion.
- Okay, one more, related to that last one. Statistically significant does not mean significant in a practical sense. I may conduct a drug study that indicates that a particular drug under development significantly improves our ability to recover from a certain type of cancer. Sounds impressive, doesn’t it? But the sample size and definition of recovery could be such that that the drug may only really save a couple of lives a year. Does it make sense to spend billions to continue development of the drug, especially if it might have undesirable side effects? Maybe not.
I could go on. Scientific experiments in the natural and social sciences are valuable, and they often incrementally advance the field in which they are conducted, even if they are set up, conducted, or interpreted incorrectly. That’s a good thing.
But even when scientists get the explanation of the results right, it is often presented to us incorrectly, or our minds draw an incorrect conclusion. A part of that is that a looser interpretation is often more newsworthy. Another part is that our minds often want to relate new information to our own circumstances. And we often don’t understand statistics well enough to draw informed conclusions.
Let us remember that Mark Twain described three types of mendacity – lies, damned lies, and statistics. Make no mistake, that last one is the most insidious. And we fall for it all the time.
Really Big Data and the Pursuit of Privacy June 7, 2013Posted by Peter Varhol in Technology and Culture.
Tags: big data, NSA, privacy
add a comment
There’s been so much excitement these days about the commercial potential of Big Data that we’ve forgotten that the Federal government is in the best position to obtain and analyze many terabytes of data. We were reminded of that in a big way following revelations that the National Security Agency (NSA) was obtaining under secret court order information about all phone calls made by Verizon customers. I am not a Verizon customer, but I have no doubt that the same court orders exist for other carriers.
(Interesting side note: Many years ago, after I earned my MS in Math, I had a job offer to join the NSA as a civilian cryptologist. Perhaps now I wish I had taken it.)
With virtually unlimited fast computing power, the NSA can identify patterns that provide a basis for follow-up law enforcement activities.
Here’s a simple example of how it works. A computer program identifies twenty or so different phone numbers in the New York City area that have called the same number in, oh, the Kingdom of Jordan about two hundred times in the last two months. The number in Jordan is a suspected front (through other sources) for some sort of terrorist activity. This connection might provide law enforcement reason to look more closely into the activities of those making these calls. That’s not inherently a bad thing.
Of course, there are ways that terrorists and criminals can combat this, such as the use of prepaid and disposable cell phones bought with cash, calling cards, and even random pay phones. At best, analyzing call records represents one tool among many in the pursuit of wrongdoing, and not really a “Big Brother is Watching” scenario.
From a privacy standpoint, I’m mostly sanguine about the NSA collecting and analyzing calling data. I’m not engaged in terrorist or criminal activities, and my phone calls are just a few data points among the billions out there. I’m not directly threatened, or even inconvenienced.
But . . . there may be a slippery slope here. The definition of suspicious calling activity may gradually expand to include things that aren’t illegal, but perhaps just unethical or embarrassing. Once you have the data and the computing power, you can start looking for other things. Call it scope creep, an all-too-common affliction of many projects.
And in a larger sense, many of our freedoms are actually constructed on the premise that the Federal government cannot connect the dots between the myriad of records held by the many Federal agencies on each of us. Call it privacy by disorganization, but it has worked at least throughout my lifetime to protect my liberties. But thanks to the advancements made in Big Data over the last several years, we may be seeing the end of that type of protection.
Security and privacy represent direct tradeoffs. Unlike many Americans, I would prefer to be a little less secure and a little more private. But the majority does rule, and I do believe that the majority has little issue with the current state of affairs.
Can Our Shopping Cards Save Our Lives? March 17, 2013Posted by Peter Varhol in Software platforms, Technology and Culture.
Tags: big data
add a comment
I’m a bit of a throwback when it comes to certain applications of technology. In addition to not using Facebook, I don’t have supermarket rewards cards, or even use a credit or debit card at the supermarket. My reasoning for the latter is simple – I would prefer not to have the supermarket chain know what I’m eating. I realize that I may be giving up coupons or other special deals by not identifying myself, but I’m willing to accept that tradeoff. It’s not a big deal either way, but it’s how I prefer to make that particular life decision.
But now there seems to be better reasons to use your supermarket reward card – according to this NBCNews.com article, it may save your life. Really.
The story goes something like this. When there is a known food contamination, health officials can see who bought that particular food, and approach those people individually, rather than send out vague alerts that not everyone sees or hears.
Count me as dubious. This is really a sort of pie-in-the-sky application of Big Data that people can dream up when they picture the potential of the data itself. It would take weeks to reach all of the buyers of a particular contaminated product, even if you could match all of the different systems and databases together somehow. By then, the scare would have run its course.
The reality is that such data is stored in hundreds or thousands of different systems, without any means of pulling them together, let alone using it to query on a specific product across millions of purchases.
And then, of course, there are people like me, who still insist on dealing in cash, and remaining somewhat anonymous. Although they could take my photo in the supermarket, and rather quickly match it up to my other identified photos on the Internet, where I am well known as a speaker and writer.
The idea is intriguing, but it falls into the same tradeoff as many other applications of technology in society today. We can do things to make ourselves safer, but at the cost of providing more information. Some don’t seem to have a problem with the latter, but I, in my doddering middle age, do.