jump to navigation

Statistical Significance and Real Life December 23, 2019

Posted by Peter Varhol in Algorithms, Education, Technology and Culture.
Tags: , ,
add a comment

I have a degree in applied math, and have taught statistics for a number of years.  I like to think that I have an intuitive feel for numbers and how they are best interpreted (of course, I also like to think that I am handsome and witty).

Over the last few years there has been concern among the academic community that most people massively misinterpret what statistical significance is telling them.  Most research is done by comparing two separate groups (people, drugs, ages, treatments, and so on), one of which is not changed, while the other of which undergoes a change (most experiments are actually more complex than this, with multiple change groups representing different stimuli, different doses, or different behaviors).  The two groups are then compared through a quantitative measurement of the characteristic under test.

Because we are sampling the population, there is some uncertainty in the result.  Only if we have complete information (a census) can we make a statement with certainty, and we almost never have that.  Statistical significance means that there is a small percentage (usually one or five percent) that a certain result can be found only by chance, thus suggesting that there is a real difference between the control and experimental groups.

Statistical significance is a narrow mathematical term.  It refers to interpreting the mathematics, not applying the result to the real world.  I try to make the distinction between statistical significance and practical significance.  Practical significance is when the experimental conclusion can result in meaningful action in the problem domain.  “This drug always cures cancer”, for example, can never be true, for multiple reasons.  But we might like to make the statement that we can save twenty thousand lives a year; that might result in action in promoting a cure.

The problem is that many policy makers and the general public conflate the two.  If something is statistically significant, how can it also not be practically significant?  A large sample size can identify and amplify tiny differences that in many cases don’t matter in the grand scheme of things.

And there is such a thing as the Type I error (there is also a Type II error, which I’ll write about later).  The Type I error says that we falsely reject the hypothesis that there is no difference between the groups.  And what are the odds of that?  Pretty good, actually.  Chances are that you got those results through random chance, not because there is a real difference.

Many studies analyzed by statistics use multiple statistical tests, sometimes numbered in the hundreds.  If you do a hundred statistical tests, and you find five that give you statistically significant results at the 95 percent level, what do you conclude?  Many researchers breathe a sigh of relief and exclaim “Publish!”  Because in many cases their jobs are dependent on publishable results.

While we can use statistics and mathematics in general to help us understand complex problems, we have to mentally separate the narrow mathematical interpretations from the broader solution and policy ones.  But most researchers, either through ignorance or because it behooves their careers to publish, do so.  And the lay public and policy makers will bow to the cult of statistical significance, making things worse rather than better.

When AI is the Author December 9, 2019

Posted by Peter Varhol in Algorithms, Machine Learning.
Tags: , ,
add a comment

I have been a professional writer (among many other things) since 1988.  I’ve certainly written over a couple thousand articles and blog posts, and in my free time have authored two fiction thrillers, with several more on the way.  I have found over the years that I write fast, clearly, and when called for, imaginatively.

Now there are machine learning systems that can do all of that.

I knew that more recent ML systems had been able to automatically write news articles for publication on news sites.  At the beginning of this year, OpenAI, a research foundation committed to ethical uses of AI, announced that they had produced a text generator so good that they weren’t going to release the trained model, fearful that it would be used to spread fake news.  They instead released a much smaller model for researchers to experiment with, as well as a technical paper.

Today, though, they are using the same GPT-2 model to create creative works such as fiction and poetry.  I started wondering if there was nothing machine learning could not do, or at least mimic.

But there’s a catch.  The ML systems have to be given a starting point.  That starting point seems to be the beginning of the story, which has to be provided to these systems.  Once they have a starting point, they can come up with follow-on sentences that can be both creative and factual.

But they can’t do so without that starting point.  They can provide the middle, and perhaps the end, although I suspect that the end would have to be tailored toward a particular circumstance.

Decades ago, I read a short story titled “In Medias Res”.  That refers to a writing technique where the actual writing begins in the middle of the story, and fills in the backstory through several possible techniques, such as flashback.  In this case, however, it was about a commercial writer who could only write the middle of stories.  Other writers, who could see their beginning and end, but not the middle, hired him to write the middle of their stories.  He was troubled that he always came in the middle, and was incapable of writing a complete story.

While I’m guessing that ML techniques will eventually be good enough to compose much of our news and fiction, the advances of today are only capable of in medias res.  So I will continue writing.

The Path to Autonomous Automobiles Will Be Longer Than We Think July 14, 2019

Posted by Peter Varhol in Algorithms, Machine Learning, Software platforms, Technology and Culture.
Tags: , ,
add a comment

I continue to be amused by people who believe that fully autonomous automobiles are right around the corner.  “They’re already in use in many cities!” they exclaim (no, they’re not).  In a post earlier this year, I’ve listed four reasons why we will be unlikely to see fully autonomous vehicles in my lifetime; at the very top of the list is mapping technology, maps, and geospatial information.

That makes the story of hundreds of cars trying to get to Denver International Airport being misdirected by Google Maps all that much more amusing.  Due to an accident on the main highway to DIA, Google Maps suggested an alternative, which eventually became a muddy mess that trapped over 100 cars in the middle of the prairie.

Of course, Google disavowed any responsibility, claiming that it makes no promises with regard to road conditions, and that users should check road conditions ahead of time.  Except that it did say that this dirt road would take about 20 minutes less than the main road.  Go figure.  While not a promise, it does sound like a factual statement on, well, road conditions.  And, to be fair, they did check road conditions ahead of time – with Google!

While this is funny (at least reading about it), it points starkly to the limitations of digital maps for use with car navigation.  Autonomous cars require maps with exacting detail, within feet or even inches.  Yet if Google as one of the best examples of mapping cannot get an entire route right, then there is no hope for fully autonomous cars to use these same maps sans driver.

But, I hear you say, how often does this happen?  It happens often.  I’ve often taken a Lyft to a particular street address in Arlington, Massachusetts, a close-in suburb of Boston.  The Lyft (and, I would guess, Uber) maps have it as a through street, but in ground truth it is bisected by the Minuteman Bikeway and blocked to vehicular traffic.  Yet every single Lyft tries to take me down one end of that street in vain.  Autonomous cars need much better navigation than this, especially in and around major cities.

And Google can’t have it both ways, supplying us with traffic conditions yet disavowing any responsibility in doing so.  Of course, that approach is part and parcel to any major tech company, so we shouldn’t be surprised.  But we should be very wary in the geospatial information they provide.

Deep Fakes and A Brave New World June 30, 2019

Posted by Peter Varhol in Algorithms, Machine Learning, Technology and Culture.
Tags: ,
1 comment so far

I certainly wasn’t the only one who did a double-take when I read about DeepNude, an AI application that could take a photograph of a woman and remove her clothing, creating a remarkably good facsimile of that woman without clothes.  At the drop of a hat, we are now in a world where someone can take a woman’s photo on the street, run it through facial recognition software to determine her name, age, address, and occupation, then use DeepNude to create realistic naked images that can be posted on the Internet, all without even meeting her.

The creator of this application (apparently from Estonia), took it down after a day, and in a subsequent interview said that he suddenly realized the ability of such a program to do great harm (duh!).  But from his description of its development, it didn’t seem that complex to replicate (I could probably do it, except for obtaining 10,000 nude photos to train it.  Or one nude photo, for that matter).

One of my favorite thriller writers, James Rollins, recently wrote a novel titled Crucible, which personalizes the race toward creating Artificial General Intelligences, or AGI.  These types of AIs have the ability to learn new skills outside of their original problem domain, much like a human would.  His fictional characters point out that AGIs will eventually train one another (I’m not sure about that assertion), so it was critically important that the first AGIs were “good”, as opposed to “evil”.

The good versus evil aspects invite much more debate, so I’ll leave them to a later post, but I can’t imagine that such an application has any socially redeeming value.  Still, once one person has done it, others will surely copy.

To be clear, DeepNude is an Artificial Specialized Intelligence, not an AGI, and its problem domain is relatively straightforward.  It is not inherently evil by common definition, and is not thinking in any sense of the word.  But when DeepNude appeared the other day, the world changed irrevocably.

The Balance Between Promotion and Privacy June 16, 2019

Posted by Peter Varhol in Algorithms, Uncategorized.
Tags: , , ,
add a comment

I have a (very) minor, and I hope positive, reputation in technology.  I’ve authored many articles and spoken at dozens of tech conferences over the past decade or so.  I am occasionally called upon as a subject matter expert to advise investors, present webinars, and author opinions about various aspects of software and their accompanying systems.

At the same time, I am deeply concerned for my privacy.  Other than a seminal moment in my personal heath, several years ago, and one or two nondenominational political statements (we all must take a stand in some fashion), I comment on technology issues.  I would like to think I do so with thought and sensitivity, and I like to think that my ideas have been on the leading edge on a number of occasions.

I do a modest job of promoting myself, through my blog (this one), Twitter (https://twitter.com/pvarhol), and LinkedIn (never Facebook), because I hope it helps my career (such as it is) in some fashion.

But at the same time I am concerned that public or Internet exposure could invite violations of privacy.  You may think that I have given up any call to privacy once I participated in social media, and you may well be correct, but I think about every foray I make on the Internet and how it may affect my privacy.

I am not so stupid as to believe that I can keep much about me to myself.  Once others have access to some information, they can likely get other stuff.  With too much transparency, you are opening yourself up to data theft, financial fraud, and reputational damage.

And this is the fundamental reason I will never sign up for Facebook.  With Facebook, you are the product, and never forget that.  Despite years of promises, Facebook has sold, given to so-called partners, or simply had stolen data from tens of millions of users.  Yet we seem to be okay with that.  I talk to many people who say “I only use Facebook to keep track of old friends”, but the fact of the matter is they often do much more.  I know that despite my non-participation, I have exposure to Facebook, due to updates and photos from friends.  When asked, I beg them not to use my image or name, but they rarely comply.

I suspect that I’m not the only one who is trying to work through the compromise of visibility and privacy.  I don’t know what the answer is, but I do know that an important part is to not have anything to do with Facebook.  As for the rest, I can only advise to weigh the risks versus the benefits carefully.

We are quickly headed toward a society where little if any information about ourselves will be owned and controlled by us.  Many of us try to practice “security through obscurity”, or trying to hide in the weeds of everyone else, but in an era of Big Data analytics, it will be a piece of cake to pinpoint and take advantage of us.  I try to remediate where I can, but I’m not prepared for this world.

You’re Magnetic Tape April 4, 2019

Posted by Peter Varhol in Algorithms, Machine Learning, Technology and Culture.
Tags: , , , ,
add a comment

That line, from the Moody Blues ‘In the Beginning’ album (yes, album, from the early 1970s), makes us out to be less than the sum of our parts, rather than more.  So logically, writer and professional provocateur Felix Salmon asks if we can prove who we say we are.

Today in an era of high security, that question is more relevant than ever.  I have a current passport, a Real ID driver’s license, a Global Entry ID card, and even my original Social Security card, issued circa 1973 (not at birth, like they are today; I had to drive to obtain it).  Our devices include biometrics like fingerprints and facial recognition, and retina scans aren’t too far behind.

On the other hand, I have an acquaintance (well, at least one) that I’ve never met.  I was messaging her the other evening when I noted, “If you are really in Barcelona, it’s 2AM (thank you, Francisco Franco), and you really should be asleep.”  She responded, “Well, I can’t prove that I’m not a bot.”

Her response raises a host of issues.  First, identity is on the cusp of becoming a big business.  If I know for certain who you are, then I can validate you for all sorts of transactions, and charge a small fee for the validation.  If you look at companies like LogMeIn, that may their end game.

Second, as our connections become increasingly worldwide, do we really know if we are communicating with an actual human being?  With AI bots becoming increasingly sophisticated, they may be able to pass the Turing test.

Last, what will have higher value, our government-issued ID, or a private vendor ID?  I recently opined that I prefer the government, because they are far more disorganized than most private companies, but someone responded “Government can give you an ID one day, and arbitrarily take it away the next.”  I prefer government siloes and disorganization, because of security by obscurity, but is that really the best option any more?

So, what is our ID?  And how can we positively prove we are who we say we are?  More to the point, how can we prove that we exist?  Those questions are starting to intrude on our lives, and may become central to our existence before we realize it.

Should We Let Computers Control Aircraft? March 23, 2019

Posted by Peter Varhol in Algorithms, Software platforms.
1 comment so far

Up until the early 1990s, pilots controlled airliners directly, using hydraulic systems.  A hydraulic system contains a heavy fluid (hydraulic oil) in tubes whose pressure is used to physically push control surfaces in the desired direction.  In other words, the pilots directly manipulated the aircraft control surfaces.

There is some comfort in direct control, in that we are certain that our commands translate directly to control surface motion.  There have only been a few instances where aircraft have complete lost hydraulics.  The best-known one is United Flight 232, in 1989, where an exploding engine on the DC-10 punctured lines in all three hydraulic systems.  The airliner crash landed in Sioux City, Iowa, with the loss of about a third of the passengers and crew, yet was considered to be a successful operation.

A second was a DHL A300 cargo plane hit by a missile after takeoff from Baghdad Airport in 2003.  It managed to return to the airport without loss of life (there was only a crew of three on board), although it ended up off the runway.

In 1984, Airbus launched the A320, the first fly-by-wire airliner.  This craft used wires between the flight controls used by the pilot and the control surfaces, with computers sitting in the middle.  The computers accept a control request from the pilot, interpret it in light of all other flight data available, and decide if and how to carry out the request (note the term “request”).  There were a few incidents with early A320s, but it was generally successful.

Today, all airliners are fly-by-wire.  Cockpit controls request changes in control surfaces, and the computer decides if it is safe to carry them out.  The computers also make continuous adjustments to the control surfaces, enabling smooth flight without pilot intervention.  In practice, pilots (captain or first officer) only fly manually for perhaps a few minutes of every flight.  Even when they fly manually, they are using the fly-by-wire system, albeit with less computer intervention.  Oh, and if the computer determines that a request cannot be executed safely, it won’t.

Fly-by-wire is inarguably safer than direct-fly hydraulic systems in controlling an aircraft.  Pilots make mistakes, and a few of those mistakes can have serious consequences.  But fewer mistakes can be made if the computer is in charge.  Another but:  Anyone who says that no mistakes can be made by the computer is on drugs.

Fly-by-wire systems are controlled by complex software, and software has an inherent problem – it isn’t and can’t be perfect.  And while aircraft software is developed under strict safety protocols, that doesn’t prevent bugs.  In the 737 MAX MCAS software, Boeing seems to have forgotten that, and made the system difficult to override.  And it didn’t document the changes to pilot manuals.  And that, apparently, is why we are here.  I am not even clear that the MCAS software is buggy; instead, it seems like it performed as designed, but the design was crappy.

The real solution is that yes, the computer has to fly the airplane under most circumstances.  The aircrew in that case are flight managers, not pilots in the traditional sense.  But if there is an unusual situation (bad storm, computer or sensor failure, structural failure, or more), the pilots must be trained to take over and fly the plane safely.  That is where both airliner manufacturers and airlines are falling down right now.

Aircrews are forgetting, or not learning, how to fly planes.  And not learning situational awareness, when they are able to comprehend when something is going wrong, and need to intervene.  It’s not their fault; aircraft and flying has changed enormously in the last two decades, and there is a generation of younger pilots who may not be able to recognize a deteriorating situation, or what to do about it.

Here’s Looking At You June 18, 2018

Posted by Peter Varhol in Algorithms, Machine Learning, Software tools, Technology and Culture.
Tags: , , ,
add a comment

I studied a rudimentary form of image recognition when I was a grad student.  While I could (sometimes) identify simple images based on obviously distinguishing characteristics, the limitations of rule-based systems, the computing power of Lisp Machines and early Macs, facial recognition was well beyond the capabilities of the day.

Today, facial recognition has benefitted greatly from better algorithms and faster processing, and is available commercially by several different companies.  There is some question as to the reliability, but at this point it’s probably better than any manual approach to comparing photos.  And that seems to be a problem for some.

Recently the ACLU and nearly 70 groups sent a letter to Amazon CEO Jeff Bezos, alongside the one from 20 shareholder groups, arguing Amazon should not provide surveillance systems such as facial recognition technology to the government.  Amazon has a facial recognition system called Rekognition (why would you use a spelling that is more reminiscent of evil times in our history?)

Once again, despite the Hitleresque product name, I don’t get the outrage.  We give the likes of Facebook our life history in detail, in pictures and video, and let them sell it on the open market, but the police can’t automate the search of photos?  That makes no sense.  Facebook continues to get our explicit approval for the crass but grossly profitable commercialization of our most intimate details, while our government cannot use commercial and legal software tools?

Make no mistake; I am troubled by our surveillance state, probably more than most people, but we cannot deny tools to our government that the Bad Guys can buy and use legally.  We may not like the result, but we seem happy to go along like sheep when it’s Facebook as the shepherd.

I tried for the life of me to curse our government for its intrusion in our lives, but we don’t seem to mind it when it’s Facebook, so I just can’t get excited about the whole thing.  I cannot imagine Zuckerberg running for President.  Why should he give up the most powerful position in the world to face the checks and balances of our government?

I am far more concerned about individuals using commercial facial recognition technology to identify and harass total strangers.  Imagine an attractive young lady (I am a heterosexual male, but it’s also applicable to other combinations) walking down the street.  I take her photo with my phone, and within seconds have her name, address, and life history (quite possibly from her Facebook account).  Were I that type of person (I hope I’m not), I could use that information to make her life difficult.  While I don’t think I would, there are people who would think nothing of doing so.

So my take is that if you don’t want the government to use commercial facial recognition software, demonstrate your honesty and integrity by getting the heck off of Facebook first.

Update:  Apple will automatically share your location when you call 911.  I think I’m okay with this, too.  When you call 911 for an emergency, presumably you want to be found.