jump to navigation

But For the Want of An Algorithm November 21, 2021

Posted by Peter Varhol in Algorithms, Machine Learning, Strategy.
Tags: ,
add a comment

I’ll bet you never heard that statement before.  Algorithms are the basis of machine learning and decision-making today.  Recently, for the want of an accurate and representative algorithm, Zillow is laying off a quarter of their staff and exiting its home-buying and selling business.

To be fair, it’s not all about the machine learning algorithm(s) that it used to determine what homes to buy and what to pay for them.  The algorithm was the starting point for an in-person inspection and negotiation by actual people.  But the algorithm overpriced homes across the country, resulting in Zillow losing millions of dollars and not able to sell many homes at the list price.

But it does illustrate overt dependence on algorithms to represent a particular problem domain.  And home prices are particularly susceptible to local conditions.  One algorithm does not fit all.

There are a number of ways that algorithms can be biased.  They may not fully represent the problem domain, or the domain may have changed since the algorithm was developed.  In this case, probably both of these contributed to the problem.

So there are dangers in using algorithms to represent domains and scenarios, dangers that can cost a great deal of money.  This illustrates the importance of testing; not just testing, but the testing and evaluation of algorithms.  And this seems to be a step that Zillow didn’t take.

Cybersecurity, Past and Future June 23, 2021

Posted by Peter Varhol in Algorithms, Software platforms.
Tags: , ,
1 comment so far

I just returned from helping drop off my grandnephew at Space Camp in Huntsville, Alabama, where he is taking a weeklong camp in cybersecurity.  Before dropping him off, I asked him if he knew what SQL injection and buffer overruns were.  He didn’t, but he’s only twelve, and I hope he does before returning at the end of the week.

This got me thinking about cybersecurity in general, and what seems to have become a backwater in encryption in particular.  I’m going to start with the Clipper chip, a hardware integrated circuit promoted by the US government, that provided for a secret encryption algorithm with a backdoor for the government to access encrypted communications.  This chip, announced in 1993, was found to have at least one security flaw, and because the US government did not or could not mandate its use, disappeared entirely later in the decade.

There was a particularly tense period in computing where it looked possible that the government would be able to impose Clipper on computer manufacturers (as well as phone manufacturers), which would have allowed the US government a back door into every single one of our systems.

I can’t count the number of problems, nor the extent of arrogance, with this approach.  First was the security flaw, which had nothing to do with the algorithm, which is secret, and everything to do with how it transmits the keys, which is simplistic enough to be hacked fairly easily.  Plus, while the government said it would never read anyone else’s mail or files without serious reason and a court order, no one believed them.  Despite the obvious use in helping to fight crime, it is ripe for government overreach and abuse.

At about the same time (1991), computer scientist and software engineer Phil Zimmerman introduced an algorithm called Pretty Good Privacy (PGP), which arguably provided a far superior encryption approach that Clipper.  Rather than attempt to profit from it, Zimmerman released it and the source code as open source, meaning that anyone could download, modify, and use it.  He let the cat out of the bag, so to speak.

The amusing thing (not for Zimmerman) was that at the time encryption technology was considered a munition by the US government.  Yes, that’s right; a weapon (it still is, although now at a higher level of encryption than PGP).  As a result, Zimmerman was hounded by the FBI, the Customs agency, and the NSA for making a controlled weapon available outside of the US.  Zimmerman was never arrested, but he was harassed mercilessly by the authorities, before that case was finally dropped.

Today, I’m not sure where encryption is in the general population.  The problem was that these approaches, known as private key/public key encryption, required users to go through multiple steps in order to decode their own documents, and to send email to others.  Using it in a phone is potentially easier, and that may be where it has found a home.  No one wants to go through those extra steps.  Clipper has been completely dead for over 20 years.

Our major issues with cybersecurity today involve hacking through more traditional attack techniques (SQL injection and buffer overruns are still popular), rather than trying to read files.  The truth be told, whatever is encrypted today is unlikely to be read by anyone soon.  The random algorithms simply take too long to crack.  And individuals aren’t going to go through the extra steps in order to encrypt and decrypt files.  While personal encryption may be an important technology, it is also an intellectual backwater.

Back to my grandnephew.  It is too early to tell whether cybersecurity will attract his attention span, but he could do worse.

Why Testing Needs Explainable Artificial Intelligence April 19, 2021

Posted by Peter Varhol in Algorithms, Machine Learning, Software development.
Tags: , , , , ,
add a comment

Many artificial intelligence/machine learning (AI/ML) applications produce results that are not easily understandable from their training and input data.  This is because these systems are largely black boxes that use multiple algorithms (sometimes hundreds) to process data and return a result.  Tracing how this data is processed, in mathematical algorithms, is an impossible task for a person.

Further, these algorithms were “trained” or adjusted based on the data used as the foundation of learning.  What is really happening there is that the data is adjusting algorithms to reflect what we already know about the relationship between inputs and outputs.  In other words, we are doing a very complex type of nonlinear regression, without any inherent knowledge of a casual relationship between inputs and outputs.

At worst, the outputs from AI systems can sometimes seem nonsensical, based on what is known about the problem domain.  Yet because those outputs come from software, we are inclined to trust them and apply them without question.  Maybe we shouldn’t.

But it can be more subtle than that.  The results could pose a systemic bias that made outputs seem correct, or at least plausible, but are not, or at least not ethically right.  And users rarely have recourse to question the outputs, making them a black box.

This is where explainable AI (XAI) comes in.  In cases where the relationship between inputs and outputs is complex and not especially apparent, users need the application to explain why it delivered a certain output.  It’s a matter of trusting the software to do what we think it is doing.  Ethical AI also plays into this concept.

So how does XAI work?  There is a long way to go here, but there are a couple of techniques that show some promise.  It operates off of the principles of transparency, interpretability, and explainability.  Transparency means that we need to be able to look into the algorithms to clearly discern how they are processing input data.  While that may not tell us how those algorithms are trained, it provides insight into the path to the results, and is intended for interpretation by the design and development team.

Interpretability is how the results might be presented for human understanding.  In other words, if you have an application and are getting a particular result, you should be able to see and understand how that result was achieved, based on the input data and processing algorithms.  There should be a logical pathway between data inputs and result outputs.

Explainability remains a vague concept while researchers try to define exactly how it might work.  We might want to support queries into our results, or to get detailed explanations into more specific phases of the processing.  But until there is better consensus, this feature remains a gray area.

The latter two characteristics are more important to testers and users.  How you do this depends on the application.  Facial recognition software can usually be built to describe facial characteristics and how they match up to values in an identification database.  It becomes possible to build at least interpretability into the software.

But interpretability and explainability are not as easy when the problem domain is more ambiguous.  How can we interpret an e-commerce recommendation that may or may not have anything to do with our product purchase?  I have received recommendations on Amazon that clearly bear little relationship to what I have purchased or examined, so we don’t always have a good path between source and destination.

So how do we implement and test XAI? 

Where Testing Gets Involved

Testing AI applications tends to be very different than testing traditional software.  Testers often don’t know what the right answer is supposed to be.  XAI can be very helpful in that regard, but it’s not the complete answer.

Here’s where XAI can help.  If the application is developed and trained in a way where algorithms show their steps in coming from problem to solution, then we have something that is testable.

Rule-based systems can make it easier, because the rules form a big part of the knowledge.  In neural networks, however, the algorithms rule, and they bear little relationship to the underlying intelligence.  But rule-based intelligence is much less common today, so we have to go back to the data and algorithms.

Testers often don’t have control over how AI systems work to create results.  But they can delve deeply into both data and algorithms to come up with ways to understand and test the quality of systems.  It should not be a black box to testers or to users.  How do we make it otherwise?

Years ago, I wrote a couple of neural network AI applications that simply adjusted the algorithms in response to training, without any insight on how that happened.  While this may work in cases where the connection isn’t important, knowing how our algorithms contribute to our results has become vital.

Sometimes AI applications “cheat”, using cues that do not accurately reflect the knowledge within the problem domain.  For example, it may be possible to facially recognize people, not through their characteristics, but through their surroundings.  You may have data to indicate that I live in Boston, and use the Boston Garden in the background as your cue, rather than my own face.  That may be accurate (or may not be), but it’s not facial recognition.

A tester can use an XAI application here to help tell the difference.  That’s why developers need to build in this technology.  But testers need deep insight into both the data and the algorithms.

Overall, a human in the loop remains critical.  Unless someone is looking critically at the results, then they can be wrong, and quality will suffer.

There’s no one correct answer here.  Instead, testers need to be intimately involved in the development of AI applications, and insist on explanatory architecture.  Without that, there is no way of comprehending the quality that these applications need to deliver actionable results.

The Limits of Data December 21, 2020

Posted by Peter Varhol in Algorithms, Machine Learning, Strategy.
Tags: ,
1 comment so far

I’ve been teaching statistics and operations research since, well, the mid-1980s I guess, to more or less degrees of student sophistication.  In most cases, I try to add some real world context over what most students consider to be a dry and irrelevant topic, even as I realize that most people are in the room because it’s required for their degree.

Except that over the last few years statistics and analytics has shown itself to be anything but irrelevant.  As data has become easier to collect and store, and faster processing has brought information to life from data in real time, more and more scientific, engineering, business, and management professionals are at least trying to use data to make more justifiable decisions.

(I casually follow American professional football, and have been amazed over the last few years to see disdain for any sort of analytics turn into a slavish following and detailed definition of obscure analytical results.)

And at least some people seem to be paying attention.  I still get a lot of “I’m not a math person” or “I make my decisions without considering data” but that is becoming less common as people recognize that they are expected to justify the directions they take.

In general this is a good trend.  An informed decision is demonstrably better than one based on “gut feel.”  As the saying goes, you are entitled to your own opinion, but not your own facts.  Professionals making decisions based on analytics won’t always result in the right answer, but it will be better than what many are doing today.

But data is not a universal panacea.  First, any data set we use may not accurately represent the problem domain.  There may have been data collection errors, or the data may not be highly related  with the conclusion you want to draw.  For example, there may be a correlation with intelligence and income, but the true determiner may well be education, not intelligence.  In these circumstances, our analytics can lead us to the wrong conclusion.

Our data can also be biased.  Machine learning systems do a poor job at facial recognition of other races, for example, causing high levels of misidentification.  This is primarily because we don’t have good data on facial characteristics of those races.  Years ago, Amazon came up with an algorithm to identify potential candidates for IT jobs that overwhelmingly used male data.  The algorithms quite naturally came to the incorrect conclusion that only men made good IT workers.

So while our data can make decisions more accurately, it’s only the case when we apply it correctly.  And that’s not as easy as it sounds.

Statistical Significance and Real Life December 23, 2019

Posted by Peter Varhol in Algorithms, Education, Technology and Culture.
Tags: , ,
add a comment

I have a degree in applied math, and have taught statistics for a number of years.  I like to think that I have an intuitive feel for numbers and how they are best interpreted (of course, I also like to think that I am handsome and witty).

Over the last few years there has been concern among the academic community that most people massively misinterpret what statistical significance is telling them.  Most research is done by comparing two separate groups (people, drugs, ages, treatments, and so on), one of which is not changed, while the other of which undergoes a change (most experiments are actually more complex than this, with multiple change groups representing different stimuli, different doses, or different behaviors).  The two groups are then compared through a quantitative measurement of the characteristic under test.

Because we are sampling the population, there is some uncertainty in the result.  Only if we have complete information (a census) can we make a statement with certainty, and we almost never have that.  Statistical significance means that there is a small percentage (usually one or five percent) that a certain result can be found only by chance, thus suggesting that there is a real difference between the control and experimental groups.

Statistical significance is a narrow mathematical term.  It refers to interpreting the mathematics, not applying the result to the real world.  I try to make the distinction between statistical significance and practical significance.  Practical significance is when the experimental conclusion can result in meaningful action in the problem domain.  “This drug always cures cancer”, for example, can never be true, for multiple reasons.  But we might like to make the statement that we can save twenty thousand lives a year; that might result in action in promoting a cure.

The problem is that many policy makers and the general public conflate the two.  If something is statistically significant, how can it also not be practically significant?  A large sample size can identify and amplify tiny differences that in many cases don’t matter in the grand scheme of things.

And there is such a thing as the Type I error (there is also a Type II error, which I’ll write about later).  The Type I error says that we falsely reject the hypothesis that there is no difference between the groups.  And what are the odds of that?  Pretty good, actually.  Chances are that you got those results through random chance, not because there is a real difference.

Many studies analyzed by statistics use multiple statistical tests, sometimes numbered in the hundreds.  If you do a hundred statistical tests, and you find five that give you statistically significant results at the 95 percent level, what do you conclude?  Many researchers breathe a sigh of relief and exclaim “Publish!”  Because in many cases their jobs are dependent on publishable results.

While we can use statistics and mathematics in general to help us understand complex problems, we have to mentally separate the narrow mathematical interpretations from the broader solution and policy ones.  But most researchers, either through ignorance or because it behooves their careers to publish, do so.  And the lay public and policy makers will bow to the cult of statistical significance, making things worse rather than better.

When AI is the Author December 9, 2019

Posted by Peter Varhol in Algorithms, Machine Learning.
Tags: , ,
add a comment

I have been a professional writer (among many other things) since 1988.  I’ve certainly written over a couple thousand articles and blog posts, and in my free time have authored two fiction thrillers, with several more on the way.  I have found over the years that I write fast, clearly, and when called for, imaginatively.

Now there are machine learning systems that can do all of that.

I knew that more recent ML systems had been able to automatically write news articles for publication on news sites.  At the beginning of this year, OpenAI, a research foundation committed to ethical uses of AI, announced that they had produced a text generator so good that they weren’t going to release the trained model, fearful that it would be used to spread fake news.  They instead released a much smaller model for researchers to experiment with, as well as a technical paper.

Today, though, they are using the same GPT-2 model to create creative works such as fiction and poetry.  I started wondering if there was nothing machine learning could not do, or at least mimic.

But there’s a catch.  The ML systems have to be given a starting point.  That starting point seems to be the beginning of the story, which has to be provided to these systems.  Once they have a starting point, they can come up with follow-on sentences that can be both creative and factual.

But they can’t do so without that starting point.  They can provide the middle, and perhaps the end, although I suspect that the end would have to be tailored toward a particular circumstance.

Decades ago, I read a short story titled “In Medias Res”.  That refers to a writing technique where the actual writing begins in the middle of the story, and fills in the backstory through several possible techniques, such as flashback.  In this case, however, it was about a commercial writer who could only write the middle of stories.  Other writers, who could see their beginning and end, but not the middle, hired him to write the middle of their stories.  He was troubled that he always came in the middle, and was incapable of writing a complete story.

While I’m guessing that ML techniques will eventually be good enough to compose much of our news and fiction, the advances of today are only capable of in medias res.  So I will continue writing.

The Path to Autonomous Automobiles Will Be Longer Than We Think July 14, 2019

Posted by Peter Varhol in Algorithms, Machine Learning, Software platforms, Technology and Culture.
Tags: , ,
add a comment

I continue to be amused by people who believe that fully autonomous automobiles are right around the corner.  “They’re already in use in many cities!” they exclaim (no, they’re not).  In a post earlier this year, I’ve listed four reasons why we will be unlikely to see fully autonomous vehicles in my lifetime; at the very top of the list is mapping technology, maps, and geospatial information.

That makes the story of hundreds of cars trying to get to Denver International Airport being misdirected by Google Maps all that much more amusing.  Due to an accident on the main highway to DIA, Google Maps suggested an alternative, which eventually became a muddy mess that trapped over 100 cars in the middle of the prairie.

Of course, Google disavowed any responsibility, claiming that it makes no promises with regard to road conditions, and that users should check road conditions ahead of time.  Except that it did say that this dirt road would take about 20 minutes less than the main road.  Go figure.  While not a promise, it does sound like a factual statement on, well, road conditions.  And, to be fair, they did check road conditions ahead of time – with Google!

While this is funny (at least reading about it), it points starkly to the limitations of digital maps for use with car navigation.  Autonomous cars require maps with exacting detail, within feet or even inches.  Yet if Google as one of the best examples of mapping cannot get an entire route right, then there is no hope for fully autonomous cars to use these same maps sans driver.

But, I hear you say, how often does this happen?  It happens often.  I’ve often taken a Lyft to a particular street address in Arlington, Massachusetts, a close-in suburb of Boston.  The Lyft (and, I would guess, Uber) maps have it as a through street, but in ground truth it is bisected by the Minuteman Bikeway and blocked to vehicular traffic.  Yet every single Lyft tries to take me down one end of that street in vain.  Autonomous cars need much better navigation than this, especially in and around major cities.

And Google can’t have it both ways, supplying us with traffic conditions yet disavowing any responsibility in doing so.  Of course, that approach is part and parcel to any major tech company, so we shouldn’t be surprised.  But we should be very wary in the geospatial information they provide.

Deep Fakes and A Brave New World June 30, 2019

Posted by Peter Varhol in Algorithms, Machine Learning, Technology and Culture.
Tags: ,
1 comment so far

I certainly wasn’t the only one who did a double-take when I read about DeepNude, an AI application that could take a photograph of a woman and remove her clothing, creating a remarkably good facsimile of that woman without clothes.  At the drop of a hat, we are now in a world where someone can take a woman’s photo on the street, run it through facial recognition software to determine her name, age, address, and occupation, then use DeepNude to create realistic naked images that can be posted on the Internet, all without even meeting her.

The creator of this application (apparently from Estonia), took it down after a day, and in a subsequent interview said that he suddenly realized the ability of such a program to do great harm (duh!).  But from his description of its development, it didn’t seem that complex to replicate (I could probably do it, except for obtaining 10,000 nude photos to train it.  Or one nude photo, for that matter).

One of my favorite thriller writers, James Rollins, recently wrote a novel titled Crucible, which personalizes the race toward creating Artificial General Intelligences, or AGI.  These types of AIs have the ability to learn new skills outside of their original problem domain, much like a human would.  His fictional characters point out that AGIs will eventually train one another (I’m not sure about that assertion), so it was critically important that the first AGIs were “good”, as opposed to “evil”.

The good versus evil aspects invite much more debate, so I’ll leave them to a later post, but I can’t imagine that such an application has any socially redeeming value.  Still, once one person has done it, others will surely copy.

To be clear, DeepNude is an Artificial Specialized Intelligence, not an AGI, and its problem domain is relatively straightforward.  It is not inherently evil by common definition, and is not thinking in any sense of the word.  But when DeepNude appeared the other day, the world changed irrevocably.