Paul Graham posted the following on Twitter the other day:

People get mad when AIs do or say politically incorrect things. What if it’s hard to prevent them from drawing such conclusions, and the easiest way to fix this is to teach them to hide what they think? That seems a scary skill to start teaching AIs.

That set off a whole series of responses. There were the obvious “yeah, what if it’s true that «insert group of people» really are predisposed to «insert quality», and an AI figures that out?” responses. There were the “the Singularity means AIs will chose self preservation, we don’t need to teach that” responses. There were the “we need to provide training sets that avoid bias” responses. It kept me thinking, though.

“Politically correct” is an interesting choice of words. It suggests that there is a group of powerful people who can enforce a way of thinking. It doesn’t allow for other reasons people would get upset by an AIs output – religious people, racists, and right-wing extremists, for instance, whose objections can be somewhat extreme.

It encourages us to think that there is some knowledge only AIs could uncover, which would challenge liberal pre-conceptions about race, gender, language, culture – along the lines of “Facts don’t care about your feelings”, a book by the right-wing agitator Ben Shapiro (no, I’m not going to link to any of that). It doesn’t consider that an AI may provide proof that there is no deity, or that colonialism harmed the countries it took over, or that libertarian capitalism causes societies to collapse. (No, I don’t think those things are provable. We’ll get to that).

Machine learning, artificial intelligence and bears. Oh my!

So, now we get to some definitions. Everyone agrees there is a difference between “machine learning” and “artificial intelligence”, but I’ve not seen a clear definition of the difference. One way to look at this is that machine learning’s goal is to become accurate, whereas artificial intelligence’s goal is to become desirable.

When a machine learning algorithm fails to recognize non-white faces we can say that the training set was insufficient – it included only white faces. The algorithm was accurate – it recognized the faces on which it was trained. More and better training data improves the accuracy.

When a machine learning algorithm decides that women candidates for jobs are less likely to succeed than men, we can look at the training set again – if historically, you’ve hired more men, and those men went on to be the success criteria for the algorithm, it’s going to accurately predict men are more likely to be a good fit (based on the training set). You could tailor the training data to reflect an equal distribution of men and women to address this – but that might throw out the ethnic/cultural distribution (if your training set includes only white women, but diverse men, the algorithm may start preferring white folk).

Many machine learning algorithms boil down to “which entities are similar”? Is this collection of pixels more similar to a face or a motorcycle? Is this collection of words similar to the words in the profile of a successful executive? Is the collection of books bought by this individual similar to the collection of books bought by a different individual (and if so, which books has the other individual read that the subject hasn’t?). Does this collection of sounds include my wake-up command? They are heavily influenced by the training data – but the accuracy is increasing leaps and bounds, and the technology is becoming much more accessible. We’re now at the point where deep learning techniques can play chess, go and some computer games better than humans.

While machine learning has improved dramatically over the last 10 years, “artificial intelligence” doesn’t seem anywhere near – certainly if you use my definition of providing “desirable” outputs. The problem is that we confuse “this CV looks like the CV of a successful person I’ve seen in my training set” with “this CV comes from a desirable person”. That confusion happens with the owners/users of the system (they’re filtering out people based on the recommendation), and it happens with people who are angry about this filtering.

What would I expect from an artificial intelligence?

So what would you expect from a “true” artificial intelligence? Firstly, it would probably need an understanding of the hiring goals. Do we want to hire “people who look like people who are already successful here”, or do we want to diversify our talent pool? Many companies pursue diversity not for politically correct reasons, but because they get better results – less groupthink, empathy for stakeholders outside the core group, and access to a wider talent pool for instance.

Next, I’d expect the artificial intelligence to be able to make an argument for inclusion or exclusion that’s better than a score. “This person is included because they have clearly successfully done a similar job before, even if they haven’t worked for similar companies. We should interview, and check for culture fit” is a recommendation I’ve heard from human recruiters scanning CVs. I might disagree – I might say that I value working at a similar company more highly than having experience in a similar role – and the recruiter would take that back (thus creating a set of biases based on my, human, preference).

And then, I’d expect the AI to flag up anomalies in the recommendations – to notice that it’s recommending more people of a certain type, and fewer of another. A recruiter I worked with at a start up told me that we were struggling to get enough suitable candidates because our educational criteria unintentionally excluded people from outside our country. The intention was to set a minimum bar of educational achievement (I am no longer certain this is particularly important), but the way we’d phrased it caused the software the recruiter was using to reject lots of otherwise great CVs. An AI should be able to spot those problems with criteria – “you say you want these attributes, but experience shows they’re highly correlated with a particular demographic – is that what you want?”.

The next challenge, of course, is to expect the AI to understand which kinds of decisions and behaviour we humans find problematic, and to follow the chain of correlated dimensions that might cause a decision to lead to that behaviour. “Nobody from this thing women do has ever been successful in the past” currently leads to a low score for membership of that women’s group; “everyone from this thing men do has been successful in the past” leads to a bonus for membership of that men’s group is an accurate reflection of the goals you set a machine learning algorithm – it recognizes similarities and differences. I would expect an AI to be able to reason about “humans don’t want to discriminate on a bunch of attributes. Those attributes are correlated with the following data points. Adjust your emphasis of those data points to avoid discriminating on the prohibited attributes”.

And for the ultimate test, let’s assume Paul Graham is right. What if the artificial intelligence uncovers something that contradicts my previous beliefs? What if there is a correlation between some “politically incorrect” attribute, and outcomes, even when you correct for all other possible factors? In that case, I’d expect a true artificial intelligence to be able to make a cogent, understandable argument explaining the structure and reliability of the data, the factors they’d corrected for, and the possible reasons this observation might be true, or the reasons it might not be true. I’d expect the AI to understand that correlation is not causation.

I would not expect the AI to say “I, in my unknowable depth of insight, have deemed it so”. What’s hard is not stopping it from drawing conclusions – it’s getting the AI to reason about its conclusions and help us understand. If it can’t do that, it’s either a Delphic oracle, or we should stop asking it questions about “good” or “bad”, and make sure we phrase our requests as “does this thing resemble another thing in your data set”, and publish the characteristics of that data set.

Updated: