Rational Robot Ritual
Much of what we regard as 'explanation' of how artificial intelligence works is really rationalization. A recent paper by a leading academic in creative AI confirms this. But it raises some interesting – and perhaps also troubling – questions in separating the process of subjective and less-than-accurate human-like explanation of an underlying true decision-making process from the objective interpretation of that process as a means of understanding it. Given that all of us have to describe human behavior is a series of potentially faulty explanations produced either by ourselves or an external observer, we have various external social rituals of rationalization such that our explanations eventually become authoritative interpretations. These rituals and the institutions that perform them are flawed, imperfect, and often fail at the attempt of making what is confusing coherent, understandable, and logical. But somehow we are willing to mostly accept this process. Anyone interested in autonomous decision-making and tech policy/ethics should think about how machines are incorporated into the process and what it means for them to be a part of it. The key questions flon out of the recognition that explanations also serve as a means of social performance – and the intermediary stage between explanation and interpretation is a kind of dramaturgical process in which performance takes place.
Mark Riedl of Georgia Tech has a new paper out – produced under the aegis of DARPA's Explainable AI project – that should be of interest to many that read this blog.
We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had done the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of the autonomous agent into natural language. We evaluate our technique in the Frogger game environment. The natural language is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation, show the results of experiments on the accuracy of our rationalization technique, and describe future research agenda. ...
My first reaction to this was rather crude and uncharitable, and I leapt straight to judgement without realizing who the authors were (people that I had seen do good work previously) and what they were trying to do (solve a hard human-computer interaction problem). I have grown tired of many of the cliches of computer ethics and technology policy, and I have been perhaps primed to react to things like this uncharitably. That was not fair, especially because Riedl is an academic professional trying to solve a serious problem – as opposed to one of the many FUD entrepreneurs behind techno-panics. There is a lot of value in the paper and it also admits its own limitations:
In this paper we introduce a new approach to explainable AI: AI rationalization. AI rationalization is a process of producing an explanation for agent behavior as if a human had done the behavior. AI rationalization is based on the observation that humans do not generally understand how their own brains work and consequently do not give explanations that literally reveal how a decision was made. Rather it is more likely that humans invent plausible explanations on the spot. However, we accept human-generated rationalizations as providing some lay insight into the mind of the other.
AI rationalization has a number of theoretical benefits: (1) by communicating like humans, rationalizations are naturally accessible and intuitive to humans without AI or computer science training; (2) human-likeness of communication between autonomous system and human operator may afford higher degrees of trust, rapport, and willingness to use autonomous systems; (3) rationalization is fast, sacrificing absolute accuracy for real-time response, appropriate for real-time human-agent collaboration. Note that AI rationalizations, like human-produced rationalizations, are not required to be accurate, but only to give some insight into the agent’s decision-making process. Should deeper, more accurate explanations or interpretations be necessary, rationalizations may need to be supplemented by other explanation, interpretation, or visualization techniques.
In Riedl and co's experimental setup, players are recorded thinking out loud while playing and then prompted to identify post-hoc a mapping of important utterances to game actions. These annotations are then used as follows:
We then use these action-rationalization annotations generate a grammar for generating sentences. The role of this grammar is to act as a representation of the reasoning that humans go through when they generate rationalizations or explanations for their actions. Thus, a grammar allows us to associate rules about the environment with the rational- izations that are generated. One benefit of using a grammar is that it allows us to explicitly evaluate how accurate our system is at producing the right kind of rationalizations. Since the grammar contains the rules that govern when certain rationalizations are generated, it allows us to compare automatically generated rationalizations against a ground- truth that one would not normally have if the entire corpus was crowdsourced in the wild.
As Riedl and his co-authors note, the fact that a rationalization may be inaccurate is a price to be paid for real-time response in everyday language that an operator or teammate can understand. And they outperform the accuracy of rationalizations generated by alternative baseline methods. So is this helpful? I have mixed feelings. Suppose we assume a scenario in which a machine is inserted into an difficult and high-risk professional domain such as military operations, emergency response, civil aviation, or the operation of a nuclear power plant. In this situation, it's questionable whether thinking out loud as a running stream of thought is useful for either training machines to work together with humans or real world cooperation with human teammates. In any kind of team situation teammates do not want colleagues to think out loud as much as communicate the minimum necessary information for collaboration. This is why military and civilian comms operators have a truncated set of voice procedures that simplify and standardize communication. Team collaboration in general is often nested within explicit and implicit interpersonal and organizational structures that simplify joint action; this is why a group of soldiers storming a fortified building can rely on something as simple as arm and hand signals to hold together during a loud, violent, and chaotic event. For, say, a Marine infantry rifleman to trust an robot teammate in a collaborative patrol task the robot teammate doesn't need to always be to explain its behavior. Rather, the robo-Marine needs to be able to model the social-psychological habitus of the Marine rifleman as expressed through a particular language of voice commands, gestures, and particular movements associated with how Marines patrol. Additionally, to the rifleman "thinking out loud" as rationalization might simply be interpreted as ass-covering bullshit on the machine's part. The rifleman needs to have enough information from the machine to understand what it is doing in context and the rifleman may infer the rest non-verbally.
It should also be noted that in any kind of professional domain, understanding how professionals think and act involves grokking their ways of reflection-in-action and reflection-on-action – topics that the broader field of explainable AI seems to neglect. Reflection-in-action is reflecting in the "action present" on what you are doing while you are doing it. Reflection-in-action is often useful when the situation is unanticipated or unique; as Donald Schon says "[w]hen someone reflects-in-action, he becomes a researcher in the practice context...he is not dependent on the categories or established theory and technique, but constructs a new theory of the unique case." It might be more useful for an explainable AI system to be able to prioritize display of its reflection on an edge case than merely giving a point-by-point update of what it is doing. This might be useful in a scenario in which, [for example](https://arxiv.org/abs/1606.06565), an AI finds itself acting and planning in an environment very different from the one in which it was trained. If the machine needs to deliberate more than it otherwise would instead of relying on a pre-cached way of acting it should alert its teammmate that it is doing so and how it is doing so. Reflection-on-action is how people reframe their models of the world when thinking about what they did after the event, a topic I will touch on more substantially in the next section.
Riedl and co make an important distinction between 'explainable' and 'interpretable' AI.
Explanation differs from interpretability, which is a feature of an algorithm or representation that affords inspection for the purposes of understanding behavior or results. For example, decision trees are considered to be relatively interpretable and neural networks are generally considered to be uninterpretable without additional processes to visualize patterns of neuron activation (Zeiler & Fergus, 2014; Yosinski et al., 2015). Interpretability is a feature desirable for AI/ML system developers who can use it to debug and improve their systems. Explanation, on the other hand, is grounded in natural language communication and is theorized to be more useful for non-AI-experts who need to operate autonomous or semi-autonomous systems.
Explanation, however, also serves a social function. And the social function of explanation is generating eventual interpretations. Legal cases do not solely hinge on the observable effects of someone's behavior but also their beliefs, desires, and intentions. And while this can be inferred from sources other than the testimony of the particular suspect it is also important as an factor in the overall outcome – which is an governmental interpretation intended to authoritatively cast judgement on the behavior being explained. In general I think that, wrt Explainable AI, people underestimate the importance of reflection-on-action as a way in which humans understand the world and themselves. Take, for example, something that police, military, and fire personnel call the "hot wash" – an immediate after-action review in which the team collectively participate in a lessons-learned reflection on what occured. For the team to learn and grow from the experience all members of the team need to be able to participate. The hot wash helps produce a collective understanding of and meaning for the operation and each participant's role in it. The stated function of this is to help the team learn from experience. But it is also a way in which the cohesion of the team itself is preserved. Everyone's actions are interrogated and critiqued, and the result is an collective understanding of what occurred that then becomes the basis for an official formal After Action Report.
This, ideally, allows everyone to have a say and also to be held responsible before the "official history" of the event is produced. Having participated in an hot wash or two as an civilian participant in law enforcement simulation and gaming, the best way for me to describe it is as a mixture of couples' counseling and a detective investigation. It is, in some ways, about how individual members of a team understand each other and come to a new understanding through dialogue and negotiation. It is also a systematic review intended to create an authoritative understanding and explanation for an event. These two aspects of social interaction are probably inherently in tension with each other, though how much depends on the context of the particular hot wash. So we see many situations in which people's subjective explanation – which is a performance (a term that is ambiguous here and not meant to strictly refer to dishonesty as much as presenting oneself on a stage) nested inside a grander stage that includes other performers – becomes the basis for how a group of people eventually interpret the decision-making of individuals such that social outcomes are dispensed in some way. We have socially accepted rituals by which people's different performative explanation-rationalizations are themselves rationalized through interpretation and into authoritative interpretations.
Beyond the hot wash, also consider the more hypothetical problem of an complex joint task involving an human-machine team in an industrial domain that goes horribly awry. ACME Industries wants to make system operator John Doe, not Skynet 3000, the person to blame. If John is more to blame, ACME has less liability. ACME would like to minimize Skynet 3000's liability and maximize John Doe's liability. And vice versa. ACME would like to argue that the explanation-generation module in Skynet 3000 was sufficient for John to know what it was doing and why it did it. Explainable AI at most provides insight into what Skynet 3000 was doing sufficient for him to do his job while working with it. John willingly ignored this information, ACME's counsel says. John's lawyers would counter that the system – though not intended to be fully accurate as a depiction of the underlying machine decision-making process – was still too inaccurate for John to be able to understand its behavior. So what, then, does 'insight' mean here? Insight as in offering John a glimpse of how the infernal contraption works? Or insight in allowing a third party to cast judgement on whether or not John should have been able to minimally work with the machine such that a disaster could have been avoided?
The problem here is, in other words, not really one of how accurate the rationalizations of Skynet 3000's behavior are within the ACME Industries factory domain. It is also not whether or not John Doe was negligent in following Skynet 3000's stream of explanation-rationalizations. The problem is that at some point a court needs to make an interpretation of what occurred, and by necessity in a mixed-initiative environment John's ability to follow Skynet 3000's explanations erases the practical distinction between interpretability of the true decision-making mechanism and the explanation for it. Suppose that John had a human co-worker named Jane. No one really knows what the true decision-making mechanism behind Jane's behavior is. John, unless he has a strong reason to believe otherwise, has to take Jane's explanation as an interpretation of what she is doing. After all, peering into Jane's head to look at the neural networks underneath doesn't generate a lot of interpretability. Nor is it always possible to represent often tacit professional knowledge formally and objectively. On this note, not even symbolic expert systems really are interpretable, as Sebastian Benthall argues. Benthall rightly quotes a paper from Knowledge Management (KM) professionals about the failure of expert systems:
[W]e argue that this approach is flawed and some knowledge simply cannot be captured. A method is needed which recognises that knowledge resides in people: not in machines or documents. We will argue that KM is essentially about people and the earlier technology driven approaches, which failed to consider this, were bound to be limited in their success. One possible way forward is offered by Communities of Practice, which provide an environment for people to develop knowledge through interaction with others in an environment where knowledge is created nurtured and sustained. ...
Viewing knowledge as a duality can help to explain the failure of some KM initiatives. When the harder aspects are abstracted in isolation the representation is incomplete: the softer aspects of knowledge must also be taken into account. Hargadon (1998) gives the example of a server holding past projects, but developers do not look there for solutions. As they put it, ‘the important knowledge is all in people’s heads’, that is the solutions on the server only represent the harder aspects of the knowledge. For a complete picture, the softer aspects are also necessary. Similarly, the expert systems of the 1980s can be seen as failing because they concentrated solely on the harder aspects of knowledge. Ignoring the softer aspects meant the picture was incomplete and the system could not be moved from the environment in which it was developed. However, even knowledge that is ‘in people’s heads’ is not sufficient – the interactive aspect of Cook and Seely Brown’s (1999) ‘knowing’ must also be taken into account. This is one of the key aspects to the management of the softer side to knowledge.
Some kinds of professional know-how is formalizable and articulatable in natural language. A significant portion of it is not. There is always an aspect of professional knowledge that is unlikely to be ever articulated, perhaps because it is essentially a-rational and embodied in a phenomenological coupling with the agent's social environment. Agents' explanations of their behavior is essentially performative and often shaped by expectations of what an audience needs to hear, something that Riedl and co's paper mimics by producing rationalizations that sound acceptable to human audiences. Human social structures and institutions act as mediating devices that make certain kinds of explanations – which we often have no choice but to treat as things to interpret – good and others bad.
My rage about the tech policy/ethics crowd's fearmongering stems simply from a refusal by tech ethics and tech policy analysts to take seriously the following that I take as axiomatic. (1) How people and institutions choose to interpret and often evaluate behavior is one of the oldest and most divisive questions of human civilization. (2) Peering into the black box of sociotechnical artifacts is often an exercise in learning what we already 'know' about the environments they are embedded within. (3) Which is often that they are related to external things that we disagree about, which goes back to (1). Moreover, as per (4), no one knows what the true model of human decision-making is. However, we have a number of social, cultural, legal, and policy systems that exist such that we can find a means of transforming potentially conflicting or ambiguous explanations into interpretations that we are often willing to accept. Until, of course, we don't. That's when the trouble starts. When the system works, it is like a well-run theater production. It takes the various performative explanations of individual actors and knits them together into an common plot and narrative. As per Weberian rationalization), they replace idiosyncratic individual standards of how to grok action with abstract rules and calculations.
But people are not always satisfied with the show. And if they can become discontented about how human explanations become authoritative interpretations, they can also become much angrier about non-human explanations. The risk is that the mismatch between people's expectations of how certain performance-explanations should become interpretations might eventually cause discontent and unrest if we do not take either the performance-explanations or the dramaturgical process of how performance-explanations become interpretations seriously. It is good that we get this right before, say, an autonomous car accidentally crushes a little kid to death rather than after. Because that will happen at some point. Soon. A non-autonomous car killed a little girl in my old neighborhood before I left California for the East Coast. The legal system found a way of at least creating an interpretation that everyone was willing to accept regardless of their feelings about the driver's explanation of how the car killed the little girl. We are in uncharted territory in the autonomous/self-driving version of this scenario. Explainable AI today is part of the solution but it also needs greater attention to the social component of explanation, rationalization, and the linkage of both to eventual interpretations.
As Riedl and co note, perhaps the practical problem isn't really interpreting a machine's behavior (humans find it difficult to interpret their own objectively). But where I would urge caution is really in the entire business of what it means for a machine to generate a "good enough" social performance of what it is doing for human audiences. My concern about Explainable AI is that it is a method of social performance that humans will inevitably use as a standard of interpretability. This is unavoidable, because in a mixed-initiative environment the incentive for ACME-like entities is to make the problem about what John knew in action rather than what their engineers or managers knew prior to action or what can be known about Skynet 3000 post-action. Ultimately institutions that use AI are going to have differing standards of what interpretation means, what kinds of explanations ultimately produce it, and how ambiguities and problems in explanations are resolved. Some institutions will need to change their ways to ethically delegate decision-making to AI. Others might find ways to incorporate Explainable AI into the narrative-generation and outcome-generation mechanisms they already have. If Riedl and co could find a way of modeling the social and perhaps dramaturgical aspect of how people and institutions in professional domains generate and use explanations – either as a standalone thing or as a model of how it becomes interpretation – he could probably make a landmark contribution not only to computer science but also at least several other academic disciplines. And it would also be a very practical tool for policymakers that need help in dealing with the robot revolution.