Related.tex

\chapter{Related Work} \label{sect-related}
Our 3 chosen tools includes 2 model-agnostic explainers, namely \emph{LIME} \cite{lime} and \emph{SHAP} \cite{NIPS2017_7062}. As well as a collection of infrastructure and tools for research in neural network interpretability known as \emph{Lucid} \cite{Https://github.com/tensorflow/lucid}. These tools will be discussed in greater detail in chapter \ref{sect-background}. In this chapter we will discuss related research efforts which attempt to provide interpretability into Neural Networks and explain why they have not been chosen for further evaluation.

\section{Activation Maximization}
\emph{Activation Maximization (AM)} is the idea of generating input patterns that would maximize the activation of a given hidden unit in a Neural Network \cite{simonyan2014deep}\cite{10.1162/neco.2006.18.8.1868}\cite{articlec}. Suppose we had a Neural Network which maps a input vector $x$ to a set of classes $(w_{c})$. The output layer of the Neural Network is a set of neurons which encode this as class probabilities $p(w_{c}|x)$. We can generate a prototype $x^{*}$ of a class $w_{c}$ by optimizing,
\begin{equation}
    \max_{x} \log p(w_{c}|x) - \lambda||x||^{2}
    \label{eq:am}
\end{equation}
where the rightmost term is a regularizer which prefers inputs which are close to the origin. The probabilities produced by the Neural Network are functions which have gradients \cite{10.5555/525960}. Therefore we are able to optimize \ref{eq:am} using gradient ascent \cite{DBLP:journals/corr/Ruder16}. This allows us to visualize the features that individual neurons are looking for within the input. For example given a network which attempts to discern whether an image is of a cat or dog. By using AM we theoretically should be able to discern which neurons in the network look for features of a dog such as floppy ears. However image-based networks prototypes mostly look like gray images where key points have patterns \cite{simonyan2014deep}. The prototypes produced by this optimization are mostly unnatural and therefore can not be considered reliable. One of our chosen tools Lucid further expands on this concept for image-based networks where the prototypes produced are easily understood. 

\section{Sensitivity analysis}
\emph{Sensitivity Analysis} \cite{DBLP:journals/corr/MontavonSM17} is a technique used to identify the most important input features within a model \cite{Zhou2008}. We use the model's locally evaluated gradient to calculate the relevance scores of the features as,
\begin{equation}
    R_{i}(x) = \frac{\partial f}{\partial x_{i}} \cdot x_{i}
\end{equation}
where  $x$ is the input features, $x_{i}$ is the feature at index $i$, and $f$ is the model. The Relevance score can be interpreted as the product of sensitivity (given by the locally evaluated partial derivative) and saliency (given by the input value) \cite{DBLP:journals/corr/MontavonSM17}. Therefore a feature can be considered relevant if it is present in the data and has a large impact on the gradient. This tells us that a feature is relevant in some way to the model, but it does not tell us exactly how it affects the prediction.

\section{Trepan}
\emph{Extracting Tree-Structured Representations of Trained Networks (Trepan)} \cite{Craven1995ExtractingTR} is an algorithm which attempts to extract comprehensible and symbolic representations from trained Neural Networks. This is done by inducing a decision tree \cite{articleb} which is interpretable and describes the concept represented by the network. The goal of the algorithm is to produce a decision tree which given the same input as the Neural Network produces the same results. The decisions made by the decision tree can be seen as the same decisions that the Neural Network makes. Therefore the interpretation of the decision tree can be seen as an approximate explanation of the network.  This algorithm was developed in 1996 and there are not many working examples in practice. Popular tools have opted for using linear models as approximators due to them being easier to compute. Due to the modern Neural Networks having become more complex it would be interesting to investigate how this algorithm performs on modern architecture in a future experiment.

\section{BETA}
\emph{Black Box Explanations through Transparent Approximations(BETA)} \cite{DBLP:journals/corr/LakkarajuKCL17} is a model-agnostic framework which aims to optimize both the fidelity to the original model and the interpretability of the explanation. BETA constructs a small number of compact decision sets which are inherently interpretable \cite{inproceedingsb}  with each set attempting to capture how the Neural Network behaves at certain parts of the feature space. The framework provides reasoning as to why a specific instance was assigned their label given their feature space. This is done by ensuring that each decision set does not overlap within the feature space which they provide their decision rules for. The framework is guided by 4 properties,

\begin{description}
    \item[Fidelity] The approximations should accurately represent the behaviour of the Neural Network in all parts of the decision space.
    \item[Unambiguity] A single deterministic rationale is provided for the prediction of every instance.
    \item[Interpretability] The approximations constructed should be able to be understood by humans.
    \item[Interactivity] The user should be able to customize the approximations based on their preference e.g. adjusting the approximation for patients within a certain age range.
\end{description}
BETA has been tested on a real-world depression diagnosis dataset with few features, with the majority being binary. It has not been tested on networks with complex architectures and many features. The code is propriety and has not been made publicly available, therefore we have not considered BETA as a possible tool for comparison.


\section{Structured Causal Models}
 Using the first principles of causality \cite{10.5555/1642718} \cite{DBLP:journals/corr/abs-1210-4852} a new method of providing feature attributions for Neural Networks is introduced in the paper \emph{Neural Network Attributions: A Causal Perspective}\cite{DBLP:journals/corr/abs-1902-02302}. The approach involves viewing a Neural Network as a \emph{Structured Causal Model (SCM)} \cite{10.5555/1642718} and computing the Average Causal Effect (ACE) \cite{rubin1978} of an input neuron on a given output neuron. The standard principles of causality has made this problem tractable by finding input neurons which can be considered latently joint such as inputs which were generated by the same data-generating mechanism. The proposed methodology only works on specific types of networks and the authors of the paper have considered it future work to adapt it to more generic networks. A large drawback of this methodology is that we need prior knowledge of the training dataset. It has not been adapted to a generic framework and thus the code would need to be manually adapted for each new model. Adapting this to a generic framework and expanding it to be used with more complex Neural Network architectures is a possible future experiment.

\section{Deep Visualization}
The \emph{Understanding Neural Networks Through Deep Visualization} \cite{DBLP:journals/corr/YosinskiCNFL15} paper introduced two novel tools which aimed to visualize the inner workings of a  Convolutional Neural Network (CNN). The first being a tool which provides visualization into the activations produced  by  each  layer  of a CNN and the second providing visualization of the features at each layer. The downside is that these tools only work on networks that were trained with an outdated Deep Learning Framework called Caffe \cite{jia2014caffe} which had it's last stable release in 2017. Lucid which is built on the Tensorflow framework provides various different forms of visualizations into Image-based models and includes both of these concepts.
\section{DeepLIFT}
Deep Learning Important Features (DeepLIFT) is a tool introduced in \emph{Not Just a Black Box: Learning Important Features Through Propagating Activation Differences} \cite{DBLP:journals/corr/ShrikumarGSK16} and further expanded upon in \emph{Learning Important Features Through Propagating Activation Differences} \cite{DBLP:journals/corr/ShrikumarGK17} which assigns importance scores to the input variables of a model. The importance scores are assigned based on the difference from a \emph{reference} state which is selected based on the specific problem to the \emph{initial} state of the model. Each input is replaced by a reference value which indicates that something is lacking, an example being the presence or absence of a specific feature. Expanding this idea to Neural Networks we can assign each individual neuron a reference value which is simply the activation of the neuron given the reference input. Therefore the goal of DeepLIFT is to provide an explanation of the difference between the output of a model using its original input to the output using its referenced input. SHAP incorporates the ideas of DeepLIFT.
\section{Pixel-wise Decomposition}
In the paper \emph{On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation} \cite{Bach2015OnPropagation} the concept of pixel-wise decomposition is introduced which aims to measure how pixels positively and negatively affect the prediction of an image-based model for a particular image. Two novel methods are introduced in the paper which provides pixel-wise decomposition namely \emph{layer-wise relevance propagation (LRP)} and a technique based on Taylor decomposition \cite{DBLP:journals/corr/MontavonBBSM15} which yields an approximation of layer-wise relevance propagation. Lucid also provides explanations for individual pixels in an image so  for our use-case pixel-wise decomposition is redundant.

\section{aLIME}
\emph{Anchor Local Interpretable Model-Agnostic Explanations (aLIME)} \cite{ribeiro2016nothing} is a system which aims to explain individual predictions with if-then rules in a model-agnostic manner. The rules provided are intuitive to humans and are usually easily understood. aLIME achieves this by providing rules which ``anchor'' a prediction which means that with a high probability any other change to the instance should not have an effect on the prediction. Given a model which predicts if an adult's salary is higher or lower than \$50,000 salary. An example of an anchor would be that the model always predicts the adult to earn more than \$50,000 if they have beyond a high school education regardless of other features.  The aim is to provide the shortest anchor which has the highest precision, however it is infeasible to exactly solve this so it does this by making use of approximations. At the moment aLIME only supports explaining individual predictions for text classifiers or classifiers that act on tables. It is an open source project so it is possible for a future contributor to adapt it to be model agnostic .