← Back to papers

Paper deep dive

Compositional Explanations of Neurons

Jesse Mu, Jacob Andreas

Year: 2020Venue: NeurIPS 2020Area: Mechanistic Interp.Type: EmpiricalEmbeddings: 70

Models: BiLSTM, ResNet-18

Intelligence

Status: succeeded | Model: google/gemini-3.1-flash-lite-preview | Prompt: intel-v1 | Confidence: 92%

Last extracted: 3/12/2026, 8:14:12 PM

Summary

The paper introduces a procedure for generating compositional explanations of neurons in deep neural networks by identifying logical combinations of primitive concepts. This approach improves upon atomic label-based interpretability methods, revealing that neurons in vision models often learn meaningful perceptual abstractions, while neurons in natural language inference (NLI) models frequently encode shallow, performance-degrading heuristics. The authors demonstrate that these compositional explanations can predict model accuracy and facilitate the creation of 'copy-paste' adversarial examples.

Entities (5)

Broden · dataset · 95%Compositional Explanations of Neurons · methodology · 95%ResNet-18 · model-architecture · 95%SNLI · dataset · 95%Polysemantic Neurons · concept · 90%

Relation Signals (3)

NLI Neurons encode Shallow Heuristics

confidence 95% · in natural language inference (NLI), neurons learn shallow lexical heuristics from dataset biases.

Compositional Explanations of Neurons appliedto ResNet-18

confidence 90% · We take the final 512-unit convolutional layer of a ResNet-18 trained on the Places365 dataset

Compositional Explanations correlatedwith Model Performance

confidence 85% · vision neurons that detect human-interpretable concepts are positively correlated with task performance

Cypher Suggestions (2)

Find all models analyzed in the paper · confidence 90% · unvalidated

MATCH (e:Entity {entity_type: 'Model Architecture'}) RETURN e.name

Identify relations between neurons and concepts · confidence 85% · unvalidated

MATCH (n:Neuron)-[r:DETECTS]->(c:Concept) RETURN n.name, r.type, c.name

Abstract

Abstract:We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts that closely approximate neuron behavior. Compared to prior work that uses atomic labels as explanations, analyzing neurons compositionally allows us to more precisely and expressively characterize their behavior. We use this procedure to answer several questions on interpretability in models for vision and natural language processing. First, we examine the kinds of abstractions learned by neurons. In image classification, we find that many neurons learn highly abstract but semantically coherent visual concepts, while other polysemantic neurons detect multiple unrelated features; in natural language inference (NLI), neurons learn shallow lexical heuristics from dataset biases. Second, we see whether compositional explanations give us insight into model performance: vision neurons that detect human-interpretable concepts are positively correlated with task performance, while NLI neurons that fire for shallow heuristics are negatively correlated with task performance. Finally, we show how compositional explanations provide an accessible way for end users to produce simple "copy-paste" adversarial examples that change model behavior in predictable ways.

Tags

ai-safety (imported, 100%)empirical (suggested, 88%)mechanistic-interp (suggested, 92%)

Links

Your browser cannot display the PDF inline. Open PDF directly →

Full Text

70,104 characters extracted from source content.

Expand or collapse full text

Compositional Explanations of Neurons Jesse Mu Stanford University muj@stanford.edu Jacob Andreas MIT CSAIL jda@mit.edu Abstract We describe a procedure for explaining neurons in deep representations by iden- tifyingcompositionallogical concepts that closely approximate neuron behavior. Compared to prior work that uses atomic labels as explanations, analyzing neurons compositionally allows us to more precisely and expressively characterize their behavior. We use this procedure to answer several questions on interpretability in models for vision and natural language processing. First, we examine the kinds of abstractions learned by neurons. In image classification, we find that many neurons learn highly abstract but semantically coherent visual concepts, while otherpolyse- manticneurons detect multiple unrelated features; in natural language inference (NLI), neurons learn shallow lexical heuristics from dataset biases. Second, we see whether compositional explanations give us insight into model performance: vision neurons that detect human-interpretable concepts are positively correlated with task performance, while NLI neurons that fire for shallow heuristics are negatively cor- related with task performance. Finally, we show how compositional explanations provide an accessible way for end users to produce simple “copy-paste” adversarial examples that change model behavior in predictable ways. 1 Introduction In this paper, we describe a procedure for automatically explaining logical and perceptual abstractions encoded by individual neurons in deep networks. Prior work in neural network interpretability has found that neurons in models trained for a variety of tasks learn human-interpretable concepts, e.g. faces or parts-of-speech, often without explicit supervision [5,10,11,27]. Yet many existing interpretability methods are limited to ad-hoc explanations based on manual inspection of model visualizations or inputs [10,26,27,35,38,39]. To instead automate explanation generation, recent work [5,11] has proposed to use labeled “probing datasets” to explain neurons by identifying concepts (e.g.dogorverb) closely aligned with neuron behavior. However, the atomic concepts available in probing datasets may be overly simplistic explanations of neurons. A neuron might robustly respond to images of dogs without being exclusively specialized for dog detection; indeed, some have noted the presence ofpolysemanticneurons in vision models that detect multiple concepts [12,27]. The extent to which these neurons have learned meaningful perceptual abstractions (versus detecting unrelated concepts) remains an open question. More generally, neurons may be more accurately characterized not just as simple detectors, but rather as operationalizing complex decision rules composed of multiple concepts (e.g.dog faces, cat bodies, and car windows). Existing tools are unable to surface such compositional concepts automatically. We propose to generate explanations by searching for logical forms defined by a set of composition operators over primitive concepts (Figure 1). Compared to previous work [5], these explanations serve as better approximations of neuron behavior, and identify behaviors that help us answer a variety of interpretability questions across vision and natural language processing (NLP) models. First, what kind of logical concepts are learned by deep models in vision and NLP? Second, do the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. arXiv:2006.14032v2 [cs.LG] 2 Feb 2021 (e) logical forms L(x) (d) concepts C(x) blueIoU.006 waterIoU.14 riverIoU.08 NOTblueIoU.004 waterORriverIoU.15 (waterORriver) ANDNOTblueIoU.16 Intersection Neuron+ Concept (f) IoU (b) neuron f 483 (x) (a) inputs x (c) neuron masks M 483 (x) Figure 1: Given a set of inputs (a) and scalar neuron activations (b) converted into binary masks (c), we generate an explanation via beam search, starting with an inventory of primitive concepts (d), then incrementally building up more complex logical forms (e). We attempt to maximize the IoU score of an explanation (f); depicted is the IoU ofM 483 (x)and(water OR river) AND NOT blue. quality and interpretability of these learned concepts relate to model performance? Third, can we use the logical concepts encoded by neurons to control model behavior in predictable ways? We find that: 1.Neurons learn compositional concepts: inimage classification, we identify neurons that learn meaningful perceptual abstractions (e.g.tall structures) and others that fire for unrelated concepts. In natural language inference (NLI), we show that shallow heuristics (based on e.g. gender and lexical overlap) are not only learned, but reified in individual neurons. 2.Compositional explanations help predict model accuracy, but interpretability is not always associated with accurate classification: inimage classification, human-interpretable ab- stractions arecorrelatedwith model performance, but inNLI, neurons that reflect shallower heuristics areanticorrelatedwith performance. 3.Compositional explanations allow users to predictably manipulate model behavior: we can generate crude “copy-paste” adversarial examples based on inserting words and image patches to target individual neurons, in contrast to black-box approaches [1, 36, 37]. 2 Generating compositional explanations Consider a neural network modelfthat maps inputsxto vector representationsr∈R d .fmight be a prefix of a convolutional network trained for image classification or a sentence embedding model trained for a language processing task. Now consider an individual neuronf n (x)∈Rand its activation on a set of concrete inputs (e.g. ResNet-18 [15] layer 4 unit 483; Figure 1a–b). How might we explain this neuron’s behavior in human-understandable terms? The intuition underlying our approach is shared with the NetDissect procedure of Bau et al.[5]; here we describe a generalized version. The core of this intuition is that a good explanation is a description(e.g. a named category or property) that identifies the same inputs for whichf n activates. Formally, assume we have a space of pre-defined atomicconceptsC∈Cwhere each concept is a functionC:x7→ 0,1indicating whetherxis an instance ofC. For image pixels, concepts are image segmentation masks; for thewaterconcept,C(x)is 1 whenxis an image region containing water (Figure 1d). Given some measureδof the similarity between neuron activations and concepts, NetDissect explains the neuronf n by searching for the conceptCthat is most similar: EXPLAIN-NETDISSECT(n) = arg max C∈C δ(n,C).(1) Whileδcan be arbitrary, Bau et al.[5]firstthresholdthe continuous neuron activationsf n (x)into binary masksM n (x)∈0,1(Figure 1c). This can be donea priori(e.g. for post-ReLU activations, thresholding above 0), or by dynamically thresholding above a neuron-specific percentile. We can then compare binary neuron masks and concepts with the Intersection over Union score (IoU, or Jaccard similarity; Figure 1f): δ(n,C),IoU(n,C) = [ ∑ x 1(M n (x)∧C(x)) ] / [ ∑ x 1(M n (x)∨C(x)) ] .(2) 2 Compositional search.The procedure described in Equation 1 can only produce explanations from the fixed, pre-defined concept inventoryC. Our main contribution is to combinatorially expand the set of possible explanations to includelogical formsL(C)defined inductively overCvia composition operations such as disjunction (OR), conjunction (AND), and negation (NOT), e.g.(L 1 ANDL 2 )(x) = L 1 (x)∧L 2 (x)(Figure 1e). Formally, ifΩ η is the set ofη-ary composition functions, defineL(C): 1. Every primitive concept is a logical form:∀C∈C, we haveC∈L(C). 2.Any composition of logical forms is a logical form:∀η, ω∈Ω η ,(L 1 ,...,L η )∈L(C) η , whereL(C) η is the set ofη-tuples of logical forms inL(C), we haveω(L 1 ,...,L η )∈L(C). Now we search for the best logical formL∈L(C): EXPLAIN-COMP(n) = arg max L∈L(C) IoU(n,L).(3) Thearg maxin Equation 3 ranges over a structured space of compositional expressions, and has the form of an inductive program synthesis problem [23]. Since we cannot exhaustively searchL(C), in practice we limit ourselves to formulas of maximum lengthN, by iteratively constructing formulas from primitives via beam search with beam sizeB= 10. At each step of beam search, we take the formulas already present in our beam, compose them with new primitives, measure IoU of these new formulas, and keep the topBnew formulas by IoU, as shown in Figure 1e. 3 Tasks The procedure we have described above is model- and task-agnostic. We apply it to two tasks in vision and NLP: first, we investigate a scene recognition task explored by the original NetDissect work [5], which allows us to examine compositionality in a task where neuron behavior is known to be reasonably well-characterized by atomic labels. Second, we examinenatural language inference (NLI): an example of a seemingly challenging NLP task which has recently come under scrutiny due to models’ reliance on shallow heuristics and dataset biases [13,14,22,25,30,37]. We aim to see whether compositional explanations can uncover such undesirable behaviors in NLI models. street (scene) pink (color) flower (object) headboard (part) Figure 2: Example concepts from the Broden dataset [5], reproduced with per- mission. Image Classification.NetDissect [5] examines whether a convolutional neural network trained on a scene recog- nition task has learned detectors that correspond to mean- ingful abstractions of objects. We take the final 512- unit convolutional layer of a ResNet-18 [15] trained on the Places365 dataset [40], probing for concepts in the ADE20k scenes dataset [41] with atomic conceptsCde- fined by annotations in the Broden dataset [5]. There are 1105 unique concepts in ADE20k, categorized by Scene, Object, Part, and Color (see Figure 2 for examples). Broden has pixel-level annotations, so for each input imageX∈R H×W , inputs are indexed by pixels(i,j):x i,j ∈X. Letf n (x i,j )be the activation of thenth neuron at position(i,j)of the image X, after the neuron’s activation map has been bilinearly upsampled from layer dimensionsH l ×W l to the segmentation mask dimensionsH×W. Following [5], we create neuron masksM n (x)via dynamic thresholding: letT n be the threshold such thatP(f n (x)> T n ) = 0.005over all inputs x∈X. ThenM n (x) =1(f n (x)> T n ). For composition, we use operationsAND(∧), OR(∨), and NOT(¬), leaving more complex operations (e.g. relations likeABOVEandBELOW) for future work. NLI.Given premise and hypothesis sentences, the task of NLI is to determine whether the premise entailsthe hypothesis,contradictsit, or neither (neutral). We investigate a BiLSTM baseline architecture proposed by [7]. A bidirectional RNN encodes both the premise and hypothesis to form 512-d representations. Both representations, and their elementwise product and difference, are then concatenated to form a 2048-d representation that is fed through a multilayer perceptron (MLP) with two 1024-d layers with ReLU nonlinearities and a final softmax layer. This model is trained on the Stanford Natural Language Inference (SNLI) corpus [6] which consists of 570K sentence pairs. Neuron-level explanations of NLP models have traditionally analyzed how RNN hidden states detect word-level features as the model passes over the input sequence [4,10], but in most NLI models, these 3 0.00 0.25 0.50 0.75 1.00 12351020 Max formula length IoU NLI 0.05 0.10 0.15 0.20 0.25 1234102030 Max formula length IoU Vision Figure 3: Distribution of IoU versus max formula length. The line indicates mean IoU.N= 1is equivalent to NetDissect [5]; IoU scores steadily increase as max formula length increases. Unit 106 bullringOR pitch OR volleyball court OR batters box OR baseball stadium OR baseball field OR tennis court OR badminton court AND (NOT football field) AND (NOT railing) IoU0.05 →0.12 →0.17 Figure 4: NetDissect [5] assigns unit 106 the la- belbullring, but in reality it is detects general sports fields, except football fields, as revealed by thelength 3andlength 10explanations. RNN features are learned early and are often quite distant from the final sentence representation used for prediction. Instead, we analyze the MLP component, probing the 1024 neurons of the penultimate hidden layer for sentence-level explanations, so our inputsxare premise-hypothesis pairs. We use the SNLI validation set as our probing dataset (10K examples). As our features, we take the Penn Treebank part of speech tags (labeled by SpaCy 1 ) and the 2000 most common words appearing in the dataset. For each of these we create 2 concepts that indicate whether the word or part-of-speech appears in the premise or hypothesis. Additionally, to detect whether models are using lexical overlap heuristics [25], we define 4 concepts indicating that the premise and hypothesis have more than 0%, 25%, 50%, or 75% overlap, as measured by IoU between the unique words. For our composition operators, we keepAND, OR, andNOT; in addition, to capture the idea that neurons might fire for groups of words with similar meanings, we introduce the unary NEIGHBORS operator. Given a word featureC, let theneighborhoodN(C)be the set of 5 closest wordsC ′ toC, as measured by their cosine distance in GloVe embedding space [28]. Then,NEIGHBORS(C)(x) = ∨ C ′ ∈N(C) C ′ (x)(i.e. the logicalORacross all neighbors). Finally, since these are post-ReLU activations, instead of dynamically thresholding we simply define our neuron masksM n (x) = 1(f n (x)>0). There are many “dead” neurons in the model, and some neurons fire more often than others; we limit our analysis to neurons that activate reliably across the dataset, defined as being active at least 500 times (5%) across the 10K examples probed. 4 Do neurons learn compositional concepts? Image Classification.Figure 3 (left) plots the distribution of IoU scores for the best concepts found for each neuron as we increase the maximum formula lengthN. WhenN= 1, we getEXPLAIN- NETDISSECT, with a mean IoU of 0.059; asNincreases, IoU increases up to 0.099 atN= 10, a statistically significant 68% increase (p= 2×10 −9 ). We see diminishing returns after length 10, so we conduct the rest of our analysis with length 10 logical forms. The increased explanation quality suggests that our compositional explanations indeed detect behavior beyond simple atomic labels: Figure 4 shows an example of abullringdetector which is actually revealed to detect fields in general. We can now answer our first question from the introduction: are neurons learning meaningful abstractions, or firing for unrelated concepts? Both happen: we manually inspected a random sample of 128 neurons in the network and their length 10 explanations, and found that69%learned some meaningful combination of concepts, while31%werepolysemantic, firing for at least some unrelated concepts. The 88 “meaningful” neurons fell into 3 categories (examples in Figure 5; more in Appendix C; Appendix A.1 reports concept uniqueness and granularity across formula lengths): 1.50 (57%) learn a perceptualabstractionthat is also lexically coherent, in that the primitive words in the explanation are semantically related (e.g. totowersorbathrooms; Figure 5a). 2.28 (32%) learn a perceptualabstractionthat isnotlexically coherent, as the primitives are not obviously semantically related. For example,cradle OR autobus OR fire escape is a vertical rails detector, but we have no annotations of vertical rails in Broden (Figure 5b). 3.10 (12%) have the formL 1 AND NOTL 2 , which we callspecialization. They detect more specific variants of Broden concepts (e.g.(water OR river) AND NOT blue; Figure 5c). 1 https://spacy.io/ 4 Unit 192 skyscraper OR lighthouse OR water tower IoU0.06 Unit 310 sink OR bathtub OR toilet IoU0.16 Unit 483 (water OR river) AND NOT blue IoU0.13 Unit 432 attic AND (NOT floor) AND (NOT bed) IoU0.15 Unit 102 cradle OR autobus OR fire escape IoU0.12 Unit 321 ball pit OR orchard OR bounce game IoU0.12 Unit 439 bakery OR bank vault OR shopfront IoU0.08 Unit 314 operating room OR castle OR bathroom IoU0.05 (a) abstraction (lexical and perceptual) (b) abstraction (perceptual only) (c) specialization (d) polysemanticity Figure 5: Image classification explanations categorized bysemantically coherentabstraction (a–b) and specialization (c), andunrelatedpolysemanticity (d). For clarity, logical forms are lengthN= 3. Unit 870 (gender-sensitive) ((((NOT hyp:man) AND pre:man) OR hyp:eating) AND (NOT pre:woman)) OR hyp:dancing IoU0.123w entail -0.046w neutral -0.021w contra 0.040 Pre A guy pointing at a giant blackberry. HypA woman tearing down a giant display. Act 29.31True contraPredcontra PreA man in a hat is working with...flowers. HypWomen are working with flowers. Act 27.64True contraPredcontra Unit 15 (sitting only in hypothesis) hyp:eatingOR hyp:sittingOR hyp:sleeping OR hyp:sitsAND (NOT pre:sits) IoU0.239w entail -0.083w neutral -0.059w contra 0.086 Pre A person...is walking through an airport. HypA woman sits in the lobby waiting on the doctor. Act 30.68True contraPredcontra Pre A man jumps over another man... HypTwo men are sittingdown, watching the game. Act 27.64True contraPredcontra Unit 99 (high overlap) ((NOT hyp:J) AND overlap-75% AND (NOT pre:people)) OR pre:basketOR pre:tv IoU0.118w entail 0.043w neutral -0.029w contra -0.021 PreA woman in a light blue jacket is riding a bike. HypA women in a jacket riding a bike. Act 19.13True entailPredentail Pre A girl in a pumpkin dress sitting at a table. HypThere is a girl in a pumpkin dress sitting at a table. Act 17.84True entailPredentail Unit 473 (unclear) ((NOT hyp:sleeping) AND (pre:NNOR pre:NNS)) AND (NOT hyp:alone) AND (NOT hyp:nobody) IoU0.586w entail 0.020w neutral 0.016w contra -0.050 Pre A gentleman in a striped shirt gesturing with a stick... HypA gentleman in a striped shirt joyously gesturing. Act 31.62True neutralPredneutral Pre An Asian man in a...uniform diving...in a game. HypA person in a uniform does something. Act 29.76True neutralPredentail Figure 6: NLI length 5 explanations. For each neuron, we show the explanation (e.g.pre:xindicates xappears in the premise), IoU, class weightsw entail,neutral,contra , and activations for 2 examples. The observation that IoU scores do not increase substantially past length 10 corroborates the finding of [12], who also note that few neurons detect more than 10 unique concepts in a model. Our procedure, however, allows us to more precisely characterize whether these neurons detect abstractions or unrelated disjunctions of concepts, and identify more interesting cases of behavior (e.g.specialization). While composition of Broden annotations explains a majority of the abstractions learned, there is still considerable unexplained behavior. The remaining behavior could be due to noisy activations, neuron misclassifications, or detection of concepts absent from Broden. 5 NLI.NLI IoU scores reveal a similar trend (Figure 3, right): as we increase the maximum formula length, we account for more behavior, though scores continue increasing past length 30. However, short explanations are already useful: Figure 6, Figure 9 (explained later), and Appendix D show example length 5 explanations, and Appendix A.2 reports on the uniqueness of these concepts across formula lengths. Many neurons correspond to simple decision rules based mostly on lexical features: for example, several neurons aregender sensitive(Unit 870), and activate forcontradictionwhen the premise, but not the hypothesis, contains the wordman. Others fire for verbs that are often associated with a specific label, such assitting,eating, orsleeping. Many of these words have highpointwise mutual information(PMI) with the class prediction; as noted by [14], the top two highest words by PMI withcontradictionaresleeping(15) andnobody(39, Figure 9). Still others (99) fire when there is high lexical overlap between premise and hypothesis, another heuristic in the literature [25]. Finally, there are neurons that are not well explained by this feature set (473). In general, we have found that many of the simple heuristics [14,25] that make NLI models brittle to out-of-distribution data [13,22,37] are actually reified as individual features in deep representations. 5 Do interpretable neurons contribute to model accuracy? 0.76 0.80 0.84 0.88 0.20.40.6 IoU SNLI acc NLI 0.4 0.6 0.8 0.10.2 IoU Accuracy when firing Vision −0.65 −0.60 −0.55 −0.50 0102030 Max formula length Correlation 0.225 0.250 0.275 0.300 0102030 Max formula length Correlation Figure 7: Top: neuron IoU versus model accu- racy over inputs where the neuron is active for vision (length 10) and NLI (length 3). Bottom: Pearson correlation between these quantities versus max formula length. A natural question to ask is whether it is empirically desirable to have more (or less) interpretable neurons, with respect to the kinds of concepts identified above. To answer this, we measure the performance of the entire model on the task of interest when the neuron is activated. In other words, for neuronn, what is the model accuracy on predictions for inputs where M n (x) = 1? Inimage classification, we find that the more interpretable the neuron (by IoU), the more accurate the model is when the neuron is active (Fig- ure 7, left;r= 0.31,p <1e−13); the correlation increases as the formula length increases and we are better able to explain neuron behavior. Given that we are measuring abstractions over the human-annotated features deemed relevant for scene classification, this suggests, perhaps unsurprisingly, that neurons that detect more interpretable concepts are more accurate. However, when we apply the same analysis to the NLImodel, theoppositetrend occurs: neurons that we are better able to explain arelessaccurate (Figure 7, right;r=−0.60,p <1e−08). Unlike vision, most sentence-level logical descriptions recoverable by our approach are spurious by definition, as they are too simple compared to the true reasoning required for NLI. If a neuron can be accurately summarized by simple deterministic rules, this suggests the neuron is making decisions based on spurious correlations, which is reflected by the lower performance. Analogously, the morerestricted our feature set (by maximum formula length), the better we capture this anticorrelation. One important takeaway is that the “interpretability” of these explanations is nota prioricorrelated with performance, but rather dependent on the concepts we are searching for: given the right concept space, our method can identify behaviors that may be correlatedoranticorrelated with task performance. 6 Can we target explanations to change model behavior? Finally, we see whether compositional explanations allow us to manipulate model behavior. In both models, we have probed the final hidden representation before a final softmax layer produces the class predictions. Thus, we can measure a neuron’s contribution to a specific class with the weight between the neuron and the class, and see whether constructing examples that activate (or inhibit) these neurons leads to corresponding changes in predictions. We call these “copy-paste” adversarial examples to differentiate them from standard adversarial examples involving imperceptible perturbations [36]. Image Classification. Figure 8 shows some Places365 classes along with the neurons that most contribute to the class as measured by the connection weight. In many cases, these connections are 6 swimming hole 324 483 304 326 (water OR river) AND (NOT blue) forest-broad OR waterfall OR forest-needle 0.38 creek OR waterfall OR desert-sand 0.27 0.29 swimming hole swimming hole swimming hole swimming hole grotto grotto grotto hot spring ResNet18 AlexNet ResNet50 DenseNet161 clean room 93 413 473 209 pool table OR machine OR bank vault martial arts gym OR ice OR fountain 0.34 batters box OR martial arts gym OR clean room 0.32 0.34 corridor corridor corridor corridor clean room alcove igloo corridor ResNet18 AlexNet ResNet50 DenseNet161 fire escape 143 199 30 104 fire escape OR bridge OR staircase house OR porch OR townhouse 0.57 cradle OR autobus OR fire escape 0.26 0.30 street street street street fire escape street cradle fire escape ResNet18 AlexNet ResNet50 DenseNet161 viaduct 347 26 378 308 aqueduct OR viaduct OR cloister-indoor bridge OR viaduct OR aqueduct 0.48 washer OR laundromat OR viaduct 0.36 0.46 forest path forest path forest path forest path viaduct viaduct viaduct laundromat ResNet18 AlexNet ResNet50 DenseNet161 Figure 8: “copy-paste” adversarial examples for vision. For eachscene(with 3 example images at bottom), the neuron that contribute most (byconnection weight) are shown, along with their length 3 explanations. We target theboldexplanations to crudely modify an input image and change the prediction towards/away from the scene. In the top-right corner, the left-most image is presented to the model (with predictions from 4 models shown); we modify the image to the right-most image, which changes the model prediction(s). Unit 39 (nobody in hypothesis) hyp:nobodyAND (NOT pre:hair) AND (NOT pre:RB) AND (NOT pre:’s) IoU0.465w entail -0.117w neutral -0.053w contra 0.047 Pre Three women prepare a meal in a kitchen. OrigHypThe ladies are cooking. Adv HypNobody but the ladies are cooking. True entail →neutralPredentail →contra Unit 133 (couch words in hypothesis) NEIGHBORS(hyp:couch) OR hyp:insideOR hyp:homeOR hyp:indoorsOR hype:eating IoU0.202w entail -0.125w neutral -0.024w contra 0.088 Pre 5 women sit around a table doing some crafts. OrigHyp5 women sit around a table. Adv Hyp5 women sit around a tablenearacouch. True entail→neutralPredentail →contra Unit 15 (sitting only in hypothesis) hyp:eatingOR hyp:sittingOR hyp:sleepingOR hyp:sitsAND (NOT pre:sits) IoU0.239w entail -0.083w neutral -0.059w contra 0.086 OrigPre A blond woman is holding 2 golf balls while reaching down into a golf hole. Adv PreA blond woman is holding 2 golf balls. HypA blondwomanissittingdown. True contra→neutralPredcontra→contra Unit 941(inside/indoors in hypothesis) hyp:insideOR hyp:notOR hyp:indoorsOR hyp:movingOR hyp:something IoU0.151w entail 0.086 w neutral -0.030w contra -0.023 OrigPre Two people are sitting in a station. AdvPre Two people are sitting in a pool. HypA coupleofpeopleareinsideandnotstanding. True entail→neutralPredentail→entail advadv advadv advadv advadv Figure 9: “copy-paste” adversarial examples for NLI. Taking an example from SNLI, we construct anadversarial (adv)premise or hypothesis which changes the true label and results in anincorrect model prediction (original label/prediction adv −→adversarial label/prediction). sensible; water, foliage, and rivers contribute to aswimming holeprediction; houses, staircases, and fire escape (objects) contribute tofire escape(scene). However, the explanations inboldinvolve polysemanticity or spurious correlations. In these cases, we found it is possible to construct a “copy-paste” example which uses the neuron explanation to predictably alter the prediction. 2 In some cases, these adversarial examples are generalizable across networks besides the probed ResNet-18, causing the same behavior across AlexNet [24], ResNet-50 [15], and DenseNet-161 [21], all trained on Places365. For example, one major contributor to theswimming holescene (top-left) is a neuron that fires for non-blue water; making the water blue switches the prediction togrottoin many models. The consistency of this misclassification suggests that models are detecting underlying biases in the 2 Appendix B tests sensitivity of these examples to size and position of the copy-pasted subimages. 7 training data. Other examples include a neuron contributing toclean roomthat also detects ice and igloos; putting an igloo in a corridor causes a prediction to shift fromcorridortoclean room, though this does not occur across models, suggesting that this is an artifact specific to this model. NLI.In NLI, we are able to trigger similar behavior by targeting spurious neurons (Figure 9). Unit 39 (top-left) detects the presence ofnobodyin the hypothesis as being highly indicative of contradiction. When creating an adversarial example by addingnobodyto the original hypothesis, the true label shifts fromentailmenttoneutral, but the model predictscontradiction. Other neurons predictcontradictionwhencouch-related words (Unit 133) orsitting(Unit 15) appear in the hypothesis, and can be similarly targeted. Overall, these examples are reminiscent of the image-patch attacks of [9], adversarial NLI inputs [1,37], and the data collection process for recentcounterfactualNLI datasets [13,22], but instead of searching among neuron visualizations or using black-box optimization to synthesize examples, we use explanations as a transparent guide for crafting perturbations by hand. 7 Related Work Interpretability.Interpretability in deep neural networks has received considerable attention over the past few years. Our work extends existing work on generating explanations for individual neurons in deep representations [4,5,10–12,27], in contrast to analysis or probing methods that operate at the level of entire representations (e.g. [2,19,29]). Neuron-level explanations are fundamentally limited, since they cannot detect concepts distributed across multiple neurons, but this has advantages: first, neuron-aligned concepts offer evidence for representations that aredisentangledwith respect to concepts of interest; second, they inspect unmodified “surface-level” neuron behavior, avoiding recent debates on how complex representation-level probing methods should be [18, 29]. Complex explanations.In generating logical explanations of model behavior, one related work is the Anchors procedure of [33], which finds conjunctions of features that “anchor” a model’s prediction in some local neighborhood in input space. Unlike Anchors, we do not explain local model behavior, but rather globally consistent behavior of neurons across an entire dataset. Additionally, we use not just conjunctions, but more complex compositions tailored to the domain of interest. As our compositional formulas increase in complexity, they begin to resemble approaches to generat- ingnatural languageexplanations of model decisions [2,8,16,17,31]. These methods primarily operate at the representation level, or describe rationales for individual model predictions. One advantage of our logical explanations is that they are directly grounded in features of the data and have explicit measures of quality (i.e. IoU), in contrast to language explanations generated from black-box models that themselves can be uninterpretable and error-prone: for example, [17] note that naive language explanation methods often mention evidence not directly present in the input. Dataset biases and adversarial examples.Complex neural models are oftenbrittle: they fail to generalize to out-of-domain data [3,13,22,32] and are susceptible to adversarial attacks where inputs are subtly modified in a way that causes a model to fail catastrophically [34,36,37]. This may be due in part to biases in dataset collection [3,14,30,32], and models fail on datasets which eliminate these biases [3,13,22,32]. In this work we suggest that these artifacts are learned to the degree that we can identify specific detectors for spurious features in representation space, enabling “copy-paste” adversarial examples constructed solely based on the explanations of individual neurons. 8 Discussion We have described a procedure for obtaining compositional explanations of neurons in deep represen- tations. These explanations more precisely characterize the behavior learned by neurons, as shown through higher measures of explanation quality (i.e. IoU) and qualitative examples of models learning perceptual abstractions in vision and spurious correlations in NLI. Specifically, these explanations (1) identify abstractions, polysemanticity, and spurious correlations localized to specific units in the representation space of deep models; (2) can disambiguate higher versus lower quality neurons in a model with respect to downstream performance; and (3) can be targeted to create “copy-paste” adversarial examples that predictably modify model behavior. 8 Several unanswered questions emerge: 1.We have limited our analysis in this paper to neurons in the penultimate hidden layers of our networks. Can we probe other layers, and better understand how concepts are formed and composed between the intermediate layers of a network (cf. [27])? 2.Doesmodel pruning[20] more selectively remove the “lower quality” neurons identified by this work? 3. To what extent can the programs implied by our explanations serve as drop-in approximations of neurons, thus obviating the need for feature extraction in earlier parts of the network? Specifically, can we distill a deep model into a simple classifier over binary concept detectors defined by our neuron explanations? 4.If there is a relationship between neuron interpretability and model accuracy, as Section 5 has suggested, can we use neuron interpretability as a regularization signal during training, and does encouraging neurons to learn more interpretable abstractions result in better downstream task performance? Reproducibility Code and data are available atgithub.com/jayelm/compexp. Broader Impact Tools for model introspection and interpretation are crucial for better understanding the behavior of black-box models, especially as they make increasingly important decisions in high-stakes societal applications. We believe that the explanations generated in this paper can help unveil richer concepts that represent spurious correlations and potentially problematic biases in models, thus helping practitioners better understand the decisions made by machine learning models. Nonetheless, we see two limitations with this method as it stands: (1) it currently requires technical expertise to implement, limiting usability by non-experts; (2) it relies on annotated datasets which may be expensive to collect, and may be biased in the kinds of features they contain (or omit). If a potential feature of interest is not present in the annotated dataset, it cannot appear in an explanation. Both of these issues can be ameliorated with future work in (1) building easier user interfaces for explainability, and (2) reducing data annotation requirements. In high stakes cases, e.g. identifying model biases, care should also be taken to avoid relying too heavily on these explanations as causal proof that a model is encoding a concept, or assuming that the absence of an explanation is proof that the model does not encode the concept (or bias). We provide evidence that neurons exhibit surface-level behavior that is well-correlated with human-interpretable concepts, but by themselves, neuron-level explanations cannot identify the full array of concepts encoded in representations, nor establish definitive causal chains between inputs and decisions. Acknowledgments and Disclosure of Funding Thanks to David Bau, Alex Tamkin, Mike Wu, Eric Chu, and Noah Goodman for helpful comments and discussions, and to anonymous reviewers for useful feedback. This work was partially supported by a gift from NVIDIA under the NVAIL grant program. JM is supported by an NSF Graduate Research Fellowship and the Office of Naval Research Grant ONR MURI N00014-16-1-2007. References [1] M. Alzantot, Y. S. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W. Chang. Generating natural language adversarial examples. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018. [2] J. Andreas, A. Dragan, and D. Klein. Translating neuralese. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 232–242, 2017. 9 [3]A. Barbu, D. Mayo, J. Alverio, W. Luo, C. Wang, D. Gutfreund, J. Tenenbaum, and B. Katz. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. InAdvances in Neural Information Processing Systems, pages 9448–9458, 2019. [4]A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. Glass. Identifying and controlling important neurons in neural machine translation. InInternational Conference on Learning Representations, 2019. [5]D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Network dissection: Quantifying interpretability of deep visual representations. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 6541–6549, 2017. [6] S. Bowman, G. Angeli, C. Potts, and C. D. Manning. A large annotated corpus for learning natural language inference. InProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, 2015. [7]S. Bowman, J. Gauthier, A. Rastogi, R. Gupta, C. D. Manning, and C. Potts. A fast unified model for parsing and sentence understanding. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1466–1477, 2016. [8] O.-M. Camburu, T. Rocktäschel, T. Lukasiewicz, and P. Blunsom. e-SNLI: natural language inference with natural language explanations. InAdvances in Neural Information Processing Systems, pages 9539–9549, 2018. [9] S. Carter, Z. Armstrong, L. Schubert, I. Johnson, and C. Olah. Activation atlas.Distill, 4(3):e15, 2019. [10]F. Dalvi, N. Durrani, H. Sajjad, Y. Belinkov, A. Bau, and J. Glass. What is one grain of sand in the desert? Analyzing individual neurons in deep NLP models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6309–6317, 2019. [11]F. Dalvi, A. Nortonsmith, A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, and J. Glass. NeuroX: A toolkit for analyzing individual neurons in neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9851–9852, 2019. [12]R. Fong and A. Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 8730–8738, 2018. [13]M. Gardner, Y. Artzi, V. Basmova, J. Berant, B. Bogin, S. Chen, P. Dasigi, D. Dua, Y. Elazar, A. Got- tumukkala, et al. Evaluating NLP models via contrast sets.arXiv preprint arXiv:2004.02709, 2020. [14] S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. Bowman, and N. A. Smith. Annotation artifacts in natural language inference data. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112, 2018. [15]K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. [16] L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell. Generating visual explanations. InProceedings of the European Conference on Computer Vision, pages 3–19, 2016. [17]L. A. Hendricks, R. Hu, T. Darrell, and Z. Akata. Grounding visual explanations. InProceedings of the European Conference on Computer Vision, pages 264–279, 2018. [18] J. Hewitt and P. Liang. Designing and interpreting probes with control tasks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2733–2743, 2019. [19]J. Hewitt and C. D. Manning. A structural probe for finding syntax in word representations. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, 2019. [20]G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. [21]G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017. 10 [22]D. Kaushik, E. Hovy, and Z. C. Lipton.Learning the difference that makes a difference with counterfactually-augmented data. InInternational Conference on Learning Representations (ICLR), 2020. [23]E. Kitzelmann. Inductive programming: A survey of program synthesis techniques. InInternational workshop on approaches and applications of inductive programming, pages 50–73. Springer, 2009. [24]A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. InAdvances in Neural Information Processing Systems, pages 1097–1105, 2012. [25]T. McCoy, E. Pavlick, and T. Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, 2019. [26]A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. InAdvances in Neural Information Processing Systems, pages 3387–3395, 2016. [27]C. Olah, N. Cammarata, L. Schubert, G. Goh, M. Petrov, and S. Carter. Zoom in: An introduction to circuits.Distill, 5(3):e00024–001, 2020. [28]J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1532–1543, 2014. [29]T. Pimentel, J. Valvoda, R. H. Maudslay, R. Zmigrod, A. Williams, and R. Cotterell. Information-theoretic probing for linguistic structure.arXiv preprint arXiv:2004.03061, 2020. [30]A. Poliak, J. Naradowsky, A. Haldar, R. Rudinger, and B. Van Durme. Hypothesis only baselines in natural language inference. InProceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 180–191, 2018. [31]N. F. Rajani, B. McCann, C. Xiong, and R. Socher. Explain yourself! Leveraging language models for commonsense reasoning. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4932–4942, 2019. [32]B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do ImageNet classifiers generalize to ImageNet? In International Conference on Machine Learning, pages 5389–5400, 2019. [33] M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [34]M. T. Ribeiro, S. Singh, and C. Guestrin. Semantically equivalent adversarial rules for debugging NLP models. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 856–865, 2018. [35]K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034, 2013. [36] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. InInternational Conference on Learning Representations, 2014. [37]E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing nlp. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, 2019. [38]M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. InEuropean Conference on Computer Vision, pages 818–833. Springer, 2014. [39] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene CNNs. InInternational Conference on Learning Representations, 2015. [40] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. [41] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba. Scene parsing through ADE20K dataset. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 633–641, 2017. 11 A Concept uniqueness and granularity Here, we report statistics about the uniqueness of neuron concepts, as we increase the maximum formula length of our explanations. 4 8 12 0100200300400500 Concept Count Length 1 3 5 10 Vision 5 10 15 020406080 Concept Count Length 1 3 5 10 NLI Figure S1: Number of repeated concepts across probed vision and NLI models, by maximum formula length. Table S1: For probedImage ClassificationandNLImodels, average number of occurrences of each detected concept and percentage of detected concepts that are unique (i.e. appear only once). Image ClassificationNLI NMean concept count% uniqueMean concept count% unique 12.6142%3.3039% 31.0397%1.2086% 51.0199%1.0496% 101.00100%1.00100% A.1 Image Classification Figure S1 (left) plots the number of times each unique concept appears across the 512 units of ResNet- 18 as the maximum formula length increases. Table S1 displays the mean number of occurrences per concept, and percentage of concepts occurring that are unique (i.e. occur only once). At length 1 (equivalent to NetDissect), many concepts appear multiple times, where only 42% of concepts occur only once and the mean number of occurrences is 2.61. But uniqueness increases dramatically as the formula length increases: already by length 3, 97% of concepts are unique, and concepts are all unique by length 10. Our explanations thus reveal significant specialization in neuron function (vs. NetDissect [5]). Table S2 shows the most common repeated concepts for each maximum formula lengths. A.2 NLI Similarly, Figure S1 (right) plots the number of times each unique concept appears across the neurons of the NLI model, and Table S1 displays occurrence statistics. Like the Image Classification model, longer formula lengths reveal significantly more specialization in neuron function. Table S3 shows the most common repeated concepts for each maximum formula lengths. B Adversarial example sensitivity In Figure S2 we vary the size and position of subimages for the copy-paste examples (note this analysis is less straightforward for examples likenon-blue water). Sensitivity depends on the specific example. In general, if the sub-image is too small (left), the original class prevails; otherwise, the igloo→clean roomexample is quite reliable, while thestreet→fire escapeexample is less so. 12 Table S2: Up to the 10 most repeated concepts in ResNet-18 conv4 by lengthN. AtN= 5there is only 1 non-unique concept. For a full list see the code. NConcept# 1pool table15 house12 corridor11 cockpit10 bed10 bakery shop9 bathroom9 alley8 airport terminal8 car8 3pillow OR (bed AND bedroom)4 sink OR toilet OR bathtub3 auditorium OR movie theater OR conference center2 sink OR toilet OR countertop2 pool table OR arcade machine OR slot machine2 pool table OR golf course OR fairway2 greenhouse OR vegetable garden OR herb garden2 water OR river AND (NOT blue)2 living room AND (sofa OR cushion)2 street AND sky AND white2 5auditorium OR indoor theater2 OR conference center OR movie theater OR silver screen 13 Table S3: Up to 10 of the most repeated concepts in our probed NLI baseline model by lengthN. At N= 3there are only 9 non-unique concepts; atN= 5there are only 3 non-unique concepts. For a full list see the code. NLI NConceptCount 1pre:N17 hyp:N14 hyp:IN8 overlap-75%4 hyp:VBG4 hyp:in3 hyp:.3 hyp:sitting2 hyp:DT2 hyp:for2 3pre:N AND (NOT overlap-50%) AND (not hyp:outside)4 pre:N AND (NOT hyp:for) AND (NOT hyp:VB)3 (NOT hyp:sleeping) AND (pre:N OR hyp:NNS)3 hyp:N AND (NOT overlap-50%) AND (NOT hyp:outside)2 hyp:for OR hyp:PRP$ OR hyp:to2 hyp:N AND (NOT overlap-50%) AND (NOT hyp:there)2 pre:N AND (NOT overlap-75%) AND (NOT hyp:outside)2 pre:N AND (NOT hyp:for) AND (NOT hyp:PRP$)2 hyp:eating OR (hyp:IN AND (NOT overlap-50%))2 5hyp:NNP OR ((NOT hyp:EX) AND (hyp:IN OR hyp:PRP$ OR hyp:to))2 (NOT hyp:for) AND (NOT hyp:PRP) AND (overlap-50%2 OR (pre:N AND (NOT hyp:PRP$))) pre:N AND (NOT overlap-50%) AND (NOT hyp:outside)2 AND (NOT hyp:EX) AND (NOT hyp:outdoors) 14 (b) corridor + igloo = clean room (c) forest path + laundry machines = viaduct (a) street + cradle = fire escape Figure S2: Varying the size and position of pasted sub-images for the vision copy-paste adversarial examples. Green: prediction changes to intended adversarial class; yellow: prediction changes to a different class (e.g.aqueductfor the bottom row); red = no change. 15 C Additional image classification examples Examples are not cherry picked; we enumerate neurons 0–39. 8QLW NGZKECNCPFRGTEGRVWCNDCTUUWTHCEGU /HQJWKΖR8TGEGRVKQP /HQJWKΖR8 TGEGRVKQP14YQTMUWTHCEG 14DCT /HQJWKΖR8 TGEGRVKQP14YQTMUWTHCEG 14DCT 14VKEMGVEQWPVGT 14DWVVQPRCPGN #0& 016 FGUM #0& 016VCDNG #0& 016JQOGQHHKEG #0& 016 DQVVNG #0& 016FTCYGT 8QLW NGZKECNCPFRGTEGRVWCNYKPFQYUUJGNXGU /HQJWKΖR8UJQRYKPFQY /HQJWKΖR8 UJQRYKPFQY14RCPVT[ 14NKSWQT UVQTGQWVFQQT /HQJWKΖR8 UJQRYKPFQY14RCPVT[ 14 NKSWQTUVQTGQWVFQQT 14UJQRHTQPV 14VQ[UJQR 14IKHV UJQR 14EQPXGPKGPEGUVQTGQWVFQQT 14RWDQWVFQQT 14 VTCFGPCOG 14UJQGUJQR 8QLW NGZKECNCPFRGTEGRVWCNKUNCPFUITCUUYCVGT /HQJWKΖR8UGC /HQJWKΖR8 UGC14OCTUJ 14NCMG /HQJWKΖR8 UGC14OCTUJ 14NCMG 14 KUNCPF #0& 016QEGCP 14DC[QW 14IQNHEQWTUG #0& 016VTGG 14DQI 14YCVGTKPIJQNG 8QLW NGZKECNCPFRGTEGRVWCNUETGGPU /HQJWKΖR8CWFKVQTKWO /HQJWKΖR8 CWFKVQTKWO14OQXKGVJGCVGTKPFQQT 14VJGCVGTKPFQQTRTQEGPKWO /HQJWKΖR8 CWFKVQTKWO14OQXKGVJGCVGT KPFQQT 14VJGCVGTKPFQQTRTQEGPKWO 14UKNXGTUETGGP 14EQPHGTGPEGEGPVGT #0& 016EGKNKPI 14DNCEMDQCTF #0& 016HNQQT 14NGEVWTGTQQO #0& 016EGKNKPI 8QLW RQN[UGOCPVKEEQNWOPUEJCPFGNKGTU /HQJWKΖR8EQWTVJQWUG /HQJWKΖR8 EQWTVJQWUG14VJTQPGTQQO 14 DCNNTQQO /HQJWKΖR8 EQWTVJQWUG14VJTQPGTQQO 14EJCPFGNKGT 14EQNWOP 14DCNNTQQO 14DCPFUVCPF 14 GODCUU[ #0& 016NKXKPITQQO #0& 016RQQNTQQOJQOG #0& 016EQWTVTQQO 8QLW RGTEGRVWCNQPN[FGDTKU /HQJWKΖR8UJGNH /HQJWKΖR8 UNWO14VQKNGV 14RCPVT[ /HQJWKΖR8 UNWO14VQKNGV 14RCPVT[ 14TWDDKUJ 14DC 14DWEMGV 14YCUJGT 14 RNC[VJKPI 14YQTMUJQR #0& 016OCTMGVQWVFQQT 8QLW NGZKECNCPFRGTEGRVWCNFQOGU /HQJWKΖR8IC /HQJWKΖR8 IC 14VGPV 14FQOG /HQJWKΖR8 IC 14VGPV 14 FQOG 14DKIVQR 14JWV 14KUNCPF 14DWVVG 14 OCWUQNGWO 14ITGGPJQWUG 14DCPFUVCPF 8QLW NGZKECNCPFRGTEGRVWCNÕGNFUQHRNCPVU /HQJWKΖR8XKPG[CTF /HQJWKΖR8 XKPG[CTF14QTEJCTF 14EQTPHKGNF /HQJWKΖR8 XKPG[CTF14QTEJCTF 14EQTP HKGNF 14XKPG[CTF #0& 016JGFIG 16 8QLW NGZKECNCPFRGTEGRVWCNRNCPVU /HQJWKΖR8ITGGPJQWUGKPFQQT /HQJWKΖR8 ITGGPJQWUGKPFQQT14ITGGPJQWUG 14 XGIGVCDNGICTFGP /HQJWKΖR8 ITGGPJQWUGKPFQQT14 ITGGPJQWUG 14XGIGVCDNGICTFGP 14XKPG[CTF 14NGCXGU 14HNQTKUVUJQRKPFQQT 14EQTPHKGNF 14NGCH #0& 016 HKGNF 14XGIGVCDNGICTFGP 8QLW RQN[UGOCPVKEITCUUDCNNUQVJGT /HQJWKΖR8ITCUU /HQJWKΖR8 DCNNRKV14RNC[VJKPI 14 MKPFGTICTFGPENCUUTQQO /HQJWKΖR8 DCNNRKV14RNC[VJKPI 14 MKPFGTICTFGPENCUUTQQO 14HTWKV 14FC[ECTGEGPVGT 14 NCMGCTVKHKEKCN #0& 016YCVGT #0& 016RCKPVKPI #0& 016RGFGUVCN #0& 016HGPEG 8QLW RQN[UGOCPVKEOQWPVCKPUJKIJYC[ /HQJWKΖR8JKIJYC[ /HQJWKΖR8 JKIJYC[14HKGNFEWNVKXCVGF #0& 016UM[ /HQJWKΖR8 JKIJYC[14HKGNFEWNVKXCVGF #0& 016UM[ 14YJGCVHKGNF 14FGUGTVCPF #0& 016 UM[ 14HKGNFTQCF #0& 016UM[ 14OQWPVCKPTQCF #0& 016GCTVJ 8QLW RQN[UGOCPVKERGQRNGQVJGTU /HQJWKΖR8RGTUQP /HQJWKΖR8 RGTUQP14DQQVJKPFQQT #0& 016 CTVUVWFKQ /HQJWKΖR8 RGTUQP#0& 016RKPM 14 DQQVJKPFQQT #0& 016CTVUVWFKQ #0& 016JGCF 14 OCTMGVKPFQQT 14DQQVJKPFQQT 14VQTUQ #0& 016 EQPHGTGPEGEGPVGT #0& 016HKTGUVCVKQP 8QLW RQN[UGOCPVKEFKPKPITQQOUÕTGGUECRGUQVJGTU /HQJWKΖR8FKPKPITQQO /HQJWKΖR8 FKPKPITQQO#0&VCDNG 14HKTG GUECRG /HQJWKΖR8 FKPKPITQQO#0&VCDNG 14 HKTGGUECRG 14ECVCEQOD 14UJGNVGT 14VJTQPGTQQO 14 CNVCT 14CNVCTRKGEG 14HKTGGUECRG #0& 016EJCKT 8QLW RQN[UGOCPVKEKUNCPFUECPQRKGUQVJGTU /HQJWKΖR8KUNGV /HQJWKΖR8 KUNGV14ECXGTPKPFQQT 14ECPQR[ /HQJWKΖR8 KUNGV14ECXGTPKPFQQT 14 ECPQR[ 14TQRGDTKFIG 14ECTQWUGN 14ECVCEQOD 14 DGFEJCODGT 14CNVCTRKGEG 14PKEJG 14DC[QW 8QLW RGTEGRVWCNQPN[HGPEGUJQTK /HQJWKΖR8DQZKPITKPI /HQJWKΖR8 DQZKPITKPI14EQTTCN 14DNGCEJGTU KPFQQT /HQJWKΖR8 DQZKPITKPI14EQTTCN 14 DNGCEJGTUKPFQQT 14HGPEG 14OKNKVCT[JWV 14 YTGUVNKPITKPIKPFQQT 14DCMGT[MKVEJGP 14DCTP[CTF 14RCTMKPIICTCIGQWVFQQT 14JQTUG 8QLW RGTEGRVWCNQPN[DGFU /HQJWKΖR8QRGTCVKPITQQO /HQJWKΖR8 QRGTCVKPITQQO14JQURKVCNTQQO 14 ETCFNG /HQJWKΖR8 QRGTCVKPITQQO14JQURKVCN TQQO 14ETCFNG 14FGPVKUVUQHHKEG #0& 016UKPM #0& 016EJGUVQHFTCYGTU 14QRGTCVKPITQQO #0& 016 FTCYGT #0& 016HQQVDQCTF 17 8QLW RQN[UGOCPVKEI[OUYKPFOKNNUQVJGT /HQJWKΖR8KEGUMCVKPITKPMKPFQQT /HQJWKΖR8 KEGUMCVKPITKPMKPFQQT14 DCUMGVDCNNEQWTVKPFQQT 14OCTVKCNCTVUI[O /HQJWKΖR8 KEGUMCVKPITKPMKPFQQT14 DCUMGVDCNNEQWTVKPFQQT 14OCTVKCNCTVUI[O 14 YKPFOKNN 14JCPICTKPFQQT 14DQZKPITKPI 14YTGUVNKPI TKPIKPFQQT 14HKTGGUECRG 14DCFOKPVQPEQWTVKPFQQT 14UWDYC[UVCVKQPEQTTKFQT 8QLW RGTEGRVWCNQPN[ÖCVCTGCU /HQJWKΖR8CWFKVQTKWO /HQJWKΖR8 CWFKVQTKWO14EQPHGTGPEGEGPVGT 14 OQXKGVJGCVGTKPFQQT /HQJWKΖR8 CWFKVQTKWO14EQPHGTGPEG EGPVGT 14OQXKGVJGCVGTKPFQQT 14VJGCVGTKPFQQT RTQEGPKWO 14UKNXGTUETGGP 14EQWTVTQQO #0& 016 DGPEJ #0& 016RGFGUVCN 14CWFKVQTKWO #0& 016 UYKXGNEJCKT 8QLW RQN[UGOCPVKETQEMUHQTGUVUQVJGT /HQJWKΖR8DCFNCPFU /HQJWKΖR8 DCFNCPFU14HQTGUVPGGFNGNGCH 14 UNQVOCEJKPG /HQJWKΖR8 DCFNCPFU14HQTGUV PGGFNGNGCH 14UNQVOCEJKPG 14LWPM[CTF 14CTECFG OCEJKPG 14EQY 14UGOKFGUGTVITQWPF 14CPKOCN 14ECT KPVGTKQTDCEMUGCV #0& 016ITGGP 8QLW NGZKECNCPFRGTEGRVWCNECUGU /HQJWKΖR8DCMGT[JQR /HQJWKΖR8 DCMGT[JQR14ECUG #0& 016 UWRGTOCTMGV /HQJWKΖR8 DCMGT[JQR14ECUG 14HQQF #0& 016UWRGTOCTMGV 14DCMGT[MKVEJGP 14DWVEJGTU UJQR 14KEGETGCORCTNQT 14KUNCPF #0& 016MKVEJGP #0& 016ECDKPGV 8QLW NGZKECNCPFRGTEGRVWCNJQWUGUFGEMU /HQJWKΖR8JQWUG /HQJWKΖR8 JQWUG14OQVGN 14 /HQJWKΖR8 JQWUG14OQVGN 14 ICTFGP 14JWPVKPINQFIGQWVFQQT 14NKFQFGEMQWVFQQT #0& 016JQWUG 14UVWFGPVTGUKFGPEG 14UYKOOKPIRQQN KPFQQT 14DCTP[CTF #0& 016DCTP 8QLW RQN[UGOCPVKEDQQMECUGUÕTGUVCVKQPU /HQJWKΖR8DQQMECUG /HQJWKΖR8 DQQMECUG14HKTGUVCVKQP 14DQQM /HQJWKΖR8 DQQMECUG14HKTGUVCVKQP 14 DQQM 14XKFGQUVQTG 14ICTCIGFQQT 14NKDTCT[KPFQQT #0& 016CTEJKXG 14XKFGQU 14EQPXGPKGPEGUVQTG KPFQQT 14GZJKDKVQT 8QLW NGZKECNCPFRGTEGRVWCNDTKFIGURQUUKDN[QXGTYCVGT /HQJWKΖR8TKXGT /HQJWKΖR8 TKXGT14DTKFIG 14TQRGDTKFIG /HQJWKΖR8 TKXGT14DTKFIG 14TQRG DTKFIG 14ETGGM 14OQWPVCKPRCVJ 14CSWGFWEV 14 IWNEJ 14UCPFDCT 14HQQVDTKFIG #0& 016ECPCN PCVWTCN 8QLW RGTEGRVWCNQPN[XGTVKECNRGTURGEVKXGNKPGU /HQJWKΖR8MKVEJGP /HQJWKΖR8 [QWVJJQUVGN14UVQXG 14ICNNG[ /HQJWKΖR8 [QWVJJQUVGN14UVQXG 14 ICNNG[ 14OKETQYCXG 14YQTMUWTHCEG 14VGNGRJQPG DQQVJ 14EWDKENGQHHKEG 14MKVEJGPGVVG 14GZJCWUV JQQF #0& 016FTCYGT 18 8QLW RQN[UGOCPVKEDGFUÕTGRNCEGUQVJGT /HQJWKΖR8HKTGRNCEG /HQJWKΖR8 HKTGRNCEG14DWHHGV 14RWNRKV /HQJWKΖR8 HKTGRNCEG14DWHHGV 14 RWNRKV 14OKETQYCXG #0& 016RQQNTQQOJQOG #0& 016 YGVDCT #0& 016RCPG #0& 016FKPGVVGJQOG #0& 016EJWTEJKPFQQT 14OKETQYCXG 8QLW RGTEGRVWCNQPN[GORV[EQTTKFQTU /HQJWKΖR8EQTTKFQT /HQJWKΖR8 EQTTKFQT14UCWPC 14GNGXCVQT /HQJWKΖR8 EQTTKFQT14UCWPC 14 GNGXCVQT 14DCUGOGPV 14HKTGGUECRG 14GNGXCVQTFQQT 14ECTIQEQPVCKPGTKPVGTKQT 14GNGXCVQTHTGKIJV GNGXCVQT #0& 016FQQTHTCOG 14EQTTKFQT 8QLW NGZKECNCPFRGTEGRVWCNCSWGFWEVU /HQJWKΖR8CSWGFWEV /HQJWKΖR8 CSWGFWEV14XKCFWEV 14ENQKUVGT KPFQQT /HQJWKΖR8 CSWGFWEV14XKCFWEV 14 ENQKUVGTKPFQQT 14DCPFUVCPF 14CTEJ 14CSWGFWEV 14 XKCFWEV 14YCVGTVQYGT 14CTECFG 14CTECFGU 8QLW RGTEGRVWCNQPN[FQOGNKMGVJKPIU /HQJWKΖR8EQEMRKV /HQJWKΖR8 EQEMRKV14YCXG 14XKCFWEV /HQJWKΖR8 EQEMRKV14YCXG 14XKCFWEV 14JQXGN 14VGPV 14FCO 14HQWPVCKP 14KEG 14 FQNOGP 14XKCFWEV 8QLW NGZKECNCPFRGTEGRVWCNOGFKVGTTCPGCPJQWUGU /HQJWKΖR8CNNG[ /HQJWKΖR8 OGFKPC14MCUDCJ 14CNNG[ /HQJWKΖR8 OGFKPC14MCUDCJ 14CNNG[ #0& DWKNFKPI 14MCUDCJ 14OGFKPC #0& 016TCKNKPI 8QLW NGZKECNCPFRGTEGRVWCNJQWUGHCECFGU /HQJWKΖR8JQWUG /HQJWKΖR8 JQWUG14RQTEJ 14VQYPJQWUG /HQJWKΖR8 JQWUG#0& 016DWKNFKPI HCECFG 14RQTEJ 14VQYPJQWUG 14KPPQWVFQQT #0& 016RNCPV #0& 016CNNG[ #0& 016FCEJC #0& 016 UVCKTU #0& 016IGPGTCNUVQTGQWVFQQT 8QLW NGZKECNCPFRGTEGRVWCNRQTEJGU /HQJWKΖR8DCNEQP[KPVGTKQT /HQJWKΖR8 DCNEQP[KPVGTKQT14FKPGVVGJQOG 14EQPVTQNVQYGTKPFQQT /HQJWKΖR8 DCNEQP[KPVGTKQT14FKPGVVG JQOG 14EQPVTQNVQYGTKPFQQT #0& 016FQQT #0& 016 EWTVCKP #0& 016CTOEJCKT 8QLW RQN[UGOCPVKERQQNVCDNGUQVJGTU /HQJWKΖR8RQQNVCDNG /HQJWKΖR8 RQQNVCDNG14CTECFGOCEJKPG 14 VGNGXKUKQPECOGTC /HQJWKΖR8 RQQNVCDNG14CTECFGOCEJKPG 14VGNGXKUKQPECOGTC 14VCDNGVGPPKU #0& 016 VGNGXKUKQPUVWFKQ #0& 016YGVDCT #0& 016OWUKE UVWFKQ 19 8QLW RGTEGRVWCNQPN[TGFVJKPIU /HQJWKΖR8TGF /HQJWKΖR8 HKTGUVCVKQP14DWNNTKPI 14DQZKPI TKPI /HQJWKΖR8 HKTGUVCVKQP14DWNNTKPI 14 DQZKPITKPI 14VJTQPGTQQO 14VGNGRJQPGDQQVJ 14DKI VQR 14TKPI 14LQUUJQWUG 14CWVQDWU #0& 016 ITCPFUVCPF 8QLW NGZKECNCPFRGTEGRVWCNNCPFUECRGUJQTK /HQJWKΖR8DCFNCPFU /HQJWKΖR8 DCFNCPFU14FGUGTVCPF 14QCUKU /HQJWKΖR8 DCFNCPFU14FGUGTVCPF 14 QCUKU 14JQQFQQ 14DWNNFQ 14ECP[QP 14FCO #0& 016TQEM 14DCFNCPFU #0& 016VTGG 8QLW RQN[UGOCPVKEDGFUCPFUJGNXGU /HQJWKΖR8DGF /HQJWKΖR8 EJKNFUTQQO14FQTOTQQO 14[QWVJ JQUVGN /HQJWKΖR8 EJKNFUTQQO14FQTOTQQO 14 [QWVJJQUVGN 14EWUJKQP 14RCPVT[ 14RKNNQY #0& 016 YCTFTQDG #0& 016FQQT #0& 016ECTRGV #0& 016 CVVKE 8QLW RQN[UGOCPVKEYCVGTQVJGTUVTWEVWTGU /HQJWKΖR8DGCEJ /HQJWKΖR8 DGCEJ14VGPV 14ECTCXCP /HQJWKΖR8 DGCEJ14VGPV 14ECTCXCP 14JQXGN 14DC[QW 14OCPWHCEVWTGFJQOG 14YCVGTKPI JQNG 14QCUKU 14GZECXCVKQP 14LWPM[CTF 8QLW RGTEGRVWCNQPN[EQORNGZYJKVGUVTWEVWTGU /HQJWKΖR8DQCV /HQJWKΖR8 DQCV14UJKR 14CKTETCHVECTTKGT /HQJWKΖR8 DQCV14UJKR 14CKTETCHV ECTTKGT 14NKIJVJQWUG 14ECPPQP 14YQTMUJQR 14RKGT 14TQNNGTEQCUVGT 14YCVGTVQYGT 14FCO 8QLW RGTEGRVWCNQPN[GORV[JCNNUTQQOU /HQJWKΖR8EQTTKFQT /HQJWKΖR8 CKTRNCPGECDKP14UWDYC[KPVGTKQT 14DGTVJ /HQJWKΖR8 CKTRNCPGECDKP14UWDYC[ KPVGTKQT 14DGTVJ 14QRGTCVKPITQQO 14JQURKVCNTQQO 14I[OPCUKWOKPFQQT 14UYKXGNEJCKT #0& 016 EQPHGTGPEGTQQO 14RKNQVJQWUGKPFQQT #0& 016FGUM 8QLW RGTEGRVWCNQPN[VJKPIUQPITCUU /HQJWKΖR8NKIJVJQWUG /HQJWKΖR8 NKIJVJQWUG14DWNNTKPI 14DCVVGTU DQZ /HQJWKΖR8 NKIJVJQWUG14DWNNTKPI 14 DCVVGTUDQZ 14HCKTYC[ 14YCVGTVQYGT 14RNCPG 14 RKVEJ 14DCUGDCNNHKGNF #0& 016UM[ 14NKIJVJQWUG 8QLW RGTEGRVWCNQPN[ÖCVUWTHCEGU /HQJWKΖR8DGF /HQJWKΖR8 RQQNVCDNG14RKNNQY 14UYKOOKPI RQQN /HQJWKΖR8 RQQNVCDNG14RKNNQY 14 UYKOOKPIRQQN 14EWUJKQP 14JQVGNQWVFQQT #0& 016 DNCEM #0& 016UYKOOKPIRQQNKPFQQT 14GKFGTFQYP #0& 016DNCEM 14RKNNQY D Additional NLI examples Examples are not cherry picked; we enumerate the first 25 neurons that fire reliably (i.e. at least 500 times across the validation dataset), skipping those already illustrated in the main paper. 20 7PKV 016QXGTNCR #0&RTG00 #0& 016J[R8$ #0& 016J[RQWVUKFG #0& 016J[RPGCT áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCYQOCPFTGUUGFKPCDNWGNQPIUNGGXGFUJKTVCPFYGCTKPICJCKTPGV + $EV6TWG2TGFFRQWUDFRQWUD 3UHVYQOGPCTGQPCEJGTT[RKEMGTRTQEGGFKPIVQRGTHQTOYQTMCVCEQPUVTWEVKQPUKVG + [JKIJYC[ $EV6TWG2TGFFRQWUDFRQWUD 3UHVJGUGVYQRQQFNGUQPGDNCEMCPFQPGDTQYPCTGRNC[KPI + $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016QXGTNCR #0&RTG00 #0& 016J[RRGQRNG #0& 016J[R': 14J[RVCNN áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCOCPKPCDNWGJGNOGVLWORKPIQàQHCJKNNQPCFKTVDKMG + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCOCPUVCPFKPIKPHTQPVQHCENCUUQHCUKCPUVWFGPVUJQNFKPICRKEVWTGQHUCPVCENCWU + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCIKTNRTGRCTGUJGTUGNHHQTVJGUYKOOGGV + $EV6TWG2TGFHQWDLOQHXWUDO 7PKV J[RHQT14J[RVQ 14J[RVCNN 14J[RVJGKT #0& 016J[RPGZV áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCOCPKUFQKPIVTKEMUQPCUMCVGDQCTF + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCIW[QPKPNKPGUMCVGUYKVJCYJKVGJCVKUQPC[GNNQYTCKN + [QPKPNKPGUMCVGUKUVT[KPIVQKORTGUUJKUIKTNHTKGPF $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCIGPVNGOCPKPCUVTKRGFUJKTVIGUVWTKPIYKVJCUVKEMNKMGQDLGEVKPJKUJCPFYJKNGRCUUGTUD[UVCTGCVJKO + [QWUN[IGUVWTKPI $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV 016J[RYGCTKPI #0&RTG00 #0& 016J[RUNGGRKPI #0& 016J[RUKVVKPI #0& 016J[RGCVKPI áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCYQOCPYGCTKPICTGFUECTHTCKUGUJGTJCPFCUUJGYCNMUKPCRCTCFG + UFC[ $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCIW[QPKPNKPGUMCVGUYKVJCYJKVGJCVKUQPC[GNNQYTCKN + [QPKPNKPGUMCVGUKUVT[KPIVQKORTGUUJKUIKTNHTKGPF $EV6TWG2TGFQHXWUDOQHXWUDO 3UHVJTGGOGPQPGRGFCNKPIYJKNGRNC[KPIFTWOUQPGRNC[KPIRKCPQCPFQPGDQVJRGFCNKPICPFUVGGTKPIOQXGCV[RGQHOQDKNG DCPFFQYPCUVTGGV + [KPIVQCVVTCEVCETQYFCPFVCMGVJGOVQCDCTYJGTGVJG[YKNNDGRNC[KPINCVGT $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV J[RKP14J[RPQDQF[ 14J[RUKVVKPI #0& 016QXGTNCR 14J[RECV 21 áQ7Y GPVCKN PGWVTCN EQPVTC 3UHOCP[RGQRNGJCXGRCKPVGFHCEGUCVPKIJV + $EV6TWG2TGFFRQWUDFRQWUD 3UHCOCPKUECTT[KPICEJKNFYJKNGJQNFKPICTGFCPFDNWGWODTGNNC + $EV6TWG2TGFFRQWUDFRQWUD 3UHCOCPYKVJCOWUVCEJGKURNC[KPIKEGJQEMG[YKVJUPQYKPVJGDCEMITQWPF + $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016J[RVQ #0&RTG00 #0& 016J[RHQT #0& 016QXGTNCR #0& 016J[RQWVFQQTU áQ7Y GPVCKN PGWVTCN EQPVTC 3UHC[QWPIOCPUOKNGUCPFRQKPVUCVUQOGVJKPIQàECOGTCYJKNGUVCPFKPIKPHTQPVQHCFKURNC[ + [QWPIOCPKUHTQYPKPIYKVJJKUJCPFUKPJKURQEMGVU $EV6TWG2TGFFRQWUDFRQWUD 3UHCNKVVNGDQ[KPCDNWGUJKTVJQNFKPICVQ[ + [FTGUUGFKPTGFNKIJVKPIVJKPIUQPÕTG $EV6TWG2TGFFRQWUDFRQWUD 3UHCUJGRJGTFDTGGFFQITWPPKPIQPVJGDGCEJ + $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016QXGTNCR #0&RTG00 #0& 016RTGHQT #0& 016J[RUKVVKPI #0& 016J[RYGCTKPI áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCNKVVNGIKTNYKVJCJCVUKVUDGVYGGPCYQOCP UHGGVKPVJGUCPFKPHTQPVQHCRCKTQHEQNQTHWNVGPVU + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHVYQIKTNUCTGUKVVKPIQWVUKFGQPVJGITQWPFKPHTQPVQHCNCMG + ÖKGU $EV6TWG2TGFQHXWUDOQHXWUDO 3UHVJTGGJQEMG[RNC[GTUCTGKPVJGOKFFNGQHCRNC[ + [GTUCTGRNC[KPIHQTVJGEJCORKQPUJKR $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV 016QXGTNCR #0& J[RKP14J[RTWPPKPI 14J[RUYKOOKPI 14J[RTKFKPI áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCYQOCPYGCTKPICFTGUUYJKNGUKVVKPIFQYPRNC[KPICOWUKECNKPUVTWOGPVCPFUKPIKPIKPVQCOKETQRJQPG + [JGTUGNH $EV6TWG2TGFFRQWUDFRQWUD 3UHCOCPYKVJCOWUVCEJGKURNC[KPIKEGJQEMG[YKVJUPQYKPVJGDCEMITQWPF + $EV6TWG2TGFFRQWUDFRQWUD 3UHRGQRNGYCNMKPIVJTQWIJFKTV + $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016QXGTNCR #0&J[R00 #0& 016J[RQWVUKFG 14J[RUNGGRKPI #0& 016J[RPGCT áQ7Y GPVCKN PGWVTCN EQPVTC 3UHVYQOGPCTGQPCEJGTT[RKEMGTRTQEGGFKPIVQRGTHQTOYQTMCVCEQPUVTWEVKQPUKVG + [JKIJYC[ $EV6TWG2TGFFRQWUDFRQWUD 3UHCYQOCPFTGUUGFKPCDNWGNQPIUNGGXGFUJKTVCPFYGCTKPICJCKTPGV 22 + $EV6TWG2TGFFRQWUDFRQWUD 3UHCDQ[KPCTGFUJKTVCPFCDQ[KPC[GNNQYUJKTVCTGLWORKPIQPCVTCORQNKPGQWVUKFG + [UCTGCUNGGR $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016RTGDNWG #0& J[R#0&J[R00 #0& 016J[RVJGTG #0& 016J[RQWVUKFG áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCOCPKPCTGFJCVCPFUJKTVYKVJITC[UJQTVUCVVGORVUVQFQVJGURNKVU + $EV6TWG2TGFFRQWUDFRQWUD 3UHRGQRNGYCNMKPIFQYPCDWU[EKV[UVTGGVKPVJGYKPVGT + [EKV[UVTGGVKPUWOOGT $EV6TWG2TGFFRQWUDFRQWUD 3UHRQNKEGQÞEGTCPFJKUOQVQTE[ENGKPCETQYFQHRGQRNGCVCRTQVGUV + ÞEGTKUTKFKPICWPKEQTPKPHTQPVQHCETQYF $EV6TWG2TGFFRQWUDFRQWUD 7PKV J[RHQT14J[RVQ #0&J[R 14J[RCUNGGR 14J[RUCF áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCRCNGFQITWPUFQYPCRCVJ + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCOCPKPCDNCEMCPFDNWGLCEMGVCPFCYJKVGJGNOGVUMKKPIFQYPCJKNNUYKHVN[ + [DGECWUGJGKUCPGZRGTV $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCYQOCPYGCTKPICTGFUECTHTCKUGUJGTJCPFCUUJGYCNMUKPCRCTCFG + UFC[ $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV 016QXGTNCR #0&J[R+0 14RTGUKVVKPI 14RTGYCVGT #0& 016J[RVJGTG áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCOQVJGTCPFJGTVYQEJKNFTGPUKVFQYPVQTGUV + $EV6TWG2TGFFRQWUDFRQWUD 3UHCITQWRQHRGQRNGUKVVKPIKPCITCUU[CTGCWPFGTCRKPMCPFYJKVGDNQUUQOKPIVTGG + [CTGC $EV6TWG2TGFFRQWUDFRQWUD 3UHRGQRNGUKVVKPIKPCDQCVTQYKPIKPCNCTIGDQF[QHYCVGTUWTTQWPFGFD[ITGGPGT[ + $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016QXGTNCR #0& J[RKP14J[RQP 14J[RUNGGRKPI 14J[RGCVKPI áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCIKTNCPFVYQDQ[UCTGRNC[KPIKPYCVGT + $EV6TWG2TGFFRQWUDFRQWUD 3UHYJKNGUQOGRGQRNGNQQMKPVJGDCTPQVJGTUYCNMQPVJGDTKFIGCPFUQOGCTGGPLQ[KPIEQQNKPIQàKPVJGYCVGTD[VJGDGCEJ + $EV6TWG2TGFQHXWUDOFRQWUD 3UHDTQYPFQITWPPKPIVJTQWIJUJCNNQYYCVGT + $EV6TWG2TGFFRQWUDFRQWUD 23 7PKV J[RVJGKT14QXGTNCR #0&J[R+0 14J[RHTKGPF #0&J[R áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCFTGUUGFWRYQOCPYCNMKPIPGZVVQCUVQTGCVPKIJV + [CVPKIJV $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCOCPKPCDNWGUJKTVMJCMKUJQTVUDCNNECRCPFYJKVGUQEMUCPFNQCHGTUYCNMKPIDGJKPFCITQWRQHRGQRNGYCNMKPIFQYPCUVQPG YCNMYC[YKVJCYCVGTDQVVNGKPJKUNGHVJCPF + YCNMYC[YKVJCYCVGTDQVVNGKPJKUNGHVJCPF $EV6TWG2TGFFRQWUDFRQWUD 3UHCOCPKUUVCPFKPIKPEQEQPWVUYJKNGVT[KPIVQQRGPQPG + [KPIVQQRGPQPG $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV 016J[R242 #0&RTG #0& 016J[R8$ #0& 016J[R242 #0& 016J[RKP áQ7Y GPVCKN PGWVTCN EQPVTC 3UHC[QWVJKUMKEMKPICUQEEGTDCNNKPCPGORV[DTKEMCTGC + $EV6TWG2TGFHQWDLOHQWDLO 3UHCDCPFQHRGQRNGRNC[KPIDTCUUKPUVTWOGPVUKURGTHQTOKPIQWVUKFG + $EV6TWG2TGFHQWDLOHQWDLO 3UHVJTGGJKMGTUCTGJKMKPIKPCOQWPVCKPÕNNGFYKVJVTGGUCPFUPQY + $EV6TWG2TGFWPMPQYPHQWDLO 7PKV 016QXGTNCR #0&J[R&6 #0& 016J[RQWVUKFG #0& 016J[RJCU #0& 016J[RPGCT áQ7Y GPVCKN PGWVTCN EQPVTC 3UHUGXGTCNRGQRNGRTGRCTGVJGKTUVCNNUVJCVEQPUKUVQHÕUJXGIGVCDNGUCPFHTWKVUHQTVJGRWDNKEG[G + $EV6TWG2TGFFRQWUDFRQWUD 3UHQWVFQQTUKPHTQPVQHCETQYFCOCPRNC[UCPKPUVTWOGPVD[DNQYKPIKPVQRKRGUJGJQNFUWRVQJKUHCEG + $EV6TWG2TGFFRQWUDFRQWUD 3UHDNWTT[RGQRNGYCNMKPIKPVJGEKV[CVPKIJV + $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016J[RHQT #0&RTG00 #0& 016J[R8$ #0& 016J[R242 14QXGTNCR áQ7Y GPVCKN PGWVTCN EQPVTC 3UHVJGNCF[KPVJGTGFLCEMGVKUJGNRKPIVJGQVJGTNCF[FGEKFGYJCVVQDW[ + $EV6TWG2TGFHQWDLOHQWDLO 3UHCURQTVUOCVEJKUVCMKPIRNCEGDGVYGGPQPGVGCOYGCTKPIVJGEQNQTUTGFCPFYJKVGCPFCPQVJGTVGCOURQTVKPIVJGEQNQTUDNCEM CPFDNWG + àGTGPVEQNQTU $EV6TWG2TGFHQWDLOHQWDLO 3UHVYQOGPQPGYKVJCECOGTCCPFCPQVJGTYKVJJCKTENKRRGTUCTGJGNRKPICPQVJGTOCPKPMKVEJGP + $EV6TWG2TGFHQWDLOHQWDLO 24 7PKV 016RTGCPF #0&J[R+0 14J[R242 #0&J[R 14J[R8$ áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCUQEEGTICOGYKVJOWNVKRNGOCNGURNC[KPI + UUQEEGTVGCOYKPPKPIVJGYQTNFEWR $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCOKNKVCT[ITQWRKPWPKHQTOUVCPFKPIVQIGVJGTYJKNGQPGQHVJGOIGVUVJGKTJCVCFLWUVGF + [RTGHQTOCVCHWPGTCN $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCOCPKUPCXKICVKPICDQCV + [CEJVFQYPVJGNCMG $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV 016J[R,, #0&QXGTNCR #0& 016J[R242 #0& 016J[RVQ 14J[RRGQRNG áQ7Y GPVCKN PGWVTCN EQPVTC 3UHGNGICPVN[FTGUUGFKPDNCEMCOCPCPFYQOCPGODTCEGKPFCPEG + $EV6TWG2TGFHQWDLOHQWDLO 3UHVYQOGPQPDKE[ENGUEQORGVKPIKPCTCEG + $EV6TWG2TGFHQWDLOHQWDLO 3UHRGQRNGYCNMKPIVQCURGEKCNRNCEG + $EV6TWG2TGFHQWDLOHQWDLO 7PKV J[RHQT14J[RVQ 14J[RJQOG 14J[RCHVGT 14J[RVJGKT áQ7Y GPVCKN PGWVTCN EQPVTC 3UHVQFFNGTYCNMKPICNQPIRCVJ + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHWPKHQTOGFUEJQQNIKTNUCTGYCNMKPIVQIGVJGTQPVJGUVTGGV + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCRCNGFQITWPUFQYPCRCVJ + $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV J[RQWVFQQTU14J[RQWVUKFG 14J[RPGCT 14J[RVJGTG 14J[RPQV áQ7Y GPVCKN PGWVTCN EQPVTC 3UHOCPCPFCYQOCPYCNMKPIQPVJGUVTGGV + $EV6TWG2TGFHQWDLOHQWDLO 3UHVJTGGYQOGPCTGUKVVKPIQPCYJCTHCPFMKEMKPIVJGKTHGGVKPVJGYCVGT + $EV6TWG2TGFHQWDLOHQWDLO 3UHCITQWRQHRGQRNGRNC[KPIIWKVCTUCPFUKPIKPI + [CTGCNNOCMKPIOWUKE $EV6TWG2TGFHQWDLOHQWDLO 7PKV J[RPQDQF[14QXGTNCR 14J[RPQV 14J[RPQ 14J[RQPG áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCDCPFYJKEJKPENWFGUCPWRTKIJVDCUURNC[GTKURNC[KPIKPCVGPVKPHTQPVQHECPCFKCPÖCIU + [GT 25 $EV6TWG2TGFFRQWUDFRQWUD 3UHCDQ[YKVJCEQPEGTPGFNQQMKVJQNFKPIWRVYQPGYURCRGTUHGCVWTKPICJGCFNKPGCDQWVOWTFGT + [KUPQVJQNFKPICP[VJKPI $EV6TWG2TGFFRQWUDFRQWUD 3UHC[QWPIDQ[YGCTKPICTGFEQCVGCVUCEJQEQNCVGDCT + [JCUPQENQVJGUQP $EV6TWG2TGFFRQWUDFRQWUD 7PKV 016J[RVJGTG #0&J[R00 #0& 016J[RUKVVKPI #0& 016J[RUVCPFKPI 14J[R8$ áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCITQWRQHRGQRNGYGCTKPIJCVUCPFWUKPIYCNMKPIUVKEMUCTGYCNMKPIVJTQWIJCYQQFGFCTGCQPCVTCKN + $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCIGPVNGOCPKPCUVTKRGFUJKTVIGUVWTKPIYKVJCUVKEMNKMGQDLGEVKPJKUJCPFYJKNGRCUUGTUD[UVCTGCVJKO + [QWUN[IGUVWTKPI $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCOKFFNGCIGFOCPKPCITC[VUJKTVCPFDTQYPRCPVUUKVVKPIQPJKUDGFTGCFKPICÖ[GTNKMGRCRGT + Ö[GTCDQWVCPGYLQDJGKUKPVGTGUVGFKP $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV J[R+0#0&QXGTNCR 14J[RPQV 14J[RPQ 14J[RQPN[ áQ7Y GPVCKN PGWVTCN EQPVTC 3UHCOCPKPCDNWGUJKTVMJCMKUJQTVUDCNNECRCPFYJKVGUQEMUCPFNQCHGTUYCNMKPIDGJKPFCITQWRQHRGQRNGYCNMKPIFQYPCUVQPG YCNMYC[YKVJCYCVGTDQVVNGKPJKUNGHVJCPF + YCNMYC[YKVJCYCVGTDQVVNGKPJKUNGHVJCPF $EV6TWG2TGFFRQWUDFRQWUD 3UHCDQ[YKVJCEQPEGTPGFNQQMKVJQNFKPIWRVYQPGYURCRGTUHGCVWTKPICJGCFNKPGCDQWVOWTFGT + [KUPQVJQNFKPICP[VJKPI $EV6TWG2TGFFRQWUDFRQWUD 3UHCFTGUUGFWRYQOCPYCNMKPIPGZVVQCUVQTGCVPKIJV + [CVPKIJV $EV6TWG2TGFQHXWUDOQHXWUDO 7PKV J[R+014J[RVQ 14J[R242 #0& 016J[R': 14J[R002 áQ7Y GPVCKN PGWVTCN EQPVTC 3UHVYQYQOGPYCNMKPIKPCPCTGCQH70- + 70-YQTMGTUYCNMFQYPVJGUVTGGVQHVJGQPEGDGCWVKHWNUWDWTDCPPGKIJDQTJQQFUWTXG[KPIVJGFCOCIGHTQOVJGUVQTO $EV6TWG2TGFQHXWUDOQHXWUDO 3UHCITQWRQHMKFUCTGRNC[KPIQPCVKTGUYKPI + $EV6TWG2TGFFRQWUDFRQWUD 3UHCYQOCPYCNMUD[CDTKEMDWKNFKPIVJCV UEQXGTGFYKVJITCÞVK + UUQPFTGYUQOGQHVJGITCÞVK $EV6TWG2TGFQHXWUDOQHXWUDO 26