[Paper Review] Locating and Editing Factual Associations in GP

Mohit Mishra
7 min readMar 16


Paper Summary/Reading


  • Reason people are mainly wondering what actually happening in between the model comes from two side of spectrum. One comes from scientific/interpretable side which mainly thinks where things are & what models are doing to achieve them. On the side of spectrum basically in practical sense as sometimes these model mess up & sometime we want to change some information in middle as they are outdated.
  • Although many of the largest neural networks in use today are autoregressive, the way that they store knowledge remains under-explored.
  • There are some researches already done on masked models.
  • Casual interventions has proved to be very useful in determining in what happens inside of a model.
  • Fundamental question behind casual tracing is we want to know which of the hidden state is carrying the information that can help us to debug our work.
  • But many people can come to conclusion that every hidden state contains the important information that can trace the hidden state. This is what this paper mainly concludes by checking this happens or not.

Let’s Debunk what actually is happening here

What is Casual Tracing?

So casual tracing essentially obfuscates the subject, adds noise so that now network will not know what we are talking about & it got a whole set of corrupted activations and now the questions comes as if we have a clean state then can we restore our answer using hidden state.

Casual tracing
  • So here firstly simply our language model works normally over the given prompt The space needle is in downtown, we can also clearly check this in image a. It also gives the correct answer as Seattle.
  • But in our second run we are entering corrupted inputs as shown in image b with asterisk in the end of every word token.
  • Eventually after our language models runs over this prompt it eventually leads to the wrong answer which was expected. But now here we can use casual tracing to check whether all neuron contains the important information for the tracing of the hidden states.
  • We will simply pick neuron from previously correct input trained model and put it in the same place in this model and will check whether will it give us Seattle as an answer even now or not.
  • Eventually it works and also proves that all neurons doesn’t contains the same important information. yeah they do contain information, not some all of them do contain.
  • Early site and late site has really high casual effects in essence they have the enough information required to restore the factual statements but all the other state don’t It is very sparse set of activations that can actually
Attention(Last Site)
  • Attention corresponds to the late site casual tracing effects
  • It is not too much surprising as model is going to recall the pass and output the next token. So it’s just right next to the prediction.
MLP (Early Site)
  • MLP corresponds to the early site casual tracing effects
  • It is really weird to see this as it is in middle which came from nowhere.
  • If we will perform this effect average over 1000 times then we will get to know early site systematically lands at the last subject token.
  • Patch of high casual effect in MLP
  • Authors of this paper basically formed the hypothesis that these MLP’s are the one that are recalling the factual knowledge.
MLP over 1000 Prompts
  • These MLP are the one that are recalling the factual statement and Attention token which are actually responsible for the next token prediction are the one reading them.
  • Now as the hypothesis states that the MLP are the one who stores/restore the factual statement, so to test this out we are here author had simply severed MLP and Attention one by one and we can clearly see the effect of this from the above image.
  • As shown above if we will sever the MLP then we can see it has very less effect compare to the one when we sever the attention.
  • For the early layers like 10–20 MLP is very important for recalling factual statement.
  • Without removing any between MLP and attention we can see the blue lines that are in the top but as soon as we will sever the MLP then it will eventually drop to the bottom green lines.
  • If we will sever attention then also it will drop but not as same as MLP for the early layers as shown with red lines in above image.
  • So from this we can have a better understanding that Attention do have a role but not as same as MLP. MLP plays much more important role in recalling the information than Attention.
MLP Layer
  • After token space needle , there will already be a MLP layer which already had recalled & stored of the fact that it is in seattle.
  • Even though till now we haven’t even asked the model about the location of the space needle.
  • So, it essentially means that if the hypothesis of this paper is correct then once the model sees the subject then it gather all sorts of MLP layer recalled and stored with whole bunch of information.
  • In the end with the help of attention layer model mainly focuses on what we asked so that it call pull out that particular information from the whole bunch of information's stored in different MLP layers.
  • So here MLP acts as a Key-Value store. One of the researcher named Geva found out that the second feed forward layer acts as the key-value memory. This itself has lot of talk about we will discuss this in future blogs.
  • This key here probably corresponds to subject
  • This corresponds not exactly output that we want as we still haven’t asked what we want but it contains something like a fact about the subject.

How to edit the language model?

  • Let just say rather than Seattle , theywant Paris as an answer from our language model.
  • They also have the value of Paris in vector form & using that value and using some bit of math author set this up as a constrained optimization problem.
  • It turns out that if they will solve that then it gives a closed form solution for a rank 1 update.
  • Here W is an original matrix, K & V are key-value pairs
  • It takes a rank 1 update which they can easily compute and that they need to add to the original weight matrix & eventually after that they gets a updated weight matrix that respects new fact that they want to insert.
  • Here new fact is simply location Paris .

Here we can have a question that-

How do they know what vector of key and value is that we wanna insert?

  • Key is relatively simple as key is the subject that you know and want & that we can simply let that run through the network & grab the activation at the particular site, they always choose the same site here.
  • Value is way different as they kind of solve the optimization problem.

How did they choose the MLP where they wants to make the edit?

  • Here they have to target the specific MLP at the specific layer.
  • Casual tracing gives the range of MLP at which it works

As we scale up the model most of the FLOPS in training and inference go into the feed forward layer mainly, into the MLP and not necessarily into the attention mechanism. Everyone tries to make attention mechanism more efficient, while they don’t realize that these large models works mainly in feed forward layer. Fan-out and Fan-in feed forward layer can really absorb huge amount of resources (basically memorized information's).

With all this, let’s end this blog as this is it of Research Paper Summary of Locating and Editing Factual Associations in GP. If you will find any issue regarding concept or code, you can message me on my Twitter or LinkedIn. The next blog will be published on 21March 2023.

Some words about me

I’m Mohit.❤️ You can also call me Chessman. I’m a Machine learning Developer and a competitive programmer. Most of my time is spent staring at a computer screen. During the day, I am usually programming, working to derive insight from large datasets. My skills include Data Analysis, Data Visualization, Machine learning, Deep Learning, DevOps and working toward Full Stack. I have developed a strong acumen for problem-solving, and I enjoy occasional challenges.

My Portfolio and Github.



Mohit Mishra

My skills include Data Analysis, Data Visualization, Machine learning, and Deep Learning. I have developed a strong acumen for problem-solving, and I enjoy ML.