ArgMax and SoftMax

Mohit Mishra
3 min readMar 29, 2023

Understanding ArgMax and SoftMax in Minutes

  • Let’s just say with some input data we are getting raw output values, not between 0 and 1. Sometimes it can be more than 1 and less than 0.
  • Due to this, these outputs are sent to ArgMax or SoftMax layer first before the final decision is made.


  • It simply takes any set of output values and set the largest output value as 1 & other as 0.
  • So, when we use ArgMax, the neural network’s prediction is simply the output with a 1 in it.
  • This makes the output of the network very easy to interpret.
  • The biggest problem with ArgMax is that we can’t use it to optimize the weights and biases in the Neural Network. Because the output here is constant.
  • This also concludes that we can’t use the ArgMax function for the backpropagation.

Note: People wants to use ArgMax for output but opposite to this they want to use SoftMax for training.


  • Softmax function does change the value of Raw Output Values but preserves the original order of it.
  • All output from the softmax function will be between 0 and 1.
  • Regardless of how many raw output values there are, softmax output is always between 0 and 1.
  • The sum of all of the softmax output will always be equal to 1.

Note: Unlike the ArgMax function which has derivative always equal to zero or undefined but the derivative of the SoftMax function is not always 0 and we can use it for Gradient Descent.

