Focal Loss for Dense Object Detection

  • Tsung-Yi Lin
  • Priya Goyal
  • Ross Girshick
  • Kaiming He
  • Piotr Dollár

Focal Loss is a loss calculation method. Its characteristics are:

  • Control the weight of the positive and negative samples.
  • Control the weight of samples that are easy to classify and difficult to classify.

Cross entropy

$$ CE(p, y) = \left\{\begin{matrix} -log(p) & \text{if} \ y = 1 \\ -log(1-p) & \text{otherwise} \end{matrix}\right. $$ Where, $p$ is the predicted value, $y$ is the actual value. For notational convenience, we define $p_{t}$: $$ p_{t} = \left\{\begin{matrix} p & \text{if} \ y = 1 \\ 1-p & \text{otherwise} \end{matrix}\right. $$ And rewrite $CE(p, y) = CE(p_{t}) = -log(p_{t})$.

Balanced Cross entropy

If you want to reduce the impact of negative samples, it can Add a coefficient $\alpha_{t}$ in the $CE(p_{t})$ function. And like $p_{t}$ , we define $\alpha_{t}$: $$ \alpha_{t} = \left\{\begin{matrix} \alpha & \text{if} \ y = 1 \\ 1-\alpha & \text{otherwise} \end{matrix}\right. $$ And the range of a is 0 to 1. In practice $\alpha$ maybe set by inverse class frequency or treated as a hyper-parameter to be set by cross-validation. At this time, we can control the contribution of positive and negative samples to loss by setting $\alpha_{t}$. So we continue to rewrite: $$ CE(p_{t}) = -\alpha_{t}\log(p_{t}) $$ Write the function completely as: $$ CE(p, y, \alpha) = \left\{\begin{matrix} -log(p) * \alpha & \text{if} \ y = 1 \\ -log(1-p) * (1-\alpha) & \text{otherwise} \end{matrix}\right. $$

Focal loss

  • Solving difficult classification problems involves:
    • Categorizing samples based on high-class probability
    • Difficulty level:
      • Positive samples: higher $(1-p)$ value, harder to classify
      • Negative samples: higher $p$ value, harder to classify $$ FL(p_{t}) = -(1-p_{t})^{\gamma} \log{(p_{t})} $$ Where, $(1-p_{t})^{\gamma}$ to differentiate the samples, $\gamma$ is called the focusing parameter.
  • When $p_{t} \to 0$, $\gamma \to 1$, the loss is affected
  • When $p_{t} \to 1$, $\gamma \to 0$, the loss is unaffected
  • When $\gamma = 0$, FL is equivalent to CE

With $\alpha\text{-balanced}$ variant

$$ FL(p_{t}) = -\alpha_{t}(1-p_{t})^{\gamma} \log{(p_{t})} $$

  • The formula for balancing class weights and handling easy/hard samples:
    • Assign weights to positive and negative samples.
    • Use focal loss with tunable focusing parameters ($\gamma$) to control the impact of easy/hard examples.