# Focal Loss for Dense Object Detection

- Tsung-Yi Lin
- Priya Goyal
- Ross Girshick
- Kaiming He
- Piotr Dollár

Focal Loss is a loss calculation method. Its characteristics are:

- Control the weight of the positive and negative samples.
- Control the weight of samples that are easy to classify and difficult to classify.

# Cross entropy

$$ CE(p, y) = \left\{\begin{matrix} -log(p) & \text{if} \ y = 1 \\ -log(1-p) & \text{otherwise} \end{matrix}\right. $$ Where, $p$ is the predicted value, $y$ is the actual value. For notational convenience, we define $p_{t}$: $$ p_{t} = \left\{\begin{matrix} p & \text{if} \ y = 1 \\ 1-p & \text{otherwise} \end{matrix}\right. $$ And rewrite $CE(p, y) = CE(p_{t}) = -log(p_{t})$.

## Balanced Cross entropy

If you want to reduce the impact of negative samples, it can Add a coefficient $\alpha_{t}$ in the $CE(p_{t})$ function. And like $p_{t}$ , we define $\alpha_{t}$: $$ \alpha_{t} = \left\{\begin{matrix} \alpha & \text{if} \ y = 1 \\ 1-\alpha & \text{otherwise} \end{matrix}\right. $$ And the range of a is 0 to 1. In practice $\alpha$ maybe set by inverse class frequency or treated as a hyper-parameter to be set by cross-validation. At this time, we can control the contribution of positive and negative samples to loss by setting $\alpha_{t}$. So we continue to rewrite: $$ CE(p_{t}) = -\alpha_{t}\log(p_{t}) $$ Write the function completely as: $$ CE(p, y, \alpha) = \left\{\begin{matrix} -log(p) * \alpha & \text{if} \ y = 1 \\ -log(1-p) * (1-\alpha) & \text{otherwise} \end{matrix}\right. $$

# Focal loss

- Solving difficult classification problems involves:
- Categorizing samples based on high-class probability
- Difficulty level:
- Positive samples: higher $(1-p)$ value, harder to classify
- Negative samples: higher $p$ value, harder to classify $$ FL(p_{t}) = -(1-p_{t})^{\gamma} \log{(p_{t})} $$ Where, $(1-p_{t})^{\gamma}$ to differentiate the samples, $\gamma$ is called the focusing parameter.

- When $p_{t} \to 0$, $\gamma \to 1$, the loss is affected
- When $p_{t} \to 1$, $\gamma \to 0$, the loss is unaffected
- When $\gamma = 0$, FL is equivalent to CE

## With $\alpha\text{-balanced}$ variant

$$ FL(p_{t}) = -\alpha_{t}(1-p_{t})^{\gamma} \log{(p_{t})} $$

- The formula for balancing class weights and handling easy/hard samples:
- Assign weights to positive and negative samples.
- Use focal loss with tunable focusing parameters ($\gamma$) to control the impact of easy/hard examples.