Class-Balanced Loss Based on Effective Number of Samples
- Yin Cui
- Menglin Jia
- Tsung-Yi Lin
- Yang Song
- Serge Belongie
CB Loss (Class-Balanced Loss) is a method developed for calculating loss, targeting the challenges brought by long-tailed data distribution:
- Aim:
- Address the issue of long-tailed data distribution.
- Particularly beneficial in scenarios like image recognition and other machine learning tasks where the sample count for some classes significantly surpasses others.
- Characteristics:
- Introduction of a new weighting scheme:
- Utilizes the effective number of samples for each class.
- Aims to re-balance the loss across classes.
- Calculation of class-wise loss:
- Ensures a balanced contribution from each class to the overall loss computation.
- Outcome:
- Facilitates better balance across classes.
- Enhances the model’s generalization ability and performance.
- Provides a robust solution even in scenarios with long-tailed data distributions.
- Introduction of a new weighting scheme:
Effective Number of Samples
- The notion is to encapsulate the reducing marginal advantages garnered from employing additional data points of a class.
- Owing to inherent resemblances within real-world data, the probability of a newly added sample closely mirroring existing ones escalates as the sample count increases.
- Additionally, CNNs undergo training with extensive data augmentations, where all augmented instances are regarded as equivalent to the original example.
Defined
$$ E_{n} = \frac{1-\beta^{n}}{1-\beta} $$ $$ \beta = \frac{N-1}{N} $$ Where, $n$ represents the literal number of occurrences. $N$ is the number of unique prototypes. $\beta \in [0, 1)$ controls the rate at which $E_{n}$ grows as $n$ increases.
- If $\beta = 0$ (implying all samples are overlapped or $N = 1$), $E_{n} = 1$
- If $\beta \to 1$ (implying all samples are independent or $N \to \infty$), $E_{n} \to n$
Class-Balanced Loss
By introducing weight coefficients inversely proportional to the number of classes, the loss function was defined to learn efficiently even for long-tail unbalanced data. $$ \mathrm{CB}(\mathbf{p}, y) = \frac{1}{E_{n_{y}}}\mathcal{L}(\mathbf{p}, y) = \frac{1-\beta}{1-\beta^{n_{y}}}\mathcal{L}(\mathbf{p}, y) $$ Where $n_{y}$ is the number of samples in the ground-truth class $y$. $\beta = 0$ corresponds to no re-weighting and $\beta \to 1$ corresponds to re-weighing by inverse class frequency.
The proposed class-balanced term is model-agnostic and loss-agnostic, indicating its independence from the choice of loss function $L$ and predicted class probabilities $p$.
Class-Balanced Softmax Cross-Entropy Loss
$$ {\rm CB}_{\rm softmax}({\bf z}, y) = - \frac{1 - \beta}{1 - \beta^{n_y}} \log{\left( \frac{\exp{(z_y)}}{\sum_{j=1}^{C} \exp{(z_j)}} \right)} $$
Class-Balanced Sigmoid Cross-Entropy Loss
$$ {\rm CB}_{\rm sigmoid}({\bf z}, y) = - \frac{1 - \beta}{1 - \beta^{n_y}} \sum_{i=1}^{C} \log{\left(\frac{1}{1+\exp{(-z_{i}^{t})}} \right)} $$
Class-Balanced Focal Loss
$$ {\rm CB}_{\rm focal}({\bf z}, y) = \frac{1 - \beta}{1 - \beta^{n_y}} \sum_{i=1}^{C} (1 - p_{i}^{t})^{\gamma} \log{(p_{i}^{t})} $$