Class-Balanced Loss Based on Effective Number of Samples

Oct 10, 2023

Yin Cui
Menglin Jia
Tsung-Yi Lin
Yang Song
Serge Belongie

CB Loss (Class-Balanced Loss) is a method developed for calculating loss, targeting the challenges brought by long-tailed data distribution:

Aim:
- Address the issue of long-tailed data distribution.
- Particularly beneficial in scenarios like image recognition and other machine learning tasks where the sample count for some classes significantly surpasses others.
Characteristics:
- Introduction of a new weighting scheme:
  - Utilizes the effective number of samples for each class.
  - Aims to re-balance the loss across classes.
- Calculation of class-wise loss:
  - Ensures a balanced contribution from each class to the overall loss computation.
- Outcome:
  - Facilitates better balance across classes.
  - Enhances the model’s generalization ability and performance.
  - Provides a robust solution even in scenarios with long-tailed data distributions.

Effective Number of Samples

The notion is to encapsulate the reducing marginal advantages garnered from employing additional data points of a class.
Owing to inherent resemblances within real-world data, the probability of a newly added sample closely mirroring existing ones escalates as the sample count increases.
Additionally, CNNs undergo training with extensive data augmentations, where all augmented instances are regarded as equivalent to the original example.

Defined

$$ E_{n} = \frac{1-\beta^{n}}{1-\beta} $$ $$ \beta = \frac{N-1}{N} $$ Where, $n$ represents the literal number of occurrences. $N$ is the number of unique prototypes. $\beta \in [0, 1)$ controls the rate at which $E_{n}$ grows as $n$ increases.

If $\beta = 0$ (implying all samples are overlapped or $N = 1$), $E_{n} = 1$
If $\beta \to 1$ (implying all samples are independent or $N \to \infty$), $E_{n} \to n$

Class-Balanced Loss

By introducing weight coefficients inversely proportional to the number of classes, the loss function was defined to learn efficiently even for long-tail unbalanced data. $$ \mathrm{CB}(\mathbf{p}, y) = \frac{1}{E_{n_{y}}}\mathcal{L}(\mathbf{p}, y) = \frac{1-\beta}{1-\beta^{n_{y}}}\mathcal{L}(\mathbf{p}, y) $$ Where $n_{y}$ is the number of samples in the ground-truth class $y$. $\beta = 0$ corresponds to no re-weighting and $\beta \to 1$ corresponds to re-weighing by inverse class frequency.

The proposed class-balanced term is model-agnostic and loss-agnostic, indicating its independence from the choice of loss function $L$ and predicted class probabilities $p$.

Class-Balanced Softmax Cross-Entropy Loss

$$ {\rm CB}_{\rm softmax}({\bf z}, y) = - \frac{1 - \beta}{1 - \beta^{n_y}} \log{\left( \frac{\exp{(z_y)}}{\sum_{j=1}^{C} \exp{(z_j)}} \right)} $$

Class-Balanced Sigmoid Cross-Entropy Loss

$$ {\rm CB}_{\rm sigmoid}({\bf z}, y) = - \frac{1 - \beta}{1 - \beta^{n_y}} \sum_{i=1}^{C} \log{\left(\frac{1}{1+\exp{(-z_{i}^{t})}} \right)} $$

Class-Balanced Focal Loss

$$ {\rm CB}_{\rm focal}({\bf z}, y) = \frac{1 - \beta}{1 - \beta^{n_y}} \sum_{i=1}^{C} (1 - p_{i}^{t})^{\gamma} \log{(p_{i}^{t})} $$