Class-Balanced Loss Based on Effective Number of Samples

  • Yin Cui
  • Menglin Jia
  • Tsung-Yi Lin
  • Yang Song
  • Serge Belongie

CB Loss (Class-Balanced Loss) is a method developed for calculating loss, targeting the challenges brought by long-tailed data distribution:

  • Aim:
    • Address the issue of long-tailed data distribution.
    • Particularly beneficial in scenarios like image recognition and other machine learning tasks where the sample count for some classes significantly surpasses others.
  • Characteristics:
    • Introduction of a new weighting scheme:
      • Utilizes the effective number of samples for each class.
      • Aims to re-balance the loss across classes.
    • Calculation of class-wise loss:
      • Ensures a balanced contribution from each class to the overall loss computation.
    • Outcome:
      • Facilitates better balance across classes.
      • Enhances the model’s generalization ability and performance.
      • Provides a robust solution even in scenarios with long-tailed data distributions.

Effective Number of Samples

  • The notion is to encapsulate the reducing marginal advantages garnered from employing additional data points of a class.
  • Owing to inherent resemblances within real-world data, the probability of a newly added sample closely mirroring existing ones escalates as the sample count increases.
  • Additionally, CNNs undergo training with extensive data augmentations, where all augmented instances are regarded as equivalent to the original example.


$$ E_{n} = \frac{1-\beta^{n}}{1-\beta} $$ $$ \beta = \frac{N-1}{N} $$ Where, $n$ represents the literal number of occurrences. $N$ is the number of unique prototypes. $\beta \in [0, 1)$ controls the rate at which $E_{n}$ grows as $n$ increases.

  • If $\beta = 0$ (implying all samples are overlapped or $N = 1$), $E_{n} = 1$
  • If $\beta \to 1$ (implying all samples are independent or $N \to \infty$), $E_{n} \to n$

Class-Balanced Loss

By introducing weight coefficients inversely proportional to the number of classes, the loss function was defined to learn efficiently even for long-tail unbalanced data. $$ \mathrm{CB}(\mathbf{p}, y) = \frac{1}{E_{n_{y}}}\mathcal{L}(\mathbf{p}, y) = \frac{1-\beta}{1-\beta^{n_{y}}}\mathcal{L}(\mathbf{p}, y) $$ Where $n_{y}$ is the number of samples in the ground-truth class $y$. $\beta = 0$ corresponds to no re-weighting and $\beta \to 1$ corresponds to re-weighing by inverse class frequency.

The proposed class-balanced term is model-agnostic and loss-agnostic, indicating its independence from the choice of loss function $L$ and predicted class probabilities $p$.

Class-Balanced Softmax Cross-Entropy Loss

$$ {\rm CB}_{\rm softmax}({\bf z}, y) = - \frac{1 - \beta}{1 - \beta^{n_y}} \log{\left( \frac{\exp{(z_y)}}{\sum_{j=1}^{C} \exp{(z_j)}} \right)} $$

Class-Balanced Sigmoid Cross-Entropy Loss

$$ {\rm CB}_{\rm sigmoid}({\bf z}, y) = - \frac{1 - \beta}{1 - \beta^{n_y}} \sum_{i=1}^{C} \log{\left(\frac{1}{1+\exp{(-z_{i}^{t})}} \right)} $$

Class-Balanced Focal Loss

$$ {\rm CB}_{\rm focal}({\bf z}, y) = \frac{1 - \beta}{1 - \beta^{n_y}} \sum_{i=1}^{C} (1 - p_{i}^{t})^{\gamma} \log{(p_{i}^{t})} $$