<aside> 💡

If you have two probability distributions: P & Q, and if one of the distributions is ground truth then use cross-entropy. But if both P & Q are just some random distributions (i.e. not ground truth) then used KL Divergence.

</aside>

Representation / Similarity Learning


1. Supervised

2. Unsupervised Learning (better representations)

Motivation: For any down-stream task, we will be better off if we can learn better representations of the sample. With better representation, the MLP head can easily classify samples (above siamese n/w with BCE).

Untitled

Untitled

The blue line, the loss is positive until 1, after that point, its zero i.e. no points for pushing samples every further. The point 1 is controlled using alpha.