transformers_domain_adaptation.data_selection.metrics.similarity

Similiarity metrics for data selection introduced by Ruder and Plank.

The functions here were adapted and vectorized from those in the authors’ repo.

transformers_domain_adaptation.data_selection.metrics.similarity.jensen_shannon_similarity(repr1, repr2)[source]

Calculate similairty based on Jensen-Shannon divergence.

https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

Parameters:
  • repr1 (numpy.ndarray) –

  • repr2 (numpy.ndarray) –

Return type:

numpy.ndarray

transformers_domain_adaptation.data_selection.metrics.similarity.renyi_similarity(repr1, repr2, alpha=0.99)[source]

Calculate similarity based on Rényi divergence.

https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#R.C3.A9nyi_divergence

Parameters:
  • repr1 (numpy.ndarray) –

  • repr2 (numpy.ndarray) –

  • alpha (float) –

Return type:

numpy.ndarray

transformers_domain_adaptation.data_selection.metrics.similarity.cosine_similarity(repr1, repr2)[source]

Calculate cosine similarity (https://en.wikipedia.org/wiki/Cosine_similarity).

Parameters:
  • repr1 (numpy.ndarray) –

  • repr2 (numpy.ndarray) –

Return type:

numpy.ndarray

transformers_domain_adaptation.data_selection.metrics.similarity.euclidean_similarity(repr1, repr2)[source]

Calculate similarity based on Euclidean distance.

https://en.wikipedia.org/wiki/Euclidean_distance

Parameters:
  • repr1 (numpy.ndarray) –

  • repr2 (numpy.ndarray) –

Return type:

numpy.ndarray

transformers_domain_adaptation.data_selection.metrics.similarity.variational_similarity(repr1, repr2)[source]

Calculate similarity based on L1 / Manhattan distance.

https://en.wikipedia.org/wiki/Taxicab_geometry

Parameters:
  • repr1 (numpy.ndarray) –

  • repr2 (numpy.ndarray) –

Return type:

numpy.ndarray

transformers_domain_adaptation.data_selection.metrics.similarity.bhattacharyya_similarity(repr1, repr2)[source]

Calculate similarity based on Bhattacharyya distance.

https://en.wikipedia.org/wiki/Bhattacharyya_distance

Parameters:
  • repr1 (numpy.ndarray) –

  • repr2 (numpy.ndarray) –

Return type:

numpy.ndarray

transformers_domain_adaptation.data_selection.metrics.similarity.similarity_func_factory(metric)[source]

Return the corresponding similarity function based on the provided metric.

Parameters:

metric (str) – Similarity metric

Raises:

ValueError – If metric does not exist in SIMILARITY_FEATURES

Return type:

Callable[[numpy.ndarray, numpy.ndarray], numpy.ndarray]