information_theory module

information_theory.entropy module

pyrocs.information_theory.entropy.discrete_entropy(values: ndarray, counts: ndarray = None, base: int = 2) float

Entropy is often used to measure the state of disorder/randomness in a system. The general equation follows the form:

\[H = - \sum_{i=1}^N [p_i * \log p_i]\]

where \(H\) = entropy, \(p\) = discrete probability of the occurrence of an event from the \(i^{\mathrm{th}}\) category, and \(N\) is the total number of categories. Low entropy values indicate a higher state of disorder while higher entropy values indicate a well-ordered system. The maximum possible value of the entropy for a given system is \(log(N)\), and is thus varies by group size. Please see [Sha48] for more information.

The function assumes users will either input an array of values or counts of values. These are then normalized prior to calculating the entropy value. This metric builds on the entropy function within the scipy package (including exposure of the specific base). Various bases can selected based on user interests, including 2, 10, and e.

For more details about entropy, please consult the scipy documentation as well as the references noted above.

Parameters:
  • values (array) – Sequence of observed values from a random process

  • counts (array[int]) – Number of times each value was observed

  • base (int) – Base of returned entropy (default returns number of bits)

Returns:

float

information_theory.kl_divergence module

pyrocs.information_theory.kl_divergence.kl_divergence(p: ndarray, q: ndarray, base: int = 2) float

Sometimes called relative entropy, the Kullback-Leibler Divergence (KLD) measures the similarity between two distributions (one a sample and the other a reference). In contrast to the continuous version available in scipy, the formulation in this package uses a discrete form of the equation following [Jos21]:

\[D(p||q) = - \sum_{i=1}^N[p_i * \log (p_i/q_i)]\]

where \(D\) is the KLD value, \(N\) is the total number of categories, and \(p_i\) and \(q_i\) reflect the discrete probability of the occurrence of an event from the \(i^{\mathrm{th}}\) category of the sample distribution and reference distribution respectively.

The function is able to calculate KLD for cases where not all categories from the reference distribution are present within the sample distribution.

Parameters:
  • p (array) – discrete probability distribution

  • q (array) – discrete probability distribution

  • base (int) – log base to compute from; base 2 (bits), base 10 (decimal/whole numbers), or base e (ecology, earth systems)

Returns:

float

pyrocs.information_theory.kl_divergence.novelty_transience_resonance(thetas_arr: ndarray, window: int) tuple[ndarray]

These three related metrics extend the Kullback-Leibler Divergence formulation to consider how a distribution differs from past and future distributions within a sequence. Specifically, novelty aims to measure how “new” information within a distribution is relative to what you knew about the past sequence and transience focuses on how “new” current information based on what occurs in the future sequence. In contrast, and resonance reflects the “stickiness” of “new” topics between the past and the future; it is calculated by taking the difference between novelty and transience.

The equations for these calculations are sourced from Barron et al. [BHSD18].

\[\begin{split}N_w(p_i) &= (1/w)\sum(1 \leq k \leq w)[D(p_i || p_(i-k))]\\ T_w(p_i) &= (1/w)\sum(1 \leq k \leq w)[D(p_i || p_(i+k))]\\ R_w(p_i) &= N_w(p_i) - T_w(p_i)\end{split}\]

where \(N\) is novelty, \(T\) is transience, \(R\) is resonance, \(w\) is the number of distributions to use either in the past or the future, \(p\) is the proportion of entries that belong to the ith category, \(k\) is the window of interest, and \(D\) is the equation for the KLD.

Parameters:
  • thetas_arr (array) – rows are topic mixtures

  • window (int) – positive integer defining scale or scale size

Returns:

tuple(array) [novelties, transiences, resonances]

information_theory.mutual_info module

pyrocs.information_theory.mutual_info.mutual_info(x: ndarray, y: ndarray, counts: ndarray = None, base: int = 2) float

Mutual information measures how much knowledge is gained about one random variable when another is observed. It is also a measure of mutual dependence between the random variables.

The equation within the package follows the formulations from Cover and Thomas [CT05] using both individual and the joint entropies,

\[I(X;Y)=H(X)+H(Y)-H(X,Y)\]

where \(I(X;Y)\) is the mutual information of \(X\) and \(Y\), \(H(X)\) is the entropy for random variable \(X\) alone, \(H(Y)\) is the entropy for random variable \(Y\) alone, and \(H(X,Y)\) is the joint entropy across both \(X\) and \(Y\).

Mutual information ranges from 0 to the minimum of \((H(X),H(Y))\). Higher values indicate that more information is shared (i.e., mutual dependence is greater) between the two random variables, \(X\) and \(Y\). Thus, higher values of mutual information indicate that more information can be gained about one variable when the other is observed.

Parameters:
  • x (array) – discretized observations from random distribution x in X

  • y (array) – discretized observations from random distribution y in Y

  • counts (array[int]) – If present, the number of times each (x,y) pair was observed

  • base (int) – If present the base in which to return the entropy

Returns:

float