biosciences module

biosciences.affinity module

pyrocs.biosciences.affinity.affinity(data: ~numpy.array | ~pandas.core.frame.DataFrame, weights=None, to_bool=<class 'bool'>) array

Returns the affinity between all pairs of columns in binary data.

This metric evaluates the likelihood of two species to co-occur, by evaluating the log odds ratio. Unlike other co-occurrence formulations, the affinity model is insensitive to the relative prevalence of the two species. The equation for Affinity is based on the formulation in [MSSF22].

\[\alpha = \log((p_1/(1-p_1))/ (p_2/(1-p_2)))\]

where \(\alpha\) is the affinity, \(p_1\) and \(p_2\) are the probability of species 1 and species 2 respectively

The normalization of each species probability by its complement (i.e., \(1-p\)) results in a binary implementation of affinity within this software.

Parameters:
  • data – array or dataframe

  • weights – (optional) float or array

  • to_bool – boolean type

Returns:

affinity between columns in data

biosciences.functional_redundancy module

pyrocs.biosciences.functional_redundancy.functional_redundancy(p: array, delta: array) float

This metric evaluates how interchangeable groups within a population are based on the specific function they perform. As a biological concept, functional redundancy reflects the extent to which different species within a community have the same ecological role.

The equation within the package follows the formulation given in [RDBM+16].

\[\begin{split}R &= 1-(Q/D) \\ &\text{where} \\ Q &= \sum_i(p_i*(\sum_j(p_j*δ_{ij})) \\ D &= \sum_i(p_i*(1-p_i))\end{split}\]

Args:

pnp.array

Relative abundances p[i] (i = 1, 2,…,N) with 0 < p[i] ≤ 1 and where the constraint 0 < p[i] means that all calculations involve only those species that are actually present in the assemblage with nonzero abundances.

deltanp.array

\(δ_{ij}\) symmetric array of pairwise functional dissimilarities between species i and j

Returns:

FRfloat

Functional Redundancy Score

biosciences.hill_diversity module

pyrocs.biosciences.hill_diversity.hill_diversity(p: array, q: float) float

The Hill Numbers are a family of diversity metrics describing “effective number of species”.

For intuition, consider a distribution over N species, but only K of them have a “significant” share of the distribution, the remaining species together having a “small” share of the distribution. The simple count of number of species, N, is sensitive to the presence of individuals from rare species. However, a formula which somehow discards or discounts rare species, returning a value close to K, is more robust and better reflects the number of important species.

To understand how the Hill Number achieves the above property, consider that “number of species” and “mean probability” have an inverse relationship: N = 1/p. Therefore, a way to compute the “effective number of species”, is to first compute an “effective mean probability” and then return the inverse. Hill Numbers compute a mean probability using the generalized power mean, weighted by the probabilities themselves. Using the probabilities as weights discounts rare (i.e. low probability) species. Since the power mean is parameterized, using different parameter values generates a family of Hill Numbers.

The equations for the set of Hill metrics are based on the formulation in [RDW21].

Hill Diversity:

\[H_q = (\sum p_i^q)^{1/(1-q)}\]

where \(p_i\) is the proportion of all individuals that belong to species \(i\), \(q\) is the exponent that determines the rarity scale on which the mean is taken

Parameters:
  • p – p[i] is the proportion of all individuals that belong to species i,

  • q – The exponent that determines the rarity scale on which the mean is taken. Species richness (q=0), Hill-Simpson diversity (q=2), Hill-Shannon diversity (q=1),

Returns:

a metric for effective count of species (diversity)

Return type:

D

pyrocs.biosciences.hill_diversity.hill_shannon(p: array) float

The Hill-Shannon number is a specific instance (i.e. the Perplexity) of Hill Diversity, which prioritizes neither common nor rare species.

The use of the geometric mean captures the proportional difference from the mean of extreme values (rather than the absolute values). The equation for the Hill-Simpson based on the formulation in [RDW21].

Hill Shannon (Perplexity):

\[H_q=e^{-\sum(p_i*\ln(p_i)}\]

where \(q\) approaches \(1\) and the mean is the geometric mean

Parameters:

p – p[i] is the proportion of all individuals that belong to species i

Returns:

A metric for effective count of species (diversity)

pyrocs.biosciences.hill_diversity.hill_simpson(p: array) float

The Hill-Simpson number is a specific instance (i.e. the Inverse Simpson Index) of Hill Diversity that prioritizes the common species. The use of an arithmetic mean gives more weight to more frequently occurring species. The equation for the Hill-Simpson based on the formulation in [RDW21].

Hill Simpson (Inverse Simpson Index):

\[H_2 = 1/\sum p_i^2\]

where \(q=2\) and the mean is the usual arithmetic mean

Parameters:

p – p[i] is the proportion of all individuals that belong to species i

Returns:

A metric for effective count of species (diversity)