biosciences module

biosciences.affinity module

pyrocs.biosciences.affinity.affinity(data: ~numpy.ndarray, weights=None, to_bool=<class 'bool'>) → float

Returns the affinity between all pairs of columns in binary data.

This metric evaluates the likelihood of two species to co-occur, by evaluating the log odds ratio. Unlike other co-occurrence formulations, the affinity model is insensitive to the relative prevalence of the two species. The equation for Affinity is based on the formulation in [MSSF22].

α = \log ((p_{1} / (1 - p_{1})) / (p_{2} / (1 - p_{2})))

where $α$ is the affinity, $p_{1}$ and $p_{2}$ are the probability of species 1 and species 2 respectively

The normalization of each species probability by its complement (i.e., $1 - p$ ) results in a binary implementation of affinity within this software.

Parameters:

data (array) – Matrix of co-occurring variables
weights (optional array) – weights for each variable
to_bool – function or type to convert array values to boolean

Returns:

float

biosciences.functional_redundancy module

pyrocs.biosciences.functional_redundancy.functional_redundancy(p: ndarray, delta: ndarray) → float

This metric evaluates how interchangeable groups within a population are based on the specific function they perform. As a biological concept, functional redundancy reflects the extent to which different species within a community have the same ecological role.

The equation within the package follows the formulation given in [RDBM+16].

\begin{aligned} R & = 1 - (Q / D) \\ where \\ Q & = \sum_{i} (p_{i} * (\sum_{j} (p_{j} * δ_{i j})) \\ D & = \sum_{i} (p_{i} * (1 - p_{i})) \end{aligned}

Parameters:

p (array) – Relative abundances p[i] (i = 1, 2,…,N) with 0 < p[i] ≤ 1 and where the constraint 0 < p[i] means that all calculations involve only those species that are actually present in the assemblage with nonzero abundances.
delta (array) – $δ_{i j}$ symmetric array of pairwise functional dissimilarities between species i and j

Returns:

float

biosciences.hill_diversity module

pyrocs.biosciences.hill_diversity.hill_diversity(p: ndarray, q: float) → float

The Hill Numbers are a family of diversity metrics describing “effective number of species”.

For intuition, consider a distribution over N species, but only K of them have a “significant” share of the distribution, the remaining species together having a “small” share of the distribution. The simple count of number of species, N, is sensitive to the presence of individuals from rare species. However, a formula which somehow discards or discounts rare species, returning a value close to K, is more robust and better reflects the number of important species.

To understand how the Hill Number achieves the above property, consider that “number of species” and “mean probability” have an inverse relationship: N = 1/p. Therefore, a way to compute the “effective number of species”, is to first compute an “effective mean probability” and then return the inverse. Hill Numbers compute a mean probability using the generalized power mean, weighted by the probabilities themselves. Using the probabilities as weights discounts rare (i.e. low probability) species. Since the power mean is parameterized, using different parameter values generates a family of Hill Numbers.

The equations for the set of Hill metrics are based on the formulation in [RDW21].

Hill Diversity:

H_{q} = (\sum p_{i}^{q})^{1 / (1 - q)}

where $p_{i}$ is the proportion of all individuals that belong to species $i$ , $q$ is the exponent that determines the rarity scale on which the mean is taken

Parameters:

p (array) – p[i] is the proportion of all individuals that belong to species i,
q (float) – The exponent that determines the rarity scale on which the mean is taken. Species richness (q=0), Hill-Simpson diversity (q=2), Hill-Shannon diversity (q=1),

Returns:

float

pyrocs.biosciences.hill_diversity.hill_shannon(p: ndarray) → float

The Hill-Shannon number is a specific instance (i.e. the Perplexity) of Hill Diversity, which prioritizes neither common nor rare species.

The use of the geometric mean captures the proportional difference from the mean of extreme values (rather than the absolute values). The equation for the Hill-Simpson based on the formulation in [RDW21].

Hill Shannon (Perplexity):

H_{q} = e^{- \sum (p_{i} * \ln (p_{i})}

where $q$ approaches $1$ and the mean is the geometric mean

Parameters:: p (array) – p[i] is the proportion of all individuals that belong to species i
Returns:: float

pyrocs.biosciences.hill_diversity.hill_simpson(p: ndarray) → float

The Hill-Simpson number is a specific instance (i.e. the Inverse Simpson Index) of Hill Diversity that prioritizes the common species. The use of an arithmetic mean gives more weight to more frequently occurring species. The equation for the Hill-Simpson based on the formulation in [RDW21].

Hill Simpson (Inverse Simpson Index):

H_{2} = 1 / \sum p_{i}^{2}

where $q = 2$ and the mean is the usual arithmetic mean

Parameters:: p (array) – p[i] is the proportion of all individuals that belong to species i
Returns:: float