Package 'TDAvec'

Title: Vector summaries of persistence diagrams
Description: Provides tools for computing various vector summaries of persistence diagrams studied in Topological Data Analysis. For improved computational efficiency, all code for the vector summaries is written in 'C++' using the 'Rcpp' and 'RcppArmadillo' packages
Authors: Umar Islambekov, Aleksei Luchinsky
Maintainer: Umar Islambekov <[email protected]>
License: MIT License
Version: 0.1.3
Built: 2025-02-19 02:40:50 UTC
Source: https://github.com/uislambekov/tdavec

Help Index


Compute Algebraic Functions from a Persistence Diagram

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computeAlgebraicFunctions() computes the following four algebraic functions based on the birth and death values:

  • f1=ibi(dibi).f_1=\sum_i b_i (d_i - b_i).

  • f2=i(dmaxdi)(dibi)f_2=\sum_i (d_{\max} - d_i) (d_i - b_i), where dmax=max(di)d_{\max} = \max(d_i).

  • f3=ibi2(dibi)4f_3=\sum_i b_i^2 (d_i - b_i)^4.

  • f4=i(dmaxdi)2(dibi)4f_4=\sum_i (d_{\max} - d_i)^2 (d_i - b_i)^4.

Points in DD with infinite death values are ignored.

Usage

computeAlgebraicFunctions(D, homDim)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

Details

The function extracts rows from D where the first column equals homDim, and computes the four algebraic functions based on the filtered data. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A (named) numeric vector (f1,f2,f3,f4)(f_1,f_2,f_3,f_4).

Author(s)

Umar Islambekov

References

1. Adcock, A., Carlsson, E. and Carlsson, G., 2013. The ring of algebraic functions on persistence bar codes. Homology, Homotopy Appl., 18:381–402, 2016.

2. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Compute algebraic functions for homological dimension H_0
computeAlgebraicFunctions(D, homDim = 0)

# Compute algebraic functions for homological dimension H_1
computeAlgebraicFunctions(D, homDim = 1)

A Vector Summary of the Betti Curve

Description

For a given persistence diagram D=(bi,di)i=1ND={(b_i,d_i)}_{i=1}^N (corresponding to a specified homological dimension), computeBettiCurve() vectorizes the Betti curve

β(t)=i=1N1[bi,di)(t)\beta(t)=\sum_{i=1}^N\bold 1_{[b_i,d_i)}(t)

based on a scale sequence scaleSeq. The evaluation method depends on the argument evaluate.

Usage

computeBettiCurve(D, homDim, scaleSeq, evaluate = "intervals")

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

scaleSeq

a numeric vector of increasing scale values used for vectorization.

evaluate

a character string indicating the evaluation method. Must be either "intervals" (default) or "points".

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data and scaleSeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A numeric vector containing elements computed using scaleSeq={t1,t2,,tn}\{t_1,t_2,\ldots,t_n\} according to the method specified by evaluate.

  • "intervals": Computes average values of the Betti curve over intervals defined by consecutive elements in scaleSeq:

    (1Δt1t1t2β(t)dt,1Δt2t2t3β(t)dt,,1Δtn1tn1tnβ(t)dt)Rn1,\Big(\frac{1}{\Delta t_1}\int_{t_1}^{t_2}\beta(t)dt,\frac{1}{\Delta t_2}\int_{t_2}^{t_3}\beta(t)dt,\ldots,\frac{1}{\Delta t_{n-1}}\int_{t_{n-1}}^{t_n}\beta(t)dt\Big)\in\mathbb{R}^{n-1},

    where Δtk=tk+1tk\Delta t_k=t_{k+1}-t_k.

  • "points": Computes values of the Betti curve at each point in scaleSeq:

    (β(t1),β(t2),,β(tn))Rn.(\beta(t_1),\beta(t_2),\ldots,\beta(t_n))\in\mathbb{R}^n.

Author(s)

Umar Islambekov, Hasani Pathirana

References

1. Chazal, F., & Michel, B. (2021). An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists. Frontiers in Artificial Intelligence, 108.

2. Chung, Y. M., & Lawson, A. (2022). Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1), 1-42.

3. Islambekov, U., & Pathirana. H. (2024). Vector Summaries of Persistence Diagrams for Permutation-based Hypothesis Testing. Foundations of Data Science 6 (1), 41-61.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values

# Compute a vector summary of the Betti curve for homological dimension H_0
computeBettiCurve(D, homDim = 0, scaleSeq)

# Compute a vector summary of the Betti curve for homological dimension H_1
computeBettiCurve(D, homDim = 1, scaleSeq)

Compute Complex Polynomial Coefficients from a Persistence Diagram

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computeComplexPolynomial() computes the coefficients of a complex polynomial

CX(z)=i=1N[zX(bi,di)],C_X(z) = \prod_{i=1}^N [z - X(b_i,d_i)],

where X:R2CX:\mathbb{R}^2\rightarrow\mathbb{C} is any one of the following three functions:

  • R(x,y)=x+iy.R(x,y)=x+iy.

  • S(x,y)={yxα2(x+iy)if (x,y)(0,0)0otherwise.S(x, y) = \begin{cases} \frac{y - x}{\alpha \sqrt{2}}(x+iy) & \text{if } (x, y) \neq (0, 0) \\ 0 & \text{otherwise.} \end{cases}

  • T(x,y)=yx2[(cosαsinα)+i(cosα+sinα)]T(x, y) = \frac{y - x}{2} [(\cos\alpha - \sin\alpha)+i(\cos\alpha + \sin\alpha)],

where α=x2+y2\alpha=\sqrt{x^2+y^2}. Points in DD with infinite death values are ignored.

Usage

computeComplexPolynomial(D, homDim, m = 1, polyType = "R")

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

m

the number of coefficients to return (default is 1). Must be less than or equal to the number of points in the diagram.

polyType

a string specifying the polynomial type to use. Options are "R" (default), "S", or "T".

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data and polyType. If D does not contain any points corresponding to homDim, a matrix of zeros is returned.

Value

A numeric matrix of dimension m×2m \times 2. Each row corresponds to a coefficient of the polynomial CX(z)C_X(z):

  • Column 1: Real part of the coefficient.

  • Column 2: Imaginary part of the coefficient.

Author(s)

Umar Islambekov

References

1. Ferri, M. and Landi, C., (1999). Representing size functions by complex polynomials. Proc. Math. Met. in Pattern Recognition, 9, pp.16-19.

2. Di Fabio, B. and Ferri, M., (2015). Comparing persistence diagrams through complex vectors. In Image Analysis and Processing—ICIAP 2015: 18th International Conference, Genoa, Italy, September 7-11, 2015, Proceedings, Part I 18 (pp. 294-305). Springer International Publishing.

3. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Compute the first 5 coefficients of the complex polynomial of type "R" for homological dimension H_0
computeComplexPolynomial(D, homDim = 0, m = 5, polyType = "R")

# Compute the first 5 coefficients of the complex polynomial of type "S" for homological dimension H_0
computeComplexPolynomial(D, homDim = 0, m = 5, polyType = "S")

A Vector Summary of the Euler Characteristic Curve

Description

Vectorizes the Euler characteristic curve

χ(t)=k=0d(1)kβk(t),\chi(t)=\sum_{k=0}^d (-1)^k\beta_k(t),

where β0,β1,,βd\beta_0,\beta_1,\ldots,\beta_d are the Betti curves corresponding to persistence diagrams D0,D1,,DdD_0,D_1,\ldots,D_d of dimensions 0,1,,d0,1,\ldots,d respectively, all computed from the same filtration. The evaluation method depends on the argument evaluate.

Usage

computeEulerCharacteristic(D, scaleSeq, maxhomDim = -1, evaluate = "intervals")

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

scaleSeq

a numeric vector of increasing scale values used for vectorization.

maxhomDim

the maximum homological dimension considered (0 for H0H_0, 1 for H1H_1, etc.). If set to -1 (default), it uses the maximum dimension found in D.

evaluate

a character string indicating the evaluation method. Must be either "intervals" (default) or "points".

Value

A numeric vector containing elements computed using scaleSeq={t1,t2,,tn}\{t_1,t_2,\ldots,t_n\} according to the method specified by evaluate. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

  • "intervals": Computes average values of the Euler characteristic curve over intervals defined by consecutive elements in scaleSeq:

    (1Δt1t1t2χ(t)dt,1Δt2t2t3χ(t)dt,,1Δtn1tn1tnχ(t)dt)Rn1,\Big(\frac{1}{\Delta t_1}\int_{t_1}^{t_2}\chi(t)dt,\frac{1}{\Delta t_2}\int_{t_2}^{t_3}\chi(t)dt,\ldots,\frac{1}{\Delta t_{n-1}}\int_{t_{n-1}}^{t_n}\chi(t)dt\Big)\in\mathbb{R}^{n-1},

    where Δtk=tk+1tk\Delta t_k=t_{k+1}-t_k.

  • "points": Computes values of the Euler characteristic curve at each point in scaleSeq:

    (χ(t1),χ(t2),,χ(tn))Rn.(\chi(t_1),\chi(t_2),\ldots,\chi(t_n))\in\mathbb{R}^n.

Author(s)

Umar Islambekov

References

1. Richardson, E., & Werman, M. (2014). Efficient classification using the Euler characteristic. Pattern Recognition Letters, 49, 99-106.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values

# Compute a vector summary of the Euler characteristic curve
computeEulerCharacteristic(D, scaleSeq)

Compute the Extreme Values of Birth, Death, and Persistence Across Multiple Persistence Diagrams

Description

Given a list of persistence diagrams, computeLimits() computes the extreme values of birth, death, and persistence across all diagrams for a specified homological dimension. Points with infinite death values are ignored.

Usage

computeLimits(Dlist, homDim)

Arguments

Dlist

a list of persistence diagrams, where each diagram is a matrix containing three columns representing homological dimension, birth, and death values.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows of the diagrams are filtered based on this value.

Value

A (named) numeric vector containing:

  • minB: the minimum birth value across all diagrams.

  • maxB: the maximum birth value across all diagrams.

  • maxD: the maximum death value across all diagrams.

  • minP: the minimum persistence value across all diagrams.

  • maxP: the maximum persistence value across all diagrams.

Author(s)

Umar Islambekov

Examples

set.seed(123)
N <- 100 # The size of point clouds
nD <- 50 # The number of persistence diagrams
Dlist <- list()
for (i in 1:nD){
  # sample N points uniformly from the unit circle and add Gaussian noise
  theta <- runif(N, min = 0, max = 2 * pi)
  X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

  # Compute the persistence diagram using the Rips filtration built on top of X
  # The 'threshold' parameter specifies the maximum distance for building simplices
  Dlist[[i]] <- TDAstats::calculate_homology(X, threshold = 2)
}

# Compute the extreme values of birth, death, and persistence across all 50 diagrams for homological dimension H_0
computeLimits(Dlist, homDim = 0)

# Compute the extreme values of birth, death, and persistence across all 50 diagrams for homological dimension H_1
computeLimits(Dlist, homDim = 1)

A Vector Summary of the Normalized Life Curve

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computeNormalizedLife() vectorizes the normalized life curve

sl(t)=i=1NdibiL1[bi,di)(t),sl(t)=\sum_{i=1}^N \frac{d_i-b_i}{L}\bold{1}_{[b_i,d_i)}(t),

where L=i=1N(dibi)L=\sum_{i=1}^N (d_i-b_i), based on a scale sequence scaleSeq. The evaluation method depends on the argument evaluate. Points in DD with infinite death values are ignored.

Usage

computeNormalizedLife(D, homDim, scaleSeq, evaluate = "intervals")

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

scaleSeq

a numeric vector of increasing scale values used for vectorization.

evaluate

a character string indicating the evaluation method. Must be either "intervals" (default) or "points".

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data and scaleSeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A numeric vector containing elements computed using scaleSeq={t1,t2,,tn}\{t_1,t_2,\ldots,t_n\} according to the method specified by evaluate.

  • "intervals": Computes average values of the normalized life curve over intervals defined by consecutive elements in scaleSeq:

    (1Δt1t1t2sl(t)dt,1Δt2t2t3sl(t)dt,,1Δtn1tn1tnsl(t)dt)Rn1,\Big(\frac{1}{\Delta t_1}\int_{t_1}^{t_2}sl(t)dt,\frac{1}{\Delta t_2}\int_{t_2}^{t_3}sl(t)dt,\ldots,\frac{1}{\Delta t_{n-1}}\int_{t_{n-1}}^{t_n}sl(t)dt\Big)\in\mathbb{R}^{n-1},

    where Δtk=tk+1tk\Delta t_k=t_{k+1}-t_k.

  • "points": Computes values of the normalized life curve at each point in scaleSeq:

    (sl(t1),sl(t2),,sl(tn))Rn.(sl(t_1),sl(t_2),\ldots,sl(t_n))\in\mathbb{R}^n.

Author(s)

Umar Islambekov

References

Chung, Y. M., & Lawson, A. (2022). Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1), 1-42.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values

# Compute a vector summary of the normalized life curve for homological dimension H_0
computeNormalizedLife(D, homDim = 0, scaleSeq)

# Compute a vector summary of the normalized life curve for homological dimension H_1
computeNormalizedLife(D, homDim = 1, scaleSeq)

A Vector Summary of the Persistence Block

Description

For a given persistence diagram D={(bi,pi)}i=1ND=\{(b_i,p_i)\}_{i=1}^N (corresponding to a specified homological dimension), computePersistenceBlock() vectorizes the persistence block

f(x,y)=i=1N1E(bi,pi)(x,y),f(x,y)=\sum_{i=1}^N \bold 1_{E(b_i,p_i)}(x,y),

where E(bi,pi)=[biλi2,bi+λi2]×[piλi2,pi+λi2]E(b_i,p_i)=[b_i-\frac{\lambda_i}{2},b_i+\frac{\lambda_i}{2}]\times [p_i-\frac{\lambda_i}{2},p_i+\frac{\lambda_i}{2}] and λi=2τpi\lambda_i=2\tau p_i with τ(0,1]\tau\in (0,1]. Points in DD with infinite persistence values are ignored.

Usage

computePersistenceBlock(D, homDim, xSeq, ySeq, tau=0.3)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and persistence values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

xSeq

a numeric vector of increasing x (birth) values used for vectorization.

ySeq

a numeric vector of increasing y (persistence) values used for vectorization.

tau

a parameter (between 0 and 1) controlling block sizes. Default is tau=0.3.

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data, xSeq and ySeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A numeric vector whose elements are the weighted averages of the persistence block computed over each cell of the two-dimensional grid constructred from xSeq={x1,x2,,xn}\{x_1,x_2,\ldots,x_n\} and ySeq={y1,y2,,ym}\{y_1,y_2,\ldots,y_m\}:

(1Δx1Δy1[x1,x2]×[y1,y2]f(x,y)wdA,,1Δxn1Δym1[xn1,xn]×[ym1,ym]f(x,y)wdA)Rd,\Big(\frac{1}{\Delta x_1\Delta y_1}\int_{[x_1,x_2]\times [y_1,y_2]}f(x,y)wdA,\ldots,\frac{1}{\Delta x_{n-1}\Delta y_{m-1}}\int_{[x_{n-1},x_n]\times [y_{m-1},y_m]}f(x,y)wdA\Big)\in\mathbb{R}^{d},

where d=(n1)(m1)d=(n-1)(m-1), wdA=(x+y)dxdywdA=(x+y)dxdy, Δxk=xk+1xk\Delta x_k=x_{k+1}-x_k and Δyj=yj+1yj.\Delta y_j=y_{j+1}-y_j.

If homDim=0 and all the birth values are equal (e.g., zero), univariate persistence block functions are used instead for vectorization:

(1Δy1y1y2f(y)ydy,,1Δym1ym1ymf(y)ydy)Rm1,\Big(\frac{1}{\Delta y_1}\int_{y_1}^{y_2}f(y)ydy,\ldots,\frac{1}{\Delta y_{m-1}}\int_{y_{m-1}}^{y_m}f(y)ydy\Big)\in\mathbb{R}^{m-1},

where f(y)=i=1N1[piλi2,pi+λi2](y)f(y)=\sum_{i=1}^N \bold 1_{[p_i-\frac{\lambda_i}{2},p_i+\frac{\lambda_i}{2}]}(y) and Δyj=yj+1yj.\Delta y_j=y_{j+1}-y_j.

Author(s)

Umar Islambekov, Aleksei Luchinsky

References

1. Chan, K. C., Islambekov, U., Luchinsky, A., & Sanders, R. (2022). A computationally efficient framework for vector representation of persistence diagrams. Journal of Machine Learning Research 23, 1-33.

Examples

N <- 100 # The number of points to sample
set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Switch from the birth-death to the birth-persistence coordinates
D[,3] <- D[,3] - D[,2]
colnames(D)[3] <- "Persistence"

# Construct one-dimensional grid of scale values
ySeqH0 <- unique(quantile(D[D[,1] == 0, 3], probs = seq(0, 1, by = 0.2)))
tau <- 0.3 # Parameter in [0,1] which controls the size of blocks around each point of the diagram

# Compute a vector summary of the persistence block for homological dimension H_0
computePersistenceBlock(D, homDim = 0, xSeq = NA, ySeq = ySeqH0, tau = tau)

xSeqH1 <- unique(quantile(D[D[,1] == 1, 2], probs = seq(0, 1, by = 0.2)))
ySeqH1 <- unique(quantile(D[D[,1] == 1, 3], probs = seq(0, 1, by = 0.2)))

# Compute a vector summary of the persistence block for homological dimension H_1
computePersistenceBlock(D, homDim = 1, xSeq = xSeqH1, ySeq = ySeqH1, tau = tau)

A Vector Summary of the Persistence Surface

Description

For a given persistence diagram D={(bi,pi)}i=1ND=\{(b_i,p_i)\}_{i=1}^N (corresponding to a specified homological dimension), computePersistenceImage() computes the persistence image - a vector summary of the persistence surface:

ρ(x,y)=i=1Nf(bi,pi)ϕ(bi,pi)(x,y),\rho(x,y)=\sum_{i=1}^N f(b_i,p_i)\phi_{(b_i,p_i)}(x,y),

where ϕ(bi,pi)(x,y)\phi_{(b_i,p_i)}(x,y) is the Gaussian distribution with mean (bi,pi)(b_i,p_i) and covariance matrix σ2I2×2\sigma^2 I_{2\times 2} and

f(b,p)=w(p)={0p0p/pmax0<p<pmax1ppmaxf(b,p) = w(p)=\left\{ \begin{array}{ll} 0 & \quad p\leq 0 \\ p/p_{max} & \quad 0<p<p_{max}\\ 1& \quad p\geq p_{max} \end{array} \right.

is the weighting function with pmaxp_{max} being the maximum persistence value among all persistence diagrams considered in the experiment. Points in DD with infinite persistence values are ignored.

Usage

computePersistenceImage(D, homDim, xSeq, ySeq, sigma)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and persistence values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

xSeq

a numeric vector of increasing x (birth) values used for vectorization.

ySeq

a numeric vector of increasing y (persistence) values used for vectorization.

sigma

a standard deviation (σ\sigma) of the Gaussian.

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data, xSeq and ySeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A numeric vector whose elements are the average values of the persistence surface computed over each cell of the two-dimensional grid constructred from xSeq={x1,x2,,xn}\{x_1,x_2,\ldots,x_n\} and ySeq={y1,y2,,ym}\{y_1,y_2,\ldots,y_m\}:

(1Δx1Δy1[x1,x2]×[y1,y2]ρ(x,y)dA,,1Δxn1Δym1[xn1,xn]×[ym1,ym]ρ(x,y)dA)Rd,\Big(\frac{1}{\Delta x_1\Delta y_1}\int_{[x_1,x_2]\times [y_1,y_2]}\rho(x,y)dA,\ldots,\frac{1}{\Delta x_{n-1}\Delta y_{m-1}}\int_{[x_{n-1},x_n]\times [y_{m-1},y_m]}\rho(x,y)dA\Big)\in\mathbb{R}^{d},

where d=(n1)(m1)d=(n-1)(m-1), dA=dxdydA=dxdy, Δxk=xk+1xk\Delta x_k=x_{k+1}-x_k and Δyj=yj+1yj.\Delta y_j=y_{j+1}-y_j.

If homDim=0 and all the birth values are equal (e.g., zero), univariate Gaussians are used instead for vectorization:

(1Δy1y1y2ρ(y)dy,,1Δym1ym1ymρ(y)dy)Rm1,\Big(\frac{1}{\Delta y_1}\int_{y_1}^{y_2}\rho(y)dy,\ldots,\frac{1}{\Delta y_{m-1}}\int_{y_{m-1}}^{y_m}\rho(y)dy\Big)\in\mathbb{R}^{m-1},

where ρ(y)=i=1Nf(pi)ϕpi(y)\rho(y)=\sum_{i=1}^N f(p_i)\phi_{p_i}(y) and Δyj=yj+1yj.\Delta y_j=y_{j+1}-y_j.

Author(s)

Umar Islambekov

References

1. Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., ... & Ziegelmeier, L. (2017). Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18.

Examples

N <- 100 # The number of points to sample
set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Switch from the birth-death to the birth-persistence coordinates
D[,3] <- D[,3] - D[,2]
colnames(D)[3] <- "Persistence"

resB <- 5 # Resolution (or grid size) along the birth axis
resP <- 5 # Resolution (or grid size) along the persistence axis

# Compute persistence image for homological dimension H_0
minPH0 <- min(D[D[,1]==0,3]); maxPH0 <- max(D[D[,1]==0,3])
ySeqH0 <- seq(minPH0, maxPH0, length.out = resP + 1)
sigma <- 0.5 * (maxPH0 - minPH0) / resP
computePersistenceImage(D, homDim = 0, xSeq = NA, ySeq = ySeqH0, sigma = sigma)

# Compute persistence image for homological dimension H_1
minBH1 <- min(D[D[,1]==1,2]); maxBH1 <- max(D[D[,1]==1,2])
minPH1 <- min(D[D[,1]==1,3]); maxPH1 <- max(D[D[,1]==1,3])
xSeqH1 <- seq(minBH1, maxBH1, length.out = resB + 1)
ySeqH1 <- seq(minPH1, maxPH1, length.out = resP + 1)
sigma <- 0.5 * (maxPH1 - minPH1) / resP
computePersistenceImage(D, homDim = 1, xSeq = xSeqH1, ySeq = ySeqH1, sigma = sigma)

Vector Summaries of the Persistence Landscape Functions

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computePersistenceLandscape() vectorizes the first kk persistence landscape functions

λj(t)=jmax1iNΛi(t),j=1,,k,\lambda_j(t) = j\hbox{max}_{1\leq i \leq N} \Lambda_i(t), \quad j=1,\dots,k,

where jmaxj\hbox{max} returns the jjth largest value and

Λi(t)={tbit[bi,bi+di2]ditt(bi+di2,di]0otherwise\Lambda_i(t) = \left\{ \begin{array}{ll} t-b_i & \quad t\in [b_i,\frac{b_i+d_i}{2}] \\ d_i-t & \quad t\in (\frac{b_i+d_i}{2},d_i]\\ 0 & \quad \hbox{otherwise} \end{array} \right.

based on a scale sequence scaleSeq. For generalized persistence landscape functions, we instead take

Λi(t)={dibi2Kh(0)Kh(tbi+di2)for tbi+di2h10otherwise\Lambda_i(t) = \left\{ \begin{array}{ll} \frac{d_i-b_i}{2K_h(0)}K_h(t-\frac{b_i+d_i}{2}) & \hbox{for } |\frac{t-\frac{b_i+d_i}{2}}{h}| \leq 1 \\ 0 & \hbox{otherwise} \end{array} \right.

where Kh()K_h(\cdot) is a kernel function with the bandwidth parameter h.h.

Usage

computePersistenceLandscape(D, homDim, scaleSeq, k = 1, generalized = FALSE, kernel = "triangle", h = NULL)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

scaleSeq

a numeric vector of increasing scale values used for vectorization.

k

an integer specifying the number of persistence landscape functions to consider (default is 1). Must be less than or equal to the number of points in the diagram.

generalized

a logical value indicating whether to use a generalized persistence landscape or not. Default is FALSE.

kernel

a string specifying the kernel type to use if generalized = TRUE. Options are "triangle", "epanechnikov", or "tricubic". Default is "triangle".

h

a positive numeric value specifying the bandwidth for the kernel. Must be provided if generalized = TRUE.

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data and scaleSeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned. If generalized = TRUE, three different kernel functions are currently supported:

  • Triangle kernel: Kh(t)=1hmax(0,1th)K_h(t) = \frac{1}{h} \max(0, 1 - \frac{|t|}{h})

  • Epanechnikov kernel: Kh(t)=34hmax(0,1t2h2)K_h(t) = \frac{3}{4h} \max(0, 1 - \frac{t^2}{h^2})

  • Tricubic kernel: Kh(t)=7081hmax(0,(1t3h3)3)K_h(t) = \frac{70}{81h} \max(0, (1 - \frac{|t|^3}{h^3})^3)

Value

A matrix where the jjth column contains the values of the jjth order persistence landscape function evaluated at each point of scaleSeq={t1,t2,,tn}\{t_1,t_2,\ldots,t_n\}:

[λ1(t1)λ2(t1)λk(t1)λ1(t2)λ2(t2)λk(t2)λ1(tn)λ2(tn)λk(tn)]\begin{bmatrix} \lambda_1(t_1) & \lambda_2(t_1) & \cdots & \lambda_k(t_1) \\ \lambda_1(t_2) & \lambda_2(t_2) & \cdots & \lambda_k(t_2)\\ \vdots & \vdots& \ddots & \vdots \\ \lambda_1(t_n) & \lambda_2(t_n) & \cdots & \lambda_k(t_n) \end{bmatrix}

Author(s)

Umar Islambekov

References

1. Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16(1), 77-102.

2. Chazal, F., Fasy, B. T., Lecci, F., Rinaldo, A., & Wasserman, L. (2014, June). Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the thirtieth annual symposium on Computational geometry (pp. 474-483).

3. Berry, E., Chen, Y. C., Cisewski-Kehe, J., & Fasy, B. T. (2020). Functional summaries of persistence diagrams. Journal of Applied and Computational Topology, 4(2):211–262.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values

# Compute vector summaries of the first three persistence landscape functions for homological dimension H_1
computePersistenceLandscape(D, homDim = 1, scaleSeq, k = 3)

# Compute vector summaries of the first three generalized persistence landscape functions (with triangle kernel and bandwidth h=0.2) for homological dimension H_1
computePersistenceLandscape(D, homDim = 1, scaleSeq, generalized = TRUE, k = 3, h = 0.2)

A Vector Summary of the Persistence Silhouette Function

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computePersistenceSilhouette() vectorizes the ppth power persistence silhouette function

ϕp(t)=i=1NdibipΛi(t)i=1Ndibip,\phi_p(t) = \frac{\sum_{i=1}^N |d_i-b_i|^p\Lambda_i(t)}{\sum_{i=1}^N |d_i-b_i|^p},

where

Λi(t)={tbit[bi,bi+di2]ditt(bi+di2,di]0otherwise\Lambda_i(t) = \left\{ \begin{array}{ll} t-b_i & \quad t\in [b_i,\frac{b_i+d_i}{2}] \\ d_i-t & \quad t\in (\frac{b_i+d_i}{2},d_i]\\ 0 & \quad \hbox{otherwise} \end{array} \right.

based on a scale sequence scaleSeq. The evaluation method depends on the argument evaluate. Points in DD with infinite death values are ignored.

Usage

computePersistenceSilhouette(D, homDim, scaleSeq, p = 1.0, evaluate = "intervals")

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

scaleSeq

a numeric vector of increasing scale values used for vectorization.

p

power of the weights for the silhouette function. By default, p=1.

evaluate

a character string indicating the evaluation method. Must be either "intervals" (default) or "points".

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data and scaleSeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A numeric vector containing elements computed using scaleSeq={t1,t2,,tn}\{t_1,t_2,\ldots,t_n\} according to the method specified by evaluate.

  • "intervals": Computes average values of the persistence silhouette function over intervals defined by consecutive elements in scaleSeq:

    (1Δt1t1t2ϕp(t)dt,1Δt2t2t3ϕp(t)dt,,1Δtn1tn1tnϕp(t)dt)Rn1,\Big(\frac{1}{\Delta t_1}\int_{t_1}^{t_2}\phi_p(t)dt,\frac{1}{\Delta t_2}\int_{t_2}^{t_3}\phi_p(t)dt,\ldots,\frac{1}{\Delta t_{n-1}}\int_{t_{n-1}}^{t_n}\phi_p(t)dt\Big)\in\mathbb{R}^{n-1},

    where Δtk=tk+1tk\Delta t_k=t_{k+1}-t_k.

  • "points": Computes values of the persistence silhouette function at each point in scaleSeq:

    (ϕp(t1),ϕp(t2),,ϕp(tn))Rn.(\phi_p(t_1),\phi_p(t_2),\ldots,\phi_p(t_n))\in\mathbb{R}^n.

Author(s)

Umar Islambekov

References

1. Chazal, F., Fasy, B. T., Lecci, F., Rinaldo, A., & Wasserman, L. (2014). Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the thirtieth annual symposium on Computational geometry (pp. 474-483).

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values

# Compute a vector summary of the persistence silhouette (with p=1) for homological dimension H_0
computePersistenceSilhouette(D, homDim = 0, scaleSeq)

# Compute a vector summary of the persistence silhouette (with p=1) for homological dimension H_1
computePersistenceSilhouette(D, homDim = 1, scaleSeq)

A Vector Summary of the Persistent Entropy Summary Function

Description

For a given persistence diagram D=(bi,di)i=1ND={(b_i,d_i)}_{i=1}^N (corresponding to a specified homological dimension), computePersistentEntropy() vectorizes the persistent entropy summary function

S(t)=i=1NliLlog2(liL)1[bi,di](t),S(t)=-\sum_{i=1}^N \frac{l_i}{L}\log_2{(\frac{l_i}{L}})\bold 1_{[b_i,d_i]}(t),

where li=dibil_i=d_i-b_i and L=i=1NliL=\sum_{i=1}^Nl_i, based on a scale sequence scaleSeq. If N=1N=1, LL is set to 1. The evaluation method depends on the argument evaluate. Points in DD with infinite death values are ignored.

Usage

computePersistentEntropy(D, homDim, scaleSeq evaluate = "intervals")

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

scaleSeq

a numeric vector of increasing scale values used for vectorization.

evaluate

a character string indicating the evaluation method. Must be either "intervals" (default) or "points".

Details

The function extracts rows from D where the first column equals homDim, and computes values based on the filtered data and scaleSeq. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A numeric vector containing elements computed using scaleSeq={t1,t2,,tn}\{t_1,t_2,\ldots,t_n\} according to the method specified by evaluate.

  • "intervals": Computes average values of the persistent entropy summary function over intervals defined by consecutive elements in scaleSeq:

    (1Δt1t1t2S(t)dt,1Δt2t2t3S(t)dt,,1Δtn1tn1tnS(t)dt)Rn1,\Big(\frac{1}{\Delta t_1}\int_{t_1}^{t_2}S(t)dt,\frac{1}{\Delta t_2}\int_{t_2}^{t_3}S(t)dt,\ldots,\frac{1}{\Delta t_{n-1}}\int_{t_{n-1}}^{t_n}S(t)dt\Big)\in\mathbb{R}^{n-1},

    where Δtk=tk+1tk\Delta t_k=t_{k+1}-t_k.

  • "points": Computes values of the persistent entropy summary function at each point in scaleSeq:

    (S(t1),S(t2),,S(tn))Rn.(S(t_1),S(t_2),\ldots,S(t_n))\in\mathbb{R}^n.

Author(s)

Umar Islambekov

References

1. Atienza, N., Gonzalez-Díaz, R., & Soriano-Trigueros, M. (2020). On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition, 107, 107509.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values

# Compute a vector summary of the persistent entropy summary function for homological dimension H_0
computePersistentEntropy(D, homDim = 0, scaleSeq)

# Compute a vector summary of the persistent entropy summary function for homological dimension H_1
computePersistentEntropy(D, homDim = 1, scaleSeq)

Compute Descriptive Statistics for Births, Deaths, Midpoints, and Lifespans in a Persistence Diagram

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computeStats() calculates descriptive statistics of the birth, death, midpoint (the average of birth and death), and lifespan (death minus birth) values. Additionally, it computes the total number of points and entropy of the lifespan values. Points in DD with infinite death values are ignored.

Usage

computeStats(D, homDim)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

Details

The function extracts rows from D where the first column equals homDim, and computes the mean, standard deviation, median, IQR (interquartile range), range, 10th, 25th, 75th and 90th percentiles of the birth, death, midpoint, lifespan (or persistence) values; the total number of bars (or points in the diagram) and the entropy of the lifespan values (-i=1NliLlog2(liL)\sum_{i=1}^N\frac{l_i}{L}\log_2(\frac{l_i}{L}), where li=dibil_i=d_i-b_i (lifespan) and L=i=1NliL=\sum_{i=1}^N l_i). If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A (named) 38-dimensional numeric vector containing:

  • mean_births, stddev_births, median_births, iqr_births, range_births, p10_births, p25_births, p75_births, p90_births: Descriptive statistics for birth values.

  • mean_deaths, stddev_deaths, median_deaths, iqr_deaths, range_deaths, p10_deaths, p25_deaths, p75_deaths, p90_deaths: Descriptive statistics for death values.

  • mean_midpoints, stddev_midpoints, median_midpoints, iqr_midpoints, range_midpoints, p10_midpoints, p25_midpoints, p75_midpoints, p90_midpoints: Descriptive statistics for midpoint values (mean of birth and death values).

  • mean_lifespans, stddev_lifespans, median_lifespans, iqr_lifespans, range_lifespans, p10_lifespans, p25_lifespans, p75_lifespans, p90_lifespans: Descriptive statistics for lifespan (or persistence) values (difference between death and birth values).

  • total_bars: The total number of points in the specified homological dimension.

  • entropy: The entropy of the lifespan values.

Author(s)

Umar Islambekov

References

1. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Compute statistics for homological dimension H_0
computeStats(D, homDim = 0)

# Compute statistics for homological dimension H_1
computeStats(D, homDim = 1)

Compute a Vectorization of a Persistence Diagram based on Tent Template Functions

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computeTemplateFunction() computes a vectorization using a collection of tent template functions defined by

G(b,p),δ(D)=i=1Nmax{0,11δmax(bib,pip)},G_{(b,p),\delta}(D) = \sum_{i=1}^N\max \left\{ 0, 1 - \frac{1}{\delta} \max \left( |b_i - b|, |p_i - p| \right) \right\},

where pi=dibip_i=d_i-b_i (persistence), b0b\geq 0, p>0p>0 and 0<δ<p0<\delta<p. The point (b,p)(b,p) is referred to as the center. Points in DD with infinite death values are ignored.

Usage

computeTemplateFunction(D, homDim, delta, d, epsilon)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

delta

a positive scalar representing the increment size used in the computation of the template function.

d

a positive integer specifying the number of bins along each axis in the grid.

epsilon

a positive scalar indicating the vertical shift applied to the grid.

Details

The function extracts rows from D where the first column equals homDim, and computes the tent template function on a discretized grid determined by delta, d, and epsilon. The number of tent functions is controlled by d. The value of δ\delta is chosen such that the box [0,δd]×[ϵ,δd+ϵ][0, \delta d] \times [\epsilon,\delta d + \epsilon] contains all the points of the diagrams considered in the birth-persistence plane. ϵ\epsilon should be smaller than the minimum persistence value across all the diagrams under consideration. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

If homDim=0 and all the birth values are equal (e.g., zero), one-dimensional tent template functions are used instead for vectorization:

Gp,δ(D)=i=1Nmax(0,1pipδ).G_{p,\delta}(D) = \sum_{i=1}^N\max (0, 1 - \frac{|p_i - p|}{\delta}).

Value

A numeric vector of dimension (d+1)d(d+1)d, containing the values of the tent template functions centered at the grid points {(δi,δj+ϵ)}i=0,j=1d,d\{(\delta i, \delta j + \epsilon)\}_{i=0,j=1}^{d,d}:

{G(δi,δj+ϵ),δ(D)0id,1jd}.\{ G_{(\delta i, \delta j + \epsilon), \delta}(D) \mid 0 \leq i \leq d, \, 1 \leq j \leq d \}.

When one-dimensional tent template functions are used, the returned vector has a dimension of dd:

{Gδj+ϵ,δ(D)1jd}.\{ G_{\delta j + \epsilon, \delta}(D) \mid 1 \leq j \leq d \}.

Author(s)

Umar Islambekov

References

1. Perea, J.A., Munch, E. and Khasawneh, F.A., (2023). Approximating continuous functions on persistence diagrams using template functions. Foundations of Computational Mathematics, 23(4), pp.1215-1272.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Compute a vectorizaton based on tent template functions for homological dimension H_0
computeTemplateFunction(D, homDim = 0, delta = 0.1, d = 20, epsilon = 0.01)

# Compute a vectorizaton based on tent template functions for homological dimension H_1
computeTemplateFunction(D, homDim = 1, delta = 0.1, d = 9, epsilon = 0.01)

Compute Tropical Coordinates from a Persistence Diagram

Description

For a given persistence diagram D={(bi,di)}i=1ND=\{(b_i,d_i)\}_{i=1}^N (corresponding to a specified homological dimension), computeTropicalCoordinates() computes the following seven tropical coordinates based on the lifespans (or persistence) λi=dibi\lambda_i = d_i - b_i:

  • F1=maxiλiF_1 = \max_i \lambda_i.

  • F2=maxi<j(λi+λj)F_2 = \max_{i<j} (\lambda_i+\lambda_j).

  • F3=maxi<j<k(λi+λj+λk)F_3 = \max_{i<j<k} (\lambda_i+\lambda_j+\lambda_k).

  • F4=maxi<j<k<l(λi+λj+λk+λl)F_4 = \max_{i<j<k<l} (\lambda_i+\lambda_j+\lambda_k+\lambda_l).

  • F5=iλiF_5 = \sum_i \lambda_i.

  • F6=imin(rλi,bi)F_6 = \sum_i \min(r \lambda_i, b_i), where rr is a positive integer.

  • F7=j(maxi(min(rλi,bi)+λi)(min(rλj,bj)+λj))F_7 = \sum_j \big(\max_i(\min(r \lambda_i, b_i)+\lambda_i) - (\min(r \lambda_j, b_j)+\lambda_j)\big).

Points in DD with infinite death values are ignored.

Usage

computeTropicalCoordinates(D, homDim, r = 1)

Arguments

D

a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively.

homDim

the homological dimension (0 for H0H_0, 1 for H1H_1, etc.). Rows in D are filtered based on this value.

r

a positive integer used to compute F6F_6 and F7F_7. Default is 1.

Details

The function extracts rows from D where the first column equals homDim, and computes the seven tropical coordinates based on the filtered data. If D does not contain any points corresponding to homDim, a vector of zeros is returned.

Value

A (named) numeric vector (F1,F2,F3,F4,F5,F6,F7)(F_1, F_2, F_3, F_4, F_5, F_6, F_7).

Author(s)

Umar Islambekov

References

1. Kališnik, S., (2019). Tropical coordinates on the space of persistence barcodes. Foundations of Computational Mathematics, 19(1), pp.101-129.

2. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Examples

N <- 100 # The number of points to sample

set.seed(123) # Set a random seed for reproducibility

# Sample N points uniformly from the unit circle and add Gaussian noise
theta <- runif(N, min = 0, max = 2 * pi)
X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2)

# Compute the persistence diagram using the Rips filtration built on top of X
# The 'threshold' parameter specifies the maximum distance for building simplices
D <- TDAstats::calculate_homology(X, threshold = 2)

# Compute tropical coordinates for homological dimension H_0
computeTropicalCoordinates(D, homDim = 0)

# Compute tropical coordinates for homological dimension H_1
computeTropicalCoordinates(D, homDim = 1)