Title: | Vector summaries of persistence diagrams |
---|---|
Description: | Provides tools for computing various vector summaries of persistence diagrams studied in Topological Data Analysis. For improved computational efficiency, all code for the vector summaries is written in 'C++' using the 'Rcpp' and 'RcppArmadillo' packages |
Authors: | Umar Islambekov, Aleksei Luchinsky |
Maintainer: | Umar Islambekov <[email protected]> |
License: | MIT License |
Version: | 0.1.3 |
Built: | 2025-02-19 02:40:50 UTC |
Source: | https://github.com/uislambekov/tdavec |
For a given persistence diagram (corresponding to a specified homological dimension),
computeAlgebraicFunctions()
computes the following four algebraic functions based on the birth and death values:
, where
.
.
.
Points in with infinite death values are ignored.
computeAlgebraicFunctions(D, homDim)
computeAlgebraicFunctions(D, homDim)
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
The function extracts rows from D
where the first column equals homDim
, and computes the four algebraic functions based on the filtered data. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A (named) numeric vector .
Umar Islambekov
1. Adcock, A., Carlsson, E. and Carlsson, G., 2013. The ring of algebraic functions on persistence bar codes. Homology, Homotopy Appl., 18:381–402, 2016.
2. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute algebraic functions for homological dimension H_0 computeAlgebraicFunctions(D, homDim = 0) # Compute algebraic functions for homological dimension H_1 computeAlgebraicFunctions(D, homDim = 1)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute algebraic functions for homological dimension H_0 computeAlgebraicFunctions(D, homDim = 0) # Compute algebraic functions for homological dimension H_1 computeAlgebraicFunctions(D, homDim = 1)
For a given persistence diagram (corresponding to a specified homological dimension),
computeBettiCurve()
vectorizes the Betti curve
based on a scale sequence scaleSeq
. The evaluation method depends on the argument evaluate
.
computeBettiCurve(D, homDim, scaleSeq, evaluate = "intervals")
computeBettiCurve(D, homDim, scaleSeq, evaluate = "intervals")
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
scaleSeq |
a numeric vector of increasing scale values used for vectorization. |
evaluate |
a character string indicating the evaluation method. Must be either |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data and scaleSeq
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A numeric vector containing elements computed using scaleSeq
= according to the method specified by
evaluate
.
"intervals"
: Computes average values of the Betti curve over intervals defined by consecutive elements in scaleSeq
:
where .
"points"
: Computes values of the Betti curve at each point in scaleSeq
:
Umar Islambekov, Hasani Pathirana
1. Chazal, F., & Michel, B. (2021). An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists. Frontiers in Artificial Intelligence, 108.
2. Chung, Y. M., & Lawson, A. (2022). Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1), 1-42.
3. Islambekov, U., & Pathirana. H. (2024). Vector Summaries of Persistence Diagrams for Permutation-based Hypothesis Testing. Foundations of Data Science 6 (1), 41-61.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the Betti curve for homological dimension H_0 computeBettiCurve(D, homDim = 0, scaleSeq) # Compute a vector summary of the Betti curve for homological dimension H_1 computeBettiCurve(D, homDim = 1, scaleSeq)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the Betti curve for homological dimension H_0 computeBettiCurve(D, homDim = 0, scaleSeq) # Compute a vector summary of the Betti curve for homological dimension H_1 computeBettiCurve(D, homDim = 1, scaleSeq)
For a given persistence diagram (corresponding to a specified homological dimension),
computeComplexPolynomial()
computes the coefficients of a complex polynomial
where is any one of the following three functions:
,
where . Points in
with infinite death values are ignored.
computeComplexPolynomial(D, homDim, m = 1, polyType = "R")
computeComplexPolynomial(D, homDim, m = 1, polyType = "R")
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
m |
the number of coefficients to return (default is 1). Must be less than or equal to the number of points in the diagram. |
polyType |
a string specifying the polynomial type to use. Options are "R" (default), "S", or "T". |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data and polyType
. If D
does not contain any points corresponding to homDim
, a matrix of zeros is returned.
A numeric matrix of dimension . Each row corresponds to a coefficient of the polynomial
:
Column 1: Real part of the coefficient.
Column 2: Imaginary part of the coefficient.
Umar Islambekov
1. Ferri, M. and Landi, C., (1999). Representing size functions by complex polynomials. Proc. Math. Met. in Pattern Recognition, 9, pp.16-19.
2. Di Fabio, B. and Ferri, M., (2015). Comparing persistence diagrams through complex vectors. In Image Analysis and Processing—ICIAP 2015: 18th International Conference, Genoa, Italy, September 7-11, 2015, Proceedings, Part I 18 (pp. 294-305). Springer International Publishing.
3. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute the first 5 coefficients of the complex polynomial of type "R" for homological dimension H_0 computeComplexPolynomial(D, homDim = 0, m = 5, polyType = "R") # Compute the first 5 coefficients of the complex polynomial of type "S" for homological dimension H_0 computeComplexPolynomial(D, homDim = 0, m = 5, polyType = "S")
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute the first 5 coefficients of the complex polynomial of type "R" for homological dimension H_0 computeComplexPolynomial(D, homDim = 0, m = 5, polyType = "R") # Compute the first 5 coefficients of the complex polynomial of type "S" for homological dimension H_0 computeComplexPolynomial(D, homDim = 0, m = 5, polyType = "S")
Vectorizes the Euler characteristic curve
where are the Betti curves corresponding to persistence diagrams
of dimensions
respectively, all computed from the same filtration. The evaluation method depends on the argument
evaluate
.
computeEulerCharacteristic(D, scaleSeq, maxhomDim = -1, evaluate = "intervals")
computeEulerCharacteristic(D, scaleSeq, maxhomDim = -1, evaluate = "intervals")
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
scaleSeq |
a numeric vector of increasing scale values used for vectorization. |
maxhomDim |
the maximum homological dimension considered (0 for |
evaluate |
a character string indicating the evaluation method. Must be either |
A numeric vector containing elements computed using scaleSeq
= according to the method specified by
evaluate
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
"intervals"
: Computes average values of the Euler characteristic curve over intervals defined by consecutive elements in scaleSeq
:
where .
"points"
: Computes values of the Euler characteristic curve at each point in scaleSeq
:
Umar Islambekov
1. Richardson, E., & Werman, M. (2014). Efficient classification using the Euler characteristic. Pattern Recognition Letters, 49, 99-106.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the Euler characteristic curve computeEulerCharacteristic(D, scaleSeq)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the Euler characteristic curve computeEulerCharacteristic(D, scaleSeq)
Given a list of persistence diagrams, computeLimits()
computes the extreme values of birth, death, and persistence across all diagrams for a specified homological dimension. Points with infinite death values are ignored.
computeLimits(Dlist, homDim)
computeLimits(Dlist, homDim)
Dlist |
a list of persistence diagrams, where each diagram is a matrix containing three columns representing homological dimension, birth, and death values. |
homDim |
the homological dimension (0 for |
A (named) numeric vector containing:
minB
: the minimum birth value across all diagrams.
maxB
: the maximum birth value across all diagrams.
maxD
: the maximum death value across all diagrams.
minP
: the minimum persistence value across all diagrams.
maxP
: the maximum persistence value across all diagrams.
Umar Islambekov
set.seed(123) N <- 100 # The size of point clouds nD <- 50 # The number of persistence diagrams Dlist <- list() for (i in 1:nD){ # sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices Dlist[[i]] <- TDAstats::calculate_homology(X, threshold = 2) } # Compute the extreme values of birth, death, and persistence across all 50 diagrams for homological dimension H_0 computeLimits(Dlist, homDim = 0) # Compute the extreme values of birth, death, and persistence across all 50 diagrams for homological dimension H_1 computeLimits(Dlist, homDim = 1)
set.seed(123) N <- 100 # The size of point clouds nD <- 50 # The number of persistence diagrams Dlist <- list() for (i in 1:nD){ # sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices Dlist[[i]] <- TDAstats::calculate_homology(X, threshold = 2) } # Compute the extreme values of birth, death, and persistence across all 50 diagrams for homological dimension H_0 computeLimits(Dlist, homDim = 0) # Compute the extreme values of birth, death, and persistence across all 50 diagrams for homological dimension H_1 computeLimits(Dlist, homDim = 1)
For a given persistence diagram (corresponding to a specified homological dimension),
computeNormalizedLife()
vectorizes the normalized life curve
where , based on a scale sequence
scaleSeq
. The evaluation method depends on the argument evaluate
. Points in with infinite death values are ignored.
computeNormalizedLife(D, homDim, scaleSeq, evaluate = "intervals")
computeNormalizedLife(D, homDim, scaleSeq, evaluate = "intervals")
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
scaleSeq |
a numeric vector of increasing scale values used for vectorization. |
evaluate |
a character string indicating the evaluation method. Must be either |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data and scaleSeq
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A numeric vector containing elements computed using scaleSeq
= according to the method specified by
evaluate
.
"intervals"
: Computes average values of the normalized life curve over intervals defined by consecutive elements in scaleSeq
:
where .
"points"
: Computes values of the normalized life curve at each point in scaleSeq
:
Umar Islambekov
Chung, Y. M., & Lawson, A. (2022). Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1), 1-42.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the normalized life curve for homological dimension H_0 computeNormalizedLife(D, homDim = 0, scaleSeq) # Compute a vector summary of the normalized life curve for homological dimension H_1 computeNormalizedLife(D, homDim = 1, scaleSeq)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the normalized life curve for homological dimension H_0 computeNormalizedLife(D, homDim = 0, scaleSeq) # Compute a vector summary of the normalized life curve for homological dimension H_1 computeNormalizedLife(D, homDim = 1, scaleSeq)
For a given persistence diagram (corresponding to a specified homological dimension),
computePersistenceBlock()
vectorizes the persistence block
where and
with
. Points in
with infinite persistence values are ignored.
computePersistenceBlock(D, homDim, xSeq, ySeq, tau=0.3)
computePersistenceBlock(D, homDim, xSeq, ySeq, tau=0.3)
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and persistence values respectively. |
homDim |
the homological dimension (0 for |
xSeq |
a numeric vector of increasing x (birth) values used for vectorization. |
ySeq |
a numeric vector of increasing y (persistence) values used for vectorization. |
tau |
a parameter (between 0 and 1) controlling block sizes. Default is |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data, xSeq
and ySeq
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A numeric vector whose elements are the weighted averages of the persistence block computed over each cell of the two-dimensional grid constructred from xSeq
= and
ySeq
=:
where ,
,
and
If homDim=0
and all the birth values are equal (e.g., zero), univariate persistence block functions are used instead for vectorization:
where and
Umar Islambekov, Aleksei Luchinsky
1. Chan, K. C., Islambekov, U., Luchinsky, A., & Sanders, R. (2022). A computationally efficient framework for vector representation of persistence diagrams. Journal of Machine Learning Research 23, 1-33.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" # Construct one-dimensional grid of scale values ySeqH0 <- unique(quantile(D[D[,1] == 0, 3], probs = seq(0, 1, by = 0.2))) tau <- 0.3 # Parameter in [0,1] which controls the size of blocks around each point of the diagram # Compute a vector summary of the persistence block for homological dimension H_0 computePersistenceBlock(D, homDim = 0, xSeq = NA, ySeq = ySeqH0, tau = tau) xSeqH1 <- unique(quantile(D[D[,1] == 1, 2], probs = seq(0, 1, by = 0.2))) ySeqH1 <- unique(quantile(D[D[,1] == 1, 3], probs = seq(0, 1, by = 0.2))) # Compute a vector summary of the persistence block for homological dimension H_1 computePersistenceBlock(D, homDim = 1, xSeq = xSeqH1, ySeq = ySeqH1, tau = tau)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" # Construct one-dimensional grid of scale values ySeqH0 <- unique(quantile(D[D[,1] == 0, 3], probs = seq(0, 1, by = 0.2))) tau <- 0.3 # Parameter in [0,1] which controls the size of blocks around each point of the diagram # Compute a vector summary of the persistence block for homological dimension H_0 computePersistenceBlock(D, homDim = 0, xSeq = NA, ySeq = ySeqH0, tau = tau) xSeqH1 <- unique(quantile(D[D[,1] == 1, 2], probs = seq(0, 1, by = 0.2))) ySeqH1 <- unique(quantile(D[D[,1] == 1, 3], probs = seq(0, 1, by = 0.2))) # Compute a vector summary of the persistence block for homological dimension H_1 computePersistenceBlock(D, homDim = 1, xSeq = xSeqH1, ySeq = ySeqH1, tau = tau)
For a given persistence diagram (corresponding to a specified homological dimension),
computePersistenceImage()
computes the persistence image - a vector summary of the persistence surface:
where is
the Gaussian distribution with mean
and
covariance matrix
and
is the weighting function with being the maximum persistence value among all persistence diagrams considered in the experiment. Points in
with infinite persistence values are ignored.
computePersistenceImage(D, homDim, xSeq, ySeq, sigma)
computePersistenceImage(D, homDim, xSeq, ySeq, sigma)
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and persistence values respectively. |
homDim |
the homological dimension (0 for |
xSeq |
a numeric vector of increasing x (birth) values used for vectorization. |
ySeq |
a numeric vector of increasing y (persistence) values used for vectorization. |
sigma |
a standard deviation ( |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data, xSeq
and ySeq
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A numeric vector whose elements are the average values of the persistence surface computed over each cell of the two-dimensional grid constructred from xSeq
= and
ySeq
=:
where ,
,
and
If homDim=0
and all the birth values are equal (e.g., zero), univariate Gaussians are used instead for vectorization:
where and
Umar Islambekov
1. Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., ... & Ziegelmeier, L. (2017). Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" resB <- 5 # Resolution (or grid size) along the birth axis resP <- 5 # Resolution (or grid size) along the persistence axis # Compute persistence image for homological dimension H_0 minPH0 <- min(D[D[,1]==0,3]); maxPH0 <- max(D[D[,1]==0,3]) ySeqH0 <- seq(minPH0, maxPH0, length.out = resP + 1) sigma <- 0.5 * (maxPH0 - minPH0) / resP computePersistenceImage(D, homDim = 0, xSeq = NA, ySeq = ySeqH0, sigma = sigma) # Compute persistence image for homological dimension H_1 minBH1 <- min(D[D[,1]==1,2]); maxBH1 <- max(D[D[,1]==1,2]) minPH1 <- min(D[D[,1]==1,3]); maxPH1 <- max(D[D[,1]==1,3]) xSeqH1 <- seq(minBH1, maxBH1, length.out = resB + 1) ySeqH1 <- seq(minPH1, maxPH1, length.out = resP + 1) sigma <- 0.5 * (maxPH1 - minPH1) / resP computePersistenceImage(D, homDim = 1, xSeq = xSeqH1, ySeq = ySeqH1, sigma = sigma)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Switch from the birth-death to the birth-persistence coordinates D[,3] <- D[,3] - D[,2] colnames(D)[3] <- "Persistence" resB <- 5 # Resolution (or grid size) along the birth axis resP <- 5 # Resolution (or grid size) along the persistence axis # Compute persistence image for homological dimension H_0 minPH0 <- min(D[D[,1]==0,3]); maxPH0 <- max(D[D[,1]==0,3]) ySeqH0 <- seq(minPH0, maxPH0, length.out = resP + 1) sigma <- 0.5 * (maxPH0 - minPH0) / resP computePersistenceImage(D, homDim = 0, xSeq = NA, ySeq = ySeqH0, sigma = sigma) # Compute persistence image for homological dimension H_1 minBH1 <- min(D[D[,1]==1,2]); maxBH1 <- max(D[D[,1]==1,2]) minPH1 <- min(D[D[,1]==1,3]); maxPH1 <- max(D[D[,1]==1,3]) xSeqH1 <- seq(minBH1, maxBH1, length.out = resB + 1) ySeqH1 <- seq(minPH1, maxPH1, length.out = resP + 1) sigma <- 0.5 * (maxPH1 - minPH1) / resP computePersistenceImage(D, homDim = 1, xSeq = xSeqH1, ySeq = ySeqH1, sigma = sigma)
For a given persistence diagram (corresponding to a specified homological dimension),
computePersistenceLandscape()
vectorizes the first persistence landscape functions
where returns the
th largest value and
based on a scale sequence scaleSeq
. For generalized persistence landscape functions, we instead take
where is a kernel function with the bandwidth parameter
computePersistenceLandscape(D, homDim, scaleSeq, k = 1, generalized = FALSE, kernel = "triangle", h = NULL)
computePersistenceLandscape(D, homDim, scaleSeq, k = 1, generalized = FALSE, kernel = "triangle", h = NULL)
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
scaleSeq |
a numeric vector of increasing scale values used for vectorization. |
k |
an integer specifying the number of persistence landscape functions to consider (default is 1). Must be less than or equal to the number of points in the diagram. |
generalized |
a logical value indicating whether to use a generalized persistence landscape or not. Default is FALSE. |
kernel |
a string specifying the kernel type to use if |
h |
a positive numeric value specifying the bandwidth for the kernel. Must be provided if |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data and scaleSeq
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned. If generalized = TRUE
, three different kernel functions are currently supported:
Triangle kernel:
Epanechnikov kernel:
Tricubic kernel:
A matrix where the th column contains the values of the
th order persistence landscape function evaluated at each point of
scaleSeq
=:
Umar Islambekov
1. Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16(1), 77-102.
2. Chazal, F., Fasy, B. T., Lecci, F., Rinaldo, A., & Wasserman, L. (2014, June). Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the thirtieth annual symposium on Computational geometry (pp. 474-483).
3. Berry, E., Chen, Y. C., Cisewski-Kehe, J., & Fasy, B. T. (2020). Functional summaries of persistence diagrams. Journal of Applied and Computational Topology, 4(2):211–262.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute vector summaries of the first three persistence landscape functions for homological dimension H_1 computePersistenceLandscape(D, homDim = 1, scaleSeq, k = 3) # Compute vector summaries of the first three generalized persistence landscape functions (with triangle kernel and bandwidth h=0.2) for homological dimension H_1 computePersistenceLandscape(D, homDim = 1, scaleSeq, generalized = TRUE, k = 3, h = 0.2)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute vector summaries of the first three persistence landscape functions for homological dimension H_1 computePersistenceLandscape(D, homDim = 1, scaleSeq, k = 3) # Compute vector summaries of the first three generalized persistence landscape functions (with triangle kernel and bandwidth h=0.2) for homological dimension H_1 computePersistenceLandscape(D, homDim = 1, scaleSeq, generalized = TRUE, k = 3, h = 0.2)
For a given persistence diagram (corresponding to a specified homological dimension),
computePersistenceSilhouette()
vectorizes the th power persistence silhouette function
where
based on a scale sequence scaleSeq
. The evaluation method depends on the argument evaluate
. Points in with infinite death values are ignored.
computePersistenceSilhouette(D, homDim, scaleSeq, p = 1.0, evaluate = "intervals")
computePersistenceSilhouette(D, homDim, scaleSeq, p = 1.0, evaluate = "intervals")
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
scaleSeq |
a numeric vector of increasing scale values used for vectorization. |
p |
power of the weights for the silhouette function. By default, |
evaluate |
a character string indicating the evaluation method. Must be either |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data and scaleSeq
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A numeric vector containing elements computed using scaleSeq
= according to the method specified by
evaluate
.
"intervals"
: Computes average values of the persistence silhouette function over intervals defined by consecutive elements in scaleSeq
:
where .
"points"
: Computes values of the persistence silhouette function at each point in scaleSeq
:
Umar Islambekov
1. Chazal, F., Fasy, B. T., Lecci, F., Rinaldo, A., & Wasserman, L. (2014). Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the thirtieth annual symposium on Computational geometry (pp. 474-483).
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the persistence silhouette (with p=1) for homological dimension H_0 computePersistenceSilhouette(D, homDim = 0, scaleSeq) # Compute a vector summary of the persistence silhouette (with p=1) for homological dimension H_1 computePersistenceSilhouette(D, homDim = 1, scaleSeq)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the persistence silhouette (with p=1) for homological dimension H_0 computePersistenceSilhouette(D, homDim = 0, scaleSeq) # Compute a vector summary of the persistence silhouette (with p=1) for homological dimension H_1 computePersistenceSilhouette(D, homDim = 1, scaleSeq)
For a given persistence diagram (corresponding to a specified homological dimension),
computePersistentEntropy()
vectorizes the persistent entropy summary function
where and
, based on a scale sequence
scaleSeq
. If ,
is set to 1. The evaluation method depends on the argument
evaluate
. Points in with infinite death values are ignored.
computePersistentEntropy(D, homDim, scaleSeq evaluate = "intervals")
computePersistentEntropy(D, homDim, scaleSeq evaluate = "intervals")
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
scaleSeq |
a numeric vector of increasing scale values used for vectorization. |
evaluate |
a character string indicating the evaluation method. Must be either |
The function extracts rows from D
where the first column equals homDim
, and computes values based on the filtered data and scaleSeq
. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A numeric vector containing elements computed using scaleSeq
= according to the method specified by
evaluate
.
"intervals"
: Computes average values of the persistent entropy summary function over intervals defined by consecutive elements in scaleSeq
:
where .
"points"
: Computes values of the persistent entropy summary function at each point in scaleSeq
:
Umar Islambekov
1. Atienza, N., Gonzalez-Díaz, R., & Soriano-Trigueros, M. (2020). On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition, 107, 107509.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the persistent entropy summary function for homological dimension H_0 computePersistentEntropy(D, homDim = 0, scaleSeq) # Compute a vector summary of the persistent entropy summary function for homological dimension H_1 computePersistentEntropy(D, homDim = 1, scaleSeq)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) scaleSeq = seq(0, 2, length.out = 11) # A sequence of scale values # Compute a vector summary of the persistent entropy summary function for homological dimension H_0 computePersistentEntropy(D, homDim = 0, scaleSeq) # Compute a vector summary of the persistent entropy summary function for homological dimension H_1 computePersistentEntropy(D, homDim = 1, scaleSeq)
For a given persistence diagram (corresponding to a specified homological dimension),
computeStats()
calculates descriptive statistics of the birth, death, midpoint (the average of birth and death), and lifespan (death minus birth) values. Additionally, it computes the total number of points and entropy of the lifespan values. Points in with infinite death values are ignored.
computeStats(D, homDim)
computeStats(D, homDim)
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
The function extracts rows from D
where the first column equals homDim
, and computes the mean, standard deviation, median, IQR (interquartile range), range, 10th, 25th, 75th and 90th percentiles of the birth, death, midpoint, lifespan (or persistence) values; the total number of bars (or points in the diagram) and the entropy of the lifespan values (-, where
(lifespan) and
). If
D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A (named) 38-dimensional numeric vector containing:
mean_births
, stddev_births
, median_births
, iqr_births
, range_births
, p10_births
, p25_births
, p75_births
, p90_births
: Descriptive statistics for birth values.
mean_deaths
, stddev_deaths
, median_deaths
, iqr_deaths
, range_deaths
, p10_deaths
, p25_deaths
, p75_deaths
, p90_deaths
: Descriptive statistics for death values.
mean_midpoints
, stddev_midpoints
, median_midpoints
, iqr_midpoints
, range_midpoints
, p10_midpoints
, p25_midpoints
, p75_midpoints
, p90_midpoints
: Descriptive statistics for midpoint values (mean of birth and death values).
mean_lifespans
, stddev_lifespans
, median_lifespans
, iqr_lifespans
, range_lifespans
, p10_lifespans
, p25_lifespans
, p75_lifespans
, p90_lifespans
: Descriptive statistics for lifespan (or persistence) values (difference between death and birth values).
total_bars
: The total number of points in the specified homological dimension.
entropy
: The entropy of the lifespan values.
Umar Islambekov
1. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute statistics for homological dimension H_0 computeStats(D, homDim = 0) # Compute statistics for homological dimension H_1 computeStats(D, homDim = 1)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute statistics for homological dimension H_0 computeStats(D, homDim = 0) # Compute statistics for homological dimension H_1 computeStats(D, homDim = 1)
For a given persistence diagram (corresponding to a specified homological dimension),
computeTemplateFunction()
computes a vectorization using a collection of tent template functions defined by
where (persistence),
,
and
. The point
is referred to as the center. Points in
with infinite death values are ignored.
computeTemplateFunction(D, homDim, delta, d, epsilon)
computeTemplateFunction(D, homDim, delta, d, epsilon)
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
delta |
a positive scalar representing the increment size used in the computation of the template function. |
d |
a positive integer specifying the number of bins along each axis in the grid. |
epsilon |
a positive scalar indicating the vertical shift applied to the grid. |
The function extracts rows from D
where the first column equals homDim
, and computes the tent template function on a discretized grid determined by delta
, d
, and epsilon
. The number of tent functions is controlled by d
. The value of is chosen such that the box
contains all the points of the diagrams considered in the birth-persistence plane.
should be smaller than the minimum persistence value across all the diagrams under consideration. If
D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
If homDim=0
and all the birth values are equal (e.g., zero), one-dimensional tent template functions are used instead for vectorization:
A numeric vector of dimension , containing the values of the tent template functions centered at the grid points
:
When one-dimensional tent template functions are used, the returned vector has a dimension of :
Umar Islambekov
1. Perea, J.A., Munch, E. and Khasawneh, F.A., (2023). Approximating continuous functions on persistence diagrams using template functions. Foundations of Computational Mathematics, 23(4), pp.1215-1272.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute a vectorizaton based on tent template functions for homological dimension H_0 computeTemplateFunction(D, homDim = 0, delta = 0.1, d = 20, epsilon = 0.01) # Compute a vectorizaton based on tent template functions for homological dimension H_1 computeTemplateFunction(D, homDim = 1, delta = 0.1, d = 9, epsilon = 0.01)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute a vectorizaton based on tent template functions for homological dimension H_0 computeTemplateFunction(D, homDim = 0, delta = 0.1, d = 20, epsilon = 0.01) # Compute a vectorizaton based on tent template functions for homological dimension H_1 computeTemplateFunction(D, homDim = 1, delta = 0.1, d = 9, epsilon = 0.01)
For a given persistence diagram (corresponding to a specified homological dimension),
computeTropicalCoordinates()
computes the following seven tropical coordinates based on the lifespans (or persistence) :
.
.
.
.
.
, where
is a positive integer.
.
Points in with infinite death values are ignored.
computeTropicalCoordinates(D, homDim, r = 1)
computeTropicalCoordinates(D, homDim, r = 1)
D |
a persistence diagram: a matrix with three columns containing the homological dimension, birth and death values respectively. |
homDim |
the homological dimension (0 for |
r |
a positive integer used to compute |
The function extracts rows from D
where the first column equals homDim
, and computes the seven tropical coordinates based on the filtered data. If D
does not contain any points corresponding to homDim
, a vector of zeros is returned.
A (named) numeric vector .
Umar Islambekov
1. Kališnik, S., (2019). Tropical coordinates on the space of persistence barcodes. Foundations of Computational Mathematics, 19(1), pp.101-129.
2. Ali, D., Asaad, A., Jimenez, M.J., Nanda, V., Paluzo-Hidalgo, E. and Soriano-Trigueros, M., (2023). A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute tropical coordinates for homological dimension H_0 computeTropicalCoordinates(D, homDim = 0) # Compute tropical coordinates for homological dimension H_1 computeTropicalCoordinates(D, homDim = 1)
N <- 100 # The number of points to sample set.seed(123) # Set a random seed for reproducibility # Sample N points uniformly from the unit circle and add Gaussian noise theta <- runif(N, min = 0, max = 2 * pi) X <- cbind(cos(theta), sin(theta)) + rnorm(2 * N, mean = 0, sd = 0.2) # Compute the persistence diagram using the Rips filtration built on top of X # The 'threshold' parameter specifies the maximum distance for building simplices D <- TDAstats::calculate_homology(X, threshold = 2) # Compute tropical coordinates for homological dimension H_0 computeTropicalCoordinates(D, homDim = 0) # Compute tropical coordinates for homological dimension H_1 computeTropicalCoordinates(D, homDim = 1)