Time Series

Misc

  • AKA Time Series Classification
  • Methods
    • Euclidean Distance Approaches
    • Dynamic Time Warping
    • Shapelet-based
    • Kernel-based
    • Feature-based
  • Resources
  • High resoulution time series are typically noisy, so smoothing before clustering may produce better clusters.

Cluster Validity Indices (CVI)

  • Notes from {dtwclust} vignette
  • Types
    • For either Hard (aka Crisp) or Soft (aka Fuzzy) Partitioning
    • Characteristics
      • Internal - Try to define a measure of cluster purity
      • External - Compare the obtained partition to the correct one. Thus, external CVIs can only be used if the ground truth is known
      • Relative - ?
    • Scores
      • Available through dtwclust::cvi
      • Global centroid is one that’s computed using the whole dataset
        • dtwclust implements whichever distance method that originally used in the clustering computation
      • Some CVIs require symmetric distance functions (distance from a to b = distance b to a)
        • A warning is printed if an asymmetric distance method was used
      • {clue} - Compare repetitions of non-deterministic clustering methods (e.g. partitional) where random element means you get a different result each time
        • It uses a measure called “dissimilarities using minimal Euclidean membership distance” to compare different runs of a cluster method

{dtwclust}

  • See vignette appendices from examples

Workflow

  • Pick one of each:
    1. Clustering Method
      • Partitional
      • Hierarchical
      • TADPole
      • k-Shape
      • Fuzzy
    2. Distance Algorithm (See Association, Time Series >> Distances)
      • Dynamic Time Warping (DTW)
      • Soft-DTW
      • Slope Based Distance (SBD)
      • Triangular Global Alignment Kernel  (TGAK)
    3. Prototype Method
      • Partition Around Medoids (PAM)
      • DTW Barycenter Averaging (DBA)
      • Shape Extraction
      • Fuzzy-Based
  • Not all methods, distances, prototypes are interchangeable with each other
  • Not all methods, distances, prototypes handle multivariate series or unequal length series
    • Multivariate series are mentioned throughout
    • I think this means comparing 2 different dataframes of time series but not sure (maybe examples in appendices)
  • The documentation for each function specifies if they use parallelization and how, but in general,
    • All distances use multi-threading (RcppParallel) (see TADPole method for more details on options)
    • Multi-Processing (foreach) is leveraged during clustering but no further parallelization on each node
  • Grid Search
    • compare_clusterings
    • Provides a way to efficiently grid search different settings for a clustering method or multiple methods (I think)

Clustering Methods

  • In addition to what’s described below, validity measures in the next section can also be used to choose the best number of clusters
  • Hierarchical
    • hierarchical_control through tsclust( type = "hierarchical")
    • Memory size needs to be a consideration. Depending on resources, might be limited to smaller datasets
    • Creates a hierarchy of groups in which, as the level in the hierarchy increases, clusters are created by merging the clusters from the next lower level, such that an ordered sequence of groupings is obtained.
    • A (dis)similarity measure (linkage) between groups should be specified, in addition to the one that is used to calculate pairwise similarities.
    • A specific number of clusters does not need to be specified for the hierarchy to be created, and the procedure is deterministic, so it will always give the same result for a chosen set of (dis)similarity measures
    • If data are easily grouped, the type of linkage used doesn’t make much of difference
    • Dendrogram - a binary tree where the height of each node is proportional to the value of the inter-group dissimilarity between its two daughter nodes
      • Visually evaluate the dendrogram in order to assess the height at which the largest change in dissimilarity occurs, consequently cutting the dendrogram at said height and extracting the clusters that are created

        • 3rd level or 4th level from the top is where the disparities in segment length between the daughters start to even out
  • Partitional
    • partitional_control through tsclust(type = "partitional")
    • Number of clusters, k, needs to specified beforehand
      • Sometimes during iterations, clusters become empty, and instability created. Try lower value of k or different distance method
    • Stochastic due to their random start. Thus, it is common practice to test different random starts to evaluate several local optima and choose the best result out of all the repetitions.
    • Tends to produce spherical clusters, but has a lower complexity, so it can be applied to very large datasets
    • Steps
      1. k centroids are randomly initialized, usually by choosing k objects from the dataset at random; these are assigned to individual clusters.
      2. The distance between all objects in the data and all centroids is calculated, and each object is assigned to the cluster of its closest centroid.
      3. A prototyping function is applied to each cluster to update the corresponding centroid.
      4. Distances and centroids are updated iteratively until a certain number of iterations have elapsed, or no object changes clusters any more
  • TADPole
    • TADPole or tadpole_control through tsclust(type= "tadpole")
    • Requires series of equal length (uses Sakoe-Chiba window)
    • Memory size needs to be a consideration. Depending on resources, might be limited to smaller datasets
    • Utilizes DTW lower bound method so lb_keogh or lb_improved needs to be specified
      • In order to use euclidean distance for the upper bound, symmetric1 step pattern needs to specified
    • dc:  cutoff distance(s). Can be a vector with several values
      • Anything below this distance is a neighbor
      • No details on how to determine the value(s) to use, may need to look at original paper
    • Deterministic like hierarchical
    • Uses series from the data as centroids (like PAM)
    • Uses parallelization with certain specifications and will utilize all available threads
      • Use RcppParallel::setThreadOptions( to adjust unless using {doParellel}
  • k-Shape
    • partitional_control through tsclust(type = "partitional")
    • Stochastic see the different component setting descriptions for more details
    • Settings
      • Partitional as the method
      • SBD as the distance measure
      • Shape extraction as the centroid function
      • z -normalization as the preprocessing step
  • Fuzzy
    • fuzzy_control through tsclust(type = "fuzzy" )
    • Soft clustering method (like Gaussian Mixture Models)
      • A series can belong to more than one cluster
        • A percentage of belongedness to each cluster

Prototypes/Average Series/Centroids

  • Defines a time-series that effectively summarizes the most important characteristics of all series in a given cluster
  • Prototypes could be used for visualization aid or to perform time-series classification
  • Methods
    • Partition Around Medoids (PAM)
      • A series in the data whose average distance to all other objects in the same cluster is minimal
      • Possible to calculate the distance matrix once and use it for each iteration. Takes less compute time and is done by default
        • partitional_control(pam.precompute = TRUE)
        • Mentions that if done this way then the matrix is allocated to memory. So can be an issue for large datasets and the option would need to switched off.
          • When switched off, sparse matrices are used.
            • partitional_control(pam.precompute = FALSE, pam.sparse=TRUE)
    • DTW Barycenter Averaging (DBA)
      • DBA
      • With series of different lengths, either symmetric1 or symmetric2 should be used
      • Expensive unless used in conjunction with the DTW distance algorithm
      • Steps
        1. For each cluster, a series in the data is chosen at random to be the reference series (or centroid)
        2. DTW alignments are calculated wrt to the reference series
        3. For each point on the reference series, the point values from the other series that map to that point are averaged.
        4. All the calculated average points become the new reference series.
        5. Process is iterated until convergence or number of iterations is reached
    • Shape Extraction
      • shape_extraction
      • Requires z-normalized in preprocessing argument
      • If lengths unequal, a reference series is chosen
      • Steps
        1. 2 series are aligned based maximizing a NCCc score
          • If unequal lengths zeros are append to the shorter series or the longer series is truncated (required for next step
        2. Aligned series are entered row-wise into a matrix and the Rayleigh Quotient (::shrugs::) is maximized to obtain the final prototype
    • Fuzzy-based
      • C-Means
        • Requires series of equal lengths
        • Centroid is like a weighted average using the cluster “belongedness” weight
          \[ \mu_{c,i} = \frac{\sum_{p=1}^N u_{p,c}^m x_{p,i}}{\sum_{p=1}^N u_{p,c}^m} \]
          • \(\mu\): The ith element of the cth centroid
          • \(x\): The ith element of the pth series in the data
          • \(u\): “Belongedness” proportion of the pth series to the cth cluster
          • \(m\): The fuzziness exponent and should be greater than 1, with a common default value of 2
      • C-Medoids (FCMdd)
        • Centroid selection similar to PAM
        • Handles series of unequal lengths (as long as distance method permits)
          \[ \mu_c = x_q \\ q = \arg \min \sum_{p=1}^N u_{p,c}^m d_{p,c}^2 \]
          • \(\mu\): The cth centroid
          • \(x_q\):The qth series in the data
          • \(u\): “Belongedness” proportion of the pth series to the cth cluster
          • \(d\): Represents the distance between the pth series of the data and the cth centroid
          • See vignette for details on the optimization procedure