Time Series

Misc

AKA Time Series Classification
Types
- Euclidean Distance Approaches
- Dynamic Time Warping
- Shapelet-based
- Kernel-based
- Feature-based
Papers
- Time-series clustering – A decade review (2015)
- Comparing Clustering Approaches for Smart Meter Time Series: Investigating the Influence of Dataset Properties on Performance
  - Tested on synthetic, high-resolution time series data where the generation process was informed by the literature and domain experts.
  - Dynamic Time Warping and k-Sliding Distance algorithms consistently outperformed traditional approaches
  - When combined with hierarchical clustering using Ward’s linkage, these methods demonstrated consistent robustness across varying dataset characteristics without careful parameter tuning
- Clustering methods for Categorical Time Series and Sequences : A scoping review (2025)
  - Provides feature comparison tables for distance, feature, and model based methods with the following columns:
    - Name, Subfamily, Dependency Order, Continuous Time. Various Lengths, Missing Data, Multi Variable, Covariates, Main Data Type, and Data Volume

Preprocessing

High resoulution time series are typically noisy, so smoothing before clustering may produce better clusters.
Standardization or min-max normalization if your data is bounded

{dtwclust}

See vignette appendices from examples

Workflow

Pick one of each:
1. Clustering Method
  - Partitional
  - Hierarchical
  - TADPole
  - k-Shape
  - Fuzzy
2. Distance Algorithm (See Association, Time Series >> Distances)
  - Dynamic Time Warping (DTW)
  - Soft-DTW
  - Slope Based Distance (SBD)
  - Triangular Global Alignment Kernel (TGAK)
3. Prototype Method
  - Partition Around Medoids (PAM)
  - DTW Barycenter Averaging (DBA)
  - Shape Extraction
  - Fuzzy-Based
Not all methods, distances, prototypes are interchangeable with each other
Not all methods, distances, prototypes handle multivariate series or unequal length series
- Multivariate series are mentioned throughout
- I think this means comparing 2 different dataframes of time series but not sure (maybe examples in appendices)
The documentation for each function specifies if they use parallelization and how, but in general,
- All distances use multi-threading (RcppParallel) (see TADPole method for more details on options)
- Multi-Processing (foreach) is leveraged during clustering but no further parallelization on each node
Grid Search
- compare_clusterings
- Provides a way to efficiently grid search different settings for a clustering method or multiple methods (I think)

Clustering Methods

In addition to what’s described below, validity measures in the next section can also be used to choose the best number of clusters
Hierarchical
- hierarchical_control through tsclust( type = "hierarchical")
- Memory size needs to be a consideration. Depending on resources, might be limited to smaller datasets
- Creates a hierarchy of groups in which, as the level in the hierarchy increases, clusters are created by merging the clusters from the next lower level, such that an ordered sequence of groupings is obtained.
- A (dis)similarity measure (linkage) between groups should be specified, in addition to the one that is used to calculate pairwise similarities.
- A specific number of clusters does not need to be specified for the hierarchy to be created, and the procedure is deterministic, so it will always give the same result for a chosen set of (dis)similarity measures
- If data are easily grouped, the type of linkage used doesn’t make much of difference
- Dendrogram - a binary tree where the height of each node is proportional to the value of the inter-group dissimilarity between its two daughter nodes
  - Visually evaluate the dendrogram in order to assess the height at which the largest change in dissimilarity occurs, consequently cutting the dendrogram at said height and extracting the clusters that are created
    - 3rd level or 4th level from the top is where the disparities in segment length between the daughters start to even out
Partitional
- partitional_control through tsclust(type = "partitional")
- Number of clusters, k, needs to specified beforehand
  - Sometimes during iterations, clusters become empty, and instability created. Try lower value of k or different distance method
- Stochastic due to their random start. Thus, it is common practice to test different random starts to evaluate several local optima and choose the best result out of all the repetitions.
- Tends to produce spherical clusters, but has a lower complexity, so it can be applied to very large datasets
- Steps
  1. k centroids are randomly initialized, usually by choosing k objects from the dataset at random; these are assigned to individual clusters.
  2. The distance between all objects in the data and all centroids is calculated, and each object is assigned to the cluster of its closest centroid.
  3. A prototyping function is applied to each cluster to update the corresponding centroid.
  4. Distances and centroids are updated iteratively until a certain number of iterations have elapsed, or no object changes clusters any more
TADPole
- TADPole or tadpole_control through tsclust(type= "tadpole")
- Requires series of equal length (uses Sakoe-Chiba window)
- Memory size needs to be a consideration. Depending on resources, might be limited to smaller datasets
- Utilizes DTW lower bound method so lb_keogh or lb_improved needs to be specified
  - In order to use euclidean distance for the upper bound, symmetric1 step pattern needs to specified
- dc: cutoff distance(s). Can be a vector with several values
  - Anything below this distance is a neighbor
  - No details on how to determine the value(s) to use, may need to look at original paper
- Deterministic like hierarchical
- Uses series from the data as centroids (like PAM)
- Uses parallelization with certain specifications and will utilize all available threads
  - Use RcppParallel::setThreadOptions( to adjust unless using {doParellel}
k-Shape
- partitional_control through tsclust(type = "partitional")
- Stochastic see the different component setting descriptions for more details
- Settings
  - Partitional as the method
  - SBD as the distance measure
  - Shape extraction as the centroid function
  - z -normalization as the preprocessing step
Fuzzy
- fuzzy_control through tsclust(type = "fuzzy" )
- Soft clustering method (like Gaussian Mixture Models)
  - A series can belong to more than one cluster
    - A percentage of belongedness to each cluster

Prototypes/Average Series/Centroids

Defines a time-series that effectively summarizes the most important characteristics of all series in a given cluster
Prototypes could be used for visualization aid or to perform time-series classification
Methods
- Partition Around Medoids (PAM)
  - A series in the data whose average distance to all other objects in the same cluster is minimal
  - Possible to calculate the distance matrix once and use it for each iteration. Takes less compute time and is done by default
    - partitional_control(pam.precompute = TRUE)
    - Mentions that if done this way then the matrix is allocated to memory. So can be an issue for large datasets and the option would need to switched off.
      - When switched off, sparse matrices are used.
        
        partitional_control(pam.precompute = FALSE, pam.sparse=TRUE)
- DTW Barycenter Averaging (DBA)
  - DBA
  - With series of different lengths, either symmetric1 or symmetric2 should be used
  - Expensive unless used in conjunction with the DTW distance algorithm
  - Steps
    1. For each cluster, a series in the data is chosen at random to be the reference series (or centroid)
    2. DTW alignments are calculated wrt to the reference series
    3. For each point on the reference series, the point values from the other series that map to that point are averaged.
    4. All the calculated average points become the new reference series.
    5. Process is iterated until convergence or number of iterations is reached
- Shape Extraction
  - shape_extraction
  - Requires z-normalized in preprocessing argument
  - If lengths unequal, a reference series is chosen
  - Steps
    1. 2 series are aligned based maximizing a NCCc score
      - If unequal lengths zeros are append to the shorter series or the longer series is truncated (required for next step
    2. Aligned series are entered row-wise into a matrix and the Rayleigh Quotient (::shrugs::) is maximized to obtain the final prototype
- Fuzzy-based
  - C-Means
    - Requires series of equal lengths
    - Centroid is like a weighted average using the cluster “belongedness” weight
      \[ \mu_{c,i} = \frac{\sum_{p=1}^N u_{p,c}^m x_{p,i}}{\sum_{p=1}^N u_{p,c}^m} \]
      - \(\mu\): The i^th element of the c^th centroid
      - \(x\): The i^th element of the p^th series in the data
      - \(u\): “Belongedness” proportion of the p^th series to the c^th cluster
      - \(m\): The fuzziness exponent and should be greater than 1, with a common default value of 2
  - C-Medoids (FCMdd)
    - Centroid selection similar to PAM
    - Handles series of unequal lengths (as long as distance method permits)
      \[ \mu_c = x_q \\ q = \arg \min \sum_{p=1}^N u_{p,c}^m d_{p,c}^2 \]
      - \(\mu\): The c^th centroid
      - \(x_q\):The q^th series in the data
      - \(u\): “Belongedness” proportion of the p^th series to the c^th cluster
      - \(d\): Represents the distance between the p^th series of the data and the c^th centroid
      - See vignette for details on the optimization procedure