General

Misc

Also see
- Notebook, pgs 57-58
- Diagnostics, Clustering
Packages
- {clustord} - Cluster Ordinal Data via Proportional Odds or Ordered Stereotype
- {FPDclustering} - Probabilistic distance clustering (PD-clustering) is an iterative, distribution-free, probabilistic clustering method.
  - PD-clustering assigns units to a cluster according to their probability of membership under the constraint that the product of the probability and the distance of each point to any cluster center is a constant.
  - PD-clustering is a flexible method that can be used with elliptical clusters, outliers, or noisy data.
- {MSCA} - Clustering of Multiple Censored Time-to-Event Endpoints
  - Computes Jaccard distances between instances described by one or more records of events
- {anticlust} - Partitions a pool of elements into clusters (or anticlusters) with the goal of achieving high between-cluster similarity and high within-cluster heterogeneity.
- {PLNmodels} - Poisson Lognormal Models
  - Clustering of multivariate count data with PLN-mixture
- {qVarSel} - Select Variables for Optimal Clustering
  - Implements an optimization method to find the variables that most separate units between clusters. In this way, masking variables can be discarded from the data frame and the clustering is more accurate.
- {torch-deeptype} (Paper, Article) - Uses both supervised and unsupervised components to find subclusters within known clusters.
  - The supervised part guides the learning toward known clusters/groups, while the clustering component is free to subdivide these into more distinct, homogeneous groups — subject to maintaining classification performance.
Resources
- Applied Machine Learning Using mlr3 in R, Ch.13.4
“Clustering algorithms always find clusters if you tune their similarity and grouping enough. Unfortunately, clusters by themselves have no inherent meaning. Only understanding the data and the algorithm can lend meaning to the clusters. To make a clustering project well defined, start by knowing what the data means. It’s especially important to know about the process that generates the data points. Based on understanding, you can form a hypothesis about the relationships that might exist between points. An algorithm with thoughtfully chosen parameters can either confirm or deny that hypothesis.” (source)
For static data, i.e., if the values do not change with time, clustering methods are usually divided into five major categories:
1. Partitioning (or Partitional)
2. Hierarchical
3. Density-Based
4. Grid-Based
5. Model-Based Methods
Biclustering (aka Two-Mode Clustering)
- Simultaneously clusters the rows and columns of an (\(n×p\))-dimensional data matrix \(Y\), with rows associated to \(n\) statistical units and columns consisting of \(p\) ordinal outcomes.
- Packages
  - {biclustermd} - Biclustering with Missing Data
  - {MixSim} - Simulating Data to Study Performance of Clustering Algorithms
- Papers
  - A Bayesian Model for Co-clustering Ordinal Data with Informative Missing Entries (no code yet)
- Use Cases
  - Analysis of political votes
    - The voting records of \(n\) politicians on \(p\) legislative bills are only available for the sessions they attended, where absence from a session may represent a deliberate political strategy.
  - Recommendation systems
    - The data consists of customer (\(n\)) ratings on \(p\) items.
    - In the well-known Netflix Prize data, missingness arises when users choose not to rate certain movies, potentially reflecting their lack of interest, preference, or other underlying factors

Terms

Cluster Centroid - The middle of a cluster. A centroid is a vector that contains one number for each variable, where each number is the mean of a variable for the observations in that cluster. The centroid can be thought of as the multi-dimensional average of the cluster.
Hard (or Crisp) Clustering - each point belongs to a single cluster
Soft (or Fuzzy) Clustering - each point is allowed to belong to multiple clusters

Cluster Descriptions

Packages
- {parameters} - provides various functions for describing, analyzing, and visualizing clusters for various methods
- {clustereval} - compute the statistical association between the features and the detected cluster labels and whether they are significant.
  - Categorical: Chi-Square, Fisher’s Exact, or Hypergeometric tests
  - Continuous: Mann-Whitney-U test
Examine variable values at the centroids of each cluster
- A higher absolute value indicates that a certain variable characteristic is more pronounced within that specific cluster (as compared to other cluster groups with lower absolute mean values).
Distributional statistics for each cluster
- Numeric variables: mean and sd for each variable in that cluster
- Categorical variables:
  - binary: percent where event = 1
  - multinomial: most prominent category
Run a decision tree on clusters
- Each color (orange, blue, green, purple) represents a cluster
- Explains how clusters were generated
- Issue
  - Suboptimal Labels: A depth of 3 might be sufficient to fully represent the cluster boundaries, but CART might only achieve 100% accuracy when using a depth of 5 or 6. This is because of the greedy nature of the decision tree algorithms.
- {treeheatr}
Radar charts
- 3 clusters: blue (highlighted), red, green
- Guessing the mean values for each variable are the points
Scatter
- Use clustering variables of interest for a scatter plot then label the points with cluster id
  - See K-Means >> Example 2 for code
Interpetable Clustering
- Notes from Introduction to Interpretable Clustering
  - Overview of methods with references to papers
- Instead of fitting a classifier in a supervised manner after clustering, clusters are directly constructed using a model.
- Unsupervised Clustering Trees - Trees are constructed by optimizing a variety of clustering metrics through various optimization algorithms
- Others: Specialized Neural Networks, Polytope Machines, and Rectangular Rules

Gaussian Mixture Models (GMM)

Misc
Packages
- {mclust} - Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
  
  EBook: Model-Based Clustering, Classification, and Density Estimation Using mclust in R
  
  JOSS: Model-based Methods of Classification: Using the mclust Software in Chemometrics
  
  Website: mclust.org
  
  Lists other packages from the group that are associated with GMMs.
  
  Hands-On Machine Learning with R, Ch. 22 (mclust tutorial)
- {mixtools} (Vignette) - Provides computational techniques for finite mixture model analysis in which components are regressions, multinomial vectors arising from discretization of multivariate data, or even distributions that are almost completely unspecified.
  
  Mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma)
  
  Various Reliability Mixture Models (RMMs),
  
  Mixtures-of-Regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts)
  
  Tools for selecting the number of components (bootstrapping the likelihood ratio test statistic, mixturegrams, and model selection criteria)
  
  Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands.
- {EMCluster} - EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution
  
  Much of it is in C, so it should be fast
- {flexmix} - A general framework for finite mixtures of regression models using the EM algorithm is implemented.
  
  Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
  
  Video: Extending flexmix to model-based clustering with sparse data
- {otrimle}
  
  Uses Improper Maximum Likelihood Estimator Clustering (IMLEC) method
  
  Hyperparameters automatically tuned; Outliers removed
  
  Robust gaussian mixture clustering algorithm
  
  Webpage has links to paper, Coretto and Hennig, 2016
- {pivmet} (JOSS) - Pivotal methods for consensus clustering in k-means and mixture modeling. Method mitigates label-switching problem that occurs during Bayesian estimation of mixture models
- {foldcluster} (Paper) - Bayesian Clustering via Fusing of Localized Densities
  
  Addresses GMM kernel misspecification problem. Clustering results are highly sensitive to kernel misspecification. For example, if Gaussian kernels are used but the true density of data within a cluster is even slightly non-Gaussian, then clusters will be broken into multiple Gaussian components.
- {plotmm} - Tidy Tools for Visualizing Mixture Models. Currently supports: mixtools, flexmix, and EMCluster models
- {telescope} - Fits Bayesian finite mixtures with an unknown number of components using the telescoping sampler and different component distributions
  
  Mixture Latent Class Analysis
  
  Univariate and Multivariate Gaussian Mixtures, Poisson Mixtures
- Soft clustering algorithm
- Notes from
  - Serrano video: https://www.youtube.com/watch?v=q71Niz856KE&ab_channel=LuisSerrano
- Resources
  - Model-Based Clustering and Classification for Data Science
Components of the Algorithm
- “Color” points according to gaussians (clusters)
  - The closer a point is to the center of a gaussian the more intensely it matches the color of that gaussian
  - Points in between gaussians are a mixture or proportion of the colors of each gaussian
- Fitting a Gaussian
  - Find the center of mass
    - 2-dim: calculate the mean of x and the mean of y and that’s the coordinates of your center of mass
  - Find the spread of the points
    - 2-dim: calculate the x-variance, y-variance, and covariance
      - First Equation: Height of Gaussian (Multivariate Gaussian distribution equation).
      - Second Equation: 1-D gaussian equation that’s just being used for reference
    - Partially “colored” points affect spread and center of mass calculations
      - Fully colored points “weigh” more than partially colored points and pull the center of mass and change the orientation
Steps
1. Start with random Gaussians
  - Each gaussian has random means, variances
2. Color points according to distance to the random gaussians
  - The heights in the distributions pic above
3. (Forget about old gaussians) Calculate new gaussians based on the colored points
4. (Forget about old colors) Color points according to distance to the new gaussians
5. Repeat until some threshold is reached (i.e. gaussians or colors don’t change much)
Tuning
- Initial Conditions (i.e. Good starting points for the random gaussians at the beginning)
- Limits on the mean and variance calculations
- Number of gaussians, k, can be chosen by minimizing the Davies-Bouldin score
  - See Diagnostics, Clustering >> Spherical/Centroid Based >> Davies-Bouldin Index
- Running algorithm multiple times
  - Like CV grid search algs or bootstrapping

Hierarchical

Misc
- Packages
  - {heatmaply} (Vignette, Examples) - Visualizes Hierarchical Clustering via Heatmap
    - Options for the clustering function for both axes or each individual axis.
      - Default is stats::hclust but the introductory vignette also showed fastcluster::hclust is capable of being used. I didn’t see anything other than the default used in the other examples
    - Distance methods (through hclust): “euclidean” (Default), “maximum”, “manhattan”, “canberra”, “binary” or “minkowski”.
    - Linkage methods (through hclust): “complete” (Default), “ward.D”, “ward.D2”, “single”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC)
- Resources
  - Hands-On Machine Learning with R, Ch. 21
Linkages
- Maximum or complete linkage clustering: Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the largest value of these dissimilarities as the distance between the two clusters. It tends to produce more compact clusters.
- Minimum or single linkage clustering: Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the smallest of these dissimilarities as a linkage criterion. It tends to produce long, “loose” clusters.
- Mean or average linkage clustering: Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the average of these dissimilarities as the distance between the two clusters. Can vary in the compactness of the clusters it creates.
- Centroid linkage clustering: Computes the dissimilarity between the centroid for cluster 1 (a mean vector of length p, one element for each variable) and the centroid for cluster 2.
- Ward’s minimum variance method: Minimizes the total within-cluster variance. At each step the pair of clusters with the smallest between-cluster distance are merged. Tends to produce more compact clusters.

Examples

Example: {heatmaply} (source)

Code

p <- 
  heatmaply(
    mat,
    #dendrogram = "row",
    xlab = "", ylab = "",
    main = "",
    scale = "column",
    margins = c(60,100,40,20),
    grid_color = "white",
    grid_width = 0.00001,
    titleX = FALSE,
    hide_colorbar = TRUE,
    branches_lwd = 0.1,
    label_names = c("Country", "Feature:", "Value"),
    fontsize_row = 5, fontsize_col = 5,
    labCol = colnames(mat),
    labRow = rownames(mat),
    heatmap_layers = theme(axis.line=element_blank())
  )

# save the widget
# library(htmlwidgets)
# saveWidget(p, file= "~/Desktop/R-graph-gallery/HtmlWidget/heatmapInter.html")

For the data, see the first example in the source article
The right dendogram checks which countries tend to have the same features on their numeric variables, and therefore which countries are similar
- Burundi and Angola (bottom) are grouped together. This grouping can be examined using the heatmap where it’s shown they are two countries in strong expansion, with a lot of children per woman but still a strong mortality rate.
From the top dendogram clusters the features
- It shows birth rate and children per woman (left) are grouped together since they are highly correlated.

Latent Profile Analysis (LPA)

Sort of like k-means + GMM
k number of profiles (i.e. clusters) are chosen
Model outputs probabilities that an observation belongs to any particular cluster
GOF metrics available
“As with Exploratory Factor Analysis (EFA )(and other latent-variable models), the assumption of LPA is that the latent (unobserved) factor”causes” (I’m using the term loosely here) observed scores on the indicator variables. So, to refer back to my initial hypothetical example, a monster being a spell caster (the unobserved class) causes it to have high intelligence, low strength, etc. rather than the inverse. This is a worthwhile distinction to keep in mind, since it has implications for how the model is fit.”
Bin variables that might dominate the profile. This way the profiles will represent a latent variable and not gradations of the dominate variable (e.g. low, middle, high values of the dominate variable).

Center other variable observations according to dominant variable bin those observations are in. (e.g. subtract values in bin1 from bin1’s mean)

# From D&D article where challenge_rating is a likely dominant variable
mons_bin <- mons_df %>%
  mutate(cr_bin = ntile(x = challenge_rating, n = 6))
ab_scores <- c("strength", "dexterity", "constitution", "intelligence", "wisdom", "charisma") 
mons_bin <- mons_bin %>%
  group_by(cr_bin) %>%
  mutate(across(.cols = ab_scores, .fns = mean, .names = "{.col}_bin_mean")) %>%
  ungroup()

tSNE

Packages
- {Rtsne}
t-Distributed Stochastic Neighbor Embedding
Looks at the local distances between points in the original data space and tries to reproduce them in the low-dimensional representation
- Both UMAP and tSNE attempt to do this but fails (Lior Pachter paper thread, Doesn’t preserve local structure, No theorem says that it preserves topology)
Results depend on a random starting point
Tuning parameters: perplexity

UMAP

Packages:
- {umap}
- {scDEED} (article) - Detects Dubious t-SNE and UMAP Embeddings and Optimizes Hyperparameters
  - scDEED assigns a reliability score to each 2D embedding to indicate how much the data point’s mid-range neighbors change in the 2D space. Observations whose 2D embedding neighbors have been drastically changed through the embedding process are called ‘dubious.’
- {SaturnCoefficient} - A metric expressing the quality of a UMAP layout
  - Range: [0, 1]
  - Higher value means better dimensionality reduction
Uniform Manifold Approximation and Projection (variation of tSNE) which projects variables to a nonlinear space
Issues
- See tSNE section for Lior Pachter threads on why not to use tSNE or UMAP
- Two dimensions are not enough to predict with high accuracy. This isn’t an issue if our goal is exploratory data analysis (EDA). However, if we over-interpret this as evidence of discrete clusters defined solely by their position in the two-dimensional scatter plot, we risk drawing incorrect conclusions.
- There are no weights like in PCA determine which variables are contributing the most to each dimension.
- Computationally intensive
Preprocessing
- Only for numeric variables
- Standardize
Differences with tSNE
- Random starting point has less of an impact that in tSNE
- Can be supervised (give it an outcome variable)
- UMAP can take a training model and apply it to test data or new data
Try pca first
- If successful (good separation between categories), then prediction may be easier
- If not, umap, tsne needed
Tuning parameter: neighbors
- Example used 500 iterations (n_epochs) as limit for convergence

K-Means

Packages

{tidyclust} - Clustering for tidymodels
- Engines
  - stats and ClusterR run classical K-means
  - laR runs K-Modes models which are the categorical analog to K-means, meaning that it is intended to be used on only categorical data
  - clustMixType to run K-prototypes which are the more general method that works with categorical and numeric data at the same time.
{ClusterR} - Consists of Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering algorithms with the option to plot, validate, predict (new data) and find the optimal number of clusters.
{pivmet} (JOSS) - Pivotal methods for consensus clustering in k-means and mixture modeling
{fkms} (Paper) - k-means clustering for sparsely observed longitudinal data (i.e. repeated measure of multiple numeric variable over time)
- Normal k-means or mixture models can’t be applied to unbalanced/irregular designs where time and measurement differ for each subject. These methods also ignore autocorrelation.
- Also applicable to densely observed data
- Employs the basis function expansion (i.e. splines) to model the cluster centers
{sparcl} - Sparse Hierarchical Clustering and Sparse K-Means Clustering
{clusterHD} (Paper) - Regularized k-Means through hard thresholding
- Uses a \(\mathcal{L}_0\) penalty to induce sparsity in the variables
{RobPC} - Performs both classical and robust panel clustering by applying Principal Component Analysis (PCA) for dimensionality reduction and clustering via standard K-Means or Trimmed K-Means.

Seeks to assign n points to k clusters and find cluster centers so as to minimize the sum of squared distances from each point to its cluster center.
Papers
- Wasserstein k-Centres Clustering for Distributional Data
  - Distributional data arises when each data point can be regarded as a probability distribution
  - Examples of a distributional data set include age distributions of countries and house price distributions of cities.
  - The real data example was classifying 84= 42 (Austrian districts × 2 (men and women) age distributions into two groups with several clustering methods and determine whether the clustering results reflect gender differences.
For choosing the number of clusters, elbow method (i.e. WSS) is usually awful if there are more than few clusters. Recommended: Calinski-Harabasz Index and BIC then Silhouette Coefficient or Davies-Bouldin Index (See Diagnostics, Clustering >> Spherical/Centroid Based (article)
Base R kmeans uses the Hartigan-Wong algorithm
- For large k and larger n, the density of cluster centers should be proportional to the density of the points to the power (d/d+2). In other words the distribution of clusters found by k-means should be more spread out than the distribution of points. This is not in general achieved by commonly used iterative schemes, which stay stuck close to the initial choice of centers.
Issues
- Inefficient in distinguishing between groups of unbalanced sizes. See Packages >> {pivmet} which uses consensus clustering to help mitigate this.

{tidyclust}

Example 1: Mixed K-Means

Model and Extract Centroids

library(tidymodels)
library(tidyclust)

data("ames", package = "modeldata")

kproto_spec <- k_means(num_clusters = 3) %>%
  set_engine("clustMixType")

kproto_fit <- kproto_spec %>%
  fit(~ ., data = ames)

kproto_fit %>%
  extract_centroids() %>%
  select(11:20) %>%
  glimpse()
#> Rows: 3
#> Columns: 10
#> $ Lot_Config     <fct> Inside, Inside, Inside
#> $ Land_Slope     <fct> Gtl, Gtl, Gtl
#> $ Neighborhood   <fct> College_Creek, North_Ames, Northridge_Heights
#> $ Condition_1    <fct> Norm, Norm, Norm
#> $ Condition_2    <fct> Norm, Norm, Norm
#> $ Bldg_Type      <fct> OneFam, OneFam, OneFam
#> $ House_Style    <fct> Two_Story, One_Story, One_Story
#> $ Overall_Cond   <fct> Average, Average, Average
#> $ Year_Built     <dbl> 1989.977, 1953.793, 1998.765
#> $ Year_Remod_Add <dbl> 1995.934, 1972.973, 2003.035

Example 2: Classic K-Means (article, code)

Preprocess and Fit

recipe_kmeans <- recipe(~ ., data = data_prep_tbl) %>%
    step_dummy(all_nominal_predictors(), one_hot = TRUE) %>%
    step_normalize(all_numeric_predictors()) %>%
    step_rm("ID")

recipe_kmeans %>% prep() %>% juice() %>% glimpse()

model_kmeans <- k_means(num_clusters = 4) %>%
    set_engine("stats")

set.seed(123)
wflw_fit_kmeans <- workflow() %>%
    add_model(model_kmeans) %>%
    add_recipe(recipe_kmeans) %>%
    fit(data_prep_tbl)

Visualize

g <- data_prep_tbl %>%
    bind_cols(extract_cluster_assignment(wflw_fit_kmeans), .) %>%
    ggplot(aes(Spent, Income)) +
    geom_point(
        aes(fill = .cluster),
        shape = 21,
        alpha = 0.3,
        size  = 5
    ) +
    geom_smooth(color = "blue", se = FALSE) +
    scale_x_continuous(labels = scales::dollar_format()) +
    scale_y_continuous(
        labels = scales::dollar_format(),
        limits = c(0, 200000)
    ) +
    labs(title = "Customer Clusters: Spent vs Income") +
    scale_fill_tq() +
    theme_tq()

ggplotly(g)

See Cluster Descriptions >> Scatter for the output

DBSCAN

Misc
- Notes from:
  - Understanding DBSCAN and Implementation with Python
  - Clustering with DBSCAN, Clearly Explained video
- Packages
  - {dbscan}- A fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data
    - DBSCAN: Density-based spatial clustering of applications with noise
    - Jarvis-Patrick Clustering: Clustering using a similarity measure based on shared near neighbors
    - SNN Clustering: Shared nearest neighbor clustering
    - HDBSCAN: Hierarchical DBSCAN with simplified hierarchy extraction
    - FOSC: Framework for optimal selection of clusters for unsupervised and semisupervised clustering of hierarchical cluster tree
    - OPTICS/OPTICSXi: Ordering points to identify the clustering structure and cluster extraction methods
  - {parameters}
    - n_clusters_dbscan - Given a “min_size” (aka minPts?), the function estimates the optimal “eps”
    - cluster_analysis - Shows Sum of Squares metrics and the (standardized) mean value for each variable within each cluster.
- HDBSCAN is the hierarchical density-based clustering algorithm
- Use Cases
  - Geospatially Clustering Earthquakes
    - Events can occur in irregular shaped clusters (i.e., along faults of different orientations).
    - Events can occur in different densities (i.e. some fault zones are more active than others).
    - Events can occur far away from fault zones (i.e. outliers)
Tuning
- eps - The maximum distance between two samples for one to be considered to be connected to the other
  - Large eps tend to include more points within a cluster,
  - Too-large eps will include everything in the same single cluster
  - Too-small eps will result in no clustering at all
- minPts (or min_samples) - The minimum number of samples in a neighborhood for a point to be considered as a core point
  - Too-small minPts is not meaningful because it will regard every point as a core point.
  - Larger minPts can be better to deal with noisy data
Algorithm
- For each data point, find the points in the neighborhood within eps distance, and define the core points as those with at least minPts neighbors.
  - The orange circle represents the eps area
  - If minPts = 4, then the top 4 points are core points because they have at least 4 points overlapping the eps area
- Define groups of connected core points as clusters.
  - All the green points have been labelled as core points
- Assign each non-core point to a nearby cluster if it’s directly reachable from a neighboring core point, otherwise define it as an outlier.
  - The black points are non-core points but are points that overlap the eps area for the outer-most core points.
  - Adding these black points finalizes the first cluster
- This process is repeated for the next group of core points and continues until all that’s left are outliers.
Advantages
- Doesn’t require users to specify the number of clusters.
- Not sensitive to outliers.
- Clusters formed by DBSCAN can be any shape, which makes it robust to different types of data.
  - Example: Nested Cluster Structure
    - K-Means
      - K-Means wants spherical clusters which makes it grab groups of points it shouldn’t
    - DBSCAN
      - Able correctly identify the oblong shaped cluster
Disadvantages
- If the data has a very large variation in densities across clusters because you can only use one pair of parameters, eps and MinPts, on one dataset
- It could be hard to define eps without the domain knowledge of the data
- Clusters not totally reproducible. Clusters are defined sequentially so depending on which group of core points the algorithm starts with and hyperparameter values, some non-core points that are within the eps area of multiple clusters may be assigned to different clusters on different runs of the algorithm.

Mixed Variable Types

Packages
- {DIBclust} (see paper) Deterministic Information Bottleneck (DIB) clustering
  - Preserves the most relevant information while forming concise and interpretable clusters, guided by principles from information theory
- {kamila} - KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering.
  - It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets.
- {FactoMineR} - Multivariate Exploratory Data Analysis and Data Mining
  - Has function for FAMD with K-Means approach
- {clustMixType} (Vignette) - Performs k-prototypes partitioning clustering for mixed variable-type data
- {cluster} - Can use for PAM w/Gower’s Dissimilarity mixed type clustering
  - First compute the dissimilarity matrix using daisy(df, metric = "gower"). Then use that dissimilarity matrix as input for pam(matrix, k, diss = TRUE).
Papers
- A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data
  - Introduces {DIBclust}
  - Compares the Deterministic Information Bottleneck (DIB) method to KAMILA, K-Prototypes, Factor Analysis for Mixed Data (FAMD) with K-Means, and PAM using Gower’s dissimilarity
    - Used Adjusted Rand Index to compare cluster method results with the ground truth.
  - Simulation Data
    - DIBmix outperforms the other methods in a majority of scenarios with the only exception being when the proportion of categorical variables is high. Alternative bandwidth choices could potentially enhance performance in these cases.
    - FAMD performed best with a high proportion of categorical variables probably due to its data reduction step.
      - It was only slightly better than DIBmix. The other methods were substantially worse.
    - Unlike the other methods, DIBmix effectively handles datasets with unbalanced cluster sizes.
      - This robustness arises from the entropy term in its objective function, which minimises for imbalanced clusters, thus allowing exploration of diverse partition structures
  - Real Data Sets
    - Out of 10 datasets for the UCI Repository, DIBmix and K-Prototypes each had 4 best scores. Kamila and PAM each had 1. The results given the characteristics of the datasets were in-line with the results of the simulated data.