FCM, The Fuzzy C-Means clustering algorithm

A. Feature

Each data point can belong to more than one cluster (Soft Clustering).
- The degree of belonging to each centroid is represented by scalar between 0 and 1.
- It seems proper to cluster which the boundary is ambiguous such as natural language task.
Likewise other prototype-based clustering such as K-means family, Fuzzy C-means clustering finds optimal status through repetition.

B. Process

Choose a number of clusters, K
Assign randomly to each point coefficients for being in the clusters.
Compute the centroid for each cluster.
For each point, compute its coefficients of being in the clusters.
Repeat 3 & 4 until the coefficients’ change between two iterations is no more than the given sensitivity threshold

C. Option

distance metric
- cosine distance
- euclidean distance
- mahalanobis distance

Python example

1. Generate sample data

Define three cluster centers

centers = [[4, 2],
         [1, 7],
         [5, 6]]

Define three cluster sigmas in x and y, respectively

sigmas = [[0.8, 0.3],
        [0.3, 0.5],
        [1.1, 0.7]]

Generate test data

np.random.seed(42)  # Set seed for reproducibility
xpts = np.zeros(1)
ypts = np.zeros(1)
labels = np.zeros(1)
for i, ((xmu, ymu), (xsigma, ysigma)) in enumerate(zip(centers, sigmas)):
  xpts = np.hstack((xpts, np.random.standard_normal(200) * xsigma + xmu))
  ypts = np.hstack((ypts, np.random.standard_normal(200) * ysigma + ymu))
  labels = np.hstack((labels, np.ones(200) * i))
alldata = np.vstack((xpts, ypts))

2. Clustering w/ fuzzy_cmeans class

(python implementation is posted on github)

n_clusters = 3
m = 2
metric = "euclidean" # "cosine" # "mahalanobis"
optimization = ["weights"] # ["weights", "centers", "error"]
fcm_model = fuzzy_cmeans(n_clusters=n_clusters, m=m, metric=metric, min_error_change=1e-4, min_weights_change=1e-6, max_iter=500, verbose=False)
fcm_model.fit(X, optimization=optimization)

Reference and Implementation:

paper: FCM: The Fuzzy C-Means clustering algorithm, DEC 1984
implementation: python implementation