pylearn.clustering package

Submodules

pylearn.clustering.clustering module

class pylearn.clustering.clustering.Clustering(k=3)[source]

Bases: object

An abstract base class for Clustering Algorithms. Defines the basic properties and methods that can be used.

Attributes:
k (int):

Number of clusters

centroids (numpy.ndarray):

Matrix of centroids of all k clusters

data_points (numpy.ndarray):

Matrix of all data points

data_points_to_cluster (list):

List of each data point’s assigned cluster

clusters (list):

List of all k clusters

assigned_clusters(clusters: list | str | int) list[tuple][source]

All to the clusters assigned data points.

Parameters:
clusters (list | str | int):

Cluster name(s)

Returns:

List of the data points

static euclidean_distance(x: ndarray, centroids: ndarray) ndarray[source]

Calculates distance of a data point x to all k centroids.

Parameters:
x (numpy.ndarray):

Data point (vector)

centroids (numpy.ndarray):

Centroids in a matrix (each row is one centroid)

Returns:

Array of the distances

static median(x: ndarray) ndarray[source]

Determines the point with the median smallest distance to all other data points in the cluster. The median point must be one of the data points.

Parameters:

x (numpy.ndarray): Matrix of all data points in one cluster

Returns:

Data point as one-element array

rename(old_clusters: list, new_clusters: list) list[source]

Renames the clusters.

Parameters:
old_clusters (list):

List of all old clusters to get renamed

new_clusters (list):

List of the renamed clusters

Returns:

A list of the data points

pylearn.clustering.k_means module

class pylearn.clustering.k_means.KMeans(k=3)[source]

Bases: Clustering

K Means algorithm computes clusters by calculating the mean of the cluter points.

Attributes:
k (int):

Number of clusters

centroids (numpy.ndarray):

Matrix of centroids of all k clusters

data_points (numpy.ndarray):

Matrix of all data points

data_points_to_cluster (list):

List of each data point’s assigned cluster

clusters (list):

List of all k clusters

fit(X: ndarray, max_iterations=500, threshold=0.001) list[source]

Assigns each data point the best cluster by calculating the distances.

Parameters:
X (numpy.ndarray):

Matrix of data points (each row is one data point)

max_iterations (int, optional):

Number of iterations to update the centroids, default: 500

threshold (float, optional):

Stopping criterion to interrupt the update iterations, default: 0.001

Returns:

A list of the to data points assigned clusters

pylearn.clustering.k_medoids module

class pylearn.clustering.k_medoids.KMedoids(k=3)[source]

Bases: Clustering

K Medoids algorithm computes clusters by calculating the median of the cluter points. Centroid must be a data point itself.

Attributes:
k (int):

Number of clusters

centroids (numpy.ndarray):

Matrix of centroids of all k clusters

data_points (numpy.ndarray):

Matrix of all data points

data_points_to_cluster (list):

List of each data point’s assigned cluster

clusters (list):

List of all k clusters

fit(X: ndarray, max_iterations=500, threshold=0.001) list[source]
Parameters:
X (numpy.ndarray):

Matrix of data points (each row is one data point)

max_iterations (int, optional):

Number of iterations to update the centroids, default: 500

threshold (float, optional):

Stopping criterion to interrupt the update iterations, default: 0.001

Returns:

A list of the to data points assigned clusters

Module contents