Data mining is the practice of analyzing huge amounts of information to look for trends and patterns. This data is often gathered from a person's internet browsing history, their shopping habits, location data, and more.

How does data mining work?

Companies commonly collect data from rewards programs, social media, mailing lists, and more. That data is then analyzed for patterns and behaviors. This is how grocery stores know that egg nog sells big during the holiday season, for example, or why ads for camping supplies pop up in your social media feed after you research that trip to Zion.

What are the main objectives associated with data mining?

The main objectives are finding insights, trends, and relationships within large amounts of data. Experts use this raw information to develop marketing strategies. It's also used in fields like policing, science, and engineering.

Internet, Networking, & Security > Around the Web

23

What Is K-Means Clustering?

Data mining with the K-Means algorithm

By

Mike Chapple

Updated on November 18, 2021

Around the Web

How the K-Means Algorithm Functions

The k-means algorithm is an evolutionary algorithm that gains its name from its method of operation. The algorithm clusters observations into k groups, where k is provided as an input parameter. It then assigns each observation to clusters based upon the observation’s proximity to the mean of the cluster. The cluster’s mean is then recomputed and the process begins again. Here’s how the algorithm works:

The algorithm arbitrarily selects k points as the initial cluster centers (the means).
Each point in the dataset is assigned to the closed cluster, based upon the Euclidean distance between each point and each cluster center.
Each cluster center is recomputed as the average of the points in that cluster.
Steps 2 and 3 repeat until the clusters converge. Convergence may be defined differently depending upon the implementation, but it normally means that either no observations change clusters when steps 2 and 3 are repeated, or that the changes do not make a material difference in the definition of the clusters.

Choosing the Number of Clusters

One of the main disadvantages to k-means clustering is the fact that you must specify the number of clusters as an input to the algorithm. As designed, the algorithm is not capable of determining the appropriate number of clusters and depends upon the user to identify this in advance.

For example, if you had a group of people that are to be clustered based upon binary gender identity as male or female, calling the k-means algorithm using the input k=3 would force the people into three clusters when only two, or an input of k=2, would provide a more natural fit.

Similarly, if a group of individuals was easily clustered based upon home state and you called the k-means algorithm with the input k=20, the results might be too generalized to be effective.

For this reason, it’s often a good idea to experiment with different values of k to identify the value that best suits your data. You also may wish to explore the use of other data mining algorithms in your quest for machine-learned knowledge.

FAQ

What is data mining?

Data mining is the practice of analyzing huge amounts of information to look for trends and patterns. This data is often gathered from a person's internet browsing history, their shopping habits, location data, and more.
How does data mining work?

Companies commonly collect data from rewards programs, social media, mailing lists, and more. That data is then analyzed for patterns and behaviors. This is how grocery stores know that egg nog sells big during the holiday season, for example, or why ads for camping supplies pop up in your social media feed after you research that trip to Zion.
What are the main objectives associated with data mining?

The main objectives are finding insights, trends, and relationships within large amounts of data. Experts use this raw information to develop marketing strategies. It's also used in fields like policing, science, and engineering.

Was this page helpful?

Thanks for letting us know!

Tell us why!