Facebook Social Network Clustering

Community Detection and Node Clustering in Social Networks

View Interactive Notebook

Project Overview

This project explores the structure of Facebook's social network by identifying communities and clusters of connected users. Using graph theory and machine learning techniques, the analysis compares multiple clustering algorithms and community detection methods to uncover meaningful patterns in how users form groups and interact within the network. The project demonstrates both traditional graph-based community detection and modern machine learning clustering approaches on real-world social network data.

Key Objectives

Dataset

Source: facebook_combined.txt - Facebook social network graph

Size: 4,039 nodes and 88,234 edges representing user connections

Type: Undirected graph with unlabeled nodes

Features Extracted (20+ local graph features):

Methods & Techniques

Community Detection Algorithms:

Clauset-Newman-Moore Label Propagation Louvain Method Kernighan-Lin

Machine Learning Clustering:

K-Means Agglomerative (Ward) Gaussian Mixture Models OPTICS Affinity Propagation Self-Organizing Maps

Advanced Techniques:

Results & Performance

0.7289
Best Silhouette Score
0.4373
Best Davies-Bouldin
9844.54
Max Calinski-Harabasz
8
Optimal Clusters

Key Findings

  • Best Overall Method: Agglomerative Clustering with Ward linkage achieved the highest silhouette score (0.7289) and lowest Davies-Bouldin index (0.4373) with 8 clusters
  • K-Means Performance: Highly effective with silhouette score of 0.7152 for 8 clusters, demonstrating robustness across metrics
  • Node2Vec Advantage: Graph embeddings produced competitive results (7 optimal clusters) by capturing network topology more effectively than raw features
  • Outlier Impact: Removing outliers significantly improved clustering quality across all metrics
  • Community Structure: Network exhibits clear modular structure with meaningful community boundaries detected by multiple algorithms

Technologies Used

Python NetworkX scikit-learn Node2Vec PyCaret pandas NumPy matplotlib seaborn sklearn-som Gap-Statistic Google Colab