A machine learning IDS trained on real-world network traffic datasets with SHAP-based explainability for security analyst interpretation.
A machine learning–based Intrusion Detection System trained to identify malicious network traffic across multiple attack categories. Built on two benchmark datasets and designed with explainability in mind so security analysts can understand why a detection was made.
Trained and evaluated on two industry-standard IDS datasets: CIC-IDS2017 (Canadian Institute for Cybersecurity) and UNSW-NB15 (University of New South Wales). These datasets cover a wide range of attack types including DoS, DDoS, brute force, web attacks, infiltration, and botnets.
Built a full preprocessing pipeline: feature extraction from raw network flow data, handling of severe class imbalance using SMOTE and undersampling techniques, normalization, and feature selection based on importance scores.
Compared multiple approaches: XGBoost (supervised, best overall performance), Random Forest (ensemble baseline), and anomaly-based detection using Isolation Forest. Evaluated tradeoffs in precision, recall, and false positive rate relevant to real-world IDS deployment.
Applied SHAP (SHapley Additive exPlanations) to identify which network features drive attack classifications. This makes the model interpretable — a security analyst can see exactly why a specific flow was flagged, not just that it was flagged.
Class imbalance was the biggest challenge — attack traffic is a tiny fraction of normal traffic in both datasets. Naive models achieve 99% accuracy by predicting everything as benign. Getting meaningful precision and recall on rare attack classes required careful resampling strategy and threshold tuning.
A functional IDS pipeline with explainable detections, demonstrating both supervised and anomaly-based detection approaches and their real-world tradeoffs. The SHAP analysis revealed the top network features most predictive of each attack category.
Key takeaway: What's the most important thing to know about this project?