Decentralized Machine Learning and Efficient Anomaly Detection
The area of distributed computing systems provides a promising domain for applications of machine learning methods. In this project, we design a system and propose a novel approximation scheme for continuously online anomaly detection that dramatically reduces the burden on the production network.
Our system leverages intelligent data filtering at distributed monitors and Principal Component Analysis (PCA) for detection at Network Operation Center (NOC). Our approximate scheme involves a set of local monitors that maintain parameterized sliding filters. These sliding filters yield quantized data streams that are sent to the NOC. The NOC makes global decisions based on these quantized data streams.
We derive analytical results based on stochastic matrix perturbation theory to effectively balance the tradeoff between detection accuracy and the amount of data communicated over the network. By avoiding the expensive step of centralizing all traffic data, our solution enables tracking PCA-based anomalies in real time with minimal data communications. This overcomes the key scalability limitations of the state-of-the-art network-wide anomaly detection solution. Experiments with traffic data from an ISP-backbone network demonstrate that our methods yield significant communication benefits while simultaneously achieving high detection accuracy.
Publications
- Communication-Efficient Online Detection of Network-Wide Anomalies. Ling Huang, XuanLong Nguyen, Minos Garofalakis, Joseph Hellerstein, Anthony D. Joseph, Michael Jordan and Nina Taft. To appear in 26th Annual IEEE Conference on Computer Communications (INFOCOM'07). Anchorage, Alaska, May 2007.
- In-Network PCA and Anomaly Detection, [longer version]. Ling Huang, XuanLong Nguyen, Minos Garofalakis, Anthony Joseph, Michael Jordan and Nina Taft. In Advances in Neural Information Processing Systems (NIPS) 19. Vancouver, B.C, December 2006.
