Synchrophasor Data Anomaly Detection and Classification Considering Data Drift and Contextual Information
Written by Sagnik Basumallik, Anurag Srivastava, Arman Ahmed and Yinghui Wu
Motivation for Synchrophasor Data Analytics and Impact: Uncertainty of renewables, penetration of inverter-based resources, and varying load dynamics have increased the operational complexity of the grid. Monitoring the power system dynamics under such highly complex scenarios is essential, and Phasor Measurement Units (PMU) provide synchronized high-resolution measurements. PMUs enable coordinated system-wide situational awareness and faster post-event analysis by providing real-time Global Position System (GPS) time-stamped voltage and current phasor data . Analyzing PMUs data comes with multiple challenges, such as data drift, data anomalies, and missing data. For data quality, efficient approaches are needed to detect, process, and remove such inconsistencies from the PMU measurements before being provided to critical PMU data analytics applications.
Challenges: Data anomalies, PMU estimation, Missing Labels, Data Drift
PMU data quality varies widely as measurements are transferred through the communication network. Inconsistencies in data may arise due to actual changing grid operating conditions, measurement inaccuracies, or errors in end-to-end synchrophasor systems including communication networks. To prevent low-quality data from producing anomalous results leading to incorrect operator decisions, it is imperative that PMU data be monitored for availability and accuracy by addressing the following challenges.
- Data anomalies: Sources of measurement anomalies include biases and calibrations errors in instrument transformers, noise, device and channel errors, incorrect data alignment, missing timestamps, loss of packet data, network congestion, problems during data compression, and harmonic interference. The anomalies may occur at multiple PMUs or at the data concentrator level .
- PMU estimation: These are induced errors introduced by the PMU estimation and filtering process which include synchrophasor total vector error, magnitude error, phase error, frequency error, and rate of change of frequency error typically higher during system dynamics .
In addition to data anomalies and induced errors, two additional challenges can arise during data analytics of PMU data.
- Missing labels: Machine-learning based events or anomaly detection algorithms require a considerable amount of labeled PMU measurements, which are often not available due to the excess manual labor and costs involved.
- Data Drift: Drifting in PMU measurements occurs due to dynamic changes in operating conditions such as asset tripping, line reconnection, or switching events. The drift can be gradual or sudden, and in both cases, the statistical properties of the PMU data change over time. As a result of drift, the definition of PMU data anomaly changes. The data that was considered normal at time t may not be considered normal later. Any machine learning or data analytics algorithm that was trained on one set of PMU data may no longer make accurate anomaly predictions, leading to model degradation and reliability over time.
Solution Strategies: Dynamic Change Driven Learning with Data Change Detection Algorithm
We aim to address this key challenge of integrating contextual information in anomaly detection. For context sensing, we introduce the dynamic Data Change Driven Learning (DCDL) method that incorporates data drift .
Consider the case of online anomaly detection when data drift is detected. In this situation, the dynamic DCDL trains an unsupervised deep-learning algorithm that takes additional input from a non-parametric multivariate Data Change Detection Algorithm (DCDA). The objective of the DCDA module is to identify when the data statistics have changed. The entire algorithm is implemented at the PMU device level, shown in Figure 1 (below), and is explained as follows. Consider a single PMU streaming n measurements. A total of N data points for one measurement is taken as the length of one single window. To detect data drift, the data window is first divided into two equal halves - from 1:M and from M+1:N. Histograms are generated for PMU data in each window, and for each bin q, the ratio of the total number of PMU measurements in bin q over the total number of measurements in each window is calculated. This is repeated for each measurement in both windows. Finally, a log-likelihood measure is calculated over all bins for each measurement and the maximum of this measure is compared against the threshold for data drift detection. The intuition is that under data drift, the log-likelihood measure successfully captures statistical differences between the two windows by quantifying the differences in the histogram bins. To filter anomalous measurements under data drift, a pre-trained unsupervised deep-learning module is used for prediction and generation of pseudo-labels for re-training. The anomalous data is filtered out using intermediate labels and the deep-learning module is retrained on the new data statistics as shown in Figure 1. Anomaly classification is subsequently carried out.
Figure 1: PMU data anomaly detection with data drift module
Future Needs: Integrated Data Anomaly Detection and Classification, Cybersecurity, Data Value
The proposed approach addresses the challenge of integrating contextual information when the definition of PMU data anomalies varies over time due to changes in power system operating conditions. In the future, integrated anomaly detection and classification for cyber-physical data is required. Also, contextual information can further be incorporated into cyber anomaly detection when changes in the inter-dependent PMU communication network result in data drift. Changes in the cyber data distribution may occur under dynamic changes in the network routing table through software-defined networking, or server and link failures, which need to be accounted for. Also, PMU data needs to be considered for multiple applications enhancing the data value. Integrating context in both cyber and physical anomaly detection modules will result in building a secure and resilient grid.
- Phadke, Arun G. "Synchronized phasor measurements in power systems." IEEE Computer Applications in Power 6, no. 2 (1993): 10-15.
- Khaledian, Ehdieh, Shikhar Pandey, Pratim Kundu, and Anurag K. Srivastava. "Real-time synchrophasor data anomaly detection and classification using isolation forest, kmeans, and loop." IEEE Transactions on Smart Grid 12, no. 3 (2020): 2378-2388.
- NASPI, "PMU data quality: A framework for the attributes of PMU data quality and quality impacts to synchrophasor applications." (2017).
- Ahmed, Arman, K. Sadanandan Sajan, Anurag Srivastava, and Yinghui Wu. "Anomaly detection, localization and classification using drifting synchrophasor data streams." IEEE Transactions on Smart Grid 12, no. 4 (2021): 3570-3580.
This article was edited by Ali Nabavi.
Sagnik Basumallik, is currently working as an Engineering Scientist at West Virginia University. He has received his Ph.D. from Syracuse University, New York. His research interests include cyber physical power system, control and optimization.
Anurag K. Srivastava, (M’05–SM’09–F’22) received his Ph.D. degree in power engineering from the Illinois Institute of Technology, Chicago, IL, USA, in 2005.He is a Raymond J. Lane Professor and Chairperson with the Computer Science and Electrical Engineering Department at West Virginia University. He is also an Adjunct Professor with Washington State University and a Senior Scientist with the Pacific Northwest National Lab. He is an author of more than 300 technical publications including a book on power system security and has three patents. His research interest includes data-driven algorithms for power system operation and control including resiliency analysis. Prof. Srivastava is serving as chair of the IEEE PES synchrophasor applications TF, voltage stability working group, vice-chair of power system operation sub-committee, and vice-chair of tools for power grid resilience task force.
Arman Ahmed, received his Ph.D. degree in Computer Science from the School of Electrical Engineering and Computer Science, Washington State University (WSU), Pullman, WA USA in 2022. His doctoral work was supported by the Pacific Northwest National Lab (PNNL), Richland, WA USA as a Distinguished Graduate Research Fellow. He is now working as an Artificial Intelligence Engineer at Intel, Hillsboro, OR 97124 USA.
Yinghui Wu is an Associate Professor in the Department of Computer and Data Science at Case Western Reserve University. He holds a joint position as a staff scientist at the Pacific Northwest National Laboratory. His areas of expertise are Data and Knowledge management, Databases, Data Mining and Machine Learning. He received his Ph.D. in Computer Science from the University of Edinburgh, UK, and a B.S. in Computer Science from Peking University, China. His area is in data management and analytics, including data quality, graph query processing, and knowledge base systems. His current research develops scalable graph analytical systems for multidisciplinary database applications. He serves as an associate editor for the ACM Journal of Data and Information Quality.