# Interview with Rui Yang - "Predictive Analytics for Power Systems Decision Making”

**Dr. Rui Yang** is a research engineer in the Power Systems Engineering Center at the National Renewable Energy Laboratory (NREL). Her areas of expertise include advanced data analytics, machine learning, and optimization in electric power systems. She currently leads multiple efforts at NREL on developing advanced data analytics for energy systems applications. She holds a Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University and a B.E. degree in Electrical Engineering from Tsinghua University.

*In this interview, Rui answers questions from her webinar, "Predictive Analytics for Power Systems Decision Making", originally presented on April 25th, 2019. For more details regarding these questions, please view her webinar on-demand on the IEEE SG Resource Center. *

**Where could we find the bus-level load data/forecast?**

The simulations were done using the IEEE 123-bus system, and we created synthetic data for the load and PV profiles. All the data used in this work are available upon request.

**What is the visibility considered in predicting the states? Percentage and resolution of PMUs and SCADA data?**

We use historical system states to forecast the future states by employing machine learning methods. The historical system states are estimated using measurements collected in the system at considered time steps. We have validated the performance of the developed state estimation method under different levels of available measurements, from 10% to 100%. The developed matrix completion-based state estimation method could work with heterogeneous measurements whichever are available at a considered time step. These measurements could come from SCADA, PMUs, AMI, and other grid-edge sensors and may be collected at different time resolutions. We are currently evaluating the performance of the developed state estimation method under different mixtures of available measurements.

**The green dots on slide 12 mean we get data from each DER too?**

The green dots on slide 12 represent the distributed energy resources (DERs) which may be deployed in the system. The figure on slide 12 is an illustrative example showing the measurements collected in distribution systems may come from SCADA, DERs, loads, and other grid-edge sensors. Therefore, the distribution state estimation method needs to incorporate heterogeneous measurement data.

**Why does minimizing the rank in the objective function make sense? (Slide 16)**

In the developed matrix completion-based state estimation method, we formulate a data matrix containing the unknown state variables and the partially known measurements in the system. Since the state variables and system measurements, such as power injections, are correlated through power flow equations, the formulated data matrix is nearly low rank. Using minimizing the rank of the data matrix as the objective function, we will recover a state estimation solution such that the resulting data matrix is nearly low rank. By doing this, we are able to use the correlations between the state variables and measurements to estimate the system states under low-observability conditions.

**The formulation of the problem by taking the power flow as perfect constraints may have missed the opportunity of detecting topology error. Is there action to deal with that?**

In the formulated matrix completion problem, we incorporate the power flow equations as constraints. Therefore, system topology and admittance matrix need to be known. If the system topology changes, we will need to update the power flow constraints in the formulated optimization problem to accurately estimate system states. If the exact topology is not known or the topology change is not detected, the power flow constraints may not be accurate, rendering inaccurate state estimation solution. Currently, we are trying to tackle this problem from two different ways. First, we are developing methods to recover the system topology and detect topology changes. Secondly, we are investigating some model-free approaches for state estimation.

**In your test system, what happens when you add additional PV generation? Your model is no longer valid for the system. Do you have to retrain your model? How long does it take to train your model and make adjustments?**

In the simulations, we used IEEE 123-bus system and created diversified load and PV profiles for individual nodes. Power flow analysis was then performed to provide the measurement data and the ground-truth of the system states. We first used matrix completion to estimate the system states from measurements and created a dataset containing historical system states which was then used to train a state forecaster to predict the system states in the short-term future. Since the historical system states cover a wide range of operating conditions, the trained state forecaster can accurately predict the system states under various operating conditions. If additional PV systems are deployed in the system, the patterns of system states over time may change; therefore, the state forecaster may need to be retrained. As shown in the results, our developed state forecasting method is a computationally efficient method, and it only needs a few seconds to retrain the forecaster for this test system.

**Does the sampling frequency of the measurements affect the accuracy of your estimation/forecasting? I see in the results figure is about 1 minute resolution, however I think the real data availability is more coarse (15 minutes). (Slides 18 & 27)**

The developed matrix completion-based state estimation method could work with heterogeneous measurements whichever are available at a considered time step to estimate the system states at that time step. These measurements could come from SCADA, PMUs, AMI, and other grid-edge sensors and may be collected at different time resolutions. If certain types of measurements are collected every 15 minutes, these types of measurements would be used to provide the state estimates every 15 minutes. If some measurements are available at a higher time resolution, we could only use this subset of measurements in higher resolutions to produce state estimates at more granular time scales. Since our developed method could work under low-observability conditions, we could provide accurate state estimation results even with a limited amount of measurements.

For forecasting the states, since we directly use the historical system states to forecast the system states in the short-term future. The resolution of the state forecasts depends on the resolution of the historical data of the system states. If the system states are estimated every 15 minutes, the forecasted states will also be in 15-minute resolution. If the system states are estimated at a more granular resolution, such as 1 minute, we could produce the state forecasts also in higher resolution.

**On Slide 21, have you tested the work with 100% data available? 100% data case can test the accuracy of the linearized power flow to represent the AC power flow.**

Yes, we have tested the developed state estimation method with 100% data available. 100% data available means that the active and reactive power injections and voltage magnitudes at each node are known. We found the estimation results under 100% data available are comparable to the results under 70% data available, since the system is fully observable under 70% data available.

For the accuracy of the linear power flow model compared to the full AC model, we have some related work specifically validating how accurate the linear power flow model we developed is. In that work, we evaluated the accuracy of the linear model under a wide range of operating conditions in the systems, and we found the accuracy of the linear power flow model is within 0.2% in voltage phasors compared to the full AC model. Detailed information about our linear power flow model can be found here.

**Using machine learning method, how do you make sure the SE solution is self-consistent? (****Satisfying Kirchhoff's Laws)**

In this work, we employ machine learning method to forecast the system states, i.e., voltage phasors, in the short-term future directly using historical system states. The historical system states are estimated using measurements collected by solving a constrained matrix completion problem. Since we incorporate the power flow equations as constraints in the formulated matrix completion, the resulting state estimation solution would ensure the power flow constraints are satisfied within a small tolerance bound.

**Why have you used simple NN and not RNN which better suits time series forecasting?**

In this work, we aim to forecast the system states in the short-term future by learning the temporal correlations in system states. Commonly used machine-learning-based forecasting methods, such as artificial neural networks and support vector regression, are computationally intensive. Therefore, we employ a simple neural network model to forecast the shot-term system states. The developed extreme learning model has very good forecasting accuracy and is very computationally efficient. Recurrent neural networks are well suited for time series forecasting problems. We will explore recurrent neural networks in the future work.

**On Slide 19, how good is the angle prediction, since angle difference is small in distribution systems?**

For the results shown in slide 19, the mean absolute error (MAE) for angle estimation is 0.3 degrees, and the mean absolute percentage error (MAPE) for angle estimation is 6%. Hence, the angle estimation is accurate.

**In general, how do you deal with time skewing of measurements?**

The developed matrix completion-based state estimation method could work with heterogeneous measurements whichever are available at a considered time step to estimate the system states at that time step. These measurements could come from SCADA, PMUs, AMI, and other grid-edge sensors and may be collected at different time resolutions. The developed state estimation method just uses the available measurements collected at each time step to estimate the system states. We are currently validating the performance of the developed state estimation method under realistic system operation scenarios and will further explore how to deal with the time skewness of measurements.

**The matrix rank minimization problem is very hard to solve, are there any suggested techniques, like the nuclear norm reformulation?**

In our work, we reformulate the rank minimization problem as minimizing the nuclear norm of the data matrix containing the unknown state variables and the partially known measurements in the system.

**Do you think to use deep learning techniques for state forecasting, which may be better than Extreme learning machine?**

Yes, deep learning techniques can be leveraged for state forecasting, and they may provide better forecasting accuracy than the extreme learning machine. The advantage of the extreme learning machine is that it is a computationally efficient method because of its simple structure while it can still provide very accurate forecasts.

**Do you have a plan for further research on this topic? if so, what are your objectives for the next phase?**

Our future research plan includes:

- Improve the scalability of the developed state estimation and forecasting methods
- Integrate the state estimation and forecasting methods in real-time controls.

The objectives of the ongoing and future work are:

- to ensure the developed methods could work on large-scale, realistic systems under different operation scenarios; and
- to generate actionable information for real-time operations and controls.

**Is the case where there is Loss of communication in data considered? If yes, how is it handled?**

Yes, we have considered the loss of communication scenario in evaluating the developed state estimation method. With communication loss in data, different measurements may be available at different time steps. This indicates that the known elements in the formulated data matrix may be different from time to time. Since our developed matrix completion-based state estimation method is flexible, we can use whatever measurements available to estimate the voltage phasors at every time step. In our test, we model the baseline scenario as 50% of the nodes have measurements and randomly remove 20% of the available measurements at each time step in a 2-hour time period. The estimates for voltage phasors under communication loss are similar to the estimates without data loss, with slight larger deviations at some time steps when critical measurements are lost. This demonstrates the developed matrix completion-based state estimation method is robust to real-world operating conditions.

To view past interviews, please visit the IEEE Smart Grid Resource Center.