• Aucun résultat trouvé

Visualizing dynamic power system scenarios for data mining

N/A
N/A
Protected

Academic year: 2021

Partager "Visualizing dynamic power system scenarios for data mining"

Copied!
8
0
0

Texte intégral

(1)

Visualizing Dynamic Power System Scenarios for Data Mining

Pierre GEURTS and Louis WEHENKEL

y

Dept. of Electrical Engineering - University of Li`ege - Sart Tilman B 28, B 4000 Li`ege - Belgium

yResearch Associate, F.N.R.S.

Abstract —This paper presents a power system dynamic scenario

vi-sualization tool which is used to support user interaction in the context of data mining. Data mining may be used in order to extract impor-tant synthetic information from data bases containing power system dynamic trajectories under various conditions, for example generated by simulations or obtained from historical records. Data mining re-lies on automatic learning algorithms, such as decision trees or neural networks, to extract information and uses various graphical tools to communicate this information to the user. The paper describes the type of information stored in dynamic security assessment data bases, the purpose of data mining and the tools developed. A postscript ver-sion of this paper, showing the graphics in color is available from the

world wide web at URL http://www.montefiore.ulg.ac.be/lwh/.

Keywords —Data mining; Automatic learning; Power system

dy-namics; Graphical visualization

1

INTRODUCTION

Like many other application areas, the power system field is presently facing an explosive growth of data. In power systems, irrespectively of the particular application, there are three main sources of data: (i) field data, collected by various devices dis-tributed throughout the system, such as digital recorders; (ii) centralized data archives, such as those maintained by control center SCADA systems; (iii) data from simulations, carried out in planning or operation environments. Trends in technology (e.g. in digital protection and recording, control center archi-tectures, simulation software and hardware...) result in more and more data generation and dramatic cost decrease of mass storage devices make it possible to store them efficiently.

However, it will not be possible to exploit effectively such huge amounts of data (maybe TERAbytes), unless appropriate software tools are developed to help experts to extract mean-ingful information from it. Note that, due to the stochastic nature of power systems, the available data will essentially be of statistical nature. Hence, exploiting it properly will call for a tool box of statistical knowledge extraction techniques (i.e. automatic learning) in order to fit the variety of power system problem types (linear/non-linear, local/global, smooth/discrete, static/dynamic...). At the same time, data will be corrupted by noise, outliers, or missing information and appropriate data cleansing methods will also be necessary. In addition, such

To appear in Proc. of LESCOPE’98 Large Engineering Systems Conference on Power Engineering, Halifax, June 98

data mining tools will require appropriate graphical front-ends to enable information exchange with power system engineers.

The development of such software environments and of the underlying methods, is the purpose of the recently emerged fields of Knowledge Discovery from Data Bases (KDD) and Data Mining (DM) [1]. Note that within the KDD community, it is well recognized that DM and KDD tools need to be tailored to application specifics in order to be effective. Some important power system specific aspects of DM applications are: the large scale character of power systems (thousands of state variables); the temporal nature of data and the coexistence of very fast phe-nomena (ms) with slower trends (minutes, hours, weeks, years) all being important for some problems; mixture of discrete (e.g. events such as topology changes or protection arming) and con-tinuous information (i.e. analog state variables); necessity of relating information to geography and topology of the power system, i.e. necessity of using one-line diagrams and power system maps to communicate with the experts [2].

The purpose of this paper is to present useful data visualiza-tion techniques (some generic and some power system specific) which are enhancing a Data Mining software developed at the University of Li`ege. This software incorporates various auto-matic learning methods (statistical, artificial neural networks, and machine learning) tailored to power system data mining problems, together with appropriate graphical user interfaces. It was progressively developed during the last 10 years, in the context of applying automatic learning techniques to power sys-tem dynamic security assessment and is able to handle large scale applications.

The paper is organized as follows. Section 2 first describes the type of information contained in power system security assessment data bases, and then focuses on both generic and special purpose visualization techniques. Section 3 describes the developed tools from the software implementation point of view. Section 4 provides examples from a real large scale data base created for dynamic security assessment of the EHV system of Electricit´e de France [3]. In the conclusions we briefly discuss the remaining work to be done and review a certain number of applications where the developed tools could be useful.

2

DATA MINING AND VISUALIZATION

2.1

Dynamic security information data bases

Although much of our discussion is also relevant to other power system data mining problems (some will be mentioned in the conclusions) we will concentrate on data bases containing information about the dynamic behavior of a power system under various conditions. Such data bases are presently mainly obtained from simulation studies, but in the future it may be envisioned that similar information could be provided by real-time wide area measurement systems.

(2)

Non temporal

Objects (scenarios)

Curves Events Numeric Discrete

...

....

ant1... adt1 ... an1 ad1 ...

scN sc2 sc1 Attributes (characteristics)

Temporal

Figure 1. Data base structure (3rd dimension : time)

data bases are obtained in pratice, let us mention that one pos-sible methodology consists of using Monte-Carlo sampling to-gether will parallel computations, in order to generate automat-ically such data bases in off-line study environments (planning or operation planning) [4]. Another possibility would be to store systematically in a data base the simulations carried out manually by the engineers in charge of design and security as-sessment, or those that are run by an on-line dynamic security assessment tool coupled with the energy management system of a control center [5].

2.1.1 Data base structure

Before discussing which kind of tools may be useful in order to extract useful information from such data bases, let us briefly describe their general structure.

A data base, as we view it, is composed of a certain number of objects each one of which is described by a certain number of

attributes (see Fig. 1). In a dynamic security information data

base, each object essentially corresponds to a dynamic trajectory of the studied system under some particular conditions; we use the term scenario to denote such objects. The attributes correspond to characteristics of various devices in the system, and more generally to variables deemed interesting. We may distinguish among the following types of attributes.

Temporal. These attributes are used in order to describe what

happens along the system trajectory; there are mainly two types of such temporal attributes :

Numerical curves. Examples are the variation with time of the

values of rotor angles, voltage magnitudes, currents:::

Discrete events. Examples are time tagged sequences of events

indicating various relay armings, breaker trippings, fault

occurences:::

Non temporal. These attributes provide static characteristics

about a particular scenario, such as lists of available plants, lines,

transformers, load demand:::Again we may distinguish among

numerical and discrete attribute values. In addition, it is worth noticing that non temporal attributes may be derived as function-als of temporal ones (e.g. initial or final value of a state variable, total number of elements of a particular type tripped during a

certain time period, stability/instability classification:::)

As is illustrated in Fig. 1, a dynamic security information data base may be viewed as a three-dimensional structure, which is normally filled in a sparse fashion. Indeed, not all attribute

values are meaningfull for all scenarios (e.g. if a line is out of operation, the corresponding current need not to be represented). Furthermore, along the time axis the values of the temporal at-tributes are only represented at the relevant instants (e.g. events when they occur; curves with variable time steps adapted to their dynamics in a particular case). In addition, some attributes may be implicitly represented as functions of others, and are computed on the fly upon request (e.g. total generation in the system as a sum of individual generators active power).

In large scale dynamic security assessment data bases, the number of scenarios may be very large (typically, a few thou-sand) as well as the number of attributes (typically, a few hun-dred), and time-constants may range from milliseconds to sev-eral minutes. Hence it is quite important to take advantage of the sparse structure of the data base, in particular along the time

axis.1

2.1.2 Physical world

While in a data base attributes are abstract entities, they gen-erally correspond to devices or sets of devices which are well defined in the physical world (e.g. voltages are related to buses, currents to lines). Moreover, in the physical world various re-lationships exist between entities (e.g. a line is related to the buses it connects).

An engineer who is working with such a data base will think about the power system according to the physical world rela-tionships, and it is hence useful to allow the data mining tool to be aware of these latter. In the sequel we suppose that the physical world relationships are given in the form of an external data base, the so-called physical data base, and links are estab-lished between this latter and the dynamic security information data base. More precisely, the data mining tool is supposed to know for each attribute (as well as for discrete attribute values) to which entity they are related in the physical world.

2.2

Data mining

“Data mining” is a buzzword which denotes a set of activities aiming to extract interesting information (“data nuggets”) from data bases. In power system dynamic security information data bases, it can be used for example to build rules characterizing secure and insecure scenarios, or to assess the influences among different parameters, or to evaluate how accurately one can pre-dict future states or events from past information, or to determine criteria to rank contingencies, or to evaluate consequences of

unstable behaviour:::

In order to help extracting such a large diversity of types of informations, data mining tools offer a wide range of function-alities, and in particular the following ones:

Subview selection. This allows the user to select part of the

scenarios and/or part of the attributes in order to focus his anal-ysis. Subsets of scenarios may, for example, be selected on the basis of some property defined on their attribute values, by name, or as a random subset of the data base. Attributes are generally chosen manually by the expert, according to the task he wants to carry out.

Sorting and grouping. Attributes may be sorted or grouped

by correlations w.r.t. a reference attribute; objects are sorted or grouped by attribute values or by distance to a prototype.

12000 scenarios, 500 attributes, fixed time-step of 10ms and 10 minutes of

(3)

Total: 10000 scenarios 6971 SECURE scenarios 3029 INSECURE scenarios

Conditional histograms of CCT-SBS vs SECURITY

100 200 300 CCT (ms) 0.0 100. 200. 300. 400. 500.Nb.scens (a) GDC 1.0 - 6/4/1998 at 14h35 € Total:1000 rho =-.88175 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • •• • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ••• • • • • • •Insecure: 302 rho =-.35708 ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° ° °Secure: 698 rho =-.82327

Correlation (PU,CCT) vs. SECURITY

700 800 900 1000 1100 1200 Pu (MW) 50 100 150 200 250 300 350 400 CCT (b) GDC 1.0 - 6/4/1998 at 14h31 Scenario 7305 Scenario 3254 Scenario 209 Scenario 6352 Scenario 1732 Scenario 6438 Scenario 3306 Scenario 2265 Scenario 8773 Scenario 7818 Scenario 6776 Scenario 4797 Scenario 9248 Scenario 308 Scenario 4953 Scenario 5569 Scenario 9229 Scenario 7869 Temporal attribute OMEGA(t) of data base OMIB (18 scenarios)

0 500 1000 1500 2000 2500 t (ms) 0 500 1000 1500 2000 2500 3000 (c) GDC 1.0 - 6/4/1998 at 12h29

Figure 2. Examples of generic visualization techniques : (a) histograms; (b) scatter plots; (c) temporal curves T7 + L0 + D8 T2: 1691.0 y T3: 746.0 y St4: 249.0y 68 (19) St41: 497.0n 121 (13) 103 (29) Delta[100] > 67.4 T126: 945.0 n St127: 470.0y 161 (10) St209: 475.0n 200 (12) 181 (22) Delta[150] > 59.6 147 (46) Delta[100] > 59.2 T295: 1309.0 n T296: 829.0y St297: 389.0y 239 (11) St360: 440.0n 280 (13) 261 (24) Delta[150] > 49.6 T438: 480.0 n St439: 280.0y 327 (14) St484: 200.0n 383 (21) 350 (33) Delta[150] > 39.5 294 (51) Delta[150] > 44.6 211 (88) Delta[150] > 54.3 297 (105) V: voltages

Xf: equivalent line reactance

CCT : critical clearing time Omega(t) : transient machine rotor speed Xinf 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.002 0.004 0.048 0.028 0.785 Qu Omega[0.1] Omega[0.15] Pu 0.997 1.000 1.000 Omega[0.05] Pl Delta[0.1] Delta[0.15] Delta[0.0] Delta[0.05] Cct-Sbs 0.995 0.999 1.000 Vinf Vl 0.980

0.978 Delta(t) : transient machine rotor angle

404

Pl: local load active power

Pu, Qu : steady state power of machine

mean st.dev. CCT (ms)

49

Figure 3. Data mining results visualization: (a) regression tree (CCT function of(t)); (b) correlation dendrogram

Temporal transformations. They consist of shifting the time

origin (e.g. to synchronize scenarios according to some event), increasing or decreasing time granularity or truncating it, in order to focus on a particular aspect of the problem.

Attribute definition language. This very important facility

enables the user to define new attributes by combining val-ues of existing ones. For example, he could easily combine elementary characteristics into more synthetic ones, and hence implement various possible criteria.

Automatic learning. Algorithms are provided in order to build

automatically function attributes (in various forms) so as to approximate a given attribute as a function of some others (e.g. decision trees are used to approximate discrete attributes; linear or non-linear regression may be used to approximate numerical attributes:::)

Similarity analysis. Algorithms are provided in order to

eval-uate similarities among scenarios, find the most similar scenario to a given one, build a set of representative prototypes, analyse

correlations among attributes:::

Visualization. Graphical tools are used in order to view

infor-mation such as attribute values for sets of objects, and results of automatic learning.

In the sequel, we focus on visualization techniques. We refer the interested reader to [2] for a detailed description of relevant automatic learning and similarity analysis methods.

2.3

Generic visualization techniques

Generic visualization techniques are those which are useful for any type of data mining activity, power systems related or not. Figure 2 illustrates three basic types of graphics which are very useful. Note that they are drawn from an academic

example data base related to transient stability assessment of a one-machine-infinite-bus (OMIB) system.

The first two are related to non temporal attributes :

con-ditional histograms (Fig. 2a) provide information about the

distribution of a given numerical attribute (horizontal axis) in relation to a discrete attribute (colors or grey levels); scatter

plots (Fig. 2b) provide information on the relations between

two numerical attributes (horizontal and vertical) and a discrete one (colors and markers). Figure 2a represents the distribution of critical clearing times (CCTs) of the OMIB system, among the 10,000 states of the data base, classified into secure and inse-cure scenarios (w.r.t. a threshold of 155ms on the CCT). Figure 2b shows the relationship between the active power generated by the machine and the CCT values, indicating a quadratic shape and also for each active power level the range of CCTs possible due to the effect of other parameters. Figure 2c provides a plot of a temporal attribute for some scenarios : the variation of the mechanical speed of the machine, when the fault is cleared at

t cl

=155ms.

Figure 3 illustrates two generic visualization techniques for data mining results.

Figure 3a (left part) shows a (partial view of a) regression tree which approximates the CCT of the OMIB system as a piecewise constant function of the machine rotor angle in the during fault period. Note that the shaded area and the number in each box provide information about the mean and standard deviations of CCTs of the scenarios corresponding to each node. The top-node corresponds to all possible scenarios, whereas the terminal nodes correspond to a subset of scenarios falling

within a certain range of(t)values. For example, the lower

left node corresponds to scenarios such that(150)>54:2



and

(100)>67:4 

; for this kind of scenarios the mean CCT value is 68ms and its standard deviation is equal to 19ms. Similarly,

(4)

Cluster number 2: 73 scenarios

Cluster number 12: 51 scenarios Cluster number 15: 157 scenarios

Clustering according to temporal attribute DELTA(t) (data base OMIB)

0 250 500 750 1000 1250 1500 Time ---100. 200. 300. 400.

Figure 4. Example of visualization of scenario clustering

39:5 

; they have a mean CCT of 383ms and a standard deviation of 21ms. Thus, the regression tree allows one to approximate the CCT from rotor angles in the during fault period; in our example this approximation is actually quite accurate, the mean absolute error being about 2ms.

Figure 3b (right part) shows a dendrogram summarizing a correlation analysis among various attributes, and three groups of attributes which may be considered as highly correlated (the numbers refer to correlation coefficients among variables com-puted on a representative subset of the data base).

Figure 4 illustrates how curve coloring can be useful to anal-yse the results of a clustering algorithm. A random sample of the data base was automatically partitioned by a clustering algorithm into 15 classes, according to the values taken by the

temporal attribute(t), i.e. rotor angle temporal behaviour. The

(t)curves of the scenarios belonging to three of these clusters

are shown in Fig. 4.

2.4

Requirements of specific visualizations

Power system specific tools for visualization are mainly using two-dimensional maps, where different icons are used to repre-sent different types of elements and colors in order to highlight groups of elements which share a common property (same

volt-age level, same island, same status on/off:::). In addition,

some fields of these maps are showing numerical information (voltage magnitudes, power flows...). Given the complexity of most power systems, it is normally impossible to visualize all elements in full detail without overloading the picture. Thus, tools must be able to display information at different levels of detail, and allow the user to easily switch from one level to another. In addition, graphical maps should of course provide easy to use zooming, focusing and navigation mechanisms.

These facilities are well known in the control center commu-nity; some interesting research has also been carried out in order to design special features useful in the context of displaying re-sults from security assessment [6, 7, 8]. We will not elaborate on the general features of such displays, which are clearly also required in the context of data mining applications. Rather, we will discuss special features which need to be added to these ba-sic tools in order to provide a tool to engineers in charge of data mining security information data bases. In passing we notice that, while the graphic tools provided in the real-time environ-ments of control centers are indeed highly sophisticated, those at the disposal of engineers in charge of studies in the off-line environment are often very minimal.

The following provides a (probably non exhaustive) list of desirable features of such visualization tools:

2.4.1 General features needed in the study environment

Focusing and searching objects on the map. Since a rather

large number of attributes and physical objects may be rep-resented on a map, it is useful to provide some focusing mech-anism which automatically centers the display on the relevant part of the map. Another useful feature is to allow the user to input physical object names for focusing. Notice that these fea-tures are specially useful since the user is not necessarily fully acquainted with the layout.

Flexibility. Since in the off-line study environment a very large

variety of problems are considered, the tool must be able to display any kind of information.

Easy and incremental editing. In particular, it should be easy

for an engineer to edit graphic layouts, add or remove infor-mation on them, since it is not possible to design in advance graphical layouts for any kind of application.

Paper version of the graphics. In order to write reports it

should be possible to generate automatically paper versions of maps, for a specified set of scenarios, and with a minimum amount of user intervention.

2.4.2 Data mining related features

Interaction with the data mining tool. The visualization tool

should provide facilities to send and acquire information to and from the data mining tool. In particular, it may be very useful in order to display groups of attributes in different colors and select or unselect some attributes by clicking on the corresponding graphical representation. It is also useful to allow the user to call some generic visualizations directly from the tool.

Multiple scenarios. It should be possible to switch quickly

from one scenario to another, or to visualize simultaneously several scenarios, in order to compare their temporal behaviour in the data mining process.

Temporal motion. In order to visualize temporal attributes it

should be possible to move forwards and backwards through time easily, or to skip to a particular event occurence.

Most of these features have been incorporated in the tool de-scribed in the next section and will be illustrated below. Notice that they are quite different from the graphics requirements in the real-time environment.

3

SOFTWARE

Below we briefly describe the data mining software (called GTDIDT) and then focus on the implementation of the scenario visualization tool VISIONET. Notice that both have been devel-oped in the research environment, and thus should be considered as prototypes rather than full-fledged software tools.

3.1

Data mining tool : GTDIDT

The data mining tool was developed gradually in the

con-text of various research projects [9]. The features recently

added concern the representation and manipulation of the (large amounts of) temporal data encountered in dynamic security in-formation data bases. The overall architecture of GTDIDT is depicted in Fig. 5. It is composed of two main components.

(5)

Data mining tool box .... LOAD SUBVIEW FUNCTIONS SELECT CLONE SORT .... SUBVIEW LOAD SORT FUNCTIONS SELECT CLONE Decision trees Descriptive stats Graphical tools

perceptronsMultilayer perceptronsMultilayer

Graphical tools Descriptive stats Decision trees DB cloning DB cloning Pictures Layouts Results Statist. Analys. Pictures Graphs Results EMPTY DATA MINING TOOL CLONED DATA MINING TOOL

No data Only algorithms Used to compile DBs with algorithms Loads a data base view and builds clones

in parallel on different computers May be used by different users Fast access to up 50 million values Contains attributes and algorithms

Compiled data bases DB view spec. Scenarios DB Results DB Output of the DB step generation

Data mining data base management Data mining data base management

Data mining tool box

Figure 5. Data mining tool GTDIDT

Data mining data base management module. It loads data

from flat files, while allowing one to select a subset of the scenarios and a subset of attributes, yielding a particular view on the data bases. These data are then compiled into internal representation and an executable image is cloned, containing the viewed data and the data mining algorithms. On a workstation with 1GB of main memory it is thus possible to access simulta-neously up to 200 million attribute values. The data base man-agement module allows the user to define additional attributes as functions of the basic ones extracted from the simulations and compiles them into efficient code. It provides also facil-ities for various subset and variable sorting/selection/grouping operations and time axis transformations.

Data mining tool box. This module contains both low level

and high level algorithms in order to allow the user to extract knowledge from a data base. The former are basic statistical summarizations in text and/or graphical forms (means, standard

deviations, correlations, bar diagrams, curves, scatter plots:::).

The latter are automatic learning algorithms, deemed useful for security assessment applications, e.g. unsupervised learning (clustering of similar types of scenarios or attributes), top down induction of decision and regression trees, and smooth linear and non-linear regression techniques. Notice that the illustrations in Figs. 2, 3 and 4 were all generated by GTDIDT.

GTDIDT is written in Lisp and runs under Unix/X11 : its user interface is a customization of the GNU-Emacs editor while the core of the tool is written in GNU CommonLisp (GCL). It generates generic graphics in the form of postscript files which are visualized using any available postscript previewer. The automatic learning algorithms generate their output mainly in the form of graphics and function attributes (lisp code) which are automatically compiled and added to the data base.

3.2

Security scenario visualization : VISIONET

VISIONET and GTDIDT have been developed indepen-dently; they run in parallel (different processes) and commu-nicate using Unix sockets, as is illustrated in Fig.6.

VISIONET is written in Tcl/Tk: Tcl (Tool command lan-guage) is an interpreted command language; Tk is a graphical tool-kit for X11 integrated with Tcl. Rather than computa-tional efficiency, the main features of Tcl/Tk are portability, open design (its integration with other languages was a main objective when it was designed), and user-friendlyness (thanks to the provision of powerful graphical libraries). It was hence perfectly suitable for the development of a general power sys-tem visualization tool, where flexibility is more important than

000 000 000 000 000 000 111 111 111 111 111 111 000 000 000 000 000 000 111 111 111 111 111 111 USER USER

PHYSICAL DATA BASE

VISIONET GTDIDT

Int2 Int1

GCL physical objects GCL/TK physical objects TCL/TK physical objects / attributes

values / physical objects physical objects / attributes attributes / physical objects

Figure 6. Interfacing GTDIDT and VISIONET

efficiency (in comparison to CAD/CAM, or image processing, for example).

The software may run either in stand-alone mode or coupled with GTDIDT. In stand-alone mode, starting with the physi-cal power system model, it allows the visualization and editing of one-line diagrams, with features like zooming, searching of physical elements, hard-copy, easy to use graphics editor. The power system may be decomposed into several logical lev-els (typically corresponding to different voltage levlev-els) which may be displayed together or separately, and also into geo-graphical regions. Notice that, while the tool was designed for power systems, it could be used to represent the topology of any other kind of interconnected system (computer network, ground

transportation:::).

The coupling between this tool and GTDIDT was carried out using the ’gcl-tk’ package, which provides the Tk facilities to GCL. GTDIDT may send directives to VISIONET (who then behaves as a graphics server), and it is possible to let GTDIDT take control over any existing Tcl/Tk program. The interaction works also in the other direction : Tcl/Tk may send expressions to GTDIDT (who then behaves both as a data and computation server) and get results back. The two applications are work-ing in parallel, and it is thus possible to continue interactwork-ing with the graphic tool while GTDIDT is busy doing some com-putations. They take advantage of the Unix/X11 environment, which makes it possible to run them on different hosts, or to use several screens to display information.

As is shown in Fig. 6, both tools make use of the physical data base as a common vocabulary and information source : GT-DIDT links its attributes with physical objects, Visionet links

graphical objects (positions, icon geometries, colors:::) with

physical objects. To realize the coupling, GTDIDT was ex-tended by two interfaces : int1 and int2 (see Fig. 6). Upon user request to GTDIDT, the latter creates a new Tcl/Tk win-dow ’int2’ which is attached to VISIONET; this winwin-dow con-tains buttons and fields corresponding to the additional func-tionalities related to the data mining activities (visualization of attribute values, selection of candidate attributes,

switch-ing among scenarios:::). During this initialization stage,

VI-SIONET receives also information about the data mining envi-ronment in the form of GTDIDT names (scenarios, attributes).

In normal use, most of the interactions correspond to two-way communications. In order to serve a graphic request from the user, VISIONET will eventually call GTDIDT to get informa-tion (e.g. a particular attribute value, corresponding to a given scenario and time instant). Then, GTDIDT sends information back in the form of values together with a list of physical object

(6)

Figure 7. VISIONET graphical user interface outlook : lower part specific to “int2”; upper part is generic

idenficators to which these values relate. In turn, from the list of physical objects VISIONET can derive its graphical information required in order to serve the user request.

Other types of interactions may be purely one-way. For ex-ample, VISIONET may be directly controlled by GTDIDT in order to reflect data mining results automatically in the Graph-ics. Similarly, VISIONET may act directly on the GTDIDT environment, e.g. in order to call some of its facilities (curve plotting, attribute selection...). In this latter case, physical object names are sent to GTDIDT who achieves the translation.

Figure 7 shows the graphical outlook of Visionet’s user inter-face. The lower part corresponds to the ’int2’ window, and the upper part to the standard graphics display where the one-line diagrams are shown. The ’int2’ window contains in its lower part a task bar with tape recorder buttons allowing to move though time, or through a list of scenarios, and above this a scrollable window allows the selection of groups of attributes and/or events to display on the map. On the top of this is located a task bar with navigation functionalities (zoom, level focusing, searching and showing object names). The task bar immedi-ately above this latter contains editing functions and the ’Edit’ button which allows to switch between editing and visualiza-tion modes. The main window (showing part of the 400kV and 225kV network of Electricit´e de France) contains objects rep-resenting the power system and attributes values. Rather than indicating the device names on the map, they are automatically displayed when the mouse moves over an object. Next section (and Fig. 8) illustrates some other features.

4

ILLUSTRATION

4.1

Case study description

We draw our illustration from a large scale study which was carried out on the EHV system of Electricit´e de France (see [3], for further details). The data base is composed of about 1500 scenarios, among which a certain proportion is highly dis-turbed. The aim of the study was to identify main system failure modes in the South-Eastern part of the system, by studying both long term and fast dynamics, and taking into account all kind of protective devices which may act when the system is highly disturbed. The simulation model comprizes 11,000 state vari-ables and each scenario is simulated during a period of about 60 minutes of real-time, using a variable time step numerical integration method (EUROSTAG [10]). Below we will focus on a particular scenario which shows very complex and interesting phenomena, appropriate to illustrate our point.

4.2

Examples of visualizations on scenario 1439

Let us briefly describe the chronology of events happening within this scenario (see also Fig.8) :

Initialization (t=0s to t=505s). At t=20s a first (bus bar) fault

appears in a 400kV2 substation (see Fig. 8a), leading to

im-mediate tripping of 225kV lines (overload protections working improperly). The fault is cleared at t=20.1s, by permanent trip-ping of breakers in the substation, leading to the loss of the two 400kV lines connected to the bus bar. At t=40s 12 tap-changer

(7)

386 kV 395 kV 410 kV 396 kV 411 kV OLTC blocked Time = 40 seconds Region L Region I fault (20s) fault (500s) 1650s 2900s

Line overload protections shown by blue (dark) lines and tripping times Time = 0 to 2903s Region L Region I 500s 2310s 1974s 1949s 20s 20s (a) (b) Hydro Hydro Hydro Hydro Hydro 280 kV 409 kV 407 kV 327 kV 344 kV OLTC blocked Time = 2230 seconds Region L Region I undervolt overspd overspd undervolt overspd 199 kV 267 kV 406 kV 300 kV 400 kV Generator protections Time = 2310 to 2345 s Region L Region I (d) (c) MV voltages Tap positions NO Var Rhone (rgn I) Azur (rgn L) Lyon (rgn I) SO NO Rhone Azur Var

Mean tap positions of zones (ratios in pu)

0 1000 2000 sec

1 1.1 1.2 1.3

Mean MV voltage of zones (tripped loads included)

0 1000 2000 sec 0.4 0.6 0.8 1 0 1000 2000 sec 100 kV 400 kV 0 1000 2000 sec 2000 MW 3000 MW P.Cor (I) It Lyon Sp Ge Gramm (Be) Boutr (L) B.Car (L) Sw Be SO 400kV voltages Power exports Tavel (L) (i) (f) (e) tripping

Line overload Undervoltage prot

∗∗TAVELS71_TRI.PS71_2 ∗AVIGNS61_MOTT5S61_1 ∗B.TOLS71_GEN.PS71_1 ∗L.NEUS61_ZCBAUS61_1 ∗ECHALS61_RIVIES61_1 ∗L.NEUS61_VALE8S61_1 ∗BOISSS71_CHAFFS71_1 ∗ BISSYS61_G.ILES61_1 ∗ ORAISS61_SISTES61_1 ∗ GIVORS61_ZGAROS61_1 ∗ SSAUBS61_ZSAL6S61_1 ∗CHAFFS71_MIONSS71_1 ∗CHAFFS72_MIONSS71_2 ∗BOUTRS61_SSTULS61_1 ∗LAVERS61_SEPTES61_1 ∗

Each step corresponds to a line tripping

0 1000. 2000 3000 sec 0 5 10 15 Step ∗ ∗RABATS61_RABATS31_1 ∗E.BOTS61_E.BOTS31_1 ∗PALUNS61_PALUNS31_1 ∗SEPTES61_SEPTES31_1 ∗ROGNAS61_ROGNAS31_1 ∗LAVERS61_LAVERS31_1 ∗FEUILS61_FEUILS31_1 ∗DARSES61_DARSES31_1 ∗RASSUS61_RASSUS31_1 ∗SSESTS61_SSESTS31_1 ∗SSCHAS61_SSCHAS31_1 ∗P.ORGS61_P.ORGS31_1 ∗RQROUS61_RQROUS31_1 ∗C.RHOS61_C.RHOS31_1 ∗L.NEUS61_L.NEUS31_1 ∗BOUDES61_BOUDES31_1 ∗BOLL5S61_BOLL5S31_1 ∗VIRADS61_VIRADS31_1 ∗TERRAS61_TERRAS31_1 ∗AVIGNS61_AVIGNS31_1 ∗CTAURS61_CTAURS31_1 ∗ARDOIS61_ARDOIS31_1 ∗SSCESS61_SSCESS31_1 ∗JONQUS61_JONQUS31_1 ∗MENTOS61_MENTOS31_1 ∗LINGOS61_LINGOS31_1 ∗T.VICS61_T.VICS31_1 ∗MOUGIS61_MOUGIS31_1 ∗P.GR5S61_P.GR5S31_1 ∗GRISOS61_GRISOS31_1 ∗S.PONS61_S.PONS31_1 ∗SISTES61_SISTES31_1 ∗SSAUBS61_SSAUBS31_1 ∗SSTULS61_SSTULS31_1 ∗COUDOS61_COUDOS31_1 ∗ESCAIS61_ESCAIS31_1 ∗VINS_S61_VINS_S31_1 ∗TRANSS61_TRANSS31_1 ∗CPNIES61_CPNIES31_1 ∗EYBENS61_EYBENS31_1 ∗PARISS61_PARISS31_1 ∗MOIRAS61_MOIRAS31_1 ∗

Steps correspond to zones of tap changers

0 1000 2000 3000 sec 0 10 20 30 40Step ∗ ∗PHENIT_1 ∗M.PONT_1 ∗M.PONT_3 ∗ M.PONT_2 ∗GARD5T_4 ∗

Steps correspond to generators tripped

0 1000 2000 3000 sec 0 2 4 Step ∗ ∗ ORAISH_2 ∗ORAISH_1 ∗SSCROH_2 ∗ SSESTH_2 ∗SSESTH_1 ∗SSCROH_1 ∗

Steps correspond to generators tripped

0 1000 2000 3000 sec 0 2 4 6 Step of generators of generators blocking

Tap changer Overspeed prot

(h) (j) (k)

(g) (h)

(8)

blocking devices act within the 225kV subsystem close to the hydro plants (see Fig. 8b,h). Nothing else happens until the occurrence of the second fault, at t=500s : it is a three-phase short-circuit on a critical 400kV line in the Rhˆone valley (see Fig. 8a), leading to the loss of two major 400kV lines (a parallel line, due to overload protection misoperation, and the faulted line due to normal operation of protections).

Intermediate stage (t=505s to t=2000s). Given the number

of circuits lost, 9 (400kV and 225kV) lines are overloaded in the study region, leading to their successive tripping between t=1650s and t=2000s (see Fig. 8a,e). During the same period, tap changers start reacting, first in region I, then in region L (see Fig. 8g).

Voltage collapse in region L (t=2000s to t=2904s). Upon loss

of the last three lines the critical point is reached in the Eastern part of region L : tap changers continue raising their taps but load decreases (compare Fig. 8j and 8g). At about the same time, low voltage protections start disconnecting 56 further (mainly 225kV) lines around Lyon leading to the loss of 3100MW of load in region I (see Fig. 8j). We can observe on Fig. 8k how this load decrease is compensated by an increase in the interconnections

flows out of France. At t=2230s a second wave of 30 tap

changer blocking devices are activated throughout region L, unfortunately too late (see Fig. 8c,g,j). At t=2310s a further 400kV line in the North-Eastern part of region I trips on overload protection (this is actually an interconnection between France and Switzerland, as can be deduced from the sudden decrease in the corresponding power flow in Fig. 8k) and two seconds later some generators in the Southern part trip on undervoltage protections (see Fig. 8d,j). About ten seconds later, some other generators are lost due to overspeed protections (see Fig. 8d,i); two further lines are tripped on overload at t=2650s and t=2900s, but since the simulation is stopped shortly afterwords we can’t observe further consequences. At t=2904s the system stabilizes with very low voltages throughout region L, normal ones in region I (see Fig. 8h).

To summarize, this scenario corresponds to the loss of 9 400kV and 65 225kV lines, 5 thermal units, 6 hydro units, 3100MW of load in region I and 4700MW in region L; in-crease in total exportation of 3600MW; very low voltages in region L (153kV to 266kV) and normal voltages in region I (408kV). While the dynamic behavior of power systems may be extremely complex, our illustration shows how visualization tools can make the engineer’s life easier, if not easy.

5

CONCLUSION

The aim of this paper was to present tools and techniques which may help engineers to study (and hence improve) the dynamic performance of their power system in a systematic way. More and more possibilities become available to gener-ate and collect data about power system dynamics, be it from

simulations or from historical recordings. The data mining

and visualization tools discussed in this paper provide a multi-tude of possibilities to the engineers in charge of power system planning, maintenance and operation. Some of these tools are generic in nature and could be provided by any commercially available data mining tool; however, much more possibilities are offered by designing special purpose tools, making use of physi-cal information about the power system structure and presenting information in a way compatible with the approach experts use

to analyse system behavior.

Our illustrations have been drawn from the power system dynamic security assessment area, which is certainly one of the fields where needs tend to become more and more strin-gent. However, there are many other problems in power sys-tems where data mining may be very useful. In particular, we mention the broad field of modeling (i.e. making use of mea-sured data in order to develop better models), state-estimation and forecasting (building up and maintaining statistical models

about load patterns, measurement and signaling errors:::), and

the very broad area of Monte-Carlo based probabilistic tools used in many long term and short-term planning problems (to find out which parameters influence most significantly

loss-of-load-probabilities, costs:::).

Further work will aim at improving the flexibility and graph-ical features of the proposed tool so as to be used for other data mining problems, and building interfaces in order to couple it also with dynamic simulation packages.

ACKNOWLEDGMENTS

This research was partly carried out in the context of a collaboration with Electricit´e de France. We thank the engineers of Electricit´e de France for the valuable suggestions and feedback.

REFERENCES

[1] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy.

Advances in Knowledge Discovery and Data Mining. AAAI

Press/MIT Press, 1996.

[2] L. Wehenkel. Automatic learning techniques in power systems. Kluwer Academic, Boston, 1998.

[3] L. Wehenkel, C. Lebrevelec, M. Trotignon, and J. Batut. A

probabilistic approach to the design of power systems protection schemes against blackouts. In Proc. IFAC-CIGRE Symp. on Cont.

of Power Plants and Power Syst, pp. 506–511, Beijing, 1997.

[4] Y. Jacquemart, L. Wehenkel, and P. Pruvot. Practical contribution of a statistical methodology to voltage security criteria determina-tion. In Proc. of the 12th PSCC, pp. 903–910, Aug. 1996. [5] T. E. Dy-Liacco. Enhancing power system security control. IEEE

Computer Applications in Power 10, no. 3, pp. 38–41, July 1997.

[6] P. M. Mahadev and R. D. Christie. Envisioning power system data: concepts and a prototype system state representation. IEEE Trans.

on Power Syst., 1993.

[7] F. L. Alvarado, Y. Hu, C. Rinzing, and R. Adapa. Vizualization of spatially differentiated security margins. In Proc. of the 11th

PSCC, pp. 519–525, Aug-Sept 1993.

[8] P. M. Mahadev and R. D. Christie. Envisioning power system data: vulnerability and severity representations for static security assessment. IEEE Trans. on Pwr Syst., 1995.

[9] L. Wehenkel, M. Pavella, E. Euxibie, and B. Heilbronn. Decision tree based transient stability method - a case study. IEEE Trans.

on Power Syst. PWRS-9, no. 1, pp. 459–469,1994.

[10] B. Meyer and M. Stubbe. EUROSTAG, a single tool for power system simulation. Transmission and Distribution International 3, no. 1, pp. 47–52, 1992.

BIOGRAPHIES

Pierre Geurts is currently a student at the electrical engineering

depart-ment of the University of Li`ege, nearing completion of his undergrad-uate studies. He started working on temporal data mining problems about one year ago.

Louis Wehenkel received the Electrical (Electronics) engineering

de-gree in 1986 and the Ph.D dede-gree in 1990 both from the University of Li`ege, Belgium, where he is presently a research associate of the F.N.R.S. His research interests lie in automatic learning and its appli-cation to power system security analysis and control.

Figure

Figure 1. Data base structure (3rd dimension : time) data bases are obtained in pratice, let us mention that one  pos-sible methodology consists of using Monte-Carlo sampling  to-gether will parallel computations, in order to generate  automat-ically such
Figure 2. Examples of generic visualization techniques : (a) histograms; (b) scatter plots; (c) temporal curves
Figure 4. Example of visualization of scenario clustering 39 : 5  ; they have a mean CCT of 383ms and a standard deviation of 21ms
Figure 5. Data mining tool GTDIDT
+3

Références

Documents relatifs

In this paper we report our initial experiments on two real-life relational databases: a collection of best and worst movies from the Internet Movie DataBase (IMBD) and a database

Two types of spatiotemporal databases are mainly considered: databases containing trajecto- ries of moving objects located in both space and time (e.g. bird or aircraft

In particular, (i) we present an extension of the UML profile for spatial DW integrating the Hierarchical Agglomerative Clustering for defining

We have also tested our tool by retrieving information about movies from 1991 to 2001 by DBpedia and Linked Movie Data Base from various countries and integrated that with

The goal of our work can thus be summarized as follows: To feedback and har- vest knowledge gained from the aftermarket operations documents to help (a) op- erations engineers

Examples that we used to collect the run-time information about types for which we were able to statically infer the type, i.e., at least one message was sent to the variable, or

When stores are closed, sales data accumulated during the day are sent to the headquarters through public communication network, and then stored in data files for the purpose of

(1) As demonstrated using the DPOSS data, the best strategy is to use small number of starting points and relatively large number of random starts.. The best solution is pretty