• Aucun résultat trouvé

Developing proxy radar data with the aid of cloud-to-ground lightning for a nowcasting system

N/A
N/A
Protected

Academic year: 2021

Partager "Developing proxy radar data with the aid of cloud-to-ground lightning for a nowcasting system"

Copied!
73
0
0

Texte intégral

(1)

Developing Proxy Radar Data with the Aid of Cloud-to-Ground Lightning for a Nowcasting System

by Erin B. Munsell

Submitted to the Department of Earth, Atmospheric and Planetary Sciences in Partial Fulfillment of the Requirements for the Degree of

Bachelor of Science in Earth, Atmospheric and Planetary Sciences at the Massachusetts Institute of Technology

May 8 th2009

Copyright 2009 Erin B. Munsell. All rights reserved.

The author hereby grants to M.I.T. permission to reproduce and distribute publicly paper and electronic copies of this thesis

and to grant others the right to do so.

Author-Signature

redacted

Department of Earth, Atmospheric and Planetary Sciences

Certified

by_ Signature

redactedby

May 8

Professor Kerry Emanuel

Reviewed

by

Signature

redacted

Thesis Supervisor Dr. Haig Iskenderian Accepted bySi

gnature

redacted

Lincoln Laboratory Supervisor Professor Samuel Bowring Chair, Committee on Undergraduate Program

MASSACHUSETTS INSTITUTEI

OF TECHNOLOGY The author hereby grants to MIT permission to

repoduce and to distrbute publicly paper and

T 2

1

electronic copies of this thesis document in

0CT

2 4

0

17]

Ahole or in pad in any medium now known or rafer created.

(2)

MITLibraries

77 Massachusetts Avenue

Cambridge, MA 02139 http://Iibraries.mit.edu/ask

DISCLAIMER NOTICE

Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available.

Thank you.

The following pages were not included in the original document submitted to the MIT Libraries. This is the most complete copy available.

(3)

Developing Proxy Radar Data with the Aid of Cloud-to-Ground Lightning for a Nowcasting System

by Erin B. Munsell Submitted to the

Department of Earth, Atmospheric and Planetary Sciences May 8t, 2009

In Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Earth, Atmospheric and Planetary Sciences

ABSTRACT

Air traffic managers need up to date nowcast information over the entire CONUS for efficient operations in the National Airspace System. In areas of degraded or no radar coverage, cloud-to-ground lightning (provided by the National Lightning Detection Network) can provide valuable information through the creation of proxy vertically integrated liquid (VIL) and echo tops (ET). To develop these lightning-VIL and lightning-ET relationships, analysis of the lightning and radar data was done in "climate zones" throughout the CONUS, due to the potential for different lightning behavior in different areas of the country. After a quantile analysis revealed differences in the data between zones, lightning-VIL. and lightning-ET relationships were developed using a probability matching method for a baseline relationship (all climate zones) and for each individual climate zone. The potential benefits of the inclusion of each zone were analyzed through a bootstrap testing of the proxy VIL and proxy ET models, and performance was assessed using a system of binary scoring. For a given lightning flash rate, VIL values in the Mid-Latitude Land West zone were considerably lower than in other zones. The Mid-Latitude Land West zone also showed a noticeable improvement in the performance of the proxy VIL model. For a given lightning flash rate, ET values in the Latitude Water zone were considerably lower than in other zones. The Mid-Latitude Water zone appeared to provide a statistical improvement in the proxy ET model, but because of a lack of data in this zone on the days chosen for model testing, this improvement was not noticeable in the overall performance of the proxy ET model and needs to be investigated further.

Thesis Supervisor: Professor Kerry Emanuel

Title: Professor of Meteorology, Department of Earth, Atmospheric and Planetary Sciences

(4)

ACKNOWLEDGMENTS

I would like to thank the following people who assisted with the completion of this thesis:

Dr. Haig Iskenderian, for his nearly constant supervision, his willingness to answer all of my questions, and his meticulous suggestions to improve my writing.

Dr. Marilyn Wolfson, for suggesting this project as my thesis and giving me the opportunity to do this work at Lincoln Laboratories.

Professor Kerry Emanuel, for his stylistic comments and suggestions. Jane Connor and Garrett Marino, for helping with the writing process.

My family, Chris and the girls, for encouraging me throughout the entire year.

And to Hurricane Emily - if you had not brushed the Carolina Coast in the summer of 1993, with a 1% chance of making landfall at Long Beach Island, New Jersey where my family was vacationing at the time, I may have never fallen in love with the weather and my dreams of becoming a meteorologist may have never been born.

(5)

CONTENTS

List of Tables and Figures 5

1 Introduction 6

1.1 Motivation and Goals 6

1.2 Previous Work 13

2 Climate Zone Analysis 16

2.1 Sources of Data 16

2.2 Organizing the Data 16

2.3 Climate Zone Creation 20

2.4 Initial Statistical Results and Analysis 23

2.5 Quantile Analyses of Data for Each Climate Zone 26 2.6 Quantile Analyses of Data for Each Climate Zone and Time of Day 31

3 Development of the Proxy Relationships 35

3.1 Explanation of the Probability Matching Method 35

3.2 Application of the Probability Matching Method 36

3.2.1 Lightning-VIL Relationships 36

3.3.2 Lightning-ET Relationships 38

3.3 Assessing Zone Differences 41

3.3.1 The Bootstrap Technique and Other Important Statistics 41 3.3.2 Differences in the Lightning-VIL Relationships 46 3.3.3 Differences in the Lightning-ET Relationships 48

4 Applications of Relationships 50

4.1 Proxy VIL Maps and Binary Scoring 50

4.2 Proxy ET Maps and Binary Scoring 57

4.3 Evidence of Improvement in Model Performance from Zone Relations 63

4.3.1 VIL and the Mid-Latitude Land West Zone 63

4.3.2 ET and the Mid-Latitude Water Zone 65

5 Conclusions and Future Work 67

5.1 Conclusions 67

5.2 Future Work 68

(6)

TABLES AND FIGURES

TABLES

Table 1: Summary of Ways to Express VIL Table 2: ET Colorbar from the CIWS System

Table 3: List of Eleven Case Days Used in this Study

Table 4: Climate Zones and their Numbering Used in this Study

Table 5: List of Four Case Days Used for Initial Analysis (highlighted in yellow) Table 6: Comparison of Baseline Lightning-VIL Relationships to Individual

Zone Lightning-VIL Relationships in Terms of Model Performance for Each Zone

Table 7: Comparison of Baseline Lightning-ET Relationships to Individual Zone Lightning-ET Relationships in Terms of Model Performance for Each Zone

FIGURES

Figure 1: Example of VIL mosaic from April 2nd, 2009 at 22 UTC

Figure 2: Example of ET mosaic from April 2nd, 2009 at 22 UTC

Figure 3: Radar and NLDN Coverage

Figure 4: Situations where proxy VIL could be beneficial Figure 5: Lightning Strikes and Flash Rates

Figure 6: Flow chart of data organization process

Figure 7: Methodology used to calculate smoothed lightning flash rate or strength Figure 8: The five climate zones used in this study

Figure 9: Mean number of days per year with thunderstorm activity from 1951-75 Figure 10: Lightning Flash Rate (# flashes/6 min) vs. Climate Zone

Figure 11: VIL (count) vs. Climate Zone Figure 12: ET (kft) vs. Climate Zone

Figure 13: Mean Relative Humidity (%) vs. Climate Zone

Figure 14: Lightning Flash Rate (# flashes/6 min) vs. Time (UTC) for each climate Zone

Figure 15: ET (kft) vs. Time (UTC) for each climate zone

Figure 16: Quantitative Relationships between Lightning Flash Rate (flashes/6 min) and VIL (count)

Figure 17: Quantitative Relationships between Lightning Flash Rate (flashes/6 min) and ET (kft)

Figure 18: Contingency table for binary scoring when forecasting Figure 19: Histograms of POD, FAR and CSI scores

Figure 20: Proxy VIL and Binary Scoring

Figure 21: Statistical summary of proxy VIL performance for August 5th, 2008

Figure 22: Statistical summary for proxy VIL performance for August 24 th, 2008

Figure 23: Proxy ET and Binary Scoring

Figure 24: Statistical summary of proxy ET performance for August 5 h 2008

Figure 25: Statistical summary for proxy ET performance for August 2 4 h 2008

Figure 26: Proxy VIL performance when the Zone 5 relationship is included and not Figure 27: Proxy ET performance when the Zone 2 relationship is included and not

8 9 17 22 24 47 48 8 10 11 12 14 18 19 20 22 27 28 29 30 32 33 38 40 42 45 52 54 55 59 61 62 64 66

(7)

1

Introduction

1.1 Motivation and Goals

Finding someone in the United States today who has flown in an airplane and who has never experienced a weather-related delay is a tricky task. Many of the delays that are disruptive to the flying public and are also so costly to the airlines occur during the summer months when convective weather is frequent and the National Airspace System (NAS) is operating at or near capacity. In these active convective situations, FAA flight controllers need to make decisions on how to tactically (0-2 hours into the future) re-route aircraft to maintain safe and efficient operations. The air traffic controllers therefore require accurate, reliable short-term (0-2 hr) forecasts of convective weather.

To address these traffic management needs, the Weather Sensing Group at MIT's Lincoln Laboratories in Lexington, Massachusetts has developed the Corridor Integrated Weather System (CIWS) (Evans and Ducot, 2006). CIWS ingests data from various sources in order to produce 0-2 hour weather forecasts that help identify these potential problem areas. The sources of ingested data include weather radars (Next Generation

Radar - NEXRAD, Terminal Doppler Weather Radar - TDWR, and Canadian radar),

surface observations, lightning detection sensors (National Lightning Detection Network

- NLDN), environmental satellites (Geostationary Operational Environmental Satellites

-GOES) and numerical models (Rapid Update Cycle - RUC). The air traffic controllers consult the 0-2 hour forecasts to determine where planes should be rerouted in order to minimize delays. CIWS also provides a two hour loop of the prior locations of storms as depicted by the radar, and a two hour forecast of the future radar image. The past and

(8)

current weather component of CIWS provide current situational weather awareness, and the forecast component provides insight into future weather impacts to traffic managers. Real-time products that are also available through the CIWS online interface include a vertically integrated liquid (VIL) mosaic and forecast, an echo tops (ET) mosaic and forecast, a winter precipitation phase mosaic and forecast, storm growth and decay trends, forecast accuracy scores, forecast contours, a satellite mosaic, and storm motion/storm extrapolated position.

All of the real-time products in the CIWS system are very interesting and useful in different ways. However, the most important features for this work are the VIL and ET mosaics, the forecasts, and the lightning maps. VIL is a measure of how much liquid precipitation is in a vertical column of air from the ground to the top of the clouds, as detected by radar. Higher values of VIL provide an indication of strong updrafts and storm core locations. The convective storm cores often contain hazards to aviation such as turbulence and lightning, and are therefore regions for aircraft to avoid. Figure 1 is an example of a VIL mosaic from the CIWS online interface. The colors correspond to different levels of VIL ranging from I to 6, with the lower levels (levels 1 and 2 - greens) corresponding to smaller amounts of precipitation and the higher levels (levels 3 to 6

-yellow, orange and red) corresponding to higher amounts of precipitation. These levels are determined using amounts of VIL measured in units of kg/M2. Lincoln Laboratories

transforms the VIL in kg/M2 to a count, which is created by linearly stretching the VIL

level cut-offs measured in kg/M2 to a scale from 0-255. The different ways in which VIL

(9)

summarized below in Table 1. Air traffic management pays particular attention to areas where the VIL is level 3 or higher since these areas represent the storm cores.

Table 1: Summary of Ways to Express VIL.

Level 1 2 3 4 5 6 VIL (kg/M2) 0.14-0.76 0.76-3.5 3.5-6.9 6.9-12.0 12.0-32.0 32.0+ VIL (count) 13-72 73-132 133-158 159-180 181-218 219-255 Colorbar

2

3

4

5

4 a7

iLdot

WIntbr PreIp E1ho TUp 'M - U EChOTOp Thg4 GAD TreDs Fa OCCNVS VaCatMOn AUry

Figure 1: Example of VIL mosaic from April 2"d, 2009 at 22 UTC. VIL is displayed

according to levels on a scale from 1 to 6. Areas that are yellow, orange or red

correspond to VIL levels in the 3-6 range. VIL in these levels represent storm cores and are dangerous to aircraft and should usually be avoided.

(10)

The second important CIWS forecast product for this work, echo tops, is a measure of the height of the storm, or, more specifically, the height of the 18 dBZ radar reflectivity. The vertical extent of a storm is a very important quantity for aviation purposes because a plane can fly through clouds as long as the plane flies over the storm core. Therefore, a storm with an ET height of about 30,000 ft does not create a

significant problem for commercial aircraft that routinely fly above 30,000 ft. ET heights above 30,000 ft however, can be hazardous to aircraft and air traffic controllers and pilots will re-route the aircraft around these cores if possible. Figure 2 gives an example of the ET mosaic with a lightning map overlaid on top of it from the CIWS online interface. The colors on this mosaic correspond to values of kilofeet for ET and are outlined below in Table 2. The lightning map overlaid on the ET mosaic gives positions of lightning flashes that have been detected in the past 6 minutes and are marked by the white plus symbols (+). Note that most of the lightning activity is located near the storm cores (Carte and Kidder, 1977), with far less cloud-to-ground lightning in the stratiform regions of the anvil.

Table 2: ET Colorbar from the CIWS System.

ET (kft) 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50+

(11)

205:00 20500 -0400 105:00 21: 00 30:00_ _ _

Precp We 0 Pep 9I0 tifn 5 S00rmMot5n Ec00Thp Tas & Troso F io p oouro, vennaooon Acray

-Figure 2: Example of ET mosaic from April 2"n, 2009 at 22 UTC. ET above 30,000 ft are

indicated by the yellows, oranges and reds and should in most circumstances be avoided by aircraft. The white plus symbols (+) contained in the red circle, correspond to

lightning strikes that have been recorded in the last 6 minutes. Note that most of the lightning activity is located near the storm cores, with little cloud-to-ground lightning in the stratiform anvil regions.

The

CIWS

system relies heavily upon high quality radar data to produce its forecasts of VIL and ET. However, there are parts of the CONUS where the radar coverage is unreliable, such as in the Mountain West due to beam blockage by the mountains, and in offshore regions due to distance from the land-based radars, particularly in the southeastern part of the United States and the Gulf of Mexico. In addition, sometimes radar data can be temporarily unavailable to the CIWS system due to communication problems. In these situations of degraded or nonexistent coverage, the situational awareness and forecast aspects of CIWS are severely limited and the usefulness of CIWS to traffic managers drops severely.

(12)

-A

In the regions of degraded radar, the lightning data coverage can be much better than the radar coverage. Figure 3 shows the coverage of both the National Lightning Detection Network (Cummins et al., 1998b) and of the radars across the CONUS on a typical day. By comparing the two coverages, it can be seen that certain regions of poor and unreliable radar coverage would benefit from the use of the lightning data to estimate VIL and ET. Figure 4 shows several examples of actual situations where there is

degraded or no radar coverage, but there is still lightning data, and therefore the

possibility exists to extract useful storm information from the lightning data. The purpose of this research is to use cloud-to-ground lightning data provided by the NLDN to

improve nowcasts of convection by depicting VIL and ET in situations where the radar data are degraded or unavailable.

/0

Figure 3: Radar and NLDN Coverage. (a) Radar coverage on a typical day over the CONUS. The gray regions are areas of good coverage, the yellow regions indicate areas where the coverage is degraded, and the blue regions are areas where there is no coverage at all. Notice how there are many regions in the Mountain West that have degraded radar due to beam blocking by the mountains, and how the coverage degrades rapidly in the off-shore regions. (b) NLDN flash detection efficiency measured in percent of lightning recorded. Parts of the Mountain West measure at least 90 percent of the lightning strikes in the same regions where there is degraded radar coverage, providing very useful information for depicting ET and VIL (from Cummins et al., 1998a).

(13)

Figure 4: Situations where proxy VIL could be beneficial. CIWS display showing examples of a case from the Mountain West (a), off the coast of Florida (b) and off the coast of the Carolinas (c) where there is cloud-to-ground lightning data in areas of degraded (dark

gray) or no (black) radar coverage. White points represent lightning strikes in prior 6 minutes and greens, yellows and reds indicate varying levels of VIL. Dates and times are noted.

(14)

1.2 Previous Work

The idea of using cloud-to-ground lightning to depict VIL and ET is not new. Weber et al. (1998) performed a linear regression of NLDN data with radar VIL for three convective cases and noted the possibility of using NLDN data to fill in gaps in the NEXRAD coverage. Mueller et al. (1999) created empirical relationships between NLDN lightning and radar quantities from visual comparison of the two data sets. Their purpose for these relationships was to improve the radar data latency in the National Convective Weather Forecast (NCWF). Megenhart et al. (2004) mapped lightning data to a 4-km grid to create relationships between lightning strikes and VIL, which serve as an input to a hazard detection field called the National Convective Weather Detection (NCWD).

Weygandt et al. (2006) convert lightning data to radar reflectivity for use in the RUC model assimilation scheme to improve the model initialization of clouds, hydrometeors and convection. Iskenderian (2007) developed relationships between cloud-to-ground lightning and ET and VIL using data provided by the NLDN, but only for the Northeast Corridor of the United States, because the work was based upon data from the CIWS system whose domain prior to June 2008 only included that Northeast portion of the country. Figure 5 is an example from this work which shows that the lightning-VIL relationships captured operationally-significant features of the storm, as shown by the jet routes and corresponding radar image.

(15)

>50 >30 >20 Decayin Wstprrms >10 >5 ,ture storm Gap Jn squall 0

Figure 5: Lightning Strikes and Flash Rates. (a) Lightning strikes for a six minute period on 27 July 2007, ending at 2245 UTC. (b) Smoothed lightning flash rate (flashes/6 minutes) for same time period on 27 July 2007. (c) Radar VIL at 2259 UTC. White and blue lines are arrivals and departures, respectively, in the area in the last 30 minutes. Notice how the cores in the squall line appear in the flash rate and depict areas where the aircraft did not fly, as well as the gap in the line, through which planes were routed (Iskenderian 2007).

(16)

In June 2008, the CIWS domain was expanded to the CONUS, but the relationships that hold in the Northeast Corridor may not apply to the entire CONUS region, due to differing climates. The regions of the CONUS that are particularly different include the Mountain West and the sub-tropical southeastern Gulf, where in addition to having different climates from the Northeast, also have significant areas of degraded or no radar coverage. In order to determine if there are any differences between the VIL and ET relationships derived from cloud-to-ground lightning across the CONUS, preliminary statistical work will be done by dividing the country into "climate zones", or regions of the country that could have potentially different relationships because of their climates (See Section 2). Then, once a sense of which climate zones might have different relationships is obtained, the actual quantitative relationships for each climate zone will be developed using a probability matching method, which after testing, will determine which, if any, climate zone relationships are different from the CONUS-wide relationships (See Section 3). Once these relationships are finalized, proxy VIL and ET fields can be derived from lightning data for a given day, and these proxy fields can be compared with the actual VIL and ET using a binary scoring system to visually and quantitatively see how accurate the proxy fields are (See

Section 4). Finally, conclusions and future potential work on this project are discussed in Section

(17)

2 Climate Zone Analysis

2.1 Sources of Data

The various sources of data used for the climate zone analysis include time and locations of cloud-to-ground lightning strikes, VIL, ET, convective available potential energy (CAPE), cloud top potential and relative humidity. Information on lightning strike locations, polarity and strength is provided by the NLDN which is provided to MIT Lincoln Laboratory by Vaisala (http://www.vaisala.com/). The radar data, which includes both the VIL and ET, are acquired from NEXRAD, while both the CAPE and the cloud top potential are obtained from the RUC (Benjamin et al., 2004) and the Space-Time Mesoscale Analysis System (STMAS) (Xie et al., 2005). All of this information is stored in the Lincoln Laboratory CIWS archives, which provide a central data resource and allow for convenient access to it.

2.2 Organizing the Data

Before beginning statistical analysis to illustrate the relationships of the lightning data, the environment, and geographical features to VIL and ET, the data had to be organized in a manner that allows easy analysis. Eleven case days from the summer (June, July and August) of 2008 were selected for this analysis. A list of these days is shown below in Table 3. These days were selected because they had a lot of lightning activity that was spread out over many different regions of the country, which provided enough data to derive statistically meaningful

(18)

Table 3: List of Eleven Case Days Used in this Study.

June 15 th 2008 August 7th 2008

July 6th 2008 August 8 th2008

July 23 rd 2008 August 10"', 2008

July 27 th, 2008 August 1i', 2008

July 28"', 2008 August 15 ', 2008

July 31", 2008

In order to prepare the data for analysis, three steps had to be taken that are outlined in Figure 6. First, for each case day considered, archive structures of the radar and stability data (CAPE, cloud top potential and relative humidity) were constructed. An archive structure groups together data files for a given day extracted from the Lincoln Laboratory archives, and allows for easier data access and processing. Only points less than 230 km from the radar were considered for analysis and included in the archive structures. These points are within the reach of the 'near range' radar, which contains radar data of good quality. Beyond this range, the radar data quality degrades. All of these radar and stability quantities were gathered only in regions extending 9 km outward from the lightning strikes.

After making the archive structures, the lightning data for each case day were placed on the same 1 km-by-I km horizontal grid as the other quantities (VIL, ET, CAPE, cloud top potential, relative humidity) for easy comparison. Finally, a database was constructed which contained all of the extracted data for each case day in one structure, which made data analysis more manageable. In this database, information was included such as lightning density, lightning strength, VIL, ET, CAPE, cloud top potential, mean relative humidity, time of day, latitude and longitude, terrain and whether or not the lightning strike occurred over land or water. In addition to the lightning densities and strengths for all of the lightning strikes, the structure also contained

(19)

the densities and strengths of just the negatively charged strikes, as well as the data for just the positively charged strikes.

Figure 6: Flow chart of data organization process. These steps are performed in order to produce a data structure that contains all of the necessary information and is easy to use and manipulate for statistical analysis.

In addition to calculating the lightning densities and strengths in each 1 km-by-1 km pixel, the same quantities were also calculated by applying a circular kernel of a diameter equal to 17 km at each point in the domain. The 17 km diameter was chosen to approximate the scale of a convective core within a storm. Therefore, these smoothed lightning fields represent lightning characteristics on the scale of convective cores. To calculate this smoothed field, the pixel values of lightning density were summed inside the 17 km pixel to assign a flash rate in the kernel to the center point of the kernel. In the case of the lightning strength, the maximum inside

Create Archive Structure: (Easy Access) -File locations in Lincoln Laboratory archives (Radar, Lightning, RUC, Stability) -File times -Data product names (VIL, ET, CAPE, etc.) Process Lightning Data: -Average in time (6 min) -Sum # of flashes at pixel (Density) -Take mean strength at pixel -Place on I km-by-1 km grid -Positive and negative density and strength stored separately

Assemble Data Structure: -Combine lightning, radar, RUC with latitude, longitude, terrain, water/land -Smooth lightning in kernel to yield flash rates -Sub sampling of points

(20)

the circular kernel was assigned to the center point. Figure 7 provides an example of how the smoothed lightning density and smoothed lightning strength fields are constructed. All of the above processing was done using MATLAB.

Figure 7: Methodology used to calculate smoothed lightning flash rate or strength. (a) A 1 km2 lightning density field in the process of being smoothed. Color scheme is as follows: Blue - 1 flash/6 min/km2, Green - 2 flashes/6 min/km2, Yellow - 3 flashes/6 min/km2, Orange - 4

flashes/6 min/km2, Red - 5 flashes/6 min/km2.Note the 17 km circular kernel surrounding the four lightning density data points. The smoothed lightning flash rate for the kernel at that point (X - center of the circle) is the sum of the lightning densities within the kernel. Therefore, a blue (1 flash/6 min), a green (2 flashes/6 min), an orange (4 flashes/6 min) and a red (5 flashes/6 min) data point produces a smoothed lightning flash rate of 12 flashes/6 min. (b) A lightning strength field in the process of being smoothed. The same color scheme applies as before, except units are now in kA for mean strength at a pixel. The smoothed lightning strength is calculated by taking the maximum strength within the kernel, so in this example the smoothed result at that point (X -center of circle) would be 5 kA.

a.

+

+0j)

17 km

+(5)

+(4) +(2)

b.

+

+()

+

17

km

+(

5

)

+(4) +(2)

(21)

-2

2.3 Climate Zone Creation

The next step in the data analysis was to divide the country into 'climate zones'. These climate zones were chosen in order to search for differences in both the lightning and radar characteristics in each zone that would provide insight into which zones have lightning vs. VIL and lightning vs. ET relationships that could potentially differ from a CONUS-wide set of relationships. The five climate zones that were considered were the Mid-Latitude Land East, the Mid-Latitude Water, the Sub-Tropical Land, the Sub-Tropical Water and the Mid-Latitude Land West. Figure 8 shows these divisions.

Figure 8: The five climate zones used in this study. M-L W is part of mid-latitude water and S-T W is part of sub-tropical water. These climate zones were created in order to search for

differences in lightning characteristics across the country. Zones included regions of the country that typically have different weather patterns, such as the moist sub-tropics and the drier

(22)

The division between the Mid-Latitude Land East and West zones was chosen to be 102'W. Besides separating the mountainous and arid western half of the country from the flatter and moister eastern half of the country, this division is also aligned with a relative minimum in the maximum number of days per year with thunderstorm activity, as shown in Figure 9. The climate zones are also marked on Figure 9 and the numbering is explained in Table 4. The

102'W line separates a relative maximum in thunderstorm days in the western region of the country of 40 days per year, from a relative maximum in the central region of the country, which is more likely to experience thunderstorm activity of, on average, a maximum of at least 50 days per year. The more thunderstorm activity there is in a region, the more lightning activity there usually is as well, which could lead to different lightning-VIL and lightning-ET relationships in these regions.

The division between the Mid-Latitude Land East and the Sub-Tropical Land zone was chosen to be 32'N. South of this line is the area of the country that usually has more moisture than north of this line, and therefore increased thunderstorm activity and generally more lightning. This can also be seen in the plot of the number of days per year with thunderstorm activity (Figure 9) as south of 32'N the number of days with thunderstorm activity is 60-70 days per year on average, while north of this line, thunderstorm activity occurs on less than 60 days per year, on average. Because of this increase in thunderstorm activity and therefore lightning south of 32'N, another division in zones was made.

(23)

Table 4: Climate Zones and their Numbering Used in this Study.

Number Zone

1 Mid-Latitude Land East

2 Mid-Latitude Water

3 Sub-Tropical Land

4 Sub-Tropical Water

5 Mid-Latitude Land West

10 20 a 13 13 29 31 34 14 2 23 32 3- 3 3 2 4 4 o * . so 3 10 70 4020

Figure 9: Mean number of days per year with thunderstorm activity from 1951-75. The red lines indicate 102'W and 32'N. Numbers indicate various climate zones outlined in Table 4. Notice how the climate zones were created to divide regions of the country with varying amounts of thunderstorm activity, and therefore potentially different lightning behavior (Colman 1990).

(24)

Besides dividing the country based on lightning activity over land, separate climate zones were also created for lightning data over the water. Lightning activity is usually much weaker in strength and less frequent over water compared to land (Williams and Stanfill, 2002). In

addition, a Sub-Tropical Water zone and a Mid-Latitude Water zone were created because of the differences in thunderstorm activity over the land in each of these regions, as explained above. It is likely that because the thunderstorm activity is different over land, there will be differences in lightning behavior over the water in these two regions as well. These two water zones were also created because of the differences in sea surface temperatures in the two regions, which could also contribute to differences in lightning activity.

2.4 Initial Statistical Results and Analysis

In order to begin the statistical analysis of the lightning data for the various climate zones, four (highlighted in yellow in Table 5) out of the eleven summer case days were selected, in order to explore potential signals in the data, and refine the data processing technique. These days were chosen because they contained a lot of lightning in both the eastern and western halves of the country, as well as lightning over the land and water in both the mid-latitude and sub-tropical regions. Therefore, in just these four days, there were from 500 to 10,000 lightning strikes in each of the five climate zones, enough to perform the initial statistical analysis.

(25)

Table 5: List of Four Case Days Used for Initial Analysis (highlighted in yellow).

June 15th, 2008 August 7 ,2008

July 6h, 2008 August 8u, 2008

July 2 3rd, 2008 August 1 0th, 2008

July 27w, 2008 August 1 m, 2008

July 28'", 2008 August 15'h, 2008

July 31s, 2008

Once these four case days were chosen and the databases were setup, some basic statistical analysis was performed on the data. For each of the fields contained in the databases, the mean, standard deviation, maximum value and minimum value for each climate zone were calculated. In addition to categorizing these fields based on climate zone, statistics were also produced as a function of time of day that the lightning strike occurred and as a function of terrain, expressed as the distance above sea level of each point where a lightning strike occurred.

Upon looking more closely at the compiled statistics, it was noticed that the four basic statistical quantities of mean, standard deviation, maximum and minimum were not capturing the structure and distribution of the data to the extent needed. For example, the mean flash rate in each climate zone was severely affected by the large quantity of flash rates equal to 1 flash/6min in the dataset; this made it more difficult to define the higher end of the distribution, which

contains strong storms and is therefore of operational importance. Developing relationships between these stronger values of lightning density and VIL and ET is important when deriving a VIL and ET proxy from the lightning, because most users will want to know where the stronger storms are located. In addition, the maximum and minimum values for each field were almost always the same for each zone, which was to be expected and did not provide much new

(26)

information. For example, the lightning strength statistics in each zone always had a minimum close to zero, and the VIL statistics in each zone always had a maximum near 255, which is the maximum value, or the heaviest rain, on the VIL scale. Therefore, the simple statistical analysis of the lightning data from the four chosen case days failed to capture what the distribution of the data really looked like, as the 'tails' of the distributions were not represented in the statistics. In order to understand more completely what the data looked like, another statistical approach had to be taken.

(27)

2.5 Quantile Analyses of Data for Each Climate Zone

In order to get a better sense of the distributions, quantiles for each field and climate zone were calculated using all eleven case days. The quantiles that were used were the 5th percentile, the 25th percentile, the median or the 50th percentile, the 75th percentile and the 95th percentile. These quantiles do a more accurate job at capturing the shapes of the different fields. A series of bar graphs was generated in order to visualize the data. Figure 10 is an example of such a bar graph, with six vertical bars along the x-axis, one for statistics from all the data across all zones and one for statistics from each climate zone, and the lightning flash rate in the 17 km diameter kernel vertically on the y-axis. The pink portion of each bar displays the lightning flash rate values up to the 2 5th percentile, the orange portion shows the region of lightning flash rates between the 25th percentile and the median, and the yellow portion of the bar shows the lightning flash rates between the median and the 7 5th percentile. Besides lightning flash rate, bar graphs

like this were created to show information in each climate zone about the lightning strength, VIL, ET, CAPE, cloud top potential and mean relative humidity. The resulting bar graph that shows VIL vs. Climate Zone is Figure 11, the bar graph that shows ET vs. Climate Zone is Figure 12 and the bar graph that shows Mean Relative Humidity vs. Climate Zone is Figure 13.

(28)

10 9- 87 -C 4 - 3- 2-21 --- 7 0

All Zones ML-LandE ML-Water ST-Land ST-Water ML-LandW

Climate Zone

Figure 10: Lightning Flash Rate (# flashes/6 min) vs. Climate Zone. The pink bar shows the

0-25th percentile, the orange bar shows the 25 -_50h percentile and the yellow bar shows the 5 0

1h-7 5th percentile. Note that the distributions for each zone are very similar with all five zones

having a 25 percentile equal to 1 flash/6 min and every zone has a median equal to 2 flashes/6 min as well. The only differences appear when looking at the 75f percentiles, as the

Mid-Latitude Water zone has a flash rate slightly higher (5 flashes/6 min) than the other zones and the Mid-Latitude Land West zone has a flash rate slightly lower (3 flashes/6 min) than the other zones.

In Figure 10, notice how the distributions of lightning flash rate in each climate zone are nearly the same, as they all have the same 25h percentile and median. Although the mid-latitude water zone has the highest 7 5th percentile at 5 flashes/6 min, this may be due to a single very strong system from one of the eleven case days that retained a lot of its characteristics once it

moved off shore. A system does not usually persist as long as this one did, and that storm may be slightly skewing the results. Accounting for this, the lightning flash rates across each of

(29)

250 200-C,, 0) 150--c

Figure 11: VIL (count) vs. Climate Zone. Bar colors as in Figure 10. Looking from left to right,

the VIL counts slowly but steadily decrease in the 25 t percentiles, the medians and the 75th

percentiles. The highest VIL values occur in the Mid-Latitude Land East and Mid-Latitude

Water zones, while the lowest VIL values occur in the Mid-Latitude Land West zone.

A few interesting facts about the VIL distributions for each climate zone can be seen in

Figure 11. The VIL values are highest in the Mid-Latitude Land East zone and decrease

as one

looks from the Mid-Latitude Land East zone to the Mid-Latitude Land West zone. Therefore, the

Mid-Latitude Land East and Mid-Latitude Water zones have the highest VIL values, the

Sub-Tropical Land and Sub-Sub-Tropical Water zones are next, while the Mid-Latitude Land West zone

generally has the lowest values of VIL. This indicates that the VIL values in the eastern

half of

(30)

bU 40-0 U 0-1

All Zones ML-LandE ML-Water ST-Land ST-Water ML-LandW

Climate Zone

Figure 12: ET (kft) vs. Climate Zone. Bar colors as in Figure 10. The Mid-Latitude Land East, Sub-Tropical Land, Sub-Tropical Water and Mid-Latitude Land West zones have very similar values for each quantile. The Mid-Latitude Water zone however, has a noticeably lower 50t and 7 5th percentile.

When the quantile distributions for ET in Figure 12 are compared between the zones, similar values for the 2 5th, 50' and 75" percentiles are noted for the Mid-Latitude Land East,

Sub-Tropical Land and Sub-Tropical Water zones. The most noticeable difference is seen when the Mid-Latitude Water results are examined, as the 50th and 751 percentiles of ET are lower than any other zone. In addition, the Mid-Latitude Land West zone has the next lowest 75th percentile out of all of the zones, so the Mid-Latitude Water zone and the Mid-Latitude Land West zone are the two zones most likely to have relationships that deviate from the baseline (All Zones) relationship between cloud-to-ground lightning and ET from this analysis.

(31)

100 90-a, 80 60 C 470 30 20 10 -0

All Zones ML-LandE ML-Water ST-Land ST-Water ML-LandW

Climate Zone

Figure 13: Mean Relative Humidity (%) vs. Climate Zone. Bar colors as in Figure 10. The Mid-Latitude Land East and Sub-Tropical Land zones have the highest mean relative humidities, followed by the Mid-Latitude Water and Sub-Tropical Water zones. The Mid-Latitude Land West zone has by far the lowest mean relative humidities, which could be a reason for the lower values of VIL in this zone.

Figure 13 provides the 2 5th, 5 0th and 7 5 th percentiles of mean relative humidity in the 900mb-700mb layer for each climate zone. Both the Mid-Latitude Land East and the

Sub-Tropical Land zones have the highest values of mean relative humidity, while the Mid-Latitude Water and the Sub-Tropical Water zones have mean relative humidity values at similar

percentiles that are slightly lower. The most significant difference in mean relative humidity is seen in the percentiles for the Mid-Latitude Land West zone, where the 2 5th, 5 0th and 7 5th

percentiles of mean relative humidity are lower than any other zone. These lower values of mean relative humidity could be a reason why the values of VIL in this zone are also the lowest of any

(32)

2.6 Quantile Analyses of Datafor Climate Zones and Time of Day

To explore a possible diurnal signal in the data, the seven fields of lightning flash rate, lightning strength, VIL, ET, CAPE, cloud top potential and mean relative humidity were

analyzed by time of day and climate zone. For each of the seven fields, the same five quantiles as before, the 5th percentile, the 25th percentile, the median or 50th percentile, the 75th percentile and the 95th percentile were calculated for each climate zone, as well as for four 6 hour blocks of the day. In other words, the data for each climate zone was further broken down into four categories, depending on what hour in the day the lightning strike occurred. The four time blocks into which the data was divided were 01Z-07Z (evening), 07Z-13Z (overnight), 13Z-19Z (morning) and 19Z-01Z (afternoon). All of these times are in Coordinated Universal Time (UTC). Once again, bar graphs were created to make the data more visually appealing. Shown in Figure 14 are the lightning flash rate values as a function of climate zone and time. Figure 15 shows the ET values as a function of climate zone and time. Each figure contains six bar graphs, one for all of the climate zones together and one for each climate zone, with the four bars representing the 25th

5 0th and 75t' percentiles for each time block. The color scheme is the same as before with the

pink portion representing the data up to the 25th percentile, the orange portion showing the data between the 25th and 5 0th percentiles and the yellow portion showing the data between the 50th

and 75th percentiles. The times indicated on the x-axis are the start times of each time block, so the bar labeled 01Z is for the block 01Z-07Z etc.

(33)

10 8 6 All Zones 2 0 v 1' I 01Z 07Z 13Z 19Z Time (UTC) C

Mid-Latitude Land East

10 8 6 2 0 U 01Z 07Z 13Z 19Z Time (UTC) 10 c E Co C,) 0 -C C,) 0 1k 0 cc -c C,) 0 a: 8 6 4 2 0 Mid-Latitude Water 01Z 07Z 13Z 19Z Time (UTC) Sub-Tropical Land 10 C E -c C,, 0 4k a, 0 cc -c C,) 0 a: 01Z 07Z 13Z 19Z Time (UTC) 8 Sub-Tropical Water 6 [ 4 2

0

01Z

07Z 13Z 19Z Time (UTC)

Mid-Latitude Land West 10 B. 0 _C -W) 0 cc F :

01Z

07Z 13Z19Z Time (UTC)

Figure 14: Lightning Flash Rate (# flashes/6 min) vs. Time (UTC) for each climate zone. The times indicated on the x-axis are the start times of each time block - the bar labeled 01 Z is for the block 01Z-07Z etc. Bar colors as in Figure 10. Notice how the Mid-Latitude Water zone, the Sub-Tropical Land zone and the Sub-Tropical Water zone have their highest lightning flash rates for the 01Z-07Z block. The lightning flash rates in the Mid-Latitude Land West zone appear to be slightly lower than the rest of the zones, especially overnight and into the morning (07Z-19Z).

Figure 14 shows how the different climate zones reach their peak lightning flash rates in different time blocks of the day. The Mid-Latitude Water zone, Sub-Tropical Land zone and the Sub-Tropical Water zone all reach their highest flash rates in the 01Z-07Z block. However, the Mid-Latitude Land East zone reaches its peak lightning flash rate in the 19Z-0 1 Z time block. Based on these bar graphs, there is an indication that both time of day and climate zone can

C E (.0 0 -c C,) c~1 0 0 cc -c C,, 0 a: 10 E .-in) 8 6 4 2 0

(34)

affect the lightning flash rates that are recorded, as different zones have different flash rate distributions over time.

All Zones 60 40 20 0 01Z 07Z 13Z 19Z Time (UTC) Sub-Tropical Land 60 20 01Z 07Z 13Z19Z Time (UTC)

Mid-Latitude Land East 60 40 20 0 01Z 07Z 13Z 1 9Z Time (UTC) Sub-Tropical Water 60 20 01Z 07Z 13Z 19Z Time (UTC) 60 40 [ H w 20 0 Mid-Latitude Water

01Z

07Z 13Z 19Z Time (UTC) Mid-Latitude Land West

60 40 20 0 01Z 07Z 13Z19Z Time (UTC)

Figure 15: ET (kft) vs. Time (UTC) for each climate zone. The times indicated on the x-axis are the start times of each time block - the bar labeled 01Z is for the block 01Z-07Z etc. Bar colors

as in Figure 10. Both the Mid-Latitude Land East zone and the Mid-Latitude Land West zone have a minimum in their ET percentiles for the 13Z-19Z (morning) time block, and maximums

in the ET percentiles from 19Z-07Z (afternoon into evening). Also, the Sub-Tropical Water zone shows a slight increase in ET during the 07Z-13Z and 13Z-19Z time blocks, while the Sub-Tropical Land zone shows a slight decrease in these same time blocks.

(35)

Figure 15 displays the effects of time of day on ET for each climate zone. The biggest impacts are in the Sub-Tropical Water zone, where an apparent increase in ET values can be seen in the 07Z-13Z and 13Z-19Z bars, and in the Sub-Tropical Land zone, where a definite decrease in ET values can be seen in the 07Z-13Z and 13Z-19Z bars. Also, both of the Mid-Latitude Land zones have minimums in the ET percentiles for the 13Z-19Z (morning) time block and

maximums in the ET percentiles for the 19Z-0 1Z (afternoon) and 01 Z-07Z (evening) time blocks.

Through this quantile analysis, an indication that there may be differences in the

lightning-VIL and lightning-ET relationships for the various climate zones was found. The lower values of VIL and mean relative humidity in the Mid-Latitude Land West zone are significant enough that a more quantitative difference in the lightning-VIL relationship should be persued. The lower values of ET for the Mid-Latitude Water and Mid-Latitude Land West zones also provide some indication that there may be quantitative differences in the lightning-ET relationships for these two zones. Based on this evidence that there could be significant

differences in the lightning-VIL and lightning-ET relationships for different climate zones, the proxy relationships should be developed by maintaining these zones.

(36)

3 Development of the Proxy Relationships

3.1 Explanation of the Probability Matching Method

While performing the climate zone analysis provided some insight on how lightning relates to both VIL and ET in each climate zone, the application of lightning as a proxy for VIL and ET will require quantitative relationships. In order to develop these quantitative relationships from the data and identify differences in these relationships among each of the climate zones, the probability matching method (PMM) technique (Rosenfeld et al., 1993) was used. Inspection of scatter plots revealed non-linear relationships between lightning and VIL and lightning and ET, and the PMM method does not assume a linear relationship. The goal in using the PMM

technique was to construct the relationships between cloud-to-ground lightning and VIL and cloud-to-ground lightning and ET by using pairs of lightning and VIL (or ET), L; and VIL , and

matching them to the cumulative distribution functions (CDFs) of lightning and VIL (or ET) at the f- probability. This concept is shown more explicitly in Equations 1 and 2:

K.

Po i iFdV (

= j PL )d L, and (1)

P1 P'E7-dET = JP(L)dL. (2)

In Equations 1 and 2, P() is a probability density function, and VIL, and ET, are the low threshold values for VIL and ET. 1L is identified as the lowest flash rate that can be detected, which is simply 1 flash/6 min. To find the matching low threshold value VIL., the CDFs of lightning and VIL for the data for all of the zones were examined. For example, for the All Zones data, L 7= 1 flash/6 min represents the 4 6th percentile in the lightning CDF. Therefore, the low

threshold value of VIL, I 'L is the value of VIL corresponding to the 46th percentile for the VIL CDF. In this data set, this value is .7= 132. This low threshold matching is performed for each zone to recalibrate the relationships by removing the data below the threshold values of

(37)

lightning and VIL. Equations 1 and 2 are then applied to match the CDFs of lightning flash rate and VIL at like probabilities to yield the final quantitative relationships between lightning and VIL. The same process is repeated for the ET data to yield the final quantitative relationships between lightning and ET.

3.2 Application of the Probability Matching Method

The probability matching method was applied to the entire dataset of 11 case days from the summer of 2008 (See Table 3). The lightning and VIL relationships are shown in Figure 16, and the lightning and ET relationships are shown in Figure 17.

3.2.1 Lightning- VIL Relationships

It is seen that VIL increases with flash rate in all zones by looking at the lightning-VIL relationships in Figure 16. It should be noted that the Mid-Latitude Land West zone and possibly the Sub-Tropical Water zone have PMM relationships that differ from the other zones. For the Mid-Latitude Land West zone, a given lightning flash rate is matched to lower values of VIL for lightning flash rates below 12 flashes/6 min and to higher values of VIL for lightning flash rates over 12 flashes/6 min. This is a noticeable difference when compared with the PMM

relationships for the rest of the zones. This difference is consistent with the quantile analysis of VIL for each zone, because as anticipated, the Mid-Latitude Land West region has consistently lower values of VIL in the presence of lightning when compared with the other zones (See Figure 11).

(38)

The only other zone that has a potentially significant difference in its PMM relationship is the Sub-Tropical Water zone. For very high lightning flash rates, the matching VIL values for this zone are lower than for the other zones. This slight difference could also have been

anticipated from the quantile analysis (See Figure 11). Both the 5 0th and the 75th percentile VIL

values are slightly lower for the Sub-Tropical Water zone when compared to the other zones whose PMM relationships look similar (Mid-Latitude Land East, Mid-Latitude Water, and

Sub-Tropical Land). However, because the difference in the PMM relationships occurs at values of lightning flash rate that are very high and are relatively rarely recorded, the impact of this difference may not be noticeable because there will be relatively few flash rates in this region of the PMM relationship. Therefore, in order to determine if there is a clear distinction between the Sub-Tropical Water zone and the Mid-Latitude Land West zone and all other zones, further analysis needs to be done. The eleven day data set will be used to score each model's performance. If a difference in a model's performance is observed relative to the All Zone relationship, that model will be retained. The scoring of the model performance is discussed below in Section 3.3.

(39)

250 200 150 -ML-LandEast ML-Water ST-Land ST-Water ML-LandWest All Zones 100 0 5 10 15 20 25 30 35 40 Flash Rate

Figure 16: Quantitative Relationships between Lightning Flash Rate (flashes/6 min) and VIL (count). The PMM relationships for the Mid-Latitude Land East, Mid-Latitude Water, Sub-Tropical Land and All Zones are practically the same. Slight differences appear in the PMM relationship for the Sub-Tropical Water Zone, and the most noticeable differences appear in the PMM relationship of the Mid-Latitude Land West Zone.

3.2.2 Lightning-ET Relationships

It is seen that the ET heights increase with flash rate in all zones by looking at the

lightning-ET PMM relationships that are plotted in Figure 17. The Mid-Latitude Land East, Sub-Tropical Land, Sub-Sub-Tropical Water relationships appear similar to the All Zone relationship. However, the two zones that may require a different set of lightning-ET relationships are the Mid-Latitude Land West zone and the Mid-Latitude Water zone, which are the two curves that seem to deviate from the pattern of the other zones. The PMM relationship for the Mid-Latitude

(40)

Land West zone predicts ET values that are higher than all of the other zones for a given

lightning flash rate. The impression that the cloud tops are much higher in the Mid-Latitude Land West zone may be in part due to the fact that the elevations in the Mid-Latitude Land West zone are much higher than in any other zone, and therefore cloud bases are higher. This could lead to the echo top heights being about the same in the Mid-Latitude Land West zone as in all of the other zones in terms of height from the ground to the top of the cloud, but because of the high altitude, the echo top heights (measured from sea level) could be larger, which would explain the higher values of ET in the PMM relationship for the Mid-Latitude Land West zone.

The lightning-ET relationship for the Mid-Latitude Water zone also differs from the relationships for the other zones. For all values of lightning flash rate, the matching ET values are significantly lower than the ET values for all of the other zones. This could be in part due to the fact that most storms that are over the Mid-Latitude Water regions have moved over cooler water from off the east coast, and are in a stage of their lifetimes where the storm is becoming weaker, and therefore the echo tops are usually decaying and descending. These differences in the PMM relationships are in line with the quantile analysis of ET vs. Climate Zone in Figure 12, where the Mid-Latitude Water zone had the lowest percentiles out of any of the zones, and the Mid-Latitude Land West zone was the only other zone with any noticeable difference in the values of the percentiles. Based on the PMM relationships and the quantile analysis, it appears necessary to carry a different set of lightning-ET relationships for both the Mid-Latitude Land West and Mid-Latitude Water zones, but model testing still needs to be done and the resulting model performance will be analyzed. The extra sets of relationships will only be kept if the model shows noticeable improvement over the All Zones relationship.

(41)

60 55 -50 45- 40-35 - - ML-LandEast M L-Water -- ST- Land 30- ST-Water ML-LandWest All Zones 25 0 5 10 15 20 25 30 Flash Rate

Figure 17: Quantitative Relationships between Lightning Flash Rate (flashes/6 min) and ET (kft). The PMM relationships for the Mid-Latitude Land East, Sub-Tropical Land, Sub-Tropical Water and All Zones are essentially the same. Differences arise in the Mid-Latitude Land West zone, which predicts values of ET higher than the other zones for a given lightning flash rate, and the Mid-Latitude Water zone, which predicts values of ET significantly lower than the other zones for a given lightning flash rate.

(42)

3.3 Assessing Zone Differences

3.3.1 The Bootstrap Technique and other Important Statistics

In order to test if the apparent differences between the lightning-VIL and lightning-ET PMM relationships for the different zones are significant, further statistical testing needed to be performed. This analysis was done using a statistical technique known as the bootstrap method (Efron 1982). The bootstrap method performs a set number of trials on points selected from the dataset. One key feature of the bootstrap method is that re-sampling is allowed, so once a data point is selected and the relationships are applied, the point is put back into the dataset and can be selected again. In addition, this method requires the user to set a threshold for event

occurrence and non-occurrence and for this analysis, these thresholds have been set at VIL = 143 (count) and ET = 33 kft. To test the PMM relationships, 1000 bootstrap trials were performed. To accommodate memory limitations of the MATLAB bootstrap routine, every 4 0th data point in the dataset was selected for the bootstrap trials. Because the dataset consists of millions of data points, this type of sub-sampling did not sacrifice our confidence in the results. The relationships were still being tested for at least a million data points with the sub-sampling, which were more than enough points to maintain confidence in the results.

To determine which set of relationships proved to be more successful for each zone, the probability of detection (POD), the false alarm rate (FAR), the critical success index (CSI), the Bias, the mean absolute error (MAE) and the root mean squared error (RMSE) from the

bootstrap trials were examined. The POD, FAR, CSI and Bias are all calculated assuming a system of binary scoring. This is a typical way of scoring the accuracy of a model when occurrence of an event is important to the user (e.g. VIL or ET above some level). In this

(43)

situation, the possible results of the VIL or ET predicted by the proxy relationships at each data point can be classified into one of four categories: hit, miss, false alarm, or correctly rejected. These categories can be organized into an I-by-J contingency table, where I= J= 2 and the

categories of the predictions are either "yes" or "no". This contingency table is shown below in Figure 18, and allows for a greater understanding of what a hit, miss, false alarm and correctly rejected prediction means and how the quantities of POD, FAR, CSI and Bias are derived. A hit occurs when an event is predicted to occur and it is observed. A false alarm results when the event is predicted and it does not occur. A miss is when an event occurs that was not predicted, and a correctly rejected event is when nothing is predicted to happen and nothing actually does happen. The thresholds of VIL = 143 (count) and ET = 33 kft were used to determine if a prediction was recorded as a hit, miss, false alarm or correctly rejected. Ideally, a perfect model would only produce counts in the hit and correctly rejected areas of the table, and no misses or

false alarms would be recorded.

Observed

Yes No

SYesa

No d.

Figure 18: Contingency table for binary scoring when of a prediction model. If an event is

predicted and it is also observed (yes and yes), this is considered a hit (a). If an event is predicted

and not observed (yes and no), this is considered a false alarm (b). If an event is not predicted,

but it is observed (no and yes), this is considered a miss (c). If an event is not predicted and it is

not observed (no and no), this is considered a correctly rejected (d). A perfect prediction -would

(44)

Now that the concepts of a hit, a false alarm, a miss and a correct rejection have been explained, the quantities CSI, POD, FAR and Bias can be defined. The critical success index, or CSI, is a way to determine the accuracy of predictions in a situation where the event to be predicted occurs much less often than the event not occurring, as is the case of predicting levels

of VIL and ET. A perfect CSI would be equal to one, while zero is the worst possible CSI score. Equation 3 shows that CSI is equal to

CSI= a , (3)

where a is the number of hits, b is the number of false alarms and c is the number of misses (see Figure 3.3). The probability of detection, or POD, gives the percent of time that the event was predicted when it actually occurred. A perfect POD would therefore be equal to one, with zero being the worst possible POD. Equation 4 shows that POD is equal to

POD = (4)

The false alarm rate, or FAR, is equivalent to the percent of times the event is predicted to occur that never actually occur. Unlike CSI and POD, a low FAR is desired, with zero being a perfect FAR and one being the worst possible FAR. Equation 5 shows that FAR is equal to

F AR =--.(5)

Bias is a measure of how often an event is predicted compared to how often an event occutrs. This is simply a ratio of the number of "yes" predictions to the number of "yes" observations. Equation 6 shows that bias is equal to

B = az (6)

If an event is predicted the same number of times that it was observed, Bias would be one, and this would be a perfectly unbiased prediction. A Bias greater than one is obtained when an event

(45)

observed more than it is predicted. All four of these quantities are useful in determining whether or not the model for a given zone improved the lightning-VIL and lightning-ET relationships when compared with the All Zone model.

As discussed before, in addition to having confidence in our results due to the large number of data points that are included in the bootstrap procedure, further confidence was acquired when the spread of the POD, FAR and CSI from the bootstrap trials was analyzed. Figure 19 summarizes these quantities for the 1000 scored trials of the lightning-VIL PMM relationships. The small spread in each of the quantities indicates that the results are repeatable and reliable, and therefore an even higher confidence in the statistics is obtained. To compare the performance of the model under the individual zone relationships vs. the baseline (All Zones) relationship, a mean of each of these quantities was calculated for the results from the 1000 bootstrap trials (See Table 6 and Table 7 below). The histograms allow for a visualization of these values as the mean POD is simply the mean value from the POD histogram. This analysis is identical for all of the statistical quantities.

Références

Documents relatifs

The leader interaction (both triggered and intercepted) and the subsequent return stroke can be thought of as two distinct phases; (i) the attachment process, which determines

Dans un premier temps, un certain nombre d’informations peuvent ˆetre tir´ees des transform´ees et m´ethodes explicit´ees au chapitre 3 lorsqu’elles sont appliqu´ees aux jeux

Nous nous intéressons à une chaîne logistique à deux niveaux composée d’un fournisseur et d’un distributeur en considérant le cas où le distributeur a le monopole et dans ce cas

Mathematical epidemiological models may be deterministic or stochastic, The deterministric models are composed of differential equations to describe the size of

that the NW-SE fractures control the water flow from the Tabular Middle Atlas to the Saïss

In order to constrain the shape and possible feeder zones of the Carnac pluton, the geometry of the dykes, and the structural relationships between the pluton and the dyke swarm, we

As Dr Zhou Xiaochan, Governor of the Peoples Bank of China stated in [ 19 ], ‘The desirable goal of reforming the international monetary system, therefore, is to create an

Similar to moisture absorption, samples with lower fiber content experience less overall structural degradation of the matrix structure when exposed to freeze-thaw