Third SRNWP Workshop on Mesoscale Verification

17-18 May 2006, Sofia (Bulgaria)

Find the meeting report and the recomandations after it.

1. Report on the main scientific issues discussed

For the report, J. Quiby

1. Verification of the precipitations of km-scale models (Δx = 1-3 km)

It has been re-affirmed that short-range NWP has taken the road of the non-hydrostatic km-scale models. Such models - with horizontal resolutions of say 1-4 km - are under development in all the Consortia or are even already operational (Met Office with 4 km over the British Isles).

Everywhere the same experience has been done with these models: the results of the verification of the precipitations are less good than for models with lower resolutions when the standard verification method is used. (The standard verification method is characterised by one-one correspondences between grid points and rain gauges and by the computation of scores from the associated contingency table).

2. How should we verify the precipitations of the km-scale models?

It has been strongly affirmed that there is no point to verify convective precipitations with stations at the scale of these models. This would only produce erratic and inconsistent results due to the hazardous realizations of the events (the showers) in space and time.

An up-scale is indispensable. To which scale do we have to upscale? We could not find a unique answer to this question.

One opinion was to upscale up to the scale of the predictability of the event. But this is not uniquely defined as the predictability for a given scale is a function of the forecast range.

Another opinion was that a "general verification" is not meaningful: verification must be made "ŕ la carte", that is must be defined according the need of the user or must be adapted to the information which is requested from the model by the user.

3. Justification of the km-scale models

As the standard verification methods give for the km-scale models - at least for the precipitations less good results than for lower resolution models, how can we justify the development and the operational use of the km-scale models?

We can justify the use of km-scale models only if they are better than models with lower resolutions.

If we have to compare the precipitations of a 2-km model with the ones of a 10-km model, we must first, as already written above, upscale the 2-km precipitation fields to the 10-km resolution and compare the fields at that latter resolution only. If the upscaled fields are better than the native 10-km ones, it is justified for a NMS to run operationally the 2-km model. But the precipitation fields must be disseminated in their 10-km upscaled version only.

Suppose now that the upscaled precipitation fields are less good than the native 10-km fields. Is the use of the 2-km model version still justified?

If after upscaling to 10 km the results coming from the 2-km model are not better, this does not mean that the 2-km model is not necessary.

Reason: one of the motivations for developing km-scale models is the hope to better predict severe and damaging weather. Severe weather is characterized by very high values of some standard parameters like wind and precipitation.

Km-scale models have the ability to produce locally large values for these parameters, something that models with significantly lower resolutions cannot (we exclude of course numerical instabilities and aliasing) due to the absence of short waves.

When very high values in wind or precipitation occur in a forecast, they are a strong signal for severe weather. It remains nevertheless true that place and time of these events could be forecasted with error.

4. Quality of the severe weather forecasts

As we have just seen it, km-scale models seem to be the best tool to indicate the possible occurrence of severe weather and also of extreme weather, in a deterministic as well as in a probabilistic way. If we accept this statement, we have to ask the following question: Which is the quality of the model forecasts for severe weather and for extreme weather?

A verification of the extreme weather model forecasts has been attempted at the Meteorological Office. This attempt has led to a clear conclusion: it was impossible to produce statistically significant results because these events are too rare. Thus we have to use an indirect way.

It has been suggested that we should specifically verify events with strong winds (say 50-60 km/h) or strong precipitations (say 8-10mm/6h) which are much more frequent that the extreme weather events and would probably yield verification results which would be statistically significant. If the model is good at forecasting these high values, we can hope that it will not be too bad for extreme values. But it is not more than a hope!

2. Actions and Recommendations

report by Jean Quiby, SRNWP Programme Coordinator

1. Verification scores should be accompanied by their statistical significance

It is a fact that the statistical results (moments or scores) of model performance presented in conferences or published in scientific journals are rarely accompanied by their statistical significance or, which is the same, by their significance level.

For example, when results are presented as 1-D curves, points stand not very often on vertical line segments whose length would inform us about the statistical significance of the values

This contrasts with other sciences where it is self-evident to indicate the statistical significance of results presented.

Without information on the statistical significance the part of the chance is unknown.

The drawback is that the determination of statistically significant results will generally imply the necessity of a larger number of realisations (in NWP, of model integrations) that we are used to perform. And if we want to render small differences in the results statistically significant, the necessary number of realisations (model integrations!) can become enormous.

The same is true is for the coefficient of correlation: in order to have a narrow confidence interval around the value of your correlation coefficient, you must perform a large number of experiments (in NWP, they would probably be model integrations).

This recommendation should encourage the scientists in NWP - particularly the scientists active in verification - to accompany their results with a measure of their statistical significance whenever possible.

2. Guide-lines for the verification of the PEPS forecasts

The SRNWP-DWD multi-model EPS - baptised PEPS (Poor man EPS) is working with great stability and reliability in its pre-operational phase.

Each participating NMS (all the European NMS active in NWP!) has received a password allowing it to look at the operational forecasts.

Over Germany, the system has already reached an "operational status" as the forecasts used are calibrated (with the Bayesian Model Averaging method) and the results are verified.

It would be very useful and even necessary for the assessment of the quality of the method and for the determination of the future development works if each NMS would verify the PEPS probabilistic forecasts over its territory. But this work will only be meaningful if the PEPS forecasts are verified everywhere in the same way.

Action: The DWD will issue guide-lines specifying how the verification of the PEPS probabilistic forecasts should be conducted (parameters, ranges, scores, verification periods, etc) by the National Meteorological Service over their respective territory.

3. Radar precipitation information: composites

Radar information will become very important in the km-scale modeling for precipitation validation and verification as well as, in a later future, for data assimilation.

For many works, conglomerated radar data in so called "composites" will be the adequate tool. But skepticism has been expressed by the audience about the quality of today's radar composites.

Firstly, nobody was aware of a comprehensive work devoted to the assessment of the quality of the radar composites. It has been claimed that even when the radar data have been previously calibrated with rain gauges the quality of the composites remains uncertain.

Although today's radar composites present shortcomings (a colleague said that on the European composite we can at first glance recognize the radar of his country!) it would nevertheless be very useful to have free access to them, particularly when we know that their quality will surely improve in the future.

But the dissemination of the radar composites of the Programme OPERA has been blocked by the EUMETNET Council at its 23rd Meeting (14 December 2004).

Concerning the concern of the NWP verification specialists about the quality of the composites, the SRNWP Coordinator will discuss this point in a meeting between the EUMETNET Programmes OPERA and SRNWP that he will organize in the fourth quarter of 2006 or in the first quarter of 2007.

Action: The SRNWP Coordinator must

continue and if possible intensify his action for the removal of the ban decided by the EUMETNET Council on the dissemination of the radar composites produced by the Meteorological Office on behalf of the OPERA Programme
organize in 2006 4th quarter or in 2007 1st quarter a meeting between the EUMETNET Programmes OPERA and SRNWP. The aim of this meeting is to permit to the NWP specialists in data assimilation and verification to meet the radar specialists.

4. Expression of the verification of deterministic forecasts with probabilistic scores

Kees Kok from KNMI presented a talk titled "Probabilistic Approach in the Verification of Deterministic High Resolution Model Output".

Kees' method produces verification results of deterministic forecasts in terms of the Brier scores and ROC curves. The method seems to be most suitable for convective precipitations, as the double penalty problem does not exist and the skill of small scale information (scattered showers) can be assessed.

The recommendation passed is that this verification method should be included in the standard verification packages used for deterministic forecasting.

The problem is that the slides presented at the meeting give a good overview but do not yield enough information to develop the method.