Accuracy and quality of weather data

At OpenWeather, we employ a proprietary forecasting framework that blends multiple data sources such as observational data from weather stations, satellite imagery, radar data, and sensor networks into a unified model. Some key elements include:

Data Assimilation. We continuously collect and ingest weather observations from global meteorological agencies, airports (METAR reports), buoys, and private data networks. This data is combined with satellite inputs and radar imagery, allowing us to track atmospheric changes in near-real time.

Global and Regional Modeling. Our system incorporates outputs from large-scale global models (e.g., ECMWF, GFS) to form the initial state and boundary conditions.We then run high-resolution, regional models to capture local phenomena such as microclimates, sea-land interactions, and urban heat islands.

Ensemble Forecasting. To account for uncertainty, we use ensemble techniques that run multiple forecast scenarios in parallel.This helps us quantify the range of possible outcomes and generate probabilistic forecasts, which are especially valuable for precipitation and severe weather events.

High-Performance Computing (HPC). The complex numerical calculations required to run these high-resolution models are processed on dedicated HPC infrastructure. This allows us to update forecasts more frequently and incorporate the latest observations efficiently.

Machine Learning Integration. We apply machine learning techniques (such as neural networks and gradient boosting) to post-process raw model outputs.These methods help correct for known biases, refine local precipitation predictions, and improve temperature/humidity forecasts in microclimates.

OpenWeather’s Numerical Weather Prediction (NWP) Model

To provide weather data through the OpenWeather API, we use our in-house Numerical Weather Prediction (NWP) model, which processes and integrates data from a variety of sources in real time. Our goal is to continuously improve the quality and accuracy of our forecasts. Once the data is downloaded, we use our proprietary algorithms to process and blend it into a unified forecast. These real-time processes are crucial for generating accurate nowcasts (very short-term forecasts) and longer-range forecasts for our clients.

Global NWP models:
Weather stations:
High-resolution weather radar imagery used to detect and track precipitation in near-real time.
Global satellite observations for cloud cover, water vapor, and infrared imagery.

Metrics

To compare forecasts, we need to choose reliable sources. We use several sources that can be considered to be reliable. In general, these are weather stations run by meteorological agencies. For precipitation, they are weather radar sources.

There are plenty of metrics for evaluating the quality of weather forecasts, both for a common purpose and for special purposes. We divide them into three groups:

Common scores, intended for forecast users, which show in general the accuracy that our clients can rely on.
Metrics to compare raw data sources, and post-processing algorithms that we use to choose between them.
Diagnostic metrics applied to localise certain types of errors in forecasts for further improvement.

In this article, we discuss scores from the first group. Other metrics are intended for internal use.

List of cities

We used 371 cities for evaluation. This list consists of national capitals and many other major cities.

Nowcast errors for temperature

Current weather is also a type of prediction, because global NWP models and weather stations cannot provide a regular one-minute time step, and not all sites have stations. This type of prediction is known as nowcasting.

We are going to consider several metrics to check how accurate the nowcast is. Two of these are the most commonly used statistical error metrics for temperature prediction: Mean Absolute Error (MAE) evaluates the average error, while Root Mean Square Error (RMSE) focuses on larger errors. The Reliability and Inaccuracy metrics allow us to get a qualitative description of accuracy in percentage terms. All calculations bellow were conducted in degrees Celsius.

MAE - absolute difference from stations, in degrees; lower is better.
RMSE - in degrees; lower is better.
Reliability - percentage of time when model values were within ±2 degrees of ground truth; higher is better. We will regard an error of up to 2 degrees in the nowcast as acceptable.
Inaccuracy - percentage of time when model values were not within ±5 degrees of ground truth; lower is better. We will regard an error of more than 5 degrees in the nowcast as inaccurate.

The numbers used as the thresholds for reliability and inaccuracy might vary for different industries.

The period of the analysis is from 13 April 2020 00:00 to 26 June 2020 00:00 (UTC time zone).

Figure 1. Quality metrics for Nowcast. (—) OpenWeather NWP model, (—) and (—) raw data from NOAA GFS05 and GFS025, (—) and (—) NOAA GFS05 and GFS025 corrected by our algorithm, (—) weather provider 1

We analysed the behaviour of metrics for the OpenWeather NMP model during two months. The figure shows that MAE is about 0.5 degrees, RMSE is less than 2 degrees, reliability is between 90% and 100%, and inaccuracy is about 1% (less is better). It is clear that the OpenWeather NWP model provides the most accurate result. However, we should keep in mind that the actual difference between the nowcast and the real situation at a specific place and time could be bigger than these average errors.

Conclusion

We began by analyzing four common metrics for temperature nowcasting, and we plan to expand our evaluation to more weather parameters such as precipitation, humidity, wind, and beyond. Additionally, we will continue to incorporate new and more advanced metrics to refine our forecasts further.

We hope this report gives you a better understanding of how we measure and continually improve the accuracy of our weather data. If you have any questions or suggestions, please do not hesitate to contact us.