Automated Weather Model Processing With FOSS4G: Lessons Learned - by Thomas Horner - The Startup - Medium

Automated Weather Model Processing With FOSS4G: Lessons Learned | by Thomas Horner | The Startup | Medium 17/02/2021 14:18
!"7&'7&'9#:)'!"#$';)++'$+$<+)=#*-9'&>#)9'>"7&'$#*>"?
@7A*':B';#)'C+87:$'%*8'A+>'%*'+D>)%'#*+
!"#$%&#'()*'&#+',)-$('.
/,$0'11234)*2#+)567789:
;'11$31);'&,3'(
!"#$%&'(#)*+) ,#--#.
/#0'12'3451'·'56'$7*')+%8
A vast array of continuously-updating weather models are

freely available in the United States, lowering the barrier of
entry to creating weather forecasting websites or weather
visualization services — meteorological knowledge aside. It is
possible to fully leverage this data on an automated basis using
only Free and Open Source Software for Geography (FOSS4G).
While developing a weather forecasting website of my own, I

stumbled across countless unseen pitfalls which inHuenced my
Inal implementation of the data processing backend. This
article documents the more egregious issues I encountered,
https://medium.com/swlh/automated-weather-model-processing-with-foss4g-lessons-learned-8aaaeda1e3bc Página 1 de 27
along with their solutions, covering everything from the initial

data extraction to the Inal technical architecture of the
automated data processing.
!"+'/EF,@'.+%>"+)'$#8+-2')+*8+)+8'<9'C%B@+)0+)'%*8'G#*&:$+8'%&'%'HC@'-%9+)'7*
I+%;-+>?
!"#$"%#&
The National Oceanic and Atmospheric Administration of the
U.S. Department of Commerce is responsible for creating and
freely distributing a large number of weather models on the
web. These weather models range from older, low resolution
global numerical models such as the GFS (Global Forecast

System) to high-resolution mesoscale models like the HRRR
(High Resolution Rapid Refresh) that simulate minute
atmospheric physics.
Each model generally occupies a speciIc niche — the HRRR is

good for identifying the strength and location of potential
storm cells, but is only available for 18–36 hours after the
model was initialized. The GFS is good at identifying large scale
weather conditions across the entire planet, and has
predictions for up to hundreds of hours after each model run.
Other popular models include the NAM, RAP, and the ECMWF
— Europe’s answer to the GFS, which requires an extremely
expensive license to access.
In the United States, the various centers responsible for each

model have agreed upon a common format for distributing this
data: GRIB2. For most models, a GRIB2 Ile is made available
for each speciIc forecast hour. For instance, the HRRR run at
00Z (midnight UTC time) will have a Ile for forecast hours 0–
36. Forecast hour 12 would indicate the model’s forecast for
12:00 UTC. Each GRIB2 Ile is a raster covering a speciIc area
of the planet, with a number of bands pertaining to a speciIc
combination of parameter and level.
For instance, the NAEFS (North American Ensemble Forecast

System) model only has a handful of bands — temperature (the
parameter/variable) at two meters above the ground (the
level), wind vectors at 10m above the ground, atmospheric

pressure at the surface, etc. Some models, like the NAM, have
hundreds of bands, modelling a dizzying array of parameters at
dozens of levels in the atmosphere. For simple forecasts, the
desired parameters only comprise a few bands of each model
run — temperature, winds, cloud cover, precipitation, etc.
!"+'A-#)7#:&'/JFK'&:B+)G#$B:>+)&'LB"#>#'G)+87>M'/JFKN?'!"+'&%;+>9'G#*+&'B)+&:$%<-9
<##&>'G-#GO'&B++8&'<9'%';+.'C(P?
Each model is run several times a day on supercomputers

operated by NOAA’s National Center for Environmental
Prediction (NCEP). The HRRR runs every hour, while some of
the ensembles only run every twelve hours. As each forecast
hour is completed, it placed on various servers for distribution
and consumption, where they are likely downloaded by a large
number of third party data providers, researchers, and
organizations.
'(%)*+,-#+./,/
Q7&:%-7P7*A'>"+'&B)+%8'L:*G+)>%7*>9N'#;'%'.+%>"+)'$#8+-'7*'J"%)>?R&?
From a technical standpoint, knowing how to consume the data

and convert it to more standardized formats may not be readily
obvious. But there are additional unseen issues as well: NCEP’s
servers are temperamental (at best!), documentation is poor,
data downloads slowly, and it’s not always apparent where the
desired data is hiding. The technical hurdles are actually the
easiest to overcome: to my surprise, obtaining the necessary
data in the Irst place caused far more headaches than actually
working with the data… but I’ll cover that later.
Thanks to the work of government agencies and dedicated

volunteer developers, a variety of tools have been developed to
work with GRIB2 data. The most obvious choice, falling under
the purview of FOSS4G/OSGeo is the ubiquitous GDAL library,
which can read and translate GRIB2 data natively — aside from
a few hiccups.
E*'+DG+)B>'#;%&'"!()*+%#:>B:>';#)'%'A)7<3';7-+';)#$'>"+'/EC'$#8+-?
For the most part, GDAL is able to identify what parameter and
level each band pertains to, if you’re able to sift through the
distracting amount of metadata to get to it. The Ile can then be
converted to a standard geospatial raster Ile format, such as
GeoTIFF, with a standardized projection like EPSG:4326.
(Note: For parameters dependent on rastercell orientation, like
wind vectors in some models, you need to be aware of how this
reprojection impacts the data!). The GeoTIFF has an identical
number of bands matching those in the GRIB2.
To put it in perspective, many commercial weather model

visualization products just run a script that renders static
images for each desired parameter for speciIc geographic
areas. This has its advantages, but the result is purely static.
With a GeoTIFF Ile, the potential for a richer, more interactive
experience is obvious — you can make the raster available for
web maps or GIS software with Open Web Services, opening up
the ability for visualization, data retrieval, or analysis of any
kind with any geographical extent.
The technical, architectural approach is not entirely

straightforward— how would you make a service that lets you
retrieve the value or rendering of a speciIc forecast hour and
parameter from a particular model run? How about for a
collection of models and older model runs? How would you
automate this?
01,$/2,%)*+,-#+3%*-,+./,/
Before we can really dig into the technical architecture, we
need to know how to get the desired data in the Irst place. This
was the more diecult issue as mentioned previously.
Accessing a particular model variable and level was the Irst

problem I tackled. Bands, whether in a GRIB2 or GeoTIFF Ile,
are only identiIed by their index in the Ile — the Irst band is
band 1. There’s no way to simply tell GDAL to request the
correct band given the variable (for instance, temperature) and
level (for instance, two meters above ground).
S;'9#:'&>%)>'8707*A'7*>#'>"+'$#8+-&2'9#:T--'<+'>)+%>+8'>#'>"+'B7**%G-+'#;'14&'.+<'8+&7A*2
G#$B-+>+'.7>"'<)#O+*'-7*O&?'F0+*'%'&#-78'<%GOA)#:*8'7*'$+>+#)#-#A9'.#*>'"+-B'9#:'8+G7B"+)
&#$+'#;'>"+'%G)#*9$&'#*'>"+'-+;>?
The obvious solution is that you don’t need to know which

The obvious solution is that you don’t need to know which

band index is which if you have the inventory Ile for the model
(provided by NCEP). All you need is to reference the table
which tells you which band number pertains to which variable
and level. Simple!
…If only. There are two major problems, the Irst being that
NCEP doesn’t provide inventories for a large number of its
models, and even if they do they are likely out-of-date or
inaccurate. This can be circumvented by looking at the
metadata itself, such as with gdalinfo, from which the correct
band number can usually be gleaned — as long as the desired
data doesn’t have screwed up metadata.
Let’s say you’ve been able to identify that the “temperature at

two meters” (I’ll refer to this as T2M from now on) can be found
in band 32 of the weather model you want to use. Now you can
extract that band with the -b Hag of gdal_translate and do
something useful with it. Using that approach, you can
painstakingly cross-reference existing inventory Iles and
GRIB2 metadata for a variety of variables across multiple
models, until you have a collection of band numbers that prove
useful for future forecasts and visualization applications.
…Except the indices aren’t constant. This consideration is

mentioned on some of the NCEP inventory pages, but if you
didn’t pay attention to that you’ll notice that certain
timestamps have incorrect data values. At some forecast hours
— often intervals of 3, 6, and 12 — new bands will be inserted
into the raster, usually to represent variables like “accumulated

precipitation over the past 6 hours” that only appear during
those intervals. This causes T2M to perhaps be located in band
35 instead of 32 at those forecast hours. You are then faced
with a choice: provide the desired lookup table for the bands
applicable to each forecast hour (a lot of manual, error-prone
work), or script it.
U#--7*A'#:)'#.*')%&>+)&':&7*A'>"+'#&A+#VA8%-'B%GO%A+';#)'B9>"#*?
Since GDAL is available as a library for a variety of languages,

it’s easy to create your own rasters and write bands to them. It’s
also possible to read the metadata of GRIB2 Iles and use that to
extract the desired variables and levels. When I encountered
the band ordering issue, I added another step that creates an
empty raster with the number of bands I would end up
needing. The script then goes through the source GRIB2 Ile
band-by-band, looking for the suitable variables
(GRIB_ELEMENT) and levels (GRIB_SHORT_NAME — not
totally obvious). They are copied over to the next available
empty band in the output raster. The result is that we end up
with a GeoTIFF where T2M is always, for instance, band 3.
S':&+8'%'W@X/';7-+'>#'B)#078+'>"+'0%)7%<-+'%*8'-+0+-';#)'+%G"'#:>B:>'<%*8'S'.%*>+8?
This assumes you are downloading one full raster per forecast
hour. There are ways to make things even easier, such as using
.idx Iles (for some models) to directly download just the bands
you want, or use NCEP’s gribIlter web application to do
basically the same thing. I will cover both of these in a bit.
There’s also the chance that some models (the SREF comes to
mind) provide every single forecast hour, variable, and level in
one monolithic grib2 Ile. Good luck with that!
.4&)54/6%)*+,-#+./,/
With a strategy in place for at least getting data out of GRIB2
Iles, it’s time to obtain them from the web. NCEP is responsible
for coordinating and centralizing the distribution of weather
models, no matter which center manages it. All models are
available via NOMADS — the NOAA Operational Model

Archive and Distribution System, which according to NCEP
provides Ive terabytes of model data per month. There are a
variety of ways to retrieve the data from NOMADS, but HTTP is
the most straightforward.
Each forecast hour is uploaded shortly after the numbers Inish

crunching for that time step, as opposed to the entire model
run being uploaded all at once when it is complete. This can
help you get the jump on processing each forecast hour before
the model is Inished.
At this point, you probably need to know what weather models

you actually want, and what they do, since dozens are available
from the NOMADS homepage (I will not be discussing the
weather models themselves in this article). That web page is
just a thin veneer over a rat’s nest of cryptically named folders
and subfolders in their web server. For example, you might
want to retrieve the HIRESW model by navigating to the
correct folder and timestamp subfolder — then bam! Your eyes
are assaulted by a list of hundreds of downloads pertaining to
the HIRESW at various resolutions, forecast hours, geographic
areas, initialization perturbations, algorithm cores (ARW vs
NMMB) etc.
!"+)+'7&'*#'&"#)>%A+'#;'7$B+*+>)%<-+2'87PP97*A'%G)#*9$&?
Hopefully, if you’re looking for something speciIc (a common

example would be the GFS at 0.25 degree resolution), you can
Ind it without too much trouble. For most models, subfolders
are present in the main model folder pertaining to the date of
the model run initialization, usually going back a few days. For
some models, there is a subfolder in each date folder for the
hour of the model run (such as 00Z, 06Z, 12Z, and 18Z), while
for other models, all the various hour’s Iles are dumped into
the date folder, with the hour included somewhere in the
Ilename (usually following a lowercase t). The forecast hour is
also mentioned in the Ilename, usually following a lowercase f.
This Ile naming convention varies pretty widely depending on

the model. Some grib2 Iles don’t even end with the .grib/.grb2
extension, and instead might end with the forecast hour (.f160)
which is something to keep in mind when downloading them
and passing them to your processing scripts. However,
individual models are consistent in their naming convention for
each forecast hour, down to the number of digits in the forecast
hour (f000 to f384). By making these Ile names programmatic,
you can retrieve them with an automated process. Here are a

few examples of weather model Ile names that I’ve made
programmatic. My script replaces the %D with the model run
initialization date, the %H with the model run initialization
hour, and the %T with the forecast hour.
GEFS 0.5 Degree Resolution Ensemble Spread
https://www.ftp.ncep.noaa.gov/data/nccf/com/gens/prod/gefs.
%D/%H/pgrb2ap5/gespr.t%Hz.pgrb2a.0p50.f%T
NAEFS NDGD Resolution Ensemble Average
https://www.ftp.ncep.noaa.gov/data/nccf/com/naefs/prod/nae
fs.%D/%H/ndgd_gb2/naefs.t%Hz.geavg.f%T.conus_ext_2p5.gri
b2
WPC Probablistic Snow, 12 Hourly, 50 Percentile
https://ftp.wpc.ncep.noaa.gov/pwpf/conus_2.5km/2.5kmpwpf_
12hr/2.5kmprcntil_12hsnow_50pt_%D%Hf0%T.grb
Note that despite the three dijerent Ile extensions, all of the
above products are GRIB2 Iles. Also, if you look carefully, you
can see that the WPC is alone from the other NOAA centers in
that their GRIB2 Iles come oj of their own servers and not on
NOMADS. Go a directory up and you’ll Ind yourself in a
magical land of old powerpoints, temporary folders, and
employee dropboxes.
75%88%)*+%,+.4&)
Anyways, remember how NOMADS serves Ive terabytes of
data per month? (Likely more these days!) It’s unlikely you’ll
encounter any issues when downloading low resolution rasters
with low band counts, but the Hagship high resolution models
present a big problem — for example, each forecast hour of the
3km NAM NEST is over a gigabyte in size, and there are sixty
forecast hours per model run, four of which are available per
day. This requires a couple hundred gigabytes of data transfer if
you’re accessing these daily. NCEP’s download speeds leave
much to be desired, to the point that the next NAM-NEST run
will almost be starting before the last one has even been
downloaded.
If that wasn’t enough, the servers are cranky and will often just
stop sending data, especially with large downloads. This often
occurs without the connection being closed, which means you
need to account for that in your scripts lest they hang forever,
waiting for data that will never come. Be wary of writing loops
and other highly-structured logic for retrieving this data, as it is
prone to failure.
For basic forecasting applications, you don’t even need 95% of

the data you’re downloading. GRIB2 Iles for models like the
NAM contain a massive amount of variables at dozens of levels
across the atmosphere that are used in its calculations. The
vertical velocity of the atmosphere at 25,000ft or the albedo of

snow on the ground is likely of little use to you as opposed to
the temperature, winds, and precipitation on the surface.
NCEP is aware of this and thus has made .idx Iles available for
many of the more bloated models. These Iles contain byte
ranges for each band, including the variable name and level.
Remember the issues above with pulling those variables and
levels out of the GRIB2 Iles and making sure they are written
in the correct order? The .idx Ile can not only make your
downloads far more reasonable in size, but address the
uncertainty with the Inal order of bands in the raster.
FDG+)B>'#;'%*'?78D';7-+?
The HTTP Range header allows you to perform random access

of Iles on the web with an otherwise normal GET request. Even

better, GDAL has no issues decoding a GRIB2 Ile that consists
of just a single band’s byte range — it seems that the header or
auxiliary information (projection, etc.) is all self-contained in
the band or described in the GRIB2 spec itself. By reading the
.idx Ile and selecting the byte ranges pertaining to each desired
variable and level, you can download just the data you need.
Now each NAM-NEST forecast hour may only require a few
megabytes of data to be transferred as opposed to hundreds or
thousands.
NCEP discusses some techniques for random access via HTTP

using .idx Iles, particularly with a collection of Perl scripts they
have devised. Their examples were not particularly helpful, but
I found another resource that was. I point this repository out
speciIcally as it highlights one of the main beneIts of making
your code open source. Even if your code is incomplete or you
think it is not be particularly useful, somebody may one day
learn something from it!
9$%:;%5,#$
Finally, it should be mentioned that NCEP has a web-based
utility called grib_Vlter available for most models, which lets
you clip to an extent and download only speciIc variables and
levels. The URL is programmatic which allows you to automate
the download of various models and forecast hours via the
application. However, after extensive use I was less than
enthused with its purported utility. Grib_Vlter tends to be slow
and a little buggy — not all variables are available for some
models, or some levels are just “empty”. Most importantly, it is
not possible to be speciIc enough unless you want just one
band. Levels and variables are independent, so if you want
“geopotential height at 850mb” and “temperature at 2m” you’ll
get those in addition to “temperature at 850mb” and
“geopotential height at 2m.”
/JFK'7&'G+)>%7*-9'*#>'.%&>7*A'>%D'8#--%)&'#*'&7--9'>"7*A&'-7O+'B-+%&%*>'YSVYZ?
<#"#$/*%)*+,-#+./,/
The above information should hopefully provide enough
guidance for the savvy scripter to write an automated means of
acquiring weather model data on a regular basis, without too
many issues. The next question is: What is the best way to
distribute the data? If you have a raster, you should know what
you can do with it — visualize it as a layer in GIS applications
or web maps via WMS, distribute in its raw form as a WCS,
query it on the server, etc. Things get a little more tricky since
we’re likely dealing with hundreds of rasters.
E'>9B7G%-'&G+*+'."+*'#B+*7*A'%'[US\3';7-+'7*'][S@?'\9'8+;%:->2'>"+';7)&>'>")++'<%*8&'%)+
>)+%>+8'%&'U[\'G#$B#*+*>&2'."7G"'-##O&'7*>+)+&>7*A^
There are a multitude of valid schools of thought here — one
might think about using GeoServer’s ImageMosaic capabilities

to distribute the various forecast hours of a single model run as
a time-queryable WMS-T (not to be confused with a WMTS).
How about distributing the GeoTIFF directly as a WCS, and
using D3.js to generate wind streamline vectors from the data
itself?
For my purposes, I wanted to have every forecast hour of every

model run available, and make it easy to view speciIc
variables, like surface temperature, from those models. The
main considerations were:
Do I make every model, forecast hour, and variable its own

raster? (hundreds of rasters per model run).
Do I cram all the variables into one raster per forecast hour?
Do I cram all the forecast hours into one raster per variable?
Do I maintain a library of GeoTIFF Iles that need to be

looked up, or can I leverage PostGIS for better organization
and querying?
In my case, I wanted to use MapServer to distribute these

rasters as WMS layers. I had a separate endpoint for getting the
value for a speciIc point that leveraged gdallocationinfo, as
WMS’s getfeatureinfo implementation turned out to be
surprisingly obtuse in MapServer. At Irst, I went with a simple
library of GeoTIFFs in various nested folders — one per
variable, per forecast hour, per model. Using some neat
programmatic features of MapServer mapIles, I could retrieve

and distribute to correct raster just by passing the model name,
forecast hour, etc. in the WMS request, using only one layer in
the mapIle.
=4(,9>7
But that’s not very mature, is it? A sprawling directory of
GeoTIFF Iles doesn’t sound very web scale now does it? The
“enterprise” choice was of course to host them in a Postgres
PostGIS database, which can store rasters directly in a table
column. Sure, leisurely loading these Iles into the database
slowed down automated model processing drastically. And
sure, the sheer number of timestamps, bands, variables, and
other attributes that were needed for querying resulted in non-
standard table schemas that dijered from typical
implementations in MapServer and GeoServer. That was Ine.
All I had to do was write a convoluted, ill-advised query in my
mapIle to actually get the data back out of PostGIS. It only took
the help of the MapServer listserv community, a fresh page of
documentation, and couple stij drinks to pull that one oj.
Shortly after getting it to work successfully, I found that

PostGIS isn’t great at any sort of serious raster processing. This
became immediately apparent when I experimented with raster
Iles that had a lot of bands. Retrieving a band from a GeoTIFF
would take GDAL only a couple of milliseconds, no matter how
many bands there were. In PostGIS, it would often take entire
seconds if there were more than a handful of bands in the
raster. This level of performance was atrocious, even with

painstaking tiling and indexing (requiring even more
convoluted table schemas) — and the associated headaches of
making it work for various integrations made it all too much to
bare. I abandoned a PostGIS solution completely.
9#4?>@@+=#$;4$8/)2#
That said, performance with a GeoTIFF library still wasn’t that
great. For instance, I had an endpoint that would return the
surface temperature from a certain model from forecast hour
zero to forecast hour 60. The initial architecture of one raster
per forecast hour (with a bunch of bands pertaining to each
variable) meant that 61 raster Iles would need to be opened
and queried to return the list of results. Even if the rasters were
cropped to a small geographic area, and contained a sensible
number of bands, this would take 5–10 seconds under average
conditions for my server. If I wanted to compare those
temperatures against the same forecast hours from other
models, the client would just sit there for an unbearable
amount of time, waiting for everything to Inish processing.
In this case, the overhead of opening and closing multiple Iles

is the primary performance bottleneck. As corroborating
evidence, I noted that the SREF model made all of its forecast
hours available in a single massive raster — thousands of bands
in one Ile. To query surface temperature for every forecast
hour, I would instead retrieve the value of band 35, 64, 93,
122, 151, … all the way into the thousands. The full list of
values would be returned in milliseconds, even if looking up

hundreds of bands. Using that knowledge, I modiIed my
approach for my GeoTIFF library — each model run and
variable had its own raster, but every forecast hour was
represented by a separate band in the Ile. Thus, to retrieve
temperature for forecast hours zero through 60, I would just
query bands 1, 2, 3, 4 … 61. The data would be retrieved
almost instantaneously. There is, of course, a bit of a memory
usage trade-oj there.
\)7*A7*A'7>'#*'"#$+'_'E*'7*>+)%G>70+2'`:+)9%<-+'.+<'$%B'<:7->'7*'E*A:-%)'>"%>'G%*'07&:%-7P+
$#8+-'):*&'%&'&##*'%&'>"+9'%)+'%0%7-%<-+?'L/#>+M'W:&>'%'.#)O'7*'B)#A)+&&?N
745A,%4)%)*
The simplistic end result of all these painful iterations
somewhat belies the amount of failed strategies that were

explored. For the web maps, a single MapServer mapIle with
only one layer is able to retrieve any model run, forecast hour,
and variable I want based on just the GET parameters passed to
it. The location of an SLD Ile can be included in the request,
which is just a bit of XML that tells MapServer how to style and
visualize the raster, if it is being retrieved as a WMS. MapServer
can also provide the raw data as a WCS, allowing the client to
handle the styling and rendering. For time-based data, such as
graphs of future weather conditions at a speciIc location,
another endpoint can query multiple forecast hours, returning
a list of values. This is accomplished through a simple web
script (Python, Node, PHP) that calls gdallocationinfo or the
GDAL library itself.
For retrieving and processing the weather models, the entire

architecture is pretty simple: a Python script is run regularly by
a cronjob to check for new model runs, download them, and
convert them to the GeoTIFF library. A Postgres database keeps
track of how many scripts are running, what time stamps have
been processed, etc. ConIguration is provided by a JSON Ile,
including a list of models, their URLs, their available hours,
their variables/levels, etc. A few small server-side scripts allow
for querying the metadata, such as the last completed
timestamp for a model. The end product, which barely scrapes
over a thousand lines of code, allows unrestricted access to
gigabytes of weather model data almost as soon as they become
available. But as described above, it took a very long time to
Inally narrow down the solution to something robust and

performant. I imagine it will scale pretty well with appropriate
caching, indexing, and hardware approaches.
B4)25A(%4)
I hope this gave you some ideas on how to leverage the wealth
of weather model data that is available. I believe that we
haven’t seen anywhere close to the full potential of what can be
done with that data, and I have no doubt that some powerful
new ideas will come out of the FOSS4G sphere. We’ve seen
smart, talented developers come out with wonderful weather
data tools like SHARPpy and Earth Wind Map. What’s next?
Bonus: if you read this far, here is your reward — my backend

solution is open source!
E*':*;7*7&"+8'>7$+';#)+G%&>7*A'B)#8:G>'<:7->'7*'E*A:-%)?'!"+'G%-G:-%>7#*&'%*8'07&:%-7P%>7#*&
%)+'B+);#)$+8'+*>7)+-9'#*'>"+'G-7+*>'&78+'<9'.+7A"7*A'0%)7#:&'$#8+-&'%*8'+*&+$<-+&'<%&+8
#*'>"+7)'&:7>%<7-7>9';#)'&B+G7;7G'>7$+')%*A+&'%*8'G#*87>7#*&?
,#&&aA [S@ H+%>"+)'b%>% H+%>"+)'EBB& [8%-
<#/$)+84$#C D/E#+D#6%A8 7-/$#+F4A$

C+87:$'7&'%*'#B+*'B-%>;#)$ F4A$(C ,-%)E%)*C
."+)+'564'$7--7#*')+%8+)& ,#--#.'>"+'.)7>+)&2 S;'9#:'"%0+'%'&>#)9'>#'>+--2
G#$+'>#';7*8'7*&7A">;:-'%*8 B:<-7G%>7#*&2'%*8'>#B7G& O*#.-+8A+'>#'&"%)+2'#)'%
89*%$7G'>"7*O7*A?'(+)+2 >"%>'$%>>+)'>#'9#:2'%*8 B+)&B+G>70+'>#'#;;+)'_
+DB+)>'%*8':*87&G#0+)+8 9#:T--'&++'>"+$'#*'9#:) .+-G#$+'"#$+?'S>T&'+%&9
0#7G+&'%-7O+'870+'7*>#'>"+ "#$+B%A+'%*8'7*'9#:) %*8';)++'>#'B#&>'9#:)
"+%)>'#;'%*9'>#B7G'%*8'<)7*A 7*<#D?'FDB-#)+ >"7*O7*A'#*'%*9'>#B7G?'H)7>+
*+.'78+%&'>#'>"+'&:);%G+? #*'C+87:$
I+%)*'$#)+
E<#:> (+-B I+A%-

Automated Weather Model Processing With FOSS4G: Lessons Learned - by Thomas Horner - The Startup - Medium

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automated Weather Model Processing With FOSS4G: Lessons Learned - by Thomas Horner - The Startup - Medium

Uploaded by

Copyright:

Available Formats

Automated Weather Model Processing With FOSS4G: Lessons Learned | by Thomas Horner | The Startup | Medium 17/02/2021 14:18

A vast array of continuously-updating weather models are

While developing a weather forecasting website of my own, I

along with their solutions, covering everything from the initial

global numerical models such as the GFS (Global Forecast

Each model generally occupies a speciIc niche — the HRRR is

In the United States, the various centers responsible for each

For instance, the NAEFS (North American Ensemble Forecast

level), wind vectors at 10m above the ground, atmospheric

Each model is run several times a day on supercomputers

From a technical standpoint, knowing how to consume the data

Thanks to the work of government agencies and dedicated

To put it in perspective, many commercial weather model

The technical, architectural approach is not entirely

Accessing a particular model variable and level was the Irst

The obvious solution is that you don’t need to know which

The obvious solution is that you don’t need to know which

Let’s say you’ve been able to identify that the “temperature at

…Except the indices aren’t constant. This consideration is

into the raster, usually to represent variables like “accumulated

Since GDAL is available as a library for a variety of languages,

available via NOMADS — the NOAA Operational Model

Each forecast hour is uploaded shortly after the numbers Inish

At this point, you probably need to know what weather models

Hopefully, if you’re looking for something speciIc (a common

This Ile naming convention varies pretty widely depending on

you can retrieve them with an automated process. Here are a

GEFS 0.5 Degree Resolution Ensemble Spread

NAEFS NDGD Resolution Ensemble Average

WPC Probablistic Snow, 12 Hourly, 50 Percentile

For basic forecasting applications, you don’t even need 95% of

vertical velocity of the atmosphere at 25,000ft or the albedo of

The HTTP Range header allows you to perform random access

of Iles on the web with an otherwise normal GET request. Even

NCEP discusses some techniques for random access via HTTP

There are a multitude of valid schools of thought here — one

might think about using GeoServer’s ImageMosaic capabilities

For my purposes, I wanted to have every forecast hour of every

Do I make every model, forecast hour, and variable its own

Do I maintain a library of GeoTIFF Iles that need to be

In my case, I wanted to use MapServer to distribute these

programmatic features of MapServer mapIles, I could retrieve

Shortly after getting it to work successfully, I found that

raster. This level of performance was atrocious, even with

In this case, the overhead of opening and closing multiple Iles

values would be returned in milliseconds, even if looking up

somewhat belies the amount of failed strategies that were

For retrieving and processing the weather models, the entire

Inally narrow down the solution to something robust and

Bonus: if you read this far, here is your reward — my backend

,#&&aA [S@ H+%>"+)'b%>% H+%>"+)'EBB& [8%-

<#/$)+84$#C D/E#+D#6%A8 7-/$#+F4A$

E<#:> (+-B I+A%-

You might also like