Professional Documents
Culture Documents
Abstract— Random forests (RFs), an advanced machine learn- compared with over land. Meanwhile, this RF algorithm tends
ing (ML) method, was used here to develop a robust and rapid to underestimate rain rate, especially in the presence of heavy
quantitative precipitation estimates (QPEs) algorithm for the rainfall. Despite this, it still produces a reasonable pattern of
new-generation geostationary satellite of Himawari-8. In this rainfall area and intensity, which are highly consistent with GPM
algorithm, the global precipitation measurement (GPM) product observations.
has been employed to train QPE prediction model. The real-time
multiband infrared brightness temperature from Himawari-8, Index Terms— Himawari-8, machine learning (Ml), quantita-
combined with the spatiotemporally matched numerical weather tive precipitation estimates (QPEs), random forests (RFs).
prediction (NWP) data from the global forecast system, have
been used as predictor variables for QPE. Among the variables
used in RF learning model, total precipitable water and K -index I. I NTRODUCTION
from NWP data have the highest rankings, indicating the
importance of atmospheric environment for QPE. To enhance
the accuracy of RF models or to optimize model training,
a sample-balance technique has been utilized to adjust the
S UMMERTIME precipitation systems in East Asia play
an important role in the energy equilibrium, climatic
system, and freshwater sustainability [1]. Long-term gage-
ratios of samples in nonprecipitation/precipitation classification
and quantitative precipitation regression data sets. Further and satellite-based precipitation observations have been widely
sensitivity and validation analyses help determine the optimal used to characterize the interannual, interdecadal, or diur-
RF classification and regression models for predicting non- nal variabilities of precipitation, which are either connected
precipitation/precipitation pixel and rain rate. The selected RF to large-scale circulation, or to aerosol pollution [1]–[7].
classification model is found to predict precipitation area with an In particular, high-quality quantitative precipitation estima-
accuracy of 0.87. For predicted QPE product, the mean-absolute-
error and root-mean-square error of RF regression model are tion (QPE) products are imperiously needed for nowcast of
0.51 and 2.0 mm/h, respectively. Overall, the RF ML algorithm high-impact weather or ecologically and hydrometeorologi-
has a higher detection rate over homogenous ocean surface as cally oriented projects [8]. Traditionally, the ground-based rain
gages and weather radars can well measure the precipitation
Manuscript received July 16, 2018; revised September 8, 2018; accepted
October 5, 2018. Date of publication November 1, 2018; date of current at high temporal resolution [4]. Nevertheless, these ground-
version April 22, 2019. This work was supported partly by the National Key based instruments are extremely lacking in many parts of the
R&D Program of China under Grants 2018YFB0504800 (2018YFB0504802), world, including oceans, mountainous regions, inland lakes,
2016YFA0600101, and 2017YFC1501401, in part by the Preresearch Project
under Grant D040103, in part by the National Natural Science Foundation and sparsely populated remote areas [9]. As a key complemen-
of China under Grant 41775045, Grant 41571348, Grant 41771399, Grant tary data, QPE products from space emerge in recent decades,
41605030, and Grant 41601400, and in part by the Chinese Academy of since considerable weather satellites have been successfully
Meteorological Sciences under Grant 2017Z005. (Corresponding authors:
Jianping Guo; Fenglin Sun.) launched and in operation, such as geostationary (GEO)
M. Min, F. Sun, F. Wang, S. Tang, B. Li, D. Di, and L. Dong are with the meteorological satellite (e.g., Himawari, Fengyun-2, GEOS),
Key Laboratory of Radiometric Calibration and Validation for Environmental and the low-earth-orbiting passive microwave satellites [i.e.,
Satellites, National Satellite Meteorological Center, China Meteorological
Administration, Beijing 100081, China (e-mail: sunfl@cma.gov.cn). the tropical rainfall measuring mission (TRMM) and the global
C. Bai is with the School of Optoelectronics, Beijing Institute of Technology, precipitation measurement (GPM] mission) [8], [10]–[12],
Beijing 100081, China. which to great extent fills the observational gap that ground-
J. Guo and H. Xu are with the State Key Laboratory of Severe Weather,
Chinese Academy of Meteorological Sciences, Beijing 100081, China (e-mail: based measurements have left.
jpguocams@gmail.com). It has been well documented that the QPE products
C. Liu is with the Key Laboratory for Aerosol-Cloud-Precipitation of China from GEO meteorological satellite possess the advantage of
Meteorological Administration, School of Atmospheric Physics, Nanjing
University of Information Science and Technology, Nanjing 210044, China. relatively high temporal and spatial resolution, albeit the
J. Li is with the Cooperative Institute for Meteorological Satellite Study, limited retrieval accuracy due to the indirect connection
University of Wisconsin–Madison, Madison, WI 53706 USA. between surface rain rates and cloud top brightness temper-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. atures (TBB) [10], [13]–[17]. Theoretically, compared with
Digital Object Identifier 10.1109/TGRS.2018.2874950 infrared (IR) radiation measurements, microwave radiation
0196-2892 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2558 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
can better penetrate clouds and interact more directly with in East Asia, and to promote wider application of GEO
precipitation, thereby getting a superior QPE and even meteorological satellite data in nowcasting applications.
3-D structure of rainfall [12], [18]. More importantly, The remainder of this paper proceeds as follows. Section II
the microwave observations have been well assimilated into briefly introduces the satellite and ancillary data used for
numerical weather prediction (NWP) model, much improv- training RF model and QPE. Section III presents the algorithm
ing the prediction products [19]. However, the microwave in detail, including RFs introduction and RF classification
satellite has relatively low temporal resolution. Currently, and regression models. In Section IV, major results of QPE
the instantaneous and operational QPE product from NOAA based on the ML algorithm are presented, which are further
GEO satellite IR measurements has been implemented using validated against Integrated MultisatellitE Retrievals (IMERG)
a self-calibrating multivariate precipitation retrieval algorithm for GPM. Finally, Section V provides a short summary.
developed by Kuligowski [13], [20] which is mainly based on
the linear correlations between the estimates and ground-based
II. DATA
rain gage observations.
In recent years, a wide spectrum of emerging machine To train the model in RF algorithm, we employed three
learning (ML) techniques, such as support vector months (June, July, and August of 2016) of continuous quanti-
machines (SVMs) [21], decision trees (DTs) [22], random tative precipitation data from the level 3 gridded GPM IMERG
forests (RF) [23], artificial neural network [24] and, deep (version of V04A) data [8], [18], [37]. The IMERG data
learning (DL) [25], have been successfully and extensively have a time interval of half an hour with the maximum rain
applied in QPE [9], [26]–[28]. The computing efficiencies rate of 50.0 mm/h, which covers the whole area between
of ML techniques have been much improved, offering us the latitudes of 60°S and 60°N within a spatial resolution
unprecedented opportunities to process large-volume data sets of 0.1° × 0.1° [8]. Technically, this is a uniform and merged
in near real-time systems, such as Earth-observing satellite precipitation product by intercalibrating, merging, and interpo-
data. Notably, the rapid developments in ML frameworks, lating some satellite microwave precipitation estimates (e.g.,
such as scikit-learn [29], Theano [30], TensorFlow [31], NOAA Joint Polar Satellite System-Advanced Technology
and PyTorch (https://pytorch.org), make it easy to use Microwave Sounder, JPSS-ATMS), together with microwave-
advanced ML algorithms for model training and high- calibrated IR satellite estimates (e.g., NOAA GOES-E/W), rain
efficiency prediction. RFs [23], as the high-accurate and gage analyses, and potentially other precipitation estimators
promising ML algorithms, have received increasing attention at fine time and space scales for the TRMM and GPM eras
for remote sensing applications. As a bagging ensemble over the globe. Previous validation studies by comparing
classification and regression technique, the RF algorithm this product with ground-based rain gages or radars over
can easily run in a parallel computing mode and capture different regions all point to the good reliability of IMERG
nonlinear or complex relationships between predictor and products [18], [38], [39]. Note that the freely released real-
predictand [23]. Kühnlein et al. [9] implemented the QPE time IMERG V04A version data will be delayed by about
using the Meteosat Second Generation/Spinning Enhanced 3–4 months. Consequently, this time lag leads to its inca-
Visible and Infrared Imager data [32] using the RF algorithm, pability in supporting near real-time storm monitoring and
which was considered by three different typical scenarios of nowcasting.
daytime, twilight, and nighttime, respectively [9], [26]. In spite of the delayed dissemination time of the GPM
Since the beginning of 2014, a series of new-generation IMERG data, the H8/AHI measurements are real time
GEO meteorological satellites have been successfully and publicly accessible, which will be used to implement
launched, such as FengYun-4 (FY-4) of China Meteorological QPE with high spatial and temporal resolutions. Consid-
Administration, Himawari-8/9 of Japan Meteorological ering the common temporal coverage of these two satel-
Agency (JMA), Geostationary Operational Environmental lite data sets, we choose the overlapped period from
Satellites-R (GOES-R) of U.S. NOAA, and so on [33]–[36]. June to August 2016 to study summertime rainfall in
Himawari-8, as the JMA next-generation operational this investigation. Table I shows the specifications, spa-
geostationary satellite, was successfully launched on tial resolutions, and radiometric calibration accuracy of H8/
October 7, 2014, and the observation data began to be AHI (http://www.data.jma.go.jp/mscweb/en/himawari89/space
released on July 7, 2015. A 16-band Advanced Himawari _segment/doc/AHI8_performance_test_en.pdf), in addition to
Imager (AHI) is onboard Himawari-8 with diverse spatial its primary applications of different bands [35], [40], [41].
resolutions ranging from 0.5 (visible band) to 2.0 km Bands 1–6, concentrated in the visible and near-IR wave-
(IR band) and a full-disk observation frequency of 10 lengths, are designed to measure earth-view surface-reflected
min (http://www.jma-net.go.jp/msc/en/). The QPE from solar radiation during daylight hours, which are typically
Himawari-8 is quite limited, let alone those using ML. used for retrieving cloud, aerosol, and vegetation proper-
Therefore, the primary objective of this paper is to develop ties or making true color picture [42]. The thermal emissive
a rapid and unified (the differences between day, night, and bands (i.e., bands 7–16) observe thermal emission radiations
twilight belts are not considered) retrieval algorithm for QPE from Earth targets during both daytime and nighttime. Note
from real-time Himawari-8/AHI (H08/AHI) observations, that the observed radiances at 3.8-μm band are inevitably
using RF. The implementation of this rapid and unified impacted by sunlight at daytime and twilight (3.8-μm band
QPE algorithm is expected to improve the accuracy of QPE contains both reflected and emissive radiations at daytime and
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
MIN et al.: ESTIMATING SUMMERTIME PRECIPITATION FROM HIMAWARI-8 AND GFS BASED ON ML 2559
TABLE I
H IMAWARI -8/AHI S PECIFICATIONS . SST: S EA S URFACE T EMPERATURE
twilight). In order to make a consensus algorithm and reduce In addition, the high-resolution global surface elevation data
the impacts of heterogeneous reflected sunlight, we only (https://ngdc.noaa.gov/mgg/global/) are used here to collocate
employ the TBBs observed by nine IR bands (Bands 8–16) every H08/AHI pixel as one of the important predictor vari-
of H08/AHI from 6.24 to 13.28 μm to predict or esti- ables.
mate quantitative precipitation based on ML technique. The
radiometric calibration accuracies of aforementioned nine IR III. M ETHODOLOGY
bands of H08/AHI reach around 0.25% (Table I), which can
ensure the stability and consistency between training and A. Random Forests
predicting models. In addition, cloud phase (ice, water, mixed, RFs technique was originally proposed by Breiman [23],
and supercooled) and cloud top properties (height, pressure, which has been widely used for both classification and
and temperature) products retrieved from H08/AHI data are regression analyses without much hyperparameter tuning. This
also used here, which are generated from the near real-time method trains a number of DT predictors that are then aver-
and robust FengYun Geostationary Algorithm Testbed system aged to improve the predicted accuracy and reduce overfitting.
[35], [36], [41]–[44]. In this paper, all the TBBs observed Furthermore, it can well capture nonlinear association patterns
by IR bands and cloud products of H08/AHI are within a between predictor and predictand variables. The RFs are also
horizontal resolution of 2.0 km and a full-disk observation able to get unbiased estimates (deviations) of the regression
time interval of 10 min. or classification models by the out-of-bag (OOB) score esti-
In addition to H08/AHI observation data, we also employ mation. This DT-based ensemble ML method not only can
other dynamic and environmental or static ancillary data achieve a high accuracy prediction but also give the impor-
to further enhance the performance of QPE algorithm of tance score (IS) of the predictor variables used for training
H08/AHI. As an important atmospheric dynamic and envi- the forests. Despite the aforementioned advantages, the RF
ronmental data, the NWP data from National Centers for algorithm is still lack of interpretability and mathematical
Environmental Prediction Global Forecast System (GFS) with theory by nature, making it almost impossible or uneasily
a spatial resolution of 0.5° × 0.5°, 26 layers from 1000 to to demonstrate how the predictions or decisions are made.
10 hPa in vertical and a time interval of 3 h [45] can be In addition, it is not possible for the predictand values to go
regularly and daily obtained or downloaded at four different beyond the ranges of training data values [9], [23], [46].
initial forecast times (0000, 0600, 1200, and 1800 UTC) from Generally, the RF algorithm randomly selects n (number
National Oceanic and Atmospheric Administration (NOAA, of tree) bootstrap samples from different data sets to develop
ftp://nomads.ncdc.noaa.gov/GFS/Grid4). Normally, the real- the DT model. For each of the bootstrap samples, a subset of
time GFS NWP data are always used in weather forecast the predictor variables is selected randomly with the lowest
business, which are seldom applied in nowcasting. In this residual sum of squares to help the forest grow. Each forest
investigation, we introduce the GFS NWP data at an initial (with n trees) has to be made sure to grow to the largest
forecast time of 0600 UTC one day ago to predict or train extent. The final predictions are derived by averaging out
this QPE algorithm. Our goal is to use the atmospheric all regression trees, which are calculated by putting the test
dynamic and environmental information in NWP as back- data set down each of the forests. During the forest growth,
ground data to enhance the accuracy of predicted QPE. about one-third of the samples are left out to estimate the
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2560 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
OOB score and the importance of the variables used for the training for both RF classification and regression mod-
constructing the aforementioned tree. For more theoretical els in order to improve the performance of RF prediction
details and working principles of RF algorithm, the reader models. The details concerning how to train RF models
is referred to the previous research studies [23], [46]. based on the sample-balance technique will be discussed
In this investigation, the free, simple, and efficient scikit- in Sections II-C and II-D.
learn toolkit [29], a well-known Python module for ML, has
been used to implement the training, parameter adjustment,
and prediction within this RF algorithm. A range of typical C. Nonprecipitation and Precipitation Classification Training
classification, regression, and clustering algorithms are inte- Following the flowchart shown in Fig. 1, we developed
grated into this Python ML toolkit, including RFs, SVMs, and an RF classification model to identify nonprecipitation and
k-means, among others. (http://scikit-learn.org/stable/). precipitation pixels observed by H08/AHI, by selecting pre-
dictor variables that are pertaining to QPE. Caution should
be taken when using the conceptual framework or model
B. Processing Flow of the rainfall retrieval for the selection of RF predictor
In this paper, we use a two-step RF ML strategy to train variables, particular in extratropical cyclones [9], [14], [15].
the input data and to ultimately make QPE. Fig. 1 shows the In addition to the well-known inherited dominant or sensitive
flowchart for QPE using a two-step RF ML method. It roughly variables for rainfall retrieval [9], more predictor variables
contains two key steps. First, we have to match the H08/AHI listed in Table II are considered and used in this investigation
and NWP data with GPM IMERG rain rate data in the same for nowcasting applications from spatiotemporally matched
spatiotemporal scales, which are then used to train and develop GFS NWP data. In addition to TBBs, TBB differences, and
a nonprecipitation and precipitation classification model based cloud properties product from H08/AHI [9], we introduce
on the RF algorithm described in Section III-A. Then, this some traditional important weather metrics as calculated using
RF classification model is employed to identify whether any time-space matched NWP data to further and better support
given pixel for the H08/AHI satellite image can generate QPE. These weather indexes can well describe the thermal
rainfall or not. The second step is to develop a regression (i.e., K-index, θse850/925), dynamic (i.e., CAPE, CIN, LI-
model for QPE after the precipitating/nonprecipitating pixels index, EBS), and moisture (i.e., TPW) features of atmospheric
have been determined. For a real-time application, only those environmental fields [45], [46], which are closely associated
data with matched H08/AHI and NWP data will be used with the initiation and development of clouds that produce
to estimate quantitative precipitation based on the two RF rain [48], [49].
ML models (namely, RF classification and regression mod- For RF model training, we tune iteratively the para-
els). Note that we also use a sample-balance technique in meters in order to find an optimal model, including the
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
MIN et al.: ESTIMATING SUMMERTIME PRECIPITATION FROM HIMAWARI-8 AND GFS BASED ON ML 2561
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2562 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
TABLE IV
IS S OF P REDICTOR VARIABLES IN THE RF M ODEL AND T HEIR
C ORRESPONDING R ANKINGS , BASED ON THE S AMPLES OF
S CENARIO -C-1 (3:1, N _E STIMATORS = 300, M AX _D EPTH = 20,
AND M AX _F EATURES = 7) FOR N ONPRECIPITATION /
P RECIPITATION C LASSIFICATION
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
MIN et al.: ESTIMATING SUMMERTIME PRECIPITATION FROM HIMAWARI-8 AND GFS BASED ON ML 2563
TABLE V
IS S OF P REDICTOR VARIABLES OF THE RF M ODEL AND T HEIR
C ORRESPONDING R ANKINGS , BASED ON THE S AMPLES OF
S CENARIO -R-2 ( N _E STIMATORS = 100, M AX _D EPTH = 40,
AND M AX _F EATURES = 27) FOR Q UANTITATIVE
P RECIPITATION R EGRESSION
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2564 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
Fig. 5. Comparisons of nonprecipitation/precipitation [(a)–(d) at 2230 UTC on June 15, 2016; (e)–(h) at 0200 UTC on July 15, 2016] between the GPM
IMERG QPE products (first column) and the predictions using three different RF classification models based on the samples of Scenario-C-0 (second column),
Scenario-C-1 (third column), and Scenario-C-2 (fourth column). Purple area: presence of rainfall.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
MIN et al.: ESTIMATING SUMMERTIME PRECIPITATION FROM HIMAWARI-8 AND GFS BASED ON ML 2565
Fig. 6. Comparisons of QPE [(a) and (b) at 0730 UTC on July 15, 2016; (c) and (d) at 1930 UTC on June 15, 2016] between the (left) GPM IMERG
product and the (right) prediction using the RF classification and regression models.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2566 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
Fig. 8. Same as Fig. 7, but for the results over (left) land and (right) ocean.
based on Scenario-C-1 yield a better consistence with the slight and moderate rain rates as well. Based on the same
IMERG QPE products. In contrast, the RF algorithm based aforementioned validation data, mean absolute error (MAE)
on Scenario-C-2 tends to overestimate the precipitation pix- and root-mean-square error (RMSE) of all predicted QPE are
els from H08/AHI observations. Intriguingly, some missing 0.51 and 2.0 mm/h, respectively. Overall, despite the fact that
independent precipitating cloud systems for Scenario-C-2 are the algorithm may miss some rainfall areas and underestimate
able to be predicted when we use the optimal Scenario-C- rain rate (or QPE), a high consistent pattern is still able to
1 classification model. be found between GPM IMERG product and near real-time
Fig. 6(a)–(d) illustrates the comparisons of QPE (0730 UTC prediction (Figs. 5 and 6).
on July 15, 2016 and 1930 UTC on June 15, 2016) between
the GPM IMERG product and the prediction using the RF
B. Validations of Rainfall Over Land and Ocean
classification (Scenario-C-1) and regression (Scenario-R-2)
models. The results show a consistent spatial pattern or good In this section, we will show the validation results of
correlation of QPEs between GPM IMERG product and the predicted QPE values over land and ocean, respectively.
prediction. A large proportion of heavy rainfall areas in the Table VII shows the statistical mean scores of nonprecipita-
full disk of H08/AHI observation can be well captured by tion/precipitation discrimination algorithm (Scenario-C-1) and
the two optimal RF models. However, the extremely heavy the mean MAE and RMSE of retrieved QPE (Scenario-R-2)
rainfall (>20 mm/h) areas are unable to be predicted very over land and ocean, respectively. A POD of 0.59 is found
well. The RF regression model can only predict the rough over the ocean, much higher than over land. By comparison,
location of extremely heavy rainfall area but cannot quanti- a FAR of 0.33 over the ocean is much lower than over land.
tatively and accurately predict the rain rate. To be specific, The relatively higher mean scores of CSI, HSS, and HR over
the RF regression model tends to significantly underestimate the ocean are likely to be attributed to the homogeneous
the heavy rainfall, which is similar to the findings revealed in surface properties over the ocean. This finding indicates a
previous studies [26]. Note that the total prediction time for a better prediction for nonprecipitation/precipitation pixels over
H08/AHI full disk observation using two optimal RF models is the homogeneous ocean surface. On the contrary, the mean
about 4 min based on a 12-kernel computing in parallel, which MAE (0.52 mm/h) and RMSE (2.10 mm/h) of retrieved QPE
can basically meet the efficiency requirement of now-casting are relatively higher over ocean than land (0.44 and 1.72)
applications from satellite observations. shown in Table VII. The higher uncertainty in QPE is mainly
Fig. 7 shows the comparison results of QPEs between GPM contributed to more heavy rainfall events from GPM IMERG
IMERG and prediction model, where the color bar repre- data over the ocean, which can be easily found in Figs. 6 and 7.
sents the occurrence frequency in log scale with an interval These unpredictable heavy rainfall events inevitably increase
of 0.5 mm/h. Apparently, most of the samples concentrate the occurrence frequency of large errors in QPE over the
around the boxes of QPE with GPM IMERG < 3.0 mm/h ocean.
and predicted QPE < 0.5 mm/h, indicating a significant As shown in the scatter plot in Fig. 8, the QPEs retrieved
underestimated QPE. Except for the extremely heavy rainfall from GPM IMERG are validated against those from the RF
events (>20 mm/h), the algorithm tends to underestimate the prediction model over land and ocean, respectively. It is not
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
MIN et al.: ESTIMATING SUMMERTIME PRECIPITATION FROM HIMAWARI-8 AND GFS BASED ON ML 2567
TABLE VII
S TATISTICS ON THE M EAN S CORES OF N ONPRECIPITATION /P RECIPITATION D ISCRIMINATION A LGORITHM AND
THE M EAN MAE AND RMSE OF QPE A LGORITHM OVER L AND AND O CEAN
a surprise that more rainfall samples concentrate around the satellite observation and NWP data can ensure successful
boxes >8.0 mm/h over the ocean and <3.0 mm/h over land. nowcasting applications. In addition, the better nonprecipita-
In general, the predicted QPE values show a better consistent tion/precipitation classification results can be found over ocean
result over land compared with over ocean, particularly the than land due to the homogenous surface. However, we also
cases with light rainfall. The accuracy of extremely heavy find the higher MAE and RMSE in the predicted QPE values
rainfall events (>20 mm/h) prediction is still low over both over the ocean, which is closely associated with the high
land and ocean. occurrence frequency of heavy rainfall event.
Better yet, we plan to use the GFS NWP data with a
V. C ONCLUSION spatial resolution of 0.25° × 0.25° in the future, which have
already been released real-time for a few months to the China
This paper aims to investigate and develop a unified (a con- Meteorological Administration since 2017. In addition, we are
sistent retrieval between day, night, and twilight belts in a full also looking forward to training a new prediction model
disk observation) QPE algorithm for nowcasting application in based on a whole year data to better support nowcasting
summer by combining real-time Himawari-8/AHI observation application of the new-generation GEO satellite data, such
data, cloud physical properties products, and GFS NWP data. as FY-4A and H08. Given the open source nature of such
The RFs ensemble classification and regression technique was ML framework as TensorFlow, keras, Pytorch, scikit-learn,
used here to implement near real-time precipitation prediction. these algorithms under a DL framework richly deserve further
This new algorithm is remarkably different from the traditional studies to investigate the possibility of predicting or estimating
and existing QPE for GEO satellite imager due to it is not rain rate from space.
using a conventional parametric approach but an RFs ML
algorithm. Its key advantage is the ability to capture nonlinear
ACKNOWLEDGMENT
association patterns between predictor and predictand, such as
precipitation. The authors would like to thank NASA, JMA, and NOAA
Compared with the existing ML approach [24] for QPE, the for freely providing the GPM IMERG, Himawari-8, and
spatiotemporally matched real-time NWP data are introduced GFS NWP data online. They would also like to thank the
as additional predictors in the regression model. It was note- Python and Scikit-Learn Groups for providing power computer
worthy that some high-rank parameters derived from NWP tools, and the anonymous reviewers for their thoughtful and
data directly indicate an important role of atmospheric back- constructive suggestions and comments.
ground information in the RF classification and regression
models. This new finding illustrates that the atmospheric envi- R EFERENCES
ronment field data are also valuable for nowcasting products [1] Y. Ding, Z. Wang, and Y. Sun, “Inter-decadal variation of the summer
such as QPE based on an advanced ML algorithm like RFs. As precipitation in East China and its association with decreasing Asian
mentioned before, the RF algorithm is able to better capture summer monsoon,” Int. J. Climatol., vol. 28, no. 9, pp. 1139–1161,
2008.
nonlinear patterns between predictors from NWP data and [2] J. Guo et al., “Declining frequency of summertime local-scale precip-
precipitation. itation over eastern China from 1970 to 2010 and its potential link to
In addition, a sample-balance technique was also used aerosols,” Geophys. Res. Lett., vol. 44, no. 11, pp. 5700–5708, 2017.
[3] M. Min, P. Wang, J. R. Campbell, X. Zong, and Y. Li, “Midlatitude cirrus
to significantly improve the RF classification and regression cloud radiative forcing over China,” J. Geophys. Res. Atmos., vol. 115,
models based on the original sample data sets. Some sensitivity no. D20, p. D20210, 2010.
studies were conducted to test the effects of sample proportion [4] R. Yu, T. Zhou, A. Xiong, Y. Zhu, and J. Li, “Diurnal variations
of summer precipitation over contiguous China,” Geophys. Res. Lett.,
and three important RF model parameters (number of tree, tree vol. 34, no. 1, pp. 223–234, 2007.
max depth, and number of random estimator) on the accuracy [5] M. Min, P. Wang, J. R. Campbell, X. Zong, and J. Xia, “Cirrus cloud
of RF prediction model. The results show the mean hit rate macrophysical and optical properties over North China from CALIOP
measurements,” Adv. Atmos. Sci., vol. 28, no. 3, pp. 653–664, 2011.
of nonprecipitation/precipitation classification is about 0.87, [6] J. Guo et al., “Delaying precipitation and lightning by air pollution over
and the predicted QPE MAE and RMSE are, respectively, the Pearl River Delta. Part I: Observational analyses,” J. Geophys. Res.
0.51 and 2.0 mm/h. In spite of the predicted results of Atmos., vol. 121, no. 11, pp. 6472–6488, 2016.
QPE show a significant underestimation (especially, it cannot [7] J. Guo et al., “Aerosol-induced changes in the vertical structure of
precipitation: A perspective of TRMM precipitation radar,” Atmos.
retrieve extremely heavy rainfall value), we could still find Chem. Phys., vol. 18, no. 18, pp. 13329–13343, 2018.
the high consistent patterns of precipitation area and intensity [8] G. J. Huffman, D. T. Bolvin, E. J. Nelkin, and D. B. Wolff, “The TRMM
with the GPM IMERG data. However, the highly efficient Multisatellite Precipitation Analysis (TMPA): Quasi-global, multiyear,
combined-sensor precipitation estimates at fine scales,” J. Hydrometeo-
and consistent patterns of QPE retrieved using near real-time rol., vol. 8, no. 1, pp. 38–55, Feb. 2007.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2568 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
[9] M. Kühnlein, T. Appelhans, B. Thies, and T. Nauß, “Precipitation [34] T. J. Schmit et al., “The GOES-R Advanced Baseline Imager and the
estimates from MSG SEVIRI daytime, nighttime, and twilight data continuation of current sounder products,” J. Appl. Meteorol. Climatol.,
with random forests,” J. Appl. Meteorol. Climatol., vol. 53, no. 11, vol. 47, no. 10, pp. 2696–2711, 2008.
pp. 2457–2480, 2014. [35] M. Min et al., “Developing the science product algorithm testbed
[10] G. A. Vicente, R. A. Scofield, and W. P. Menzel, “The operational for Chinese next-generation geostationary meteorological satellites:
GOES infrared rainfall estimation technique,” Bull. Amer. Meteorol. FengYun-4 series,” J. Meteorol. Res., vol. 31, no. 4, pp. 708–719,
Soc., vol. 79, no. 9, pp. 1883–1898, 1998. 2017.
[11] R. Joyce, J. Janowiak, and G. Huffman, “Latitudinally and seasonally [36] J. Yang, Z. Zhang, C. Wei, F. Lu, and Q. Guo, “Introducing the new
dependent zenith-angle corrections for geostationary satellite IR bright- generation of Chinese geostationary weather satellites, Fengyun-4,” Bull.
ness temperatures,” J. Appl. Meteorol., vol. 40, no. 40, pp. 689–703, Amer. Meteorol. Soc., vol. 98, no. 8, pp. 1637–1658, Aug. 2017.
2001. [37] G. J. Huffman et al., “NASA Global Precipitation Measurement (GPM)
[12] A. Y. Hou et al., “The global precipitation measurement mission,” Bull. integrated multi-satellite retrievals for GPM (IMERG),” Algorithm
Amer. Meteorol. Soc., vol. 95, pp. 701–722, May 2014. Theoretical Basis Document (ATBD) Version 4.5, 2015, pp. 1–26.
[13] R. J. Kuligowski, “GOES-R Advanced Baseline Imager (ABI) algorithm [Online]. Available: https://pmm.nasa.gov/sites/default/files/document_
theoretical basis document for rainfall rate (QPE), version 2.0,” files/IMERG_ATBD_V4.5.pdf
Algorithm Theor. Basis Document (ATBD), Tech. Rep., 2010, pp. 1–44. [38] A. AghaKouchak, A. Mehran, H. Norouzi, and A. Behrangi, “System-
[Online]. Available: https://www.goes-r.gov/products/ATBDs/baseline/ atic and random error components in satellite precipitation data sets,”
Hydro_RRQPE_v2.0_no_color.pdf Geophys. Res. Lett., vol. 39, no. 9, p. L09406, 2012.
[14] B. Thies, T. Nauss, and J. Bendix, “Discriminating raining from non- [39] V. Bharti and C. Singh, “Evaluation of error in TRMM 3B42V7
raining cloud areas at mid-latitudes using meteosat second generation precipitation estimates over the Himalayan region,” J. Geophys. Res.,
SEVIRI night-time data,” Meteorol. Appl., vol. 15, no. 8, pp. 219–230, Atmos., vol. 120, no. 24, pp. 12458–12473, 2015.
2008. [40] T. J. Greenwald et al., “Real-time simulation of the GOES-R ABI
for user readiness and product evaluation,” Bull. Amer. Meteorol. Soc.,
[15] B. Thies, T. Nauß, and J. Bendix, “Precipitation process and rainfall
vol. 97, no. 2, pp. 245–261, 2016.
intensity differentiation using Meteosat Second Generation Spinning
[41] D. Chen et al., “The cloud top distribution and diurnal variation of
Enhanced Visible and Infrared Imager data,” J. Geophys. Res., Atmos.,
clouds over East Asia: Preliminary results from Advanced Himawari
vol. 113, no. D23, 2008.
Imager,” J. Geophys. Res., Atmos., vol. 123, no. 7, pp. 3724–3739,
[16] M. Min et al., “An investigation of the implications of lunar illumination Apr. 2018.
spectral changes for Day/Night Band-based cloud property retrieval due [42] S. D. Miller et al., “A sight for sore eyes: The return of true color
to lunar phase transition,” J. Geophys. Res., Atmos., vol. 122, no. 17, to geostationary satellites,” Bull. Amer. Meteorol. Soc., vol. 97, no. 10,
pp. 9233–9244, 2017. pp. 1803–1816, 2016.
[17] J. Li, J. Huang, K. Stamnes, T. Wang, Q. Lv, and H. Jin, “A global survey [43] A. K. Heidinger, A. T. Evan, M. J. Foster, and A. Walther, “A naive
of cloud overlap based on CALIPSO and CloudSat measurements,” Bayesian cloud-detection scheme derived from CALIPSO and applied
Atmos. Chem. Phys., vol. 15, no. 1, pp. 519–536, 2015. within PATMOS-x,” J. Appl. Meteorol. Climate, vol. 51, no. 6,
[18] J. Tan, W. A. Petersen, and A. Tokay, “A novel approach to identify pp. 1129–1144, 2012.
sources of errors in IMERG for GPM ground validation,” J. Hydrome- [44] J. Li et al., “The impact of atmospheric stability and wind shear on
teorol., vol. 17, no. 9, pp. 2477–2491, 2016. vertical cloud overlap over the Tibetan Plateau,” Atmos. Chem. Phys.,
[19] Q. Lu, W. Bell, P. Bauer, N. Bormann, and C. Peubey, “An evaluation vol. 18, no. 10, pp. 7329–7343, 2018.
of FY-3A satellite data for numerical weather prediction,” Quart. J. Roy. [45] M. Kanamitsu, “Description of the NMC global data assimilation and
Meteorol. Soc., vol. 137, no. 658, pp. 1298–1311, 2011. forecast system,” Weather Forecasting, vol. 4, no. 3, pp. 335–342, 1989.
[20] R. J. Kuligowski, “A self-calibrating real-time GOES rainfall algorithm [46] L. Breiman and A. Cutler. (2013). Random Forests-Classification
for short-term rainfall estimates,” J. Hydrometeorol., vol. 3, no. 2, Manual. [Online]. Available: http://www.stat.berkeley.edu/;breiman/
pp. 112–130, 2002. RandomForests/cc_home.htm
[21] G. Mountrakis, J. Im, and C. Ogole, “Support vector machines in remote [47] A. G. Laing and J. M. Fritsch, “The large-scale environments of
sensing: A review,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 3, the global populations of mesoscale convective complexes,” Monthly
pp. 247–259, 2011. Weather Rev., vol. 128, no. 8, pp. 2756–2776, 2000.
[22] P. E. Utgoff, “Incremental induction of decision trees,” Mach. Learn., [48] G. J. Zhang, “Roles of tropospheric and boundary layer forcing in the
vol. 4, no. 2, pp. 161–186, 1989. diurnal cycle of convection in the U.S. Southern great plains,” Geophys.
[23] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Res. Lett., vol. 30, no. 24, Dec. 2003.
2001. [49] J. Roman, R. Knuteson, S. Ackerman, and H. Revercomb, “Estimating
[24] J. Schmidhuber, “Deep learning in neural networks: An overview,” minimum detection times for satellite remote sensing of trends in mean
Neural Netw., vol. 61, pp. 85–117, Jan. 2015. and extreme precipitable water vapor,” J. Climate, vol. 29, no. 22,
[25] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 8211–8230, 2016.
pp. 436–444, May 2015. [50] Y. Liu, N. V. Chawla, M. P. Harper, E. Shriberg, and A. Stolcke,
[26] M. Kühnlein, T. Appelhans, B. Thies, and T. Nauss, “Improving the “A study in machine learning from imbalanced data for sentence
accuracy of rainfall rates from optical satellite sensors with machine boundary detection in speech,” Comput. Speech Lang., vol. 20, no. 4,
learning—A random forests-based approach applied to MSG SEVIRI,” pp. 468–494, 2006.
Remote Sens. Environ., vol. 141, pp. 129–143, Feb. 2014. [51] M. Min, Y. Zhang, Z. Rong, and L. Dong, “A method for monitoring
the on-orbit performance of a satellite sensor infrared window band by
[27] D. I. F. Grimes, E. Coppola, M. Verdecchia, and G. Visconti, “A neural
using oceanic drifters,” Int. J. Remote Sens., vol. 35, no. 1, pp. 382–400,
network approach to real-time rainfall estimation for Africa using
2014.
satellite data,” J. Hydrometeorol., vol. 4, no. 6, pp. 1119–1133, 2003.
[52] T. J. Schmit, J. Li, S. A. Ackerman, and J. J. Gurka, “High-spectral-
[28] M. Pal, “Random forest classifier for remote sensing classification,” Int. and high-temporal-resolution infrared measurements from geostationary
J. Remote Sens., vol. 26, no. 1, pp. 217–222, 2007. orbit,” J. Atmos. Ocean. Technol., vol. 26, no. 11, pp. 2273–2292,
[29] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. 2009.
Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011. [53] Y. Ai, J. Li, W. Shi, T. J. Schmit, C. Cao, and W. Li, “Deep convective
[30] J. Bergstra et al., “Theano: A CPU and GPU math compiler in Python,” cloud characterizations from both broadband imager and hyperspectral
in Proc. Python Sci. Comput. Conf. (SciPy), 2010, pp. 1–7. infrared sounder measurements,” J. Geophys. Res., Atmos., vol. 122,
[31] M. Abadi et al., “TensorFlow: A system for large-scale machine no. 3, pp. 1700–1712, 2017.
learning,” presented at the 12th USENIX Symp. Oper. Syst. Design [54] J. Mecikalski, K. Bedka, S. Paech, and L. Litten, “A statistical
Implement., 2016. evaluation of GOES cloud-top properties for nowcasting convective
[32] J. Schmetz et al., “An introduction to Meteosat Second Generation initiation,” Monthly Weather Rev., vol. 136, no. 12, pp. 4899–4914,
(MSG),” Bull. Amer. Meteorol. Soc., vol. 83, no. 7, pp. 977–992, 2002. 2008.
[33] T. J. Schmit, M. M. Gunshor, W. P. Menzel, J. Li, and A. S. Bachmeier, [55] M. Grecu and W. F. Krajewski, “A large-sample investigation of
“Introducing the next-generation Advanced Baseline Imager on statistical procedures for radar-based short-term quantitative precip-
GOES-R,” Bull. Amer. Meteorol. Soc., vol. 86, no. 8, pp. 1079–1096, itation forecasting,” J. Hydrol., vol. 239, nos. 1–4, pp. 69–84,
2005. Dec. 2000.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
MIN et al.: ESTIMATING SUMMERTIME PRECIPITATION FROM HIMAWARI-8 AND GFS BASED ON ML 2569
Min Min received the B.S. degree in applied Fu Wang received the B.S., M.S., and Ph.D. degrees
meteorology from the Nanjing University of Infor- in electronic technology from the University of Elec-
mation Science and Technology, Nanjing, China, tronic Science and Technology of China, Chengdu,
in 2005, and the Ph.D. degree in atmospheric physics China, in 2008, 2011, and 2015, respectively.
and environment from the Institute of Atmospheric He is currently an Assistant Professor with
Physics, Chinese Academy of Sciences, Beijing, the National Satellite and Meteorological Cen-
China, in 2010. ter, China Meteorological Administration, Beijing,
From 2013 to 2014, he was a Visiting Research China. His research interests include cloud algo-
Assistant with the Department of Physics, University rithms of FengYun satellite sensor and cloud-aerosol
of Maryland, Baltimore, MD, USA. He is currently interaction.
an Associate Professor with the National Satellite
and Meteorological Center, China Meteorological Administration, Beijing. His
research interests include cloud and weather science algorithms of satellite
remote sensing, atmospheric radiative transfer, and calibration of FengYun
satellite sensor.
Hui Xu received the B.S. degree from Central
South University, Changsha, China, in 2010, and the
Ph.D. degree from the Institute of Remote Sens-
ing Applications, Chinese Academy of Sciences,
Chen Bai received the B.S. degree in optical engi- Beijing, China, in 2015.
neering from the School of Optoelectronics, Beijing She is currently an Assistant Professor with
Institute of Technology, Beijing, China, in 2017, the Chinese Academy of Meteorological Sciences,
where he is currently pursuing the master’s degree China Meteorological Administration, Beijing. Her
with the School of Optoelectronics. research interests include cloud radiative forcing and
aerosol radiative forcing.
Fenglin Sun received the B.S. degree in commu- Bo Li received the B.S. degree in atmospheric
nication engineering from the Ocean University of science from Nanjing University, Nanjing, China,
China, Qingdao, China, in 2009, and the Ph.D. in 2006, and the Ph.D. degree in meteorology from
degree in magnetic and microwave technology from the Institute of Atmospheric Physics, Chinese Acad-
the National Space Science Center, Chinese Acad- emy of Sciences, Beijing, China, in 2011.
emy of Sciences, Beijing, China, in 2014. She is currently an Associate Professor with
Since 2014, he has been an Assistant Professor the National Satellite and Meteorological Center,
with the National Satellite and Meteorological Cen- China Meteorological Administration, Beijing. Her
ter, China Meteorological Administration, Beijing. research interests include cloud phase algorithms of
His research interests include convection and pre- FengYun satellite and atmospheric circulation.
cipitation algorithms of geostationary satellite, mode
recognition, and microwave calibration of FengYun satellite sensor.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2570 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
Lixin Dong received the B.S. degree in applied Jun Li received the B.S. degree in mathematics
meteorology from the Nanjing Meteorological Insti- from Peking University, Beijing, China, in 1987, and
tute, Nanjing, China, in 1995, and the Ph.D. degree the M.S. and Ph.D. degrees in atmospheric science
from the Institute of Remote Sensing Applications from the Chinese Academy of Sciences, Beijing,
of Chinese Academy of Sciences, Beijing, China, in 1990 and 1996, respectively.
in 2008. Since 1997, he has been with the Cooperative
He is currently an Associate Professor with the Institute for Meteorological Satellite Studies, Space
National Satellite and Meteorological Center, China Science and Engineering Center (SSEC), University
Meteorological Administration, Beijing, where he of Wisconsin–Madison, Madison, WI, USA, where
is involved in land surface temperature and soil he was involved in advanced imager/sounder data
moisture inversion algorithms of FengYun satellite processing and applications, especially on the syn-
image. ergistic use of high-spatial-resolution imager data and high-spectral- resolution
infrared sounder data for deriving atmospheric temperature and moisture
profiles, cloud and aerosol/dust properties, as well as surface properties.
He was involved in the methodologies for improving the assimilation of
hyperspectral infrared sounder measurements for tropical cyclone forecasts
in regional numerical weather prediction models, and also in the development
and application of a retrieval methodology for processing the Advanced
TOVS data from the NOAA- 15/16/17/18/19 polar-orbiting satellites and
the geostationary sounder data from GOES-8/9/10/11/12/13/14/15 satellites
to obtain the real-time atmospheric soundings and derived products.
He is currently the Principal Investigator with SSEC, University of
Wisconsin–Madison, where he is involved in several GOES/POES-related
projects, including International ATOVS Processing Package, GOES-R trade
studies, GOES-R legacy profile product development, GOES-R high-impact
weather studies, JPSS application for tropical cyclone forecasts, and regional
OSSE for future sounding measurement systems.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.