Estimating Summertime Precipitation From Himawari-8 and Global Forecast System Based On Machine Learning

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO.
5, MAY 2019 2557
Estimating Summertime Precipitation from

Himawari-8 and Global Forecast System
Based on Machine Learning
Min Min , Chen Bai, Jianping Guo , Fenglin Sun, Chao Liu , Fu Wang, Hui Xu,
Shihao Tang, Bo Li, Di Di, Lixin Dong, and Jun Li
Abstract— Random forests (RFs), an advanced machine learn- compared with over land. Meanwhile, this RF algorithm tends
ing (ML) method, was used here to develop a robust and rapid to underestimate rain rate, especially in the presence of heavy
quantitative precipitation estimates (QPEs) algorithm for the rainfall. Despite this, it still produces a reasonable pattern of
new-generation geostationary satellite of Himawari-8. In this rainfall area and intensity, which are highly consistent with GPM
algorithm, the global precipitation measurement (GPM) product observations.
has been employed to train QPE prediction model. The real-time
multiband infrared brightness temperature from Himawari-8, Index Terms— Himawari-8, machine learning (Ml), quantita-
combined with the spatiotemporally matched numerical weather tive precipitation estimates (QPEs), random forests (RFs).
prediction (NWP) data from the global forecast system, have
been used as predictor variables for QPE. Among the variables
used in RF learning model, total precipitable water and K -index I. I NTRODUCTION
from NWP data have the highest rankings, indicating the
importance of atmospheric environment for QPE. To enhance
the accuracy of RF models or to optimize model training,
a sample-balance technique has been utilized to adjust the
S UMMERTIME precipitation systems in East Asia play
an important role in the energy equilibrium, climatic
system, and freshwater sustainability [1]. Long-term gage-
ratios of samples in nonprecipitation/precipitation classification
and quantitative precipitation regression data sets. Further and satellite-based precipitation observations have been widely
sensitivity and validation analyses help determine the optimal used to characterize the interannual, interdecadal, or diur-
RF classification and regression models for predicting non- nal variabilities of precipitation, which are either connected
precipitation/precipitation pixel and rain rate. The selected RF to large-scale circulation, or to aerosol pollution [1]–[7].
classification model is found to predict precipitation area with an In particular, high-quality quantitative precipitation estima-
accuracy of 0.87. For predicted QPE product, the mean-absolute-
error and root-mean-square error of RF regression model are tion (QPE) products are imperiously needed for nowcast of
0.51 and 2.0 mm/h, respectively. Overall, the RF ML algorithm high-impact weather or ecologically and hydrometeorologi-
has a higher detection rate over homogenous ocean surface as cally oriented projects [8]. Traditionally, the ground-based rain
gages and weather radars can well measure the precipitation
Manuscript received July 16, 2018; revised September 8, 2018; accepted
October 5, 2018. Date of publication November 1, 2018; date of current at high temporal resolution [4]. Nevertheless, these ground-
version April 22, 2019. This work was supported partly by the National Key based instruments are extremely lacking in many parts of the
R&D Program of China under Grants 2018YFB0504800 (2018YFB0504802), world, including oceans, mountainous regions, inland lakes,
2016YFA0600101, and 2017YFC1501401, in part by the Preresearch Project
under Grant D040103, in part by the National Natural Science Foundation and sparsely populated remote areas [9]. As a key complemen-
of China under Grant 41775045, Grant 41571348, Grant 41771399, Grant tary data, QPE products from space emerge in recent decades,
41605030, and Grant 41601400, and in part by the Chinese Academy of since considerable weather satellites have been successfully
Meteorological Sciences under Grant 2017Z005. (Corresponding authors:
Jianping Guo; Fenglin Sun.) launched and in operation, such as geostationary (GEO)
M. Min, F. Sun, F. Wang, S. Tang, B. Li, D. Di, and L. Dong are with the meteorological satellite (e.g., Himawari, Fengyun-2, GEOS),
Key Laboratory of Radiometric Calibration and Validation for Environmental and the low-earth-orbiting passive microwave satellites [i.e.,
Satellites, National Satellite Meteorological Center, China Meteorological
Administration, Beijing 100081, China (e-mail: sunfl@cma.gov.cn). the tropical rainfall measuring mission (TRMM) and the global
C. Bai is with the School of Optoelectronics, Beijing Institute of Technology, precipitation measurement (GPM] mission) [8], [10]–[12],
Beijing 100081, China. which to great extent fills the observational gap that ground-
J. Guo and H. Xu are with the State Key Laboratory of Severe Weather,
Chinese Academy of Meteorological Sciences, Beijing 100081, China (e-mail: based measurements have left.
jpguocams@gmail.com). It has been well documented that the QPE products
C. Liu is with the Key Laboratory for Aerosol-Cloud-Precipitation of China from GEO meteorological satellite possess the advantage of
Meteorological Administration, School of Atmospheric Physics, Nanjing
University of Information Science and Technology, Nanjing 210044, China. relatively high temporal and spatial resolution, albeit the
J. Li is with the Cooperative Institute for Meteorological Satellite Study, limited retrieval accuracy due to the indirect connection
University of Wisconsin–Madison, Madison, WI 53706 USA. between surface rain rates and cloud top brightness temper-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. atures (TBB) [10], [13]–[17]. Theoretically, compared with
Digital Object Identifier 10.1109/TGRS.2018.2874950 infrared (IR) radiation measurements, microwave radiation
0196-2892 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Hyderabad IG Memorial Library. Downloaded on September 13,2022 at 04:36:42 UTC from IEEE Xplore. Restrictions apply.
2558 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 5, MAY 2019
can better penetrate clouds and interact more directly with in East Asia, and to promote wider application of GEO
precipitation, thereby getting a superior QPE and even meteorological satellite data in nowcasting applications.
3-D structure of rainfall [12], [18]. More importantly, The remainder of this paper proceeds as follows. Section II
the microwave observations have been well assimilated into briefly introduces the satellite and ancillary data used for
numerical weather prediction (NWP) model, much improv- training RF model and QPE. Section III presents the algorithm
ing the prediction products [19]. However, the microwave in detail, including RFs introduction and RF classification
satellite has relatively low temporal resolution. Currently, and regression models. In Section IV, major results of QPE
the instantaneous and operational QPE product from NOAA based on the ML algorithm are presented, which are further
GEO satellite IR measurements has been implemented using validated against Integrated MultisatellitE Retrievals (IMERG)
a self-calibrating multivariate precipitation retrieval algorithm for GPM. Finally, Section V provides a short summary.
developed by Kuligowski [13], [20] which is mainly based on
the linear correlations between the estimates and ground-based
II. DATA
rain gage observations.
In recent years, a wide spectrum of emerging machine To train the model in RF algorithm, we employed three
learning (ML) techniques, such as support vector months (June, July, and August of 2016) of continuous quanti-
machines (SVMs) [21], decision trees (DTs) [22], random tative precipitation data from the level 3 gridded GPM IMERG
forests (RF) [23], artificial neural network [24] and, deep (version of V04A) data [8], [18], [37]. The IMERG data
learning (DL) [25], have been successfully and extensively have a time interval of half an hour with the maximum rain
applied in QPE [9], [26]–[28]. The computing efficiencies rate of 50.0 mm/h, which covers the whole area between
of ML techniques have been much improved, offering us the latitudes of 60°S and 60°N within a spatial resolution
unprecedented opportunities to process large-volume data sets of 0.1° × 0.1° [8]. Technically, this is a uniform and merged
in near real-time systems, such as Earth-observing satellite precipitation product by intercalibrating, merging, and interpo-
data. Notably, the rapid developments in ML frameworks, lating some satellite microwave precipitation estimates (e.g.,
such as scikit-learn [29], Theano [30], TensorFlow [31], NOAA Joint Polar Satellite System-Advanced Technology
and PyTorch (https://pytorch.org), make it easy to use Microwave Sounder, JPSS-ATMS), together with microwave-
advanced ML algorithms for model training and high- calibrated IR satellite estimates (e.g., NOAA GOES-E/W), rain
efficiency prediction. RFs [23], as the high-accurate and gage analyses, and potentially other precipitation estimators
promising ML algorithms, have received increasing attention at fine time and space scales for the TRMM and GPM eras
for remote sensing applications. As a bagging ensemble over the globe. Previous validation studies by comparing
classification and regression technique, the RF algorithm this product with ground-based rain gages or radars over
can easily run in a parallel computing mode and capture different regions all point to the good reliability of IMERG
nonlinear or complex relationships between predictor and products [18], [38], [39]. Note that the freely released real-
predictand [23]. Kühnlein et al. [9] implemented the QPE time IMERG V04A version data will be delayed by about
using the Meteosat Second Generation/Spinning Enhanced 3–4 months. Consequently, this time lag leads to its inca-
Visible and Infrared Imager data [32] using the RF algorithm, pability in supporting near real-time storm monitoring and
which was considered by three different typical scenarios of nowcasting.
daytime, twilight, and nighttime, respectively [9], [26]. In spite of the delayed dissemination time of the GPM
Since the beginning of 2014, a series of new-generation IMERG data, the H8/AHI measurements are real time
GEO meteorological satellites have been successfully and publicly accessible, which will be used to implement
launched, such as FengYun-4 (FY-4) of China Meteorological QPE with high spatial and temporal resolutions. Consid-
Administration, Himawari-8/9 of Japan Meteorological ering the common temporal coverage of these two satel-
Agency (JMA), Geostationary Operational Environmental lite data sets, we choose the overlapped period from
Satellites-R (GOES-R) of U.S. NOAA, and so on [33]–[36]. June to August 2016 to study summertime rainfall in
Himawari-8, as the JMA next-generation operational this investigation. Table I shows the specifications, spa-
geostationary satellite, was successfully launched on tial resolutions, and radiometric calibration accuracy of H8/
October 7, 2014, and the observation data began to be AHI (http://www.data.jma.go.jp/mscweb/en/himawari89/space
released on July 7, 2015. A 16-band Advanced Himawari _segment/doc/AHI8_performance_test_en.pdf), in addition to
Imager (AHI) is onboard Himawari-8 with diverse spatial its primary applications of different bands [35], [40], [41].
resolutions ranging from 0.5 (visible band) to 2.0 km Bands 1–6, concentrated in the visible and near-IR wave-
(IR band) and a full-disk observation frequency of 10 lengths, are designed to measure earth-view surface-reflected
min (http://www.jma-net.go.jp/msc/en/). The QPE from solar radiation during daylight hours, which are typically
Himawari-8 is quite limited, let alone those using ML. used for retrieving cloud, aerosol, and vegetation proper-
Therefore, the primary objective of this paper is to develop ties or making true color picture [42]. The thermal emissive
a rapid and unified (the differences between day, night, and bands (i.e., bands 7–16) observe thermal emission radiations
twilight belts are not considered) retrieval algorithm for QPE from Earth targets during both daytime and nighttime. Note
from real-time Himawari-8/AHI (H08/AHI) observations, that the observed radiances at 3.8-μm band are inevitably
using RF. The implementation of this rapid and unified impacted by sunlight at daytime and twilight (3.8-μm band
QPE algorithm is expected to improve the accuracy of QPE contains both reflected and emissive radiations at daytime and
MIN et al.: ESTIMATING SUMMERTIME PRECIPITATION FROM HIMAWARI-8 AND GFS BASED ON ML 2559
TABLE I
H IMAWARI -8/AHI S PECIFICATIONS . SST: S EA S URFACE T EMPERATURE
twilight). In order to make a consensus algorithm and reduce In addition, the high-resolution global surface elevation data
the impacts of heterogeneous reflected sunlight, we only (https://ngdc.noaa.gov/mgg/global/) are used here to collocate
employ the TBBs observed by nine IR bands (Bands 8–16) every H08/AHI pixel as one of the important predictor vari-
of H08/AHI from 6.24 to 13.28 μm to predict or esti- ables.
mate quantitative precipitation based on ML technique. The
radiometric calibration accuracies of aforementioned nine IR III. M ETHODOLOGY
bands of H08/AHI reach around 0.25% (Table I), which can
ensure the stability and consistency between training and A. Random Forests
predicting models. In addition, cloud phase (ice, water, mixed, RFs technique was originally proposed by Breiman [23],
and supercooled) and cloud top properties (height, pressure, which has been widely used for both classification and
and temperature) products retrieved from H08/AHI data are regression analyses without much hyperparameter tuning. This
also used here, which are generated from the near real-time method trains a number of DT predictors that are then aver-
and robust FengYun Geostationary Algorithm Testbed system aged to improve the predicted accuracy and reduce overfitting.
[35], [36], [41]–[44]. In this paper, all the TBBs observed Furthermore, it can well capture nonlinear association patterns
by IR bands and cloud products of H08/AHI are within a between predictor and predictand variables. The RFs are also
horizontal resolution of 2.0 km and a full-disk observation able to get unbiased estimates (deviations) of the regression
time interval of 10 min. or classification models by the out-of-bag (OOB) score esti-
In addition to H08/AHI observation data, we also employ mation. This DT-based ensemble ML method not only can
other dynamic and environmental or static ancillary data achieve a high accuracy prediction but also give the impor-
to further enhance the performance of QPE algorithm of tance score (IS) of the predictor variables used for training
H08/AHI. As an important atmospheric dynamic and envi- the forests. Despite the aforementioned advantages, the RF
ronmental data, the NWP data from National Centers for algorithm is still lack of interpretability and mathematical
Environmental Prediction Global Forecast System (GFS) with theory by nature, making it almost impossible or uneasily
a spatial resolution of 0.5° × 0.5°, 26 layers from 1000 to to demonstrate how the predictions or decisions are made.
10 hPa in vertical and a time interval of 3 h [45] can be In addition, it is not possible for the predictand values to go
regularly and daily obtained or downloaded at four different beyond the ranges of training data values [9], [23], [46].
initial forecast times (0000, 0600, 1200, and 1800 UTC) from Generally, the RF algorithm randomly selects n (number
National Oceanic and Atmospheric Administration (NOAA, of tree) bootstrap samples from different data sets to develop
ftp://nomads.ncdc.noaa.gov/GFS/Grid4). Normally, the real- the DT model. For each of the bootstrap samples, a subset of
time GFS NWP data are always used in weather forecast the predictor variables is selected randomly with the lowest
business, which are seldom applied in nowcasting. In this residual sum of squares to help the forest grow. Each forest
investigation, we introduce the GFS NWP data at an initial (with n trees) has to be made sure to grow to the largest
forecast time of 0600 UTC one day ago to predict or train extent. The final predictions are derived by averaging out
this QPE algorithm. Our goal is to use the atmospheric all regression trees, which are calculated by putting the test
dynamic and environmental information in NWP as back- data set down each of the forests. During the forest growth,
ground data to enhance the accuracy of predicted QPE. about one-third of the samples are left out to estimate the
Fig. 1. Two-step RF ML strategy and flowchart for QPEs.
OOB score and the importance of the variables used for the training for both RF classification and regression mod-
constructing the aforementioned tree. For more theoretical els in order to improve the performance of RF prediction
details and working principles of RF algorithm, the reader models. The details concerning how to train RF models
is referred to the previous research studies [23], [46]. based on the sample-balance technique will be discussed
In this investigation, the free, simple, and efficient scikit- in Sections II-C and II-D.
learn toolkit [29], a well-known Python module for ML, has
been used to implement the training, parameter adjustment,
and prediction within this RF algorithm. A range of typical C. Nonprecipitation and Precipitation Classification Training
classification, regression, and clustering algorithms are inte- Following the flowchart shown in Fig. 1, we developed
grated into this Python ML toolkit, including RFs, SVMs, and an RF classification model to identify nonprecipitation and
k-means, among others. (http://scikit-learn.org/stable/). precipitation pixels observed by H08/AHI, by selecting pre-
dictor variables that are pertaining to QPE. Caution should
be taken when using the conceptual framework or model
B. Processing Flow of the rainfall retrieval for the selection of RF predictor
In this paper, we use a two-step RF ML strategy to train variables, particular in extratropical cyclones [9], [14], [15].
the input data and to ultimately make QPE. Fig. 1 shows the In addition to the well-known inherited dominant or sensitive
flowchart for QPE using a two-step RF ML method. It roughly variables for rainfall retrieval [9], more predictor variables
contains two key steps. First, we have to match the H08/AHI listed in Table II are considered and used in this investigation
and NWP data with GPM IMERG rain rate data in the same for nowcasting applications from spatiotemporally matched
spatiotemporal scales, which are then used to train and develop GFS NWP data. In addition to TBBs, TBB differences, and
a nonprecipitation and precipitation classification model based cloud properties product from H08/AHI [9], we introduce
on the RF algorithm described in Section III-A. Then, this some traditional important weather metrics as calculated using
RF classification model is employed to identify whether any time-space matched NWP data to further and better support
given pixel for the H08/AHI satellite image can generate QPE. These weather indexes can well describe the thermal
rainfall or not. The second step is to develop a regression (i.e., K-index, θse850/925), dynamic (i.e., CAPE, CIN, LI-
model for QPE after the precipitating/nonprecipitating pixels index, EBS), and moisture (i.e., TPW) features of atmospheric
have been determined. For a real-time application, only those environmental fields [45], [46], which are closely associated
data with matched H08/AHI and NWP data will be used with the initiation and development of clouds that produce
to estimate quantitative precipitation based on the two RF rain [48], [49].
ML models (namely, RF classification and regression mod- For RF model training, we tune iteratively the para-
els). Note that we also use a sample-balance technique in meters in order to find an optimal model, including the
TABLE II TABLE III

P REDICTOR VARIABLES OR F EATURES U SED IN THE RF M ODELS FOR T HREE T YPICAL S AMPLE D ATA S ETS FOR RF T RAINING U NDER
N ONPRECIPITATION /P RECIPITATION C LASSIFICATION AND QPE D IFFERENT S CENARIOS , W HICH A RE I NDICATED BY THE R ATIO
OF THE N UMBER OF N ONPRECIPITATION P IXEL TO
THE N UMBER OF P RECIPITATION P IXEL
in the forest [9]. Also, the balance in the sample number of

nonprecipitation and precipitation significantly tends to exert
influences on the final predictive accuracy [9].
The original ratio of nonprecipitation samples to pre-
cipitation samples is 4.5:1, which is also referred to as
“Scenario-C-0.” In order to enhance the prediction accuracy,
we separately adjust the original nonprecipitation/precipitation
sample ratios to 3:1 and 2:1, which are marked as “Scenario-
C-1” and “Scenario-C-2,” respectively. Table III details the
number of samples used for RF classification training under
different scenarios as mentioned earlier. During the processes
total number of trees in the forest (n_estimators), the max- of classification training, the sample-balance technique [9] has
imum depth of the tree (max_depth), and the number been used to conduct diverse sensitivity studies. This technique
of predictor variables (or features, or denoted by f ) randomly downsamples the majority class (nonprecipitation
when looking for the best split (max_features)(http://scikit- pixels) in order to equally divide the number of minority
learn.org/stable/modules/ensemble.html#forest). Previous stu- and majority class samples. Due to only a subset of majority
dies [9], [23] have pointed out that the suggested default √ class samples being used in training, it may result in a
values of n_estimators for RF classification (= f) poor performance for the majority class [50]. Therefore, this
and regression (= f /3) models typically vary, depending technique is employed here to enhance the probability of
on the specific conditions when the RF classification is precipitation area detection (minority class).
performed. Therefore, the max_features for RF models are For the sample data set training under Scenario-C-1,
supposed to be tested until it reaches the best values. For- Fig. 2 shows the effect of number of trees in the forest
tunately, the new official RF package documentation suggests (n_estimators), maximum depth of the tree (max_depth),
that the empirical good default value of max_features or f and random split predictor variables (max_features) on OOB
for RF regression model equals to the total number scores for selecting an appropriate nonprecipitation and pre-
of predictor variables or features (http://scikit-learn.org/ cipitation RF classification model (n_estimators = 100, 300,
stable/modules/ensemble.html#forests-of-randomized-trees). 500, 1000; max_depth √ = 10, 20, 30, 40, and 50; max_features
Overall, the number of three months (June, July, and August = 4, 5, 6, and 7, f ≈ 5, f = 27 as detailed in Table
of 2016) of samples, which are matched pixel by pixel from II). Particularly, “NaN” in subboxes represents the failed case
H08/AHI and GPM IMERG data, is more than 30 billion. The (or none) for crashed RF model, which is mainly due to the
samples are too huge to train RF classification model. In light large storage capacity of the final RF model. Theoretically, the
of this issue, we randomly pick up 1/1000 matched samples storage capacity of the RF model will significantly increase
to train the RF classifier, thereby determining whether the with n_estimators and max_depth [23]. Also, it is shown
samples or pixels are nonprecipitating or precipitating. Also, that the highest OOB score of 0.8193 has been achieved
it should be noted that we exclude the data on 15 June, July, for the RF algorithm under a scenario with n_estimators =
and August, which will be used for validation in Section IV. 500, max_depth = 50, and max_features = 7. However, this
In terms of the DT, its number in the forest has great scenario ensues an extremely large storage capacity (72.0 GB),
implication for the OOB score [9], [26]. This score represents leading to a longer time (about 7 min) spent to read or input
an unbiased estimation of the RF model or fitting residual the RF model, which, in turn, seriously reduces the real-
error with training data set. Namely, a significant OOB error time efficiency of QPE. Given much smaller storage capacity
(= 1.0 OOB score) in the RF model has been induced by a (8.6 GB), and less time (0.3 min) spent for reading the data,
relatively small total number of trees (e.g., n_estimators < 10) the RF model with n_estimators = 300, max_depth = 20,
TABLE IV
IS S OF P REDICTOR VARIABLES IN THE RF M ODEL AND T HEIR
C ORRESPONDING R ANKINGS , BASED ON THE S AMPLES OF
S CENARIO -C-1 (3:1, N _E STIMATORS = 300, M AX _D EPTH = 20,
AND M AX _F EATURES = 7) FOR N ONPRECIPITATION /
P RECIPITATION C LASSIFICATION
Fig. 2. Effect of total number of trees in the forest (n_estimators),

maximum depth of the tree (max_depth), and random split predictor vari-
ables [max_features = (top left) 4, (top right) 5, (bottom left)6, and (bot-
tom right)7] on OOB scores for the nonprecipitation and precipitation RF
classification models using the Scenario-C-1 (3:1) samples. The number and
“NaN” in every subbox represent the accurate OOB score and failed scenario,
respectively.
and max_features = 7 has been selected as our optimal RF

classification model for real-time QPE, even though its OOB
score is 0.8136, a little bit lower than 0.8193.
Table IV shows the ISs of all the predictor variables and
their rankings for this optimal nonprecipitation/precipitation
RF classification model. As one of the key parameters in RF
algorithm, the IS here represents the weighting coefficient of
every predictor in the fitting prediction model. Among others,
the TBBs and TBB differences observed by H08/AHI get the
top rankings, such as T11.2−12.3, T7.3−12.3 , T10.4 , indicating (i.e., 27), the training of QPEs RF regression model only sets
the importance of cloud top properties from space. It is the number of random split variables (max_features) to be 27,
known that these near- or middle-IR wavelengths are generally in sharp contrast with the nonprecipitation/precipitation RF
characterized with relatively high atmospheric transmission classification model. As such, we are unable to perform the
and weak atmospheric absorption [34], [50]. In addition, same sensitivity test for the parameter of max_features as the
K -index and TPW from real-time NWP data show relatively RF classification model.
high rankings. In addition to TPW, it is very interesting to Fig. 3 shows that the RF regression model is unable to get
see that T8.6 , representing low atmospheric layer water vapor a relatively high OOB score when we use the original rain
(WV) information, also get a high IS [51]. Overall, the results rate samples. Note that about 6 million rainfall samples (about
illustrate that not only real-time H08/AHI observation data but one-fifth of all the samples) are selected from the original
also NWP data (representing environmental field information) data set for the RF classification model training. Nevertheless,
are significant for discriminating nonprecipitation and precip- the relatively lower OOB scores (about 0.23) indicate that the
itation pixels. RF regression model will get more biased estimates than the
classification model, which tends to likely induce a significant
underestimation of rain rate for the strong dependence of
D. Regression Training for Quantitative Precipitation OOB score on the deviations between prediction and observed
Estimates rain rate. Similar to previous studies [9], [26], [50] and
Given that the maximum number of random split variables nonprecipitation/precipitation RF classification model, the RF
must be smaller than the total number of predictor variables regression model also cannot predict the relatively heavy rain
TABLE V
IS S OF P REDICTOR VARIABLES OF THE RF M ODEL AND T HEIR
C ORRESPONDING R ANKINGS , BASED ON THE S AMPLES OF
S CENARIO -R-2 ( N _E STIMATORS = 100, M AX _D EPTH = 40,
AND M AX _F EATURES = 27) FOR Q UANTITATIVE
P RECIPITATION R EGRESSION
Fig. 3. Same as Fig. 2, but for quantitative precipitation RF regression models

based on three typical scenarios. (Top left) Scenario-R-0. (Top right) Scenario-
R-1. (Bottom left) Scenario-R-2. The number of random split predictor
variables, max_features = 27.
event very well. This is probably attributed to that the training

sample distributed unevenly (a small proportion of heavy
rainfall samples). Therefore, in order to better enhance the
OOB score and predictive accuracy of a heavy rain event,
the sample-balance technique has to be used to increase the
numbers of high-rain-rate samples. Fig. 4 shows the different
sample number distributions of rainfall rate for three typi-
cal scenarios: Scenario-R-0, Scenario-R-1, and Scenario-R-2,
which correspond to three data sets under original (natural),
decline, and equilibrium sample number distribution scenarios.
The two new samples (Scenario-R-1/2) stem from the original
sample (Scenario-R-0) by simply increasing the high-rain-rate
samples. Most of the rainfall samples are found to be smaller
than 4 mm/h for the Scenario-R-0 data set, indicating a clear
imbalance of sample distribution.
Same as in Fig. 2, Fig. 3 also shows the effect of total
number of trees in the forest and a maximum depth of the tree
on OOB scores for the quantitative precipitation RF regres-
sion models under three different scenarios mentioned before.
Apparently, it shows the significant increases in OOB scores
of RF model with the maximum depth for Scenario-R-1 and
Scenario-R-1-2. Considering the total storage capacities and
OOB scores of all RF models in Fig. 3, we choose the model Fig. 4. Probability distributions of rain rate sample for three typical scenarios,
with n_estimators = 100 and max_depth = 40 (OOB score namely, Scenario-R-0, Scenario-R-1, and Scenario-R-2. N represents the total
= 0.9887) under Scenario-R-2 as the optimal RF regression number of training samples.
model, whose effective storage capacity is about 20.0 GB (only
1–2 min required for loading this model). Table V shows the with Table IV, we find the ranking of predictor variables
ISs of all the predictor variables and their rankings for the QPE from NWP data have been significantly improved, indicating
RF regression model based under Scenario-R-2. Compared the importance of atmospheric environment field parameters
Fig. 5. Comparisons of nonprecipitation/precipitation [(a)–(d) at 2230 UTC on June 15, 2016; (e)–(h) at 0200 UTC on July 15, 2016] between the GPM
IMERG QPE products (first column) and the predictions using three different RF classification models based on the samples of Scenario-C-0 (second column),
Scenario-C-1 (third column), and Scenario-C-2 (fourth column). Purple area: presence of rainfall.
cannot be ignored in near real-time QPE. In the meantime, TABLE VI

the TBBs (T6.2 , T6.9 , T7.3 , and T8.6 ) observed by four WV S TATISTICS ON THE P ERFORMANCE M ETRICS OF P RECIPITATION
C LASSIFICATION U SING T HREE I NDEPENDENT RF C LASSIFICATION
absorption channels [52] drop sharply in the ranks, implying M ODELS BASED ON THE S CENARIOS U SED TO T RAIN
a weak connection between atmospheric moisture content at S AMPLES : S CENARIO -C-0/1/2 ( N _E STIMATORS = 300,
high layer and ground-based rain rate. However, T7.3−12.3 M AX _D EPTH = 20, AND M AX _F EATURES = 7)
still keeps a relatively high ranking in the RF regression model.
This could be due to the precipitating areas dominated by
convections, which are always characterized by a deep cloud
with cloud top reaching much high into the troposphere [9].
The brightness temperature difference between the WV and
IR channels (i.e., T7.3−12.3 ) reflects to some extent the
information of the cloud-top height relative to the tropopause
level [53]. In addition, we find the relatively low rankings of follows [54], [55]:
cloud phase and surface elevation in Tables IV and V, which
POD = a/(a + c) (1)
implies their weak associations with rainfall event.
FAR = b/(a + b) (2)
IV. VALIDATIONS CSI = a/(a + b + c) (3)
HSS = 2(ad − bc)/[(a + c)(c + d) + (a + b)(b + d)] (4)
A. Validation of all Rainfall Prediction Results
HR = (a + d)/(a + b + c + d) (5)
As above-mentioned in Section III, three days (June 15,
2016; July 15, 2016; and August 15, 2016) of indepen- where a is the number of imagery pixels identified by both
dent and spatiotemporally matched data are used here for GPM IMERG product and RF classification model as precip-
validating the performances of the RF classification and itation; c signifies the number of pixels that GPM IMERG
regression models. For the nonprecipitation/precipitation RF product indicates as precipitation but RF model does not; b
classification model, we introduce five classical metrics to denotes the number of imagery pixels that RF model identifies
quantitatively assess the nonprecipitation and precipitation as precipitation but GPM IMERG product does not; and
classification results: probability of detection (POD, optimal d represents the number of pixels that both GPM IMERG
= 1), false-alarm ratio (FAR, optimal = 0), critical success product and RF regression model identify as nonprecipitation.
index (CSI, optimal = 1), Heidke skill score (HSS, optimal = Table VI summarizes the statistics on the performance
1), and hit rate (HR, optimal = 1), which are formulated as metrics of precipitation classification using three independent
Fig. 6. Comparisons of QPE [(a) and (b) at 0730 UTC on July 15, 2016; (c) and (d) at 1930 UTC on June 15, 2016] between the (left) GPM IMERG
product and the (right) prediction using the RF classification and regression models.
RF classification models (n_estimators = 300, max_depth =

20, and max_features = 7), based on the three scenarios used
to train samples. Compared with Scenario-C-0, the POD for
Scenario-C-2 increases significantly from 0.48 to 0.68 for pre-
cipitation pixel prediction due to the sharp increase in precipi-
tation samples. In addition, the CSI and HSS scores vary only
by a small margin when we use the samples of Scenario-C-2,
regardless of the ratio of nonprecipitation and precipitation
samples. The HRs decrease slightly with rising precipitation
samples. In contrast, the FARs significantly increase from
0.27 to 0.41, most likely due to the increasing detection
probability of precipitation pixel. Although the enhanced
POD can better predict precipitation pixel, the FAR increases
with the increasing ratio of precipitation pixels. Therefore,
due to the nearly invariable CSI and HSS scores between
Scenario-C-1 and Scenario-C-2 in Table VI, we ultimately
choose the RF classification model based on the Scenario-
C-1 data set (3:1, n_estimators = 300, max_depth = 20, and
max_features = 7), which is expected to better predict non-
Fig. 7. Comparisons of QPE between GPM IMERG and prediction model.
precipitation/precipitation pixels (see details in Section III-B). Color bar: occurrence frequency (in log scale) at intervals of 0.5 mm/h.
To visually inspect the classification results, Fig. 5 shows the
two typical nonprecipitation/precipitation classification cases
at 2230 UTC (night time) on June 15 and 0200 UTC (daytime) models of Scenario-C-0, Scenario-C-1, and Scenario-C-2 (with
on July 15, 2016 between the GPM IMERG QPE products the highest POD). It is obvious that the nonprecipitation
and the predictions using the three different RF classification and precipitation prediction results using the RF algorithm
Fig. 8. Same as Fig. 7, but for the results over (left) land and (right) ocean.
based on Scenario-C-1 yield a better consistence with the slight and moderate rain rates as well. Based on the same
IMERG QPE products. In contrast, the RF algorithm based aforementioned validation data, mean absolute error (MAE)
on Scenario-C-2 tends to overestimate the precipitation pix- and root-mean-square error (RMSE) of all predicted QPE are
els from H08/AHI observations. Intriguingly, some missing 0.51 and 2.0 mm/h, respectively. Overall, despite the fact that
independent precipitating cloud systems for Scenario-C-2 are the algorithm may miss some rainfall areas and underestimate
able to be predicted when we use the optimal Scenario-C- rain rate (or QPE), a high consistent pattern is still able to
1 classification model. be found between GPM IMERG product and near real-time
Fig. 6(a)–(d) illustrates the comparisons of QPE (0730 UTC prediction (Figs. 5 and 6).
on July 15, 2016 and 1930 UTC on June 15, 2016) between
the GPM IMERG product and the prediction using the RF
B. Validations of Rainfall Over Land and Ocean
classification (Scenario-C-1) and regression (Scenario-R-2)
models. The results show a consistent spatial pattern or good In this section, we will show the validation results of
correlation of QPEs between GPM IMERG product and the predicted QPE values over land and ocean, respectively.
prediction. A large proportion of heavy rainfall areas in the Table VII shows the statistical mean scores of nonprecipita-
full disk of H08/AHI observation can be well captured by tion/precipitation discrimination algorithm (Scenario-C-1) and
the two optimal RF models. However, the extremely heavy the mean MAE and RMSE of retrieved QPE (Scenario-R-2)
rainfall (>20 mm/h) areas are unable to be predicted very over land and ocean, respectively. A POD of 0.59 is found
well. The RF regression model can only predict the rough over the ocean, much higher than over land. By comparison,
location of extremely heavy rainfall area but cannot quanti- a FAR of 0.33 over the ocean is much lower than over land.
tatively and accurately predict the rain rate. To be specific, The relatively higher mean scores of CSI, HSS, and HR over
the RF regression model tends to significantly underestimate the ocean are likely to be attributed to the homogeneous
the heavy rainfall, which is similar to the findings revealed in surface properties over the ocean. This finding indicates a
previous studies [26]. Note that the total prediction time for a better prediction for nonprecipitation/precipitation pixels over
H08/AHI full disk observation using two optimal RF models is the homogeneous ocean surface. On the contrary, the mean
about 4 min based on a 12-kernel computing in parallel, which MAE (0.52 mm/h) and RMSE (2.10 mm/h) of retrieved QPE
can basically meet the efficiency requirement of now-casting are relatively higher over ocean than land (0.44 and 1.72)
applications from satellite observations. shown in Table VII. The higher uncertainty in QPE is mainly
Fig. 7 shows the comparison results of QPEs between GPM contributed to more heavy rainfall events from GPM IMERG
IMERG and prediction model, where the color bar repre- data over the ocean, which can be easily found in Figs. 6 and 7.
sents the occurrence frequency in log scale with an interval These unpredictable heavy rainfall events inevitably increase
of 0.5 mm/h. Apparently, most of the samples concentrate the occurrence frequency of large errors in QPE over the
around the boxes of QPE with GPM IMERG < 3.0 mm/h ocean.
and predicted QPE < 0.5 mm/h, indicating a significant As shown in the scatter plot in Fig. 8, the QPEs retrieved
underestimated QPE. Except for the extremely heavy rainfall from GPM IMERG are validated against those from the RF
events (>20 mm/h), the algorithm tends to underestimate the prediction model over land and ocean, respectively. It is not
TABLE VII
S TATISTICS ON THE M EAN S CORES OF N ONPRECIPITATION /P RECIPITATION D ISCRIMINATION A LGORITHM AND
THE M EAN MAE AND RMSE OF QPE A LGORITHM OVER L AND AND O CEAN
a surprise that more rainfall samples concentrate around the satellite observation and NWP data can ensure successful
boxes >8.0 mm/h over the ocean and <3.0 mm/h over land. nowcasting applications. In addition, the better nonprecipita-
In general, the predicted QPE values show a better consistent tion/precipitation classification results can be found over ocean
result over land compared with over ocean, particularly the than land due to the homogenous surface. However, we also
cases with light rainfall. The accuracy of extremely heavy find the higher MAE and RMSE in the predicted QPE values
rainfall events (>20 mm/h) prediction is still low over both over the ocean, which is closely associated with the high
land and ocean. occurrence frequency of heavy rainfall event.
Better yet, we plan to use the GFS NWP data with a
V. C ONCLUSION spatial resolution of 0.25° × 0.25° in the future, which have
already been released real-time for a few months to the China
This paper aims to investigate and develop a unified (a con- Meteorological Administration since 2017. In addition, we are
sistent retrieval between day, night, and twilight belts in a full also looking forward to training a new prediction model
disk observation) QPE algorithm for nowcasting application in based on a whole year data to better support nowcasting
summer by combining real-time Himawari-8/AHI observation application of the new-generation GEO satellite data, such
data, cloud physical properties products, and GFS NWP data. as FY-4A and H08. Given the open source nature of such
The RFs ensemble classification and regression technique was ML framework as TensorFlow, keras, Pytorch, scikit-learn,
used here to implement near real-time precipitation prediction. these algorithms under a DL framework richly deserve further
This new algorithm is remarkably different from the traditional studies to investigate the possibility of predicting or estimating
and existing QPE for GEO satellite imager due to it is not rain rate from space.
using a conventional parametric approach but an RFs ML
algorithm. Its key advantage is the ability to capture nonlinear
ACKNOWLEDGMENT
association patterns between predictor and predictand, such as
precipitation. The authors would like to thank NASA, JMA, and NOAA
Compared with the existing ML approach [24] for QPE, the for freely providing the GPM IMERG, Himawari-8, and
spatiotemporally matched real-time NWP data are introduced GFS NWP data online. They would also like to thank the
as additional predictors in the regression model. It was note- Python and Scikit-Learn Groups for providing power computer
worthy that some high-rank parameters derived from NWP tools, and the anonymous reviewers for their thoughtful and
data directly indicate an important role of atmospheric back- constructive suggestions and comments.
ground information in the RF classification and regression
models. This new finding illustrates that the atmospheric envi- R EFERENCES
ronment field data are also valuable for nowcasting products [1] Y. Ding, Z. Wang, and Y. Sun, “Inter-decadal variation of the summer
such as QPE based on an advanced ML algorithm like RFs. As precipitation in East China and its association with decreasing Asian
mentioned before, the RF algorithm is able to better capture summer monsoon,” Int. J. Climatol., vol. 28, no. 9, pp. 1139–1161,
2008.
nonlinear patterns between predictors from NWP data and [2] J. Guo et al., “Declining frequency of summertime local-scale precip-
precipitation. itation over eastern China from 1970 to 2010 and its potential link to
In addition, a sample-balance technique was also used aerosols,” Geophys. Res. Lett., vol. 44, no. 11, pp. 5700–5708, 2017.
[3] M. Min, P. Wang, J. R. Campbell, X. Zong, and Y. Li, “Midlatitude cirrus
to significantly improve the RF classification and regression cloud radiative forcing over China,” J. Geophys. Res. Atmos., vol. 115,
models based on the original sample data sets. Some sensitivity no. D20, p. D20210, 2010.
studies were conducted to test the effects of sample proportion [4] R. Yu, T. Zhou, A. Xiong, Y. Zhu, and J. Li, “Diurnal variations
of summer precipitation over contiguous China,” Geophys. Res. Lett.,
and three important RF model parameters (number of tree, tree vol. 34, no. 1, pp. 223–234, 2007.
max depth, and number of random estimator) on the accuracy [5] M. Min, P. Wang, J. R. Campbell, X. Zong, and J. Xia, “Cirrus cloud
of RF prediction model. The results show the mean hit rate macrophysical and optical properties over North China from CALIOP
measurements,” Adv. Atmos. Sci., vol. 28, no. 3, pp. 653–664, 2011.
of nonprecipitation/precipitation classification is about 0.87, [6] J. Guo et al., “Delaying precipitation and lightning by air pollution over
and the predicted QPE MAE and RMSE are, respectively, the Pearl River Delta. Part I: Observational analyses,” J. Geophys. Res.
0.51 and 2.0 mm/h. In spite of the predicted results of Atmos., vol. 121, no. 11, pp. 6472–6488, 2016.
QPE show a significant underestimation (especially, it cannot [7] J. Guo et al., “Aerosol-induced changes in the vertical structure of
precipitation: A perspective of TRMM precipitation radar,” Atmos.
retrieve extremely heavy rainfall value), we could still find Chem. Phys., vol. 18, no. 18, pp. 13329–13343, 2018.
the high consistent patterns of precipitation area and intensity [8] G. J. Huffman, D. T. Bolvin, E. J. Nelkin, and D. B. Wolff, “The TRMM
with the GPM IMERG data. However, the highly efficient Multisatellite Precipitation Analysis (TMPA): Quasi-global, multiyear,
combined-sensor precipitation estimates at fine scales,” J. Hydrometeo-
and consistent patterns of QPE retrieved using near real-time rol., vol. 8, no. 1, pp. 38–55, Feb. 2007.
[9] M. Kühnlein, T. Appelhans, B. Thies, and T. Nauß, “Precipitation [34] T. J. Schmit et al., “The GOES-R Advanced Baseline Imager and the
estimates from MSG SEVIRI daytime, nighttime, and twilight data continuation of current sounder products,” J. Appl. Meteorol. Climatol.,
with random forests,” J. Appl. Meteorol. Climatol., vol. 53, no. 11, vol. 47, no. 10, pp. 2696–2711, 2008.
pp. 2457–2480, 2014. [35] M. Min et al., “Developing the science product algorithm testbed
[10] G. A. Vicente, R. A. Scofield, and W. P. Menzel, “The operational for Chinese next-generation geostationary meteorological satellites:
GOES infrared rainfall estimation technique,” Bull. Amer. Meteorol. FengYun-4 series,” J. Meteorol. Res., vol. 31, no. 4, pp. 708–719,
Soc., vol. 79, no. 9, pp. 1883–1898, 1998. 2017.
[11] R. Joyce, J. Janowiak, and G. Huffman, “Latitudinally and seasonally [36] J. Yang, Z. Zhang, C. Wei, F. Lu, and Q. Guo, “Introducing the new
dependent zenith-angle corrections for geostationary satellite IR bright- generation of Chinese geostationary weather satellites, Fengyun-4,” Bull.
ness temperatures,” J. Appl. Meteorol., vol. 40, no. 40, pp. 689–703, Amer. Meteorol. Soc., vol. 98, no. 8, pp. 1637–1658, Aug. 2017.
2001. [37] G. J. Huffman et al., “NASA Global Precipitation Measurement (GPM)
[12] A. Y. Hou et al., “The global precipitation measurement mission,” Bull. integrated multi-satellite retrievals for GPM (IMERG),” Algorithm
Amer. Meteorol. Soc., vol. 95, pp. 701–722, May 2014. Theoretical Basis Document (ATBD) Version 4.5, 2015, pp. 1–26.
[13] R. J. Kuligowski, “GOES-R Advanced Baseline Imager (ABI) algorithm [Online]. Available: https://pmm.nasa.gov/sites/default/files/document_
theoretical basis document for rainfall rate (QPE), version 2.0,” files/IMERG_ATBD_V4.5.pdf
Algorithm Theor. Basis Document (ATBD), Tech. Rep., 2010, pp. 1–44. [38] A. AghaKouchak, A. Mehran, H. Norouzi, and A. Behrangi, “System-
[Online]. Available: https://www.goes-r.gov/products/ATBDs/baseline/ atic and random error components in satellite precipitation data sets,”
Hydro_RRQPE_v2.0_no_color.pdf Geophys. Res. Lett., vol. 39, no. 9, p. L09406, 2012.
[14] B. Thies, T. Nauss, and J. Bendix, “Discriminating raining from non- [39] V. Bharti and C. Singh, “Evaluation of error in TRMM 3B42V7
raining cloud areas at mid-latitudes using meteosat second generation precipitation estimates over the Himalayan region,” J. Geophys. Res.,
SEVIRI night-time data,” Meteorol. Appl., vol. 15, no. 8, pp. 219–230, Atmos., vol. 120, no. 24, pp. 12458–12473, 2015.
2008. [40] T. J. Greenwald et al., “Real-time simulation of the GOES-R ABI
for user readiness and product evaluation,” Bull. Amer. Meteorol. Soc.,
[15] B. Thies, T. Nauß, and J. Bendix, “Precipitation process and rainfall
vol. 97, no. 2, pp. 245–261, 2016.
intensity differentiation using Meteosat Second Generation Spinning
[41] D. Chen et al., “The cloud top distribution and diurnal variation of
Enhanced Visible and Infrared Imager data,” J. Geophys. Res., Atmos.,
clouds over East Asia: Preliminary results from Advanced Himawari
vol. 113, no. D23, 2008.
Imager,” J. Geophys. Res., Atmos., vol. 123, no. 7, pp. 3724–3739,
[16] M. Min et al., “An investigation of the implications of lunar illumination Apr. 2018.
spectral changes for Day/Night Band-based cloud property retrieval due [42] S. D. Miller et al., “A sight for sore eyes: The return of true color
to lunar phase transition,” J. Geophys. Res., Atmos., vol. 122, no. 17, to geostationary satellites,” Bull. Amer. Meteorol. Soc., vol. 97, no. 10,
pp. 9233–9244, 2017. pp. 1803–1816, 2016.
[17] J. Li, J. Huang, K. Stamnes, T. Wang, Q. Lv, and H. Jin, “A global survey [43] A. K. Heidinger, A. T. Evan, M. J. Foster, and A. Walther, “A naive
of cloud overlap based on CALIPSO and CloudSat measurements,” Bayesian cloud-detection scheme derived from CALIPSO and applied
Atmos. Chem. Phys., vol. 15, no. 1, pp. 519–536, 2015. within PATMOS-x,” J. Appl. Meteorol. Climate, vol. 51, no. 6,
[18] J. Tan, W. A. Petersen, and A. Tokay, “A novel approach to identify pp. 1129–1144, 2012.
sources of errors in IMERG for GPM ground validation,” J. Hydrome- [44] J. Li et al., “The impact of atmospheric stability and wind shear on
teorol., vol. 17, no. 9, pp. 2477–2491, 2016. vertical cloud overlap over the Tibetan Plateau,” Atmos. Chem. Phys.,
[19] Q. Lu, W. Bell, P. Bauer, N. Bormann, and C. Peubey, “An evaluation vol. 18, no. 10, pp. 7329–7343, 2018.
of FY-3A satellite data for numerical weather prediction,” Quart. J. Roy. [45] M. Kanamitsu, “Description of the NMC global data assimilation and
Meteorol. Soc., vol. 137, no. 658, pp. 1298–1311, 2011. forecast system,” Weather Forecasting, vol. 4, no. 3, pp. 335–342, 1989.
[20] R. J. Kuligowski, “A self-calibrating real-time GOES rainfall algorithm [46] L. Breiman and A. Cutler. (2013). Random Forests-Classification
for short-term rainfall estimates,” J. Hydrometeorol., vol. 3, no. 2, Manual. [Online]. Available: http://www.stat.berkeley.edu/;breiman/
pp. 112–130, 2002. RandomForests/cc_home.htm
[21] G. Mountrakis, J. Im, and C. Ogole, “Support vector machines in remote [47] A. G. Laing and J. M. Fritsch, “The large-scale environments of
sensing: A review,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 3, the global populations of mesoscale convective complexes,” Monthly
pp. 247–259, 2011. Weather Rev., vol. 128, no. 8, pp. 2756–2776, 2000.
[22] P. E. Utgoff, “Incremental induction of decision trees,” Mach. Learn., [48] G. J. Zhang, “Roles of tropospheric and boundary layer forcing in the
vol. 4, no. 2, pp. 161–186, 1989. diurnal cycle of convection in the U.S. Southern great plains,” Geophys.
[23] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Res. Lett., vol. 30, no. 24, Dec. 2003.
2001. [49] J. Roman, R. Knuteson, S. Ackerman, and H. Revercomb, “Estimating
[24] J. Schmidhuber, “Deep learning in neural networks: An overview,” minimum detection times for satellite remote sensing of trends in mean
Neural Netw., vol. 61, pp. 85–117, Jan. 2015. and extreme precipitable water vapor,” J. Climate, vol. 29, no. 22,
[25] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 8211–8230, 2016.
pp. 436–444, May 2015. [50] Y. Liu, N. V. Chawla, M. P. Harper, E. Shriberg, and A. Stolcke,
[26] M. Kühnlein, T. Appelhans, B. Thies, and T. Nauss, “Improving the “A study in machine learning from imbalanced data for sentence
accuracy of rainfall rates from optical satellite sensors with machine boundary detection in speech,” Comput. Speech Lang., vol. 20, no. 4,
learning—A random forests-based approach applied to MSG SEVIRI,” pp. 468–494, 2006.
Remote Sens. Environ., vol. 141, pp. 129–143, Feb. 2014. [51] M. Min, Y. Zhang, Z. Rong, and L. Dong, “A method for monitoring
the on-orbit performance of a satellite sensor infrared window band by
[27] D. I. F. Grimes, E. Coppola, M. Verdecchia, and G. Visconti, “A neural
using oceanic drifters,” Int. J. Remote Sens., vol. 35, no. 1, pp. 382–400,
network approach to real-time rainfall estimation for Africa using
2014.
satellite data,” J. Hydrometeorol., vol. 4, no. 6, pp. 1119–1133, 2003.
[52] T. J. Schmit, J. Li, S. A. Ackerman, and J. J. Gurka, “High-spectral-
[28] M. Pal, “Random forest classifier for remote sensing classification,” Int. and high-temporal-resolution infrared measurements from geostationary
J. Remote Sens., vol. 26, no. 1, pp. 217–222, 2007. orbit,” J. Atmos. Ocean. Technol., vol. 26, no. 11, pp. 2273–2292,
[29] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. 2009.
Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011. [53] Y. Ai, J. Li, W. Shi, T. J. Schmit, C. Cao, and W. Li, “Deep convective
[30] J. Bergstra et al., “Theano: A CPU and GPU math compiler in Python,” cloud characterizations from both broadband imager and hyperspectral
in Proc. Python Sci. Comput. Conf. (SciPy), 2010, pp. 1–7. infrared sounder measurements,” J. Geophys. Res., Atmos., vol. 122,
[31] M. Abadi et al., “TensorFlow: A system for large-scale machine no. 3, pp. 1700–1712, 2017.
learning,” presented at the 12th USENIX Symp. Oper. Syst. Design [54] J. Mecikalski, K. Bedka, S. Paech, and L. Litten, “A statistical
Implement., 2016. evaluation of GOES cloud-top properties for nowcasting convective
[32] J. Schmetz et al., “An introduction to Meteosat Second Generation initiation,” Monthly Weather Rev., vol. 136, no. 12, pp. 4899–4914,
(MSG),” Bull. Amer. Meteorol. Soc., vol. 83, no. 7, pp. 977–992, 2002. 2008.
[33] T. J. Schmit, M. M. Gunshor, W. P. Menzel, J. Li, and A. S. Bachmeier, [55] M. Grecu and W. F. Krajewski, “A large-sample investigation of
“Introducing the next-generation Advanced Baseline Imager on statistical procedures for radar-based short-term quantitative precip-
GOES-R,” Bull. Amer. Meteorol. Soc., vol. 86, no. 8, pp. 1079–1096, itation forecasting,” J. Hydrol., vol. 239, nos. 1–4, pp. 69–84,
2005. Dec. 2000.
Min Min received the B.S. degree in applied Fu Wang received the B.S., M.S., and Ph.D. degrees
meteorology from the Nanjing University of Infor- in electronic technology from the University of Elec-
mation Science and Technology, Nanjing, China, tronic Science and Technology of China, Chengdu,
in 2005, and the Ph.D. degree in atmospheric physics China, in 2008, 2011, and 2015, respectively.
and environment from the Institute of Atmospheric He is currently an Assistant Professor with
Physics, Chinese Academy of Sciences, Beijing, the National Satellite and Meteorological Cen-
China, in 2010. ter, China Meteorological Administration, Beijing,
From 2013 to 2014, he was a Visiting Research China. His research interests include cloud algo-
Assistant with the Department of Physics, University rithms of FengYun satellite sensor and cloud-aerosol
of Maryland, Baltimore, MD, USA. He is currently interaction.
an Associate Professor with the National Satellite
and Meteorological Center, China Meteorological Administration, Beijing. His
research interests include cloud and weather science algorithms of satellite
remote sensing, atmospheric radiative transfer, and calibration of FengYun
satellite sensor.
Hui Xu received the B.S. degree from Central
South University, Changsha, China, in 2010, and the
Ph.D. degree from the Institute of Remote Sens-
ing Applications, Chinese Academy of Sciences,
Chen Bai received the B.S. degree in optical engi- Beijing, China, in 2015.
neering from the School of Optoelectronics, Beijing She is currently an Assistant Professor with
Institute of Technology, Beijing, China, in 2017, the Chinese Academy of Meteorological Sciences,
where he is currently pursuing the master’s degree China Meteorological Administration, Beijing. Her
with the School of Optoelectronics. research interests include cloud radiative forcing and
aerosol radiative forcing.
Shihao Tang received the M.S. degree in applied

meteorology from the Nanjing University of Infor-
mation Science and Technology, Nanjing, China,
Jianping Guo received the Ph.D. degree in cartogra- in 1996, and the Ph.D. degree in remote sensing
phy and geographical information system from the sciences from Beijing Normal University, Beijing,
Institute of Remote Sensing Applications, Chinese China, in 2001.
Academy of Sciences, Beijing, China, in 2007. He is currently a Professor with the National
He is currently a Professor with the Chinese Acad- Satellite and Meteorological Center, China Meteoro-
emy of Meteorological Sciences, Beijing, where he logical Administration, Beijing. His research inter-
leads a team working on aerosol-cloud-precipitation ests include land and ecology science algorithms of
interaction. FengYun satellite sensor.
Fenglin Sun received the B.S. degree in commu- Bo Li received the B.S. degree in atmospheric
nication engineering from the Ocean University of science from Nanjing University, Nanjing, China,
China, Qingdao, China, in 2009, and the Ph.D. in 2006, and the Ph.D. degree in meteorology from
degree in magnetic and microwave technology from the Institute of Atmospheric Physics, Chinese Acad-
the National Space Science Center, Chinese Acad- emy of Sciences, Beijing, China, in 2011.
emy of Sciences, Beijing, China, in 2014. She is currently an Associate Professor with
Since 2014, he has been an Assistant Professor the National Satellite and Meteorological Center,
with the National Satellite and Meteorological Cen- China Meteorological Administration, Beijing. Her
ter, China Meteorological Administration, Beijing. research interests include cloud phase algorithms of
His research interests include convection and pre- FengYun satellite and atmospheric circulation.
cipitation algorithms of geostationary satellite, mode
recognition, and microwave calibration of FengYun satellite sensor.
Chao Liu received the B.S. degree in physics

from Tongji University, Shanghai, China, in 2009, Di Di received the B.S. degree in atmospheric
and the Ph.D. degree in atmospheric science from science from the Nanjing Meteorological Institute,
Texas A&M University, College Station, TX, USA, Nanjing, China, in 2013, and the M.S. degree in
in 2013. atmospheric science from the Chinese Academy of
He is currently a Professor with the School of Meteorological Sciences, Beijing, China, in 2016.
Atmospheric Physics, Nanjing University of Infor- She is currently pursuing the Ph.D. degree with the
mation Science and Technology, Nanjing, China. Chinese Academy of Sciences, Beijing.
His research interests include light scattering mod- Her research interests include high-spectral
eling of atmospheric particles, optical properties of atmospheric radiative transfer and satellite data
aerosols and ice clouds, and radiative transfer and assimilation.
remote sensing of clouds.
Lixin Dong received the B.S. degree in applied Jun Li received the B.S. degree in mathematics
meteorology from the Nanjing Meteorological Insti- from Peking University, Beijing, China, in 1987, and
tute, Nanjing, China, in 1995, and the Ph.D. degree the M.S. and Ph.D. degrees in atmospheric science
from the Institute of Remote Sensing Applications from the Chinese Academy of Sciences, Beijing,
of Chinese Academy of Sciences, Beijing, China, in 1990 and 1996, respectively.
in 2008. Since 1997, he has been with the Cooperative
He is currently an Associate Professor with the Institute for Meteorological Satellite Studies, Space
National Satellite and Meteorological Center, China Science and Engineering Center (SSEC), University
Meteorological Administration, Beijing, where he of Wisconsin–Madison, Madison, WI, USA, where
is involved in land surface temperature and soil he was involved in advanced imager/sounder data
moisture inversion algorithms of FengYun satellite processing and applications, especially on the syn-
image. ergistic use of high-spatial-resolution imager data and high-spectral- resolution
infrared sounder data for deriving atmospheric temperature and moisture
profiles, cloud and aerosol/dust properties, as well as surface properties.
He was involved in the methodologies for improving the assimilation of
hyperspectral infrared sounder measurements for tropical cyclone forecasts
in regional numerical weather prediction models, and also in the development
and application of a retrieval methodology for processing the Advanced
TOVS data from the NOAA- 15/16/17/18/19 polar-orbiting satellites and
the geostationary sounder data from GOES-8/9/10/11/12/13/14/15 satellites
to obtain the real-time atmospheric soundings and derived products.
He is currently the Principal Investigator with SSEC, University of
Wisconsin–Madison, where he is involved in several GOES/POES-related
projects, including International ATOVS Processing Package, GOES-R trade
studies, GOES-R legacy profile product development, GOES-R high-impact
weather studies, JPSS application for tropical cyclone forecasts, and regional
OSSE for future sounding measurement systems.

Estimating Summertime Precipitation From Himawari-8 and Global Forecast System Based On Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimating Summertime Precipitation From Himawari-8 and Global Forecast System Based On Machine Learning

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO.

5, MAY 2019 2557

Estimating Summertime Precipitation from

Fig. 1. Two-step RF ML strategy and flowchart for QPEs.

TABLE II TABLE III

in the forest [9]. Also, the balance in the sample number of

Fig. 2. Effect of total number of trees in the forest (n_estimators),

and max_features = 7 has been selected as our optimal RF

Fig. 3. Same as Fig. 2, but for quantitative precipitation RF regression models

event very well. This is probably attributed to that the training

cannot be ignored in near real-time QPE. In the meantime, TABLE VI

RF classification models (n_estimators = 300, max_depth =

Shihao Tang received the M.S. degree in applied

Chao Liu received the B.S. degree in physics

You might also like