Professional Documents
Culture Documents
Version 2020.11
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
Table of Contents
01 03
Why choosing the right solar dataset is important? MASTER: 6 key properties to consider when choosing
Not all solar datasets are equal a reliable solar dataset
What are the most common problems?
Made for Solar
Available
02
Spatial resolution
Current approaches for selecting datasets
Time resolution
Weighted average vs median approach
Extensive validation
Real hourly vs synthetic hourly data
Representative in time
Why is Typical Meteorological Year (TMY) not sufficient?
04
Applying the selection criteria — and a bonus tip!
Comparison of 7 most popular datasets
Who are the winners?
© 2020 Solargis 2
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
01
© 2020 Solargis 3
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
Some data providers even offer data from multiple model versions 3. Lower productivity: Developers, technical advisors, and EPCs
at the same time (for example v1.1, v1.2, v2, etc.), further adding exchange endless emails and phone calls to agree on the
to the confusion. choice of dataset. The process of downloading data from
multiple websites and comparing them in spreadsheets takes
Such an extensive choice of datasets creates several problems: hours for project engineers, wasting valuable time.
© 2020 Solargis 4
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
02
There is not yet a standardised approach for identifying the best Whilst, at first glance, these approaches might seem more reliable
solar and meteorological dataset. Two approaches commonly used than relying on a single data source, both have limitations and their
by technical advisors to justify estimates of solar resource are: use is strongly discouraged.
1. The weighted average approach, which relies on taking The weighted mean approach requires generation of synthetic hourly
a weighted mean of monthly averages of solar resource data from monthly averages. Using synthetically generated typical
and air temperature data from multiple sources. Weights year dataset creates multiple limitations for evaluation of a solar
to different datasets are assigned based on parameters project, as summarised in table below. This approach is subjective
such as spatial resolution and temporal coverage. and results in a dataset that can always be disputed by stakeholders
The main argument for use of this method is that the in the project evaluation process. Use of this approach reduces both
weighted mean should give a resource estimate that has transparency and efficiency in the evaluation of solar projects.
lower uncertainty than that of a single dataset.
Real Synthetic
hourly data hourly data
2. The median approach, where long-term monthly and
Minimum uncertainty
yearly values of multiple data sources are compared.
Interannual variability of solar resource and expected generation
If any data source shows inconsistency relative to other Optimizing of PV system design
datasets, it is identified as an outlier and is removed Accurate revenue planning and self-consumption analysis
from further analysis. From the remaining datasets, the Validate the models and adapt them to the local conditions
using high quality ground measurements
dataset with the median GHI value is selected as input Calculate consistent metrics along entire project lifetime
for yield estimation.
© 2020 Solargis 5
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
It is also worth noting that the best models may not be selected
via this median approach, as they are able to capture what other
models cannot. As an example, an analysis of GHI estimates from
multiple models for the Salt Lakes in South Australia show that
Solargis is the outlier. However, Solargis is the only data source that
is able to distinguish between clouds and high reflectivity surfaces.
By employing the median approach, a suboptimal dataset might be
Image credit: J.K. Copper and A.G. Bruce, UNSW (2019)
selected in this case.
© 2020 Solargis 6
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
03
© 2020 Solargis 7
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
The dataset must specifically designed for simulation of Availability of data covering the recent period is a feature often
solar power. The older data sets, such as NREL TMY2 or overlooked, but it is vital. If the satellite-derived data used in the
TMY3, or Meteonorm lack internal integrity and geographical development stage is updated on a regular basis, it is possible to
representativeness. NREL TMY2 and TMY3 datasets are created compare on-site measurements to the modeled solar radiation
by selection of 12 representative months from a multi-year time estimates during the operation phase of the project.
series. However, when selecting representative months, equal
weightage is given to solar radiation and meteo parameters (air This feedback is useful for:
temperature and wind), even though solar radiation parameters • Re-evaluation of the long-term yield
have bigger influence on the performance of a solar power plant. • Achieving the correct asset valuation in case of sale
Such approach is optimal for building energy simulations but not or refinancing of the project
for solar energy applications. • Improving accuracy of long-term yield estimates
for future projects to be developed.
© 2020 Solargis 8
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
S Spatial resolution
High spatial resolution is now a necessity, as projects are Solar asset developers, owners and operators should also
increasingly developed in mountainous terrain or at the pay close attention to the spatial resolution of meteorological
intersection of the land with water (lakes and seas). parameters. Often, meteorological parameters such as air
temperature are derived from numerical weather models that
Some data providers offer access to a low-resolution version have a coarse spatial resolution, such as 25 km × 25 km. This
of their datasets, i.e 10 km × 10 km, and a higher resolution can result in higher uncertainty for yield estimates in regions
version, representing native resolution of satellite imagery. with variable terrain or in proximity to lakes and seas. Datasets
It should be noted that when using the lower resolution dataset, which include post-processed meteo inputs, for example
the uncertainty of estimates is higher than the advertised elevation corrected and de-biased air temperature data, should
uncertainty of the high-resolution dataset. be given preference when selecting a dataset for energy yield
simulations.
© 2020 Solargis 9
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
T Time resolution
© 2020 Solargis 10
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
E Extensive validation
The most accurate datasets undergo extensive validation across Demonstrating consistent results in validation of both GHI and
all geographical zones and climate regions, demonstrating DNI is critical, as it proves integrity of the solar models used.
low uncertainty. This is the single most important criteria for Only rigorous validation of air temperature qualifies the dataset
selection of the right dataset. (bias of 1-degree Celsius air temperature results in a systematic
error of approximately 0.5% in energy yield).
Many popular data sources only include typical year files with
artificially generated hourly values. These cannot be compared Besides bias and RMSE, consistency (representativeness) of
one-to-one with ground measurements, and hence cannot be modelled and measured values at sub-hourly time resolution,
validated. When choosing between a validated dataset where e.g. Kolmogorov Smirnoff index, must be taken into account.
the margin of error is quantified, and a dataset that cannot
be validated, it’s a clear choice. Applying this criterion alone
narrows down the long list of data sources significantly.
© 2020 Solargis 11
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
R Representative in time
The dataset must cover a long period, ideally 20+ years, and Length of GHI measurements 1 year 10 years 20 years
data must be computed in real time for availability up to the Typical interannual variability (STDEV) 5—7 % 2—4 % 2 %
(Remund and Müller 2010)
present time.
Uncertainty (80 % occurence) 6.4—9 % 2.6—5.1 % 2.6 %
© 2020 Solargis 12
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
04
To demonstrate how to apply the above data selection criteria, we have compared 7 datasets used widely by the solar industry in North America:
Validation, accuracy
Long period
Recent data
Spatial resolution
As shown by the above comparison, the choice of solar datasets But how should solar asset owners choose between these two
can be easily narrowed down to two options — Solargis and options if both data sources for a specific region are validated,
SolarAnywhere. show good accuracy, and meet the relevant criteria?
© 2020 Solargis 13
M A S T E R: H OW TO C H O O S E T H E R I G H T DATA S E T FO R E VA LUAT I O N O F S O L A R P R O J EC T S
Bonus tip
Choose the solution that delivers additional commercial insights and improves the productivity
of your team. This may mean a better API or inputs for more accurate simulation of new
technologies — such as bifacial or floating solar PV systems. Factors such as higher spatial
resolution of the data, high integrity of the meteo parameters, global coverage, integration with
a wide range of software tools, and professional support are also increasingly important for solar
developers.
It’s surprising that lower quality datasets developed in the 1990s are still being used by the solar
industry. These practices stifle innovation and competitiveness of the solar industry — ultimately
hindering growth. This eBook shares our best practice method — the MASTER approach — for
selecting the most reliable data for solar energy simulation. By mastering this process, we can
collectively improve profitability, transparency and productivity within the industry.
© 2020 Solargis 14
To learn how Solargis can help your business
improve your solar energy assessment,
contact us to schedule a demo.