You are on page 1of 16

Data Driven Growing in Practice

Whitepaper

B
1 www.letsgrow.com
Introduction

Data is a hot topic in the horticultural sector. More and more companies are implementing data-based
solutions. A lot of new research and development is based on data and therefore also vacancies for
data analysts in horticulture start to pop up. However, why has data become so essential? Is data
analysis really that profitable? Which data should be collected? How important is data quality?
How to start with data analysis? Which tools must be used?
For answering these questions, understanding the fundamentals of data analysis is crucial.
In this whitepaper, some important requirements are discussed to get started with data-driven growing
in practice.

Glossary

1. Data cleaning: the process of detecting and correcting inaccurate values from a dataset
2. Descriptive statistics: values that summarize a dataset
3. Information: analysed data in a specific context
4. Linear interpolation: constructing new data points by assuming that there is a straight line between
known data points
5. Metadata: data that describes other data
6. Raw data: any data that is not processed either manually or by computer software
7. SMART criteria: goals which are Specific, Measurable, Achievable, Relevant and Time-bound
8. Typo: a small mistake in a text or number made when it was entered
9. Heatmap: a visualization to show spatial differences in a greenhouse such as air temperatur

2
Data collection

To get started with data-driven growing, data needs to be Where is this data coming from and how to store all this
available. Therefore, the first step is the collection of data. data?

Sensor-generated data

A modern greenhouse is equipped with a computer


system for climate control. The climate computer controls
the installations and equipment with the use of sensors.
Besides the sensors needed by the climate computer for
climate control, there can be additional sensors installed
in a greenhouse. For instance, a network of wireless
sensors for measuring variation in greenhouse tempera-
ture or a thermographic camera for recording differences
in plant temperature.

Sensors collect data that can give useful insights into the Example of sensor-generated data: a thermographic photo which shows the
differences in plant temperature
growth strategy. To find the optimal strategy data analysis
can be part of the solution. Sensor-generated data should
be of high-quality to be used for data analysis. Using high-quality sensor-generated data will
For example, when making the comparison between prevent drawing false conclusions at the end of an
greenhouse conditions from this year with last year, regu- analysis.
lar calibration of the sensors is required. Another aspect
is the absolute accuracy of a sensor.

When comparing data from two different types of air tem-


perature sensors, it is important to check if the accuracy
is within a similar range. Otherwise, a correction is nee-
ded, or the focus must be on trends instead of absolute
values.

Crop registrations

To place the sensor generated data in the right context,


additional data has to be gathered from the crop. Crop
registration is usually done manually and once per week.
Data collection concerning crop registrations can result in
gaining further insights into crop growth and develop-
ment. With data from crop registrations, the effect of the
short- and long-term changes of climate conditions are
made visible, and the right conclusions can be drawn.
Data entry app for crop registrations
Moreover, if the climate strategy was changed due to a
crop observation, a question should be raised about the Consistently measuring plant characteristics at regular in-
effectiveness of this adjustment. tervals is the basis of a high-quality dataset.

To be able to compare the observations from the crop Typos are also a common source of errors, for example
with the climate registrations, the measurements of the when copying the data from a handwritten notebook to a
crop should be done close to the aspirator box and oth- digital spreadsheet. Therefore, filling in the registrations
er relevant sensors. Just like the sensor-generated data, with an app is less error sensitive. Another advantage is
the quality of crop registration is critical. In contradiction that a data entry app can be used for several purposes. It
to automated data collection, the manual data collection is possible to scout pests and diseases, but also to take
(crop registration) is subject to the ‘human error’. photos of pests or other remarkable crops observations.
To reduce the risk of errors, it is important to be very pun
tual with crop registration.

3 www.letsgrow.com
Metadata

During data collection, it is also important to focus on misconceptions about the context of the data. Creating
the metadata. This is data that provide information a high-quality dataset takes time and commitment but is
about other data. For example, having a column in the indispensable for a correct analysis. After all, “garbage in
spreadsheet with the name “temperature” raises many is garbage out!”
questions. Is it the greenhouse temperature or the outside
temperature? What is the unit, Celsius or Fahrenheit?
How and where is it measured? Having this information
prevents having

Storage and Privacy

The rate whereby new data is created is still increasing. It a local computer or on an external disk. All these mediums
is a fact that 90% of the total data storage in the world has have their storage location and format; think about the
been collected within the past two years! Also in advanced different date and time formats or separators. Because
greenhouses, more and more data is collected. For a of this incompatibility, the use of data from different
climate computer in an advanced greenhouse, roughly sources is often problematic. The solution is a central
7.500 data samples are collected per hectare per day, and data platform. By using one platform with one standard
this number is increasing rapidly. Just think about data format for data storage, the risk of errors is tremendously
generation by new wired and wireless sensors, (thermal) reduced which makes data analysis much more efficient.
cameras, packing machines and harvesting robots. Using one platform for all data resources makes it easier
However, how do we store this huge amount of data? to manage all the data at once.
There are different storage mediums available.

The greenhouse temperature measurement from


an aspirator box is locally stored on the hard drive of the
climate computer. Data from wireless sensors can be
stored in the cloud on multiple servers. Crop registrations
can be written down in a notebook or filled into a
spreadsheet stored on

The role of a central data platform

When the amount of data increases, storage becomes this data needs to be accessible 24/7 by multiple users
more and more complex. There are several key factors worldwide. It requires specific knowledg to set up and
regarding data storage such as capacity, performance, maintain a data storage environment which meets these
accessibility, and security. Significant amounts of data requirements.
have to be stored, and

4
Besides the technical challenges about data storage, also The producer of the sensor, or the owner of the platform
an ethical aspect such as privacy is significant. In this on which the data is stored, could claim that they are also
situation, the discussion is not about the privacy of search owners of the data and that they are allowed to use it for
history on Google, Facebook or YouTube but actually other purposes. Therefore, you should use a transparent
about the business strategy of a company. Important platform that clearly states that the client is the owner of
subjects to the debate about the privacy of data are for his data. The client should be the only one to decide who
example ownership, users and sharing. In this discussion, has access to the data and with whom the data is shared.
the user is not always automatically the owner of the data.

Data cleaning

As discussed, sensor calibration and consistent crop First, you have to get an idea of what the data looks like. A
registrations are essential for good data quality. Before good start is to calculate some descriptive statistics such
the dataset can be used for analysis, the data must be as mean, minimum and maximum values. These statistics
cleaned. The idea of data cleaning is simple; it is the can be compared with prior knowledge about the dataset.
process of making raw data “clean”. Even a calibrated Therefore, it is crucial that the person responsible for data
sensor can break down and cause errors in the dataset analysis has some background knowledge.
that should not be taken into account when calculating
descriptive statistics such as the mean value. Therefore For example, if the maximum value for greenhouse relative
data cleaning is required to improve the quality of the humidity in the dataset is 10 then probably some column
dataset. names are mixed up, like humidity deficit with relative
humidity. These statistics are helpful to detect possible
Data cleaning can be done with all kind of tools, scripts errors. A second approach for data cleaning is to visualize
or algorithms. Before doing this, it is important to think the data. A simple graph with date and time on the X-axis
about which data is needed for the analysis. Keeping the gives a lot of information. Does the hourly, daily or even
dataset as small as possible can save much time. When yearly fluctuations match with your hypothesis? Is the
all the required data is selected, it is time to focus on the outside temperature higher during the summer period? By
actual data cleaning. answering these simple questions, errors can
be detected in the dataset.

5 www.letsgrow.com
Example of a graph showing incorrect data

After creating a clear view of the observed errors in the However, if the gap with missing values is large, let’s say a
dataset, it is time to find a solution. There are numerous few hours, it might be better to remove (not use) the rows
possible dataset quality problems, but most of them con- with missing values.
cern incorrect and incomplete data. Like the example in
the graph above where there is an outlier in daily radiation Even if this leads to a gap in the time series, which again
sum because the sum was calculated for two days. can make the dataset useless for analysis. When compa-
Different techniques can be used to deal with these pro- ring, for example, the relationship
blems in order to make the dataset as clean as possible. between greenhouse temperature and daily radiation sum,
it might be better even to remove days with incorrect va-
On the one hand, replacing incorrect or missing values lues. Also think about the days during which the old crop
with new values is a good way to go. When replacing an has been removed and the new crop has been planted.
incorrect or missing value, it is always the question of how
certain and representative the new value is. If there is a There is no clear guidebook for data cleaning. Every da-
gap in greenhouse temperature of five minutes (5min), taset has its issues, and the possible solutions depend
you can safely use the average of the 5min value before mainly on the goal of the analysis. Eventually, the data
and 5 min ahead (linear interpolation). analysis should provide correct and reliable conclusions.

6
Data analysis

To convert a good quality dataset into results the data What happened? Why did it happen? What will happen?
analysis is an important step. However, data analysis on What is the best that could happen? Finding answers on
itself is a broad concept. First of all, it is essential to define the last question requires a more complex analysis than
a clear goal and objectives to achieve good results. It is finding answers on the first question, but the business
recommended to test the defined goals and objectives on value of those answers is also more significant. The
the SMART criteria. This prevents vague goals such as results of each data analysis can be different, depending
“optimizing growth” or “saving energy”. Creating goals on the goal and type of analysis. The outcome can be a
and objectives prevents getting stuck in data analysis. dashboard with graphs to show the results, a report with
statistical analysis or even a predictive machine learning
In general, there are four types of data analysis: descriptive, model.This whitepaper focusses on sense and response;
diagnostic, predictive and prescriptive analysis. Each type what happened and why did it happen?
is related to one of the following questions:

Different types of data analysis

Graphs

Visualization is an import aspect of descriptive analytics. A good visualization can result in an excellent data
Through visuals such as graphs and diagrams, it is interpretation without any additional information. This
possible to share and present the results of an event implies meaningful axis names, title name, and explanation
transparently. of different colours or line/ dot types (legend).

There are different types of visualizations: line chart, In most cases, a simple line chart in a spreadsheet is
bar chart, histogram, scatterplot, etc. which all can be informative enough, for example, to show the yearly
very useful but not in every situation. The appropriate variation in radiation sum. In the case of large datasets,
visualization can be selected based on data type, the which contains huge amounts of information, data
number of variables, and if the dataset is chronologically visualization becomes an art project on itself.
ordered or not.

7 www.letsgrow.com
To make data analysis more clear, let’s take a look at the graph is a good start for visualizing what happened.
following example. A grower observes that the production The difference in production could be related to greenhouse
at location A is always higher than at location B. Now the climate. For example, plotting greenhouse temperatures
grower wants to know why the production was higher at for a month can show that there is a difference between
location A by using data analysis. In this situation, a line these two locations.

A line graph with the greenhouse temperature of location A (red) and location B (blue)

However, it can be challenging to search for trends from For example, the tool which groups data per hour. With
the line graph. For example, on some days the greenhouse this method, it becomes clear at what time of the day
temperature can be higher at location A and vice versa. the average temperature at location B deviates from the
Specific tools havebeen developed to extract trends from average temperature at location A.
a line graph.

The greenhouse temperature of two locations grouped per hour. In month May the temperature of location A (red) was on
average lower during the night and warmer during the day.

When looking at a line chart, it can be hard to detect two locations is quite similar, but there can be a difference
deviations, especially when having many measurements. between single measurements. Combining all this data in
It is possible that the average greenhouse temperature at one line graph is a so-called “spaghetti graph”.

A “spaghetti graph” with a lot of single measurements

8
Therefore, it is possible to display only data that deviate moments which requires extra attention. With this tool,
from the average of all measurements in a graph. A set it is possible to visualize directly when the temperature
margin can be visualised as a shaded area in the graph. of a single measurement deviates from the average
Only lines outside the margin are visible, and these are greenhouse temperature including the set margin.

The shaded area shows the average (including a margin) of lines in the graph above. The lines outside the
shaded area represent deviations

Although graphs can be helpful by directly visualizing


small will over emphasize minor differences. Furthermore,
essential information, they could also be misleading.
adding too much information into a graph makes it hard to
Especially, when wrongly scaling the y-axis. A scale that
draw the right conclusions. “A diagram must tell a single
is too large will mask differences, while a scale that is too
story and not multiple stories at once”.

Dashboards

To observe “what happened”, making graphs is essential. A dashboard can be useful to provide answers to several
For the next step, “why did it happen”, it is vital to combine questions with a single glance. For example, it can
data from different sources. Therefore, a dashboard is answer why there is a difference in yield between two
handy. Using a dashboard makes it possible to connect greenhouse locations. There can be multiple answers
data from crop registrations with sensorgenerated data. to this question, but a dashboard makes it much easier
It is also possible to combine different types of to find those answers. A logical explanation can be that
visualizations like gauges, graphs, and a heatmap. A both greenhouses were using a different climate strategy.
heatmap visualises the temperature distribution on a map Typically, the climate strategy implies multiple interrelated
of the greenhouse based on data of multiple (wireless) factors. Therefore it is essential to monitor the climate
temperature and RH sensors. A heatmap can also be used in such a way that this strategy becomes visible. This is
for visualising the developmen of pests and diseases in explained in the next section.
the greenhouse.

Example of a dashboard showing different types of visualizations

9 www.letsgrow.com
Climate Monitor

To achieve high productivity in combination with At each radiation level, the other factors should be within
efficient use of resources, adjusting the optimal settings specific limit values to form the best combination of
for controlling the abiotic factors in the greenhouse conditions for photosynthesis and growth. This ensures
environment such as temperature, humidity and CO2 is that the plant remains in balance, with optimum production
essential. This is also called the climate strategy. However, and quality. For this reason, the basic concept of the
deviations often occur in practice. Such deviations cause Climate Monitor is focussing on the right mix of growth
a less ideal greenhouse climate and are not beneficial for factors in the greenhouse. Its visualisation provides quick
the crop growth and development. However, how to track and accessible insight into the quality of the greenhouse
these deviations? Moreover, how to assess the quality of climate. It is possible to see at which moments, which limit
the realised climate compared to the climate strategy? values are exceeded and therefore, where improvements
The crop experiences the growth climate as a combination in climate control can be achieved.
of light, temperature, humidity, and CO2.

Climate Monitor graph, showing the degree of balance of the growing climate. If the green vertical line reaches 100%, all growth factors are within
the specified limits, so the climate is in balance.

Plant Balance RTR

Besides monitoring the quality of the greenhouse climate, Ratio (RTR). With Plant Balance RTR, it is possible to
it is essential to get insights into the assimilate balance visualise and monitor the assimilates balance and to see
of the plant. It is known that the production of assimilates when and why the realized RTR deviates from the desired
depends mainly on the total light sum per day, and that strategy. The ultimate goal is to have a balanced plant
the plant growth rate largely depends on the average 24- because this results in optimal production and quality and
hour temperature. The development of the plant, therefore, increases plant health and resilience.
is determined mainly by the Radiation-Temperature-

Visualization of the Radiation-Temperature-Ratio (RTR)

10
Statistics

To observe ‘what’ and ‘why did it happen’ most of the size will reduce the power of a statistical test which
time it is enough to use graphs and dashboards in data makes it hard to detect differences, especially when the
analysis. However, if important decisions are based on differences are small. Therefore, it is important to be aware
the observed differences, then statistics should be used of the effect of the sample size on the accuracy of your
to support decision making. crop registration values. Statistics are useful in finding
the right answers to questions such as “why something
Regarding statistics, it is essential to be aware of some happened”. However, working with statistics can be tricky.
basic concepts. One is the relationship between the effect Therefore, it is vital to be careful, for example regarding
size, sample size, and significance criterion. A large effect the difference between correlation and causation. Making
can be measured with a small sample size and vice versa. a scatterplot of two variables may show that these
Significance criterion is the threshold when experiments variables are strongly correlated. However, correlation
differ significantly from each other. does not imply causation.

By knowing all three components, the statistical power A well-known example is the high correlation between
of a test can be calculated. The statistical power is the the sales of ice creams and sunglasses. This does not
likelihood that a test detects an effect when there is imply, however, that ice creams cause a higher sales of
an effect to be detected. In a greenhouse with a large sunglasses. The
population of plants, usually, only a small sample of the underlying cause, in this case, is the amount of sunshine.
population is measured for crop registrations. Understanding this difference is vital for drawing any
conclusions.
It just takes too much time to assess, for instance, the
fruit set on every plant. As explained; lowering the sample

Conclusion

Data analysis in horticulture is more than just making growing requires knowledge of data analysis. As a grower
graphs in a spreadsheet. With data analysis, it all starts or consultant, it is essential to get in contact with the right
with the process of data collection, storage, and cleaning. parties that can help to make the transition to data-driven
growing.
A high-quality dataset creates many possibilities for data
analysis, like answering questions such as what happened Important selection criteria are having a central data
and why did it happen? Visualising data with data analysis platform, knowledge of plants, practical experience in
can be essential in finding the correct answers. When horticulture, available tools and transparency regarding
combining data from multiple sources, tools such as a privacy and data ownership. Only by using the right
dashboard, the climate monitor or Plant Balance RTR are platform it is possible to implement data-driven growing
essential to present results in a clear way. Data-driven in practice fully.

11 www.letsgrow.com
Epilogue

Hoogendoorn Growth Management creates sustainable computers. Our clients have 24/7 access to realtime data
and user-friendly automation solutions for every kind of all around the world. The fact that there are already 1000
horticultural business worldwide. It provides growers with greenhouse growers with a subscription to LetsGrow.
a complete solution to efficiently manage the greenhouse com is a testament to this. This includes both individual
climate, irrigation and energy usage, regardless of the companies as well as major growers associations. Both
greenhouse structure, location and equipment. Hoogendoorn and LetsGrow offer the possibility to collect
and to analyze cultivation related data.
The intelligent controls enable growing highquality crops
and achieve maximum crop yields with minimum use of Via an online platform, you can analyse data regarding
scarce resources such as water, energy and nutrients. e.g. greenhouse climate, crop, labour and energy
Growth, continuity and innovation are the focus. LetsGrow. consumptions. Graphs and dashboards can easily be
com has been operational since 2002. Together with created to gain insight into the cultivation data. Also,
Wageningen University Horticultural Research, we have calculations upon real-time data can be performed to
developed models for forecasting crop yields. deliver even more insight. This way, data is turned into
meaningful information.
However, the services of LetsGrow.com span horticulture
in its entirety: we support all popular brands of climate

12
13 www.letsgrow.com
14
Westlandseweg 190 Westlandseweg 190
3131HX Vlaardingen 3131HX Vlaardingen
Nederland Nederland

T +31 10 460 81 08 T +31 10 460 80 80


Info@letsgrow.com Info@hoogendoorn.nl

A www.letsgrow.com

You might also like