You are on page 1of 57

1

Open Foris Calc


Version 2.0
2

OPEN FORIS CALC


Version 2.0,
March 2016

Contents
1. Installation ........................................................................................................................................................................ 4
2. Calc Home........................................................................................................................................................................ 9
3. Settings view .................................................................................................................................................................. 11
3.1. User interface ........................................................................................................................................................ 11
3.2. Create a new survey .............................................................................................................................................. 11
3.3. Sampling design .................................................................................................................................................... 14
3.3.1. Estimation Engine ............................................................................................................................................. 14
3.3.2. Setting up a sampling design strategy ............................................................................................................... 15
3.3.2.1. User Interface ................................................................................................................................................... 15
3.3.2.2. Reporting Unit (AOI).......................................................................................................................................... 16
3.3.2.3. Reporting Unit (AOI) with predefined stratum areas .......................................................................................... 17
3.3.2.4. Base Unit .......................................................................................................................................................... 18
3.3.2.5. Double Sampling............................................................................................................................................... 19
3.3.2.6. Two Stage Sampling with simple random sampling (SRS) ................................................................................ 22
3.3.2.7. Stratified............................................................................................................................................................ 23
3.3.2.8. Cluster .............................................................................................................................................................. 25
3.3.2.9. Reporting Unit (AOI) join ................................................................................................................................... 25
3.3.3. Test data, 1-phase sampling design ............................................................................................................. 26
3.3.4. Test data, 2-phase sampling design ............................................................................................................. 29
4. Data view........................................................................................................................................................................ 34
5. Working with R scripts .................................................................................................................................................... 39
5.1. Working process .................................................................................................................................................... 39
5.1. Creating a new calculation module in Calc ............................................................................................................. 40
5.2. Editing scripts in RStudio ....................................................................................................................................... 42
5.3. Base unit weight script ........................................................................................................................................... 45
5.4. Plot area scripts ..................................................................................................................................................... 46
5.5. Calculation scripts .................................................................................................................................................. 47
5.6. Calculation using external equations ...................................................................................................................... 49
5.7. Calculation using Categorical variable ................................................................................................................... 51
5.8. Executing scripts.................................................................................................................................................... 53
5.9. Export / Import workspace ..................................................................................................................................... 54
Annex 1. pgAdmin: get entities with result variables from Calc database into CSV .................................................................... 55
3

Preface

Open Foris Calc is robust, modular browser-based software for analysis and
reporting of results of sampling based natural resource assessments.
It allows expert users to write custom R modules to perform country/inventory-
specific calculations.
The scripts could be written and debugged using RStudio integrated development
environment (IDE) for R.

Calc works for several types of sample plot designs and sampling methods. The input
metadata and data comes from Open Foris Collect, and it provides a flexible way to
produce aggregated results for any defined area of Interest. It is possible to visualize
and analyze the aggregated results through open source software Saiku.

http://www.openforis.org

Manual compiled by Marco Piazza, Lauri Vesa, Paul Patterson, Mino Togna
4

1. INSTALLATION

CALC can be installed on your computer through a CALC installer. CALC operates in synergy with
other open source and free software which, if not already present, must be installed. The supporting
software are:
1. Java SE Development Kit (JDK 8). The JDK includes tools for running programs written in the
Java programming language on the Java platform.
2. PostgreSQL: an object-relational database management system; as a database server, its
primary function is to store data and retrieve it later, as requested by other software
applications.
3. R is a free software environment for statistical computing and graphics.
4. *Optional: RStudio IDE (Integrated Development Environment) is a powerful and productive
user interface for R.

The installation process can be carried out following these steps:

1. Download and install JDK 8.


http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
2. Download and install PostgreSQL 9.4 or newer [http://www.enterprisedb.com/products-services-
training/pgdownload ].

The graphical installer for PostgreSQL includes the PostgreSQL server, pgAdmin III; a
graphical tool for managing and developing your databases, and StackBuilder; a package
manager that can be used to download and install additional PostgreSQL applications and
drivers. However, you do not need StackBuilder with OF Calc.

Note: In the installation the required password should be set as ‘postgres’.

3. Download and install R 3.2 (or newer) [http://cran.r-project.org/ ].


3.1. Install the packages required by Calc :
install.packages('RPostgreSQL')
install.packages('sqldf')

4. *Optional: Download and install the latest version of Rstudio


[https://www.rstudio.com/products/rstudio/download/ ].
5. Download and install Open Foris Calc

 Open Foris Calc Installer for Windows


 Open Foris Calc Installer for Linux

Step number 4 will download the following executable file:

OpenForisCalc-{VERSION}-{PLATFORM}-installer.exe

Double click to launch the CALC set up Wizard


5

Calc setup Wizard

click Next

Accept license agreement and click Next


6

Specify the directory where Open Foris CALC will be installed and click Next.

Confirm password postgres (Or enter the postgres user password given during the installation) by
clicking Next.
7

CALC Set up Wizard is now ready to begin installation. Click Next

Wait while Set up installs Open Foris Calc on your computer

Installation is now complete.


8

After installation you will notice the Open Foris Calc icon on your desktop (for Windows users also in
the Start menu).

Double click to launch the Open Foris Calc Control Panel

Calc Control Panel


The java/Tomcat window will open

Note: the Tomcat windows needs to be kept open (minimized) and run in the background.

The Open Foris Calc – Control panel window will open

Note: the Control Panel will automatically launch Calc in your default browser. Calc can then be
stopped and started from the Control Panel. The Log button shows the content of the Tomcat
commands.

CALC will open in your default browser at this URL: http://127.0.0.1:8081/calc


9

2. CALC HOME

Open Foris Calc is a modular browser-based tool for data analysis and results calculation. It
allows expert users to write custom R modules to perform country/inventory-specific calculations
and Calc is able to host multiple surveys. Calc produces results for forest inventories with a
variety of sampling designs: single or double sampling, cluster, random, systematic and stratified
sampling. Read more about sampling design alternatives and handling data in chapter 3.3.

The input data and metadata come from Open Foris Collect and Calc provides a flexible way to
produce aggregated results for any defined area of interest. The hierarchical structure defined in
Collect is converted into a relational database: the entities are converted into tables and their
attributes into columns.

CALC has three main sections: Settings, Data, Calculation and Saiku.

 Calculation, the core of Calc, allows setting up calculation steps in order to analyze the
data set and produce results which will be managed and displayed in symbiosis
with Saiku.
 Data is the area of Calc where data can be displayed (in tabular or scatter plot format).
Raw data can be sorted, filtered and managed as needed, including upload and download
of data sub-sets.
 In Settings the user can upload and manage an existing survey as well as any other
information necessary such as areas of interest, sampling design specifications, models,
for producing meaningful calculations.
 Saiku. Aggregated results can be visualized and analyzed through open source software
Saiku.

There is a small icon in the lower right corner. You can check the version of Calc with the
help of this button. The following pop-up message will appear.
10

In order to familiarize with Calc and understanding the components and its full potential, a test
data-set is available for download
http://openforis.org/newwebsite/tools/calc/tutorials/calc-home.html

Test data-set
A test survey named Atlantis is available for download here:
 Click to download the test data-set Atlantis

The folder test-data contains two surveys, the first set up as a one-phase sampling, the
second with a double sampling design.
This tutorial is based on the double sampling example which is the one that requires more
setting up steps.
 Open the folder atlantis-2phases-sampling
 Download collect-backup-atlantis.zip which contains the data and metadata of
the survey created in Collect.
 Open the folder calc

The files contained in the folder calc allow to experiment with two different types of set-
up: automatic or manual.
Option 1. Automatic set-up: download the workspace atlantis-calc-workspace.zip into
Calc. This file already contains all the necessary information (files csv) for the forest
inventory case.
Option 2. Manual: download the following csv files
o calc-aois.csv: containing information on the “areas of interest”.
o calc-phase1-plots.csv: containing phase-1 information on the strata,
cluster identification number, plot and a code indicating the area of
interest.
o calc-strata.csv: containing information on the number of strata included
in the survey.
o calc-volume-models.csv: containing the volume equations that will be
used in the analysis.

In the following sections you will find test files boxes with explanations of the csv files and
their structure as well as indications on how to set up the csv files for your own survey.
11

3. SETTINGS VIEW

3.1. User interface


The Settings panel is composed of two sections: Inventory metadata and Workspace.

1. Inventory metadata contains Sampling design settings (including ‘Areas of Interest’


settings), definitions for categories used in the calculation modules, and a function to
import external equations into the calculation system.
2. Calc is able to host multiple surveys. Workspace allows creating a new survey, switching
from one survey to another, deleting a survey and managing those (Import-Export
functionalities).

3.2. Create a new survey

In order to use Calc for a new survey case, you should have the following files ready:
1) Backup of data from Collect (i.e. collect-data file),
2) Area of Interest (AOI) csv file (see the next chapter).

The following steps describe how the test data “atlantis” can be read into Calc.
The steps to create a new survey area as follows:
1) In Workspace section, click the first button on the left

2) To add a new survey workspace, click + button


12

3) Give the name for your survey. Use only lower case letters, and no spaces. Save the
survey.

4) Click Activate button

5) Import your Collect data, click Collect button

A dialogue window will appear as shown below. In case of any error these will appear in
the Log section.

6) Close the window and go back to the main Setting page by clicking on the left arrow in
the footer at the bottom of the screen.

Note: In test data case, the survey “atlantis” was successfully imported and its name is
shown in the command box indicating that this is the active workspace.
13

7) Go to Sampling Design page. Define your inventory type and upload the information
about the Areas of interest (see chapter Samplign Design).

8) Create and test calculation scripts (see chapter 5)

9) View results in Saiku.


14

3.3. Sampling design


3.3.1. Estimation Engine

In Open Foris Calc, the smallest unit of measured area is called base unit.
Let be the weight of the base unit. Let equal if the base unit is in class ,
and 0 otherwise.

Area Estimator
The Calc estimate of the area of the population in any class of interest is

The factor is called expansion factor and is explained below for the different sampling design
strategies which Calc can handle.
The weight of the base unit must be entered by the user in the calculation step ‘base-unit-script’.
The script is written into an R module.

Tree Attribute Estimator

For the th tree, let be the measurement of the tree level attribute and the area of the base
unit where the tree is measured.
Let be equal to if the th tree is in class , and 0 otherwise. Then the Calc estimate of the
total for the tree level attribute in any class of interest is

The Calc estimate of the mean for the tree level attribute in any class of interest is

The measurement is associated to one calculation step, and the user must define the formulas
in its corresponding R script.
The factor must be entered by the user in its relative R script that is named ‘$entityName-plot-
area.R’ (where $entityName is the name of the entity which the measurement is linked to).

The factors , , , , , are automatically calculated by Calc.


(See Working with R scripts for practical implementation).

Sampling methods

 SIMPLE RANDOM SAMPLING / SYSTEMATIC SAMPLING

 SIMPLE RANDOM SAMPLING / SYSTEMATIC SAMPLING WITH STRATIFICATION

EXPF is calculated separately for each stratum (h), the is the sum of the
estimated . Analogously is the sum of .

 DOUBLE (OR TWO PHASE) SAMPLING FOR STRATIFICATION

In double sampling for stratification, the stratum areas are not known and are
calculated as proportion ( of the theoretical first point data ( ).
15

where

3.3.2. Setting up a sampling design strategy

3.3.2.1. User Interface

The sampling design UI has the following tools:

The following tool buttons can be used:

Upload data from csv file.

View current data.

Close current data view.

Scroll to previous/next edit section.

Calc can handle different types of sampling strategies: Random, Systematic, Double Sampling or
Two Phase, Two Stages, Cluster and Stratified. Within these types, you may then apply point or
area method1. Technically, random and systematic sampling methods are handled similarly in the
settings.
To set or edit a sampling design, click the edit button and the system will switch in the sampling
design edit mode.

The sampling design section has as question/answer type of user interface: choices are done by
clicking the appropriate button, which by turning to green, indicates that a selection has been
confirmed/activated.
The steps to follow are described below.

1
Line method is currently not supported, yet.
16

3.3.2.2. Reporting Unit (AOI)

(Reporting Units were called as Areas of interest (AOI) in Calc version 1).

The first step consists in uploading a Reporting Unit CSV file.

This AOI file contains data on the main reporting area of associated to the survey: it can be
the total area of a country and its subdivisions in regions, provinces etc.
Each level of the Reporting Unit must be represented with two columns, indicating the
respective code and label. The last level, in addition, must contain the area.

An example.

Click the upload button, to import the Reporting Unit CSV file.
The user will be now asked to enter captions for the area levels which, for example, could be
set as: Level 1 = Country; Level 2 = Region.

Then click on Import

View the current settings


The screen will show a graphical representation of the areas object of the survey. The biggest
circle indicates Level 1 (Country), while the smaller circles indicate Level 2 (Regions).
The radius of each circle is proportional to the area of the unit it represents.
17

If you move the cursor on the top of a balloon, the area will appear (in hectares).

Go back to Sampling design view.

Click on right side arrow button.

3.3.2.3. Reporting Unit (AOI) with predefined stratum areas

In case of stratified sampling where stratum areas are known, these areas are given in the
same file with Reporting Unit (AOI) labels, as a CSV file. The following table shows a forest
inventory case where the reporting units (AOI) are provinces in Zambia and stratification is
based on FAO-FRA classes (i.e., Forest, Other Wooded Land, Other land, Water) which
areas are taken from auxiliary data source, as from remote sensing data. This forest inventory
is a 2-stage stratified (cluster) sampling. In this case only the areas of strata are given (in
hectares for example).
18

3.3.2.4. Base Unit

The base unit is the first level of aggregation (i.e. a plot, subplot, or a section within a plot).
In this step, the user must select the entity that represents the Base Unit. With test data
‘atlantis’: select plot.

The following screen will appear.

Click on right side arrow button.

To use area weighting when the total area per strata or reporting unit is known (Optional),
select ‘apply area weighted method (by base unit area)’

.
19

3.3.2.5. Double Sampling


In the Double- or Two-phase sampling initially a sample of unit is selected for obtaining
auxiliary information only, and then a second sample is selected in which the variable of
interest is observed, in addition to the auxiliary information. Double sampling is also called
two-phase sampling.

The third step requires the user to select whether the sampling design is two phases or
not. If positive, select the Double sampling button, otherwise click the next button.

At this point, the system will require the user to upload a csv file containing the first
phase points that will be converted into a database table.

Click on import tool button.

Calc recognizes the file structure and, before importing the file, the user should select
which columns to import and define the data type of each column, choosing from Integer,
Real or String. Integer and Real refer to numerical values (with or without decimals),
String refers to a coded value (please note that even if a value is indicated with a number
it may actually refer to a code).
20

Then click on Import.

A running import window will appear showing that the CSV file has been successfully
imported (100%). In case of errors or notifications they will be displayed in the Log
window.

The next step requires defining the joins between tables. The rationale for this process is
that each table should have at least one column in common with another table, and that
column will be the join (see the join between table1 and table 2 in the Data tables and
relational joins section). In our example, under phase 1 table select cluster and make
the join with plot_view table by selecting ‘cluster_id’. Additional joins can be added by
clicking on the small blue [+] sign. In our example proceed by selecting plot and making
the join it with ‘plot_no’.
21
22

3.3.2.6. Two Stage Sampling with simple random sampling (SRS)


In the two-stage sampling design the population is partitioned into groups, like cluster
sampling, but in this design new samples are taken from each cluster sampled. The
clusters are the first stage units to be sampled, called primary or first sampling units
(PSU). The second-stage units are the elements of those clusters, called sub-units,
secondary or second sampling units (SSU).

This step requires the user to select whether the sampling design is in two stages or not.
If yes select the 2 stages w/SRS button, otherwise click the next button.

At this point the system will ask the user to upload a CSV file containing the Primary
Sampling Unit (PSU) data, to select the column that represents the area and the number
of theoretical base units.

Afterwards, the user must select which entity represents the Secondary Sampling Unit
and define which columns are used to join with the PSU table.
23

3.3.2.7. Stratified
A stratified sampling design purposely partitions the target population into two or more
non-overlapping subpopulations, called strata, which are sampled separately.
Use the 'Upload Csv' button below to import the strata and select the column that
represents the stratum field in the data.

Select whether the survey is stratified sampling. If the survey is stratified, click on
‘Stratified’ button and it turns to green.

If stratum labels are not imported together with the Reporting Unit CSV file, the system
will require uploading a CSV file containing the stratum labels. In this file, the first column
defines the stratum number and second column the stratum caption (see the next image).

Next, the user must define the stratum label join, i.e. the column that will be used to
identify the stratum for each record [in case of 1-phase sampling the column has to be
present in the sampling unit table].
24
25

3.3.2.8. Cluster
In cluster sampling the population is partitioned into groups, called clusters. Each element
should belong to one cluster only and none of the elements of the population should be
left out.

If the survey has a cluster sampling design, click the Cluster button to turn it green, and
select which column in the data represents the cluster column,

3.3.2.9. Reporting Unit (AOI) join


Select which variable is used to join the data with the lowest level of the hierarchy of the
reporting unit.
26

3.3.3. Test data, 1-phase sampling design

In this chapter, we continue describing how to set up a sampling design step by step using the
test data from ‘Base Unit settings’ chapter. After having selected the base unit, click on the right-
arrow to proceed to the next step.

1. Select whether the survey is designed with double sampling (2 phases). For this
example just click the right arrow button.

2. Select whether the survey is designed with two stages or Simple random (SRS). For
this example just click the right arrow button.

3. Select whether the survey is stratified sampling. This survey is stratified, so click on
‘Stratified’ button and it turn to green.

4. Next, you need to download Stratum Labels from a CSV file

The system will require uploading of a CSV file containing the stratum labels. In this file,
the first column defines the stratum number and second column the stratum caption (see
the next image).

5. View the current settings

6. Close the view.

7. Next, the user must define the stratum label join, i.e. the column that will be used to
identify the stratum for each record [in case of 1-phase sampling the column has to be
present in the sampling unit table].
27

Select ‘stratum’ as the column that serves as a join.

Then click right-arrow for the next step.

8. Select whether the survey is cluster sampling. For this example click on Cluster button to
green and select column ‘cluster_id’.

Then click right-arrow for the next step.

9. In the final step, Reporting Unit join, select column to link input data with Reporting Unit
(AOI) data. In this case select column ‘stratum’.

10. Click on Save button.


28

And the following screen will appear.


29

3.3.4. Test data, 2-phase sampling design

The following steps will guide you to define the sampling design for ‘atlantis’ survey data using 2-
phase sampling design in Calc. This is a stratified sampling case with plots are falling into three
strata.

1. First, create a new workspace named as ‘atlantis_2phase’ and read in Collect data.

2. Go into Sampling design pages and read in Reporting Unit (AOI) CSV file as previously
explained in chapter 3.3.2.1.

3. Next select the entity that represents the Base Unit. In the test data ‘atlantis’ case select
plot (see chapter 3.3.2.2). Click on the right-arrow button.

4. Select whether the survey is designed with double sampling (2 phases). Click on this
option to green.

5. At this point, the system will require the user to upload a csv file containing the first
phase points that will be converted into a database table (table 2). This file is ‘calc-
phase1-plots.csv’. See more information in the next info box!

Click on import tool button.

The user must define the columns that will be used to join with the sampling unit table.
This is necessary to link the first phase points with their observations (e.g., see yellow
arrows in the 'Data tables and relational joins' section).
30

Calc recognizes the file structure and before importing the file the user should select
which columns to import and define the data type of each column, choosing from Integer,
Real or String. Integer and Real refer to numerical values (with or without decimals),
String refers to a coded value (please note that even if a value is indicated with a number
it may actually refer to a code). In our example, make the selection as shown in the image
below.

Then click on Import.

6. A running import window will appear showing that the CSV file has been successfully
imported (100%). In case of errors or notifications they will be displayed in the Log
window.

7. The next step requires defining the joins between tables. The rationale for this process is
that each table should have at least one column in common with another table, and that
column will be the join (see the join between table1 and table 2 in the Data tables and
31

relational joins section). In our example, under phase 1 table select cluster and make
the join with plot_view table by selecting ‘cluster_id’. Additional joins can be added by
clicking on the small blue [+] sign. In our example proceed by selecting plot and making
the join it with ‘plot_no’.

8. Next continue by clicking on the right arrow button. Skip 2 stages w/SRS.

9. This survey is stratified. Select this and the button will turn green.

10. Next upload calc-strata.csv containing the stratification table.

You can also view this data.

In this test data, the survey contains three strata as visible in the uploaded file calc-
strata.csv
32

The user must define the column that will be used to identify the Stratum label join for
each record.

Select field stratum as the column that serves as a join.

Then click right-arrow for the next step

11. This survey is applying cluster sampling. The user must define the column that represents
the cluster code. For the test data set select column ‘cluster’.

Click on right arrow button.


12. In Reporting Unit (AOI) join, set column to link input data with reporting unit (AOI)
areas. The system requires the users to indicate the column that represents the lowest
level of the administrative unit hierarchy previously imported. [In case of double sampling
the column has to be present in the phase 1 table].

13. Click on Save button.


33

If successful, the following screen will appear.


34

4. DATA VIEW

Data is the section of Calc where the row data (and computed result variables) can be viewed,
visualized, sorted, filtered, etc. Two visualizations options are available: data can be visualized
in Table format or through a Scatter chart.

In order to visualize the data, the user must first select the Entity to display, then choose between
[Q] and [C] for selecting the type of variables to display: Quantitative or Categorical.

Then click View to visualize the data.

The first step is to select the entity (variable) to display.

In the case of our test survey “Atlantis”, selecting the entity tree will show all the attributes related
to the entity tree.

The attributes that you wish to display can be selected by clicking on them (the related box will
turn green). It is also possible to filter the data by clicking on the filtering icon to the right of each
attribute. The following screen shots show some examples of the display and sorting possible
using Calc.
35
36

In the top part of the screen in Table view you can see:

 An indicator of the number of records displayed and a total count. Left and right arrows
can be used to move from one page to another of the tabular results.

 The CSV button to download a CSV file of the data displayed. Click for automatic
download.
37
38

If you need more variables from Calc database and you feel that it is too laborious to select
variables in Calc Data view, you can use pgAdmin to export data. See Annex 1 how to use
pgAdmin to export data directly from PostgreSQL database into CSV.
39

5. WORKING WITH R SCRIPTS

5.1. Working process

To start from scratch, the recommended working process with Calc and RStudio can be as
follows:
1. [Calc] Create at least one result variable (i.e. calculation module) for each entity (e.g. tree,
stump, etc.) that you want to analyze! The reason is that an entity data is read into R only if
there is any “use” for this entity. Actually, in this way you get and edit Plot area scripts for
each entity.
2. [Calc] Export the scripts into a zip file from Calc (see the next chapter).

3. Open exported zip file.


4. Open user/5-base-unit-weight.R
5. [RStudio] Write R script for the base unit weight (See more at chapter 7.5.3).Test that you
can run it without errors in Rstudio.
6. [RStudio] Write R scripts for Plot area for each entity. Test these.
7. [RStudio] Run calc.R by using use the Source toolbar button.

If R scripts are successfully executed, scripts are written back to Calc database (i.e. into
PostgreSQL).
8. [Calc] Create result variables, i.e. calculation modules. Check the order of these modules in
Calc.
9. [Calc] Export the scripts into a zip file from Calc (As in steps 2 and 3). Open the exported zip
file.
10. [RStudio] Write R script for result variable(s). Test these.
11. [RStudio] When all major scripts are ready, run the entire document of calc.R by using use
the Source toolbar button.
12. [Calc] Create Categorical variable(s). Export the scripts to zip file again.
13. [RStudio] Write R script for Categorical variable(s). Test these.
14. [RStudio] When scripts are ready, run calc.R by using use the Source toolbar button.
15. [Calc] Go back to Calc, run whole data processing chain

This will create files for Saiku repository and for viewing results in Saiku.
16. [Calc] View results in Saiku.
40

--------------------------

Calc export file (zip) can contain the following script types:
1. Calc.R is the main calculation module. This file is created automatically by Calc.
2. System scripts under subfolder /system.
3. Base unit weight script in user/5-base-unit-weight.R. See more at chapter 7.5.3.
4. User-defined scripts. These can be split into the following groups:
 4-common.R contains user-defined common scripts as R functions (optional).
 Plot area scripts for each entity that are in calculation,
 Categorical variable scripts, and
 Result variable scripts.

In the next chapters we will look more detailed how to write and test R scripts.

5.1. Creating a new calculation module in Calc


To create a new calculation module, click on + icon in Calculation panel.

And you will see this window:

1. Select Type: ‘R Script’ (default).


2. Write ‘Caption’. This is the title of the calculation module in Calc.
41

2
3. Next select the entity for which this result variable is created .

4. Create a new result variable


5. Give name for this result variable: a unique variable name required. Use only lower case
letters, no spaces. Note: this name comes also visible in Saiku! Use informative and clear
names. See next an example below.

6. When ready, save this new module. You will later write R scripts for this module in
RStudio.

As seen above, in Calc the user only creates the metadata for the calculation modules, but for
actual writing and testing of R scripts you need to use RStudio.

When creating a calculation module you will see these three options:

These options are for the following purposes:


 ‘R script’ is to compute a new result variable for an entity in the survey data. (Examples
of result variables: tree count, tree basal area, tree above-ground biomass, bamboo
biomass, deadwood volume, etc.). A result variable can be seen and selected in Saiku
reporting tools, and it can be called in posterior calculation modules (of the same entity).

 ‘External equation’: This allows the user to upload external equations for the calculation
of timber volume or biomass according to tree species or any other condition. This is
done by uploading a CSV file. Read more at chapter 5.6.
 ‘Category’: This allows the user to aggregate data. These aggregated classes can be
reported in Saiku. Categorical variables are typically run first, and then following the other
scripts.

Aggregate function: This option is available when the survey has not a sampling design
assigned. In fact, the result output variable will be aggregated using the selected functions
(SUM/MIN/MAX/AVG/COUNT/DISTINCT-COUNT) and results will be available in Saiku. These
functions can be used for example with interview type data.

2
Note: Currently you cannot create a result variable to the same entity that is your base unit. There
are some ways to get around this problem using programming in R with direct queries to Calc
database tables and views with the help of SQL clauses and sqldf-library in R. But these results can
be only written e.g. into CSV or graphic files.
42

Once a new calculation is saved it will appear in the main panel as shown below. Calculation
modules can also be deleted by dragging and dropping on the top of the trash icon on the right
side.

Each calculation module can be activated or deactivated. Click on the small icon on the right top
side within each calculation step to Activate/Deactivate. In the next example the module at the left
side is active, and the second module is inactive.

Note: Pay attention to the possible dependencies between the calculation modules. If a result
variable is needed by another calculation step, it needs to be activated otherwise it will interrupt
the calculation chain and result into an error.

5.2. Editing scripts in RStudio


In order to export all calculation scripts for editing in RStudio, go to Calculation panel in Calc. In
the left upper side there is RStudio icon, click on that.


This function will export all calculations scripts into a .ZIP format file into the default ‘downloads’
folder. Unzip it. Then select the unzipped folder and you will see two subfolders and the file
Calc.R.

Double click file calc.R: this will start RStudio and open the file. You will see the list of all
calculation modules in RStudio (see the next image).
43

File calc.R contains the main program code, with references to files in two subfolders:
‘system‘ and ‘user’. The user-defined scripts are all stored in the ‘user’ folder. Now Rstudio
can be used for editing the calculation modules, testing and debugging the code.

In calc.R, this line sets the default folder (i.e. working directory) when running the scripts in
RStudio:
setwd('.');

If you first unzipped the package and opened RStudio by double clicking file calc.R, then you
do not need to change the first line in calc.R. But if you opened RStudio first and then
opened calc.R, you need to change this command line and make it to refer to your R script
(unzipped) folder in your computer! Check this path in your computer. As an example it may
be as this:
setwd('D:/downloads/calc-atlantis-processing-chain-20160104-134014’);

When you want to edit and debug a particular module in Rstudio, run in the main program
calc.R all modules prior to the selected one. In the following example we need to edit and
test module ‘user/6-tree-plot-area.R’. Do it as follows:
1. Select all preceding modules
44

2. And run them


3. Go to module ‘user/6-tree-plot-area.R’ in Rstudio, and you can run (and debug) it line
by line.

4. When the module works without errors, save the module.


Note: this does not write the codes to PostgreSQL (Calc) database but just into this
very file. Only successful execution of Source command in calc.R will do this.
5. Continue with other modules similarly.

More about editing and executing code in RStudio, see e.g.


https://support.rstudio.com/hc/en-us/articles/200484448-Editing-and-Executing-Code

And more about using RStudio, see


https://support.rstudio.com/hc/en-us/sections/200107586-Using-RStudio
45

5.3. Base unit weight script


The base unit is the first level of aggregation (i.e. a plot, subplot, or a section within a plot) and it
is defined in Sampling Design settings. The user needs to write an R script to calculate
the weight of each record of the base unit table (or this table can be called also as sampling unit
table). The script will assign a weight to each record by adding a column (named weight) to the
base unit table (table 1). The weight must be a numeric value between 0-1.

Open file user/5-base-unit-weight.R in Rstudio. The following R script will appear as default:

plot$"weight" <- 1;

If a plot cannot split into subplots, then the weight is 1. If it is allowed that a plot may fall in the
border of two land use classes or two forest types, in this case the plot falls in two estimation
domains. Each proportion of each plot (or cluster) falling in a domain can either be estimated

1. by the proportion of plot centers falling in the domain3 or

2. by the plot areas (as in U.S. FIA method and FAO’s NFMA method) falling in the domain.

In the next examples we cover the both ways of giving base unit weight.

Example I. Plot center point/reference point method.

In ‘atlantis’ test data case the calculation follows so-called point-method, so change scripts this
as follows:

plot$weight = ifelse ( is.na ( plot$subplot ) | plot$subplot == 'A' , 1 , 0 )

The script above in the text box means that a record gets a full weight of 1 if the subplot (i.e. plot
section) code is equal to ‘A’. Code ‘A’ means that plot’s center point is located in this subplot
(=record). If the subplot code is other than ‘A’, the record will get no weight.

Example II. Plot area method.

Base unit weight is relational to plot area or plot section’s area (as in Zambian National Forest
Inventory case). In this Zambian forest survey the maximum plot area is 0.1 ha.

# Plot weight by plot section (lvs) areas. Plot dimensions are in meters.
# The weight must be between 0-1.
lvs$lvs_area <- lvs$width * lvs$slength
# if missing dimension, give weight 0
lvs$lvs_area[is.na(lvs$lvs_area)] <- 0
# convert m2 to hectares
lvs$weight <- lvs$lvs_area/10000
# rescale to from 0 to 1
lvs$weight <- lvs$weight / 0.1
# inaccessible plots get no weight
lvs$weight[lvs$accessibility > 0] <- 0 ;
lvs$weight[is.na(lvs$weight)] <- 0;

3
This method can be applied typically with circular, nested circular or relascope plots. The method is
applicable also with another plot type with corresponding reference point.
46

5.4. Plot area scripts


Whenever you save a new calculation module for a new entity, Calc creates two script modules
for this entity: one for the actual calculation script, one for the “plot area” script. See next the
case where we have created a calculation module for entity ‘tree’ and result variable ‘tree_count’.

In this case, the file “6-tree-plot-area.R” should contain the calculation function which represents
the formula of the plot area for this entity. This is called as Plot area script in Calc. The following
example shows how to calculate the plot area for 3 circular nested plots with areas of 0.1, 0.05
and 0.01 ha, respectively.

tree$plot_area <- with (tree ,


ifelse(dbh >= 40, 0.1 ,
ifelse(dbh >= 20 , 0.05 ,
0.01
)));

Another example from Zambia with a 0.1 ha rectangular plot.

tree$plot_area <- tree$width * tree$slength /10000


tree$plot_area[tree$plot_area > 0.1] <- 0.1
47

5.5. Calculation scripts


The following is an example of how to create a calculation step using R script to assign a height
value to each tree record according to a linear fit model.

Note: a useful source for learning about R is Quick-R (http://www.statmethods.net/)

Script for [Number]-tree-est_height.R:

sample_trees <- tree[ !is.na( tree$total_height ) , ]


sample_trees <- sample_trees[ sample_trees$total_height > 0 , ];
height_model <- with( sample_trees, lm(total_height ~ dbh + I(dbh ^2) + I(dbh ^3))
);
tree$est_height <- predict( height_model, newdata = tree[ ,
c('dbh','total_height')] );
tree$est_height <- ifelse( is.na(tree$total_height), tree$est_height,
tree$total_height) ;

Examples of R scripts
Here are some examples of R scripts used to perform calculations.
-----------------------------------------------------------------------------------------------
Caption: Tree - Basal area
Calculation type: R script
Script:
# tree basal area in m2, when dbh is in cm
tree$basal_area <- pi * (0.01* tree$dbh/2)^2
-----------------------------------------------------------------------------------------------
Caption: Tree - Volume
Calculation type: R script
Script:
# Basic form factor volume model
ff <- 0.67;
tree$volume <- with( tree, (0.1291+1.5984 * ff) * pi * (0.01 * dbh / 2)^2 * est_height^0.764 );
-----------------------------------------------------------------------------------------------
Caption: Stand - IPCC class
Calculation type: Category
Script:
48

# '1' Forest land, '2' Grass land, '3' Cropland, '4' Settlements, '5' Wetland, '-1' NA
stand$ipcc_class <- with ( stand,
I felse(forest_status ==’160’ | forest_status == ‘630’, ‘5’,
ifelse(as.integer(forest_status) < 440, ‘1’,
ifelse(forest_status == ‘440’, ‘2’,
ifelse(as.integer(forest_status) < 600, ‘3’, ‘4’)
))));
-----------------------------------------------------------------------------------------------
Caption: Tree - AG Biomass
Calculation type: R script
Script:
BEF_pinus <- 1.3;
tree$genus_code <- substr( tree$species_code, 1, 3 );
# compute AGB in kg, Pinus dry wood density is 500 kg/m3
tree$aboveground_biomass <- with ( tree,
ifelse( genus_code=='PIN', BEF_pinus * volume * 500, 269.63396 *
(((dbh/100)^2*est_height)^0.95193) # Evergreen forest
))
# convert kg -> tons
tree$aboveground_biomass <- tree$aboveground_biomass / 1000
-----------------------------------------------------------------------------------------------
Caption: Tree - BG Biomass
Calculation type: R script
Script:
# conversion factor source:
tree$belowground_biomass <- tree$aboveground_biomass * 0.265;
-----------------------------------------------------------------------------------------------
Caption: Tree - Total biomass
Calculation type: R script
Script:
tree$total_biomass <- tree$aboveground_biomass + tree$belowground_biomass ;
-----------------------------------------------------------------------------------------------
Caption: Stump - Count
Calculation type: R script
Script:
stump$quantity[is.na(stump$quantity)] <- 1
stump$quantity[stump$quantity ==''] <- 1
stump$count_stump <- stump$quantity
-----------------------------------------------------------------------------------------------
49

5.6. Calculation using external equations


This step allows the user to upload External equations for example for computing of timber
volume by tree species or any other condition. This is done with the help of a CSV file which
contains the link field (‘code’), equations and rules.

Click on External equations, then on Upload CSV. With the ‘atlantis’ test data, the file to upload
is calc-volume-models.csv

The following figure shows how to read in external equations for computing of the tree volume
with the test data ‘atlantis’.
50

The fields Caption, Entity and Variable have the same function as in the previous example.

 Equation list: this field is used to select the equation list that should be used to perform
the calculation. [In the ‘atlantis’ test example the only equation list that was uploaded is
the one called volume].

When an equation list is selected, Calc recognizes which are the variables involved in the
calculation, by analyzing the variable names written in the equations (you may want to have
another look at the equations listed in calc-volume-models.csv). Then new fields appear and the
appropriate variable should be selected. In the test data case these are:

 Code variable = species_code ;


 Variable ‘vegetation_type’ = vegetation_type ;
 Variable ‘h’= est_height (the variable created in the example using R script) ;
 Variable ‘dbh’ = dbh
51

5.7. Calculation using Categorical variable


The following example shows how to create a new Categorical variable for reporting results by
breast height diameter (dbh) classes. Each tree needs to be assigned to a specific class
according to its dbh.

The fields Caption and Entity have the same function as in the previous examples.

In our test example we assign text ‘Dbh class’ as the caption of the new category and then
select tree as the entity of our interest.

Define the name of the new Variable

Click Save
The new Category can be added and defined by clicking on Plus sign [+] on the right.4
A new mask will open requesting to indicate a Caption for the new category as well as
a Code and a Caption for each class.

4
New categories can be created and existing ones managed in Settings panel, under Categories.
52

[In the test example we wish to create four dbh classes with codes ranging from 1 to 4 and dbh
classes of <10; 10-20; 20-30; 30+ cm].

Next steps are as follows: save these definitions, export scripts to RStudio and write the R script
for the computing of aggregated classes.

In test data case, the corresponding script in file [Number]-tree_dbh_class.R is as follows:


53

Here is a more efficient R script for a forest inventory case where 5 cm dbh range (up from 10
cm) is applied in reporting:
# '-1' NA, '0' <10 cm, '1' 10-14.9 cm, '2' 15-19.9 cm, '3' 20-24.9 cm,
# '4' 25-29.9 cm, '5' 30-34.9 cm, '6' 35-39.9 cm, '7' 40+ cm
tree$dbh_05 <- trunc(((tree$dbh - 10.0)+5)/5 ,0)
tree$dbh_05 <- ifelse( tree$dbh_05 > 7, 7, tree$dbh_05)
tree$dbh_05 <- ifelse(tree$dbh_05 < 0, 0, tree$dbh_05)
tree$tree_dbh_class_05 <- as.integer(tree$dbh_05);

5.8. Executing scripts

When all calculation scripts are ready and tested in RStudio, run the entire document of calc.R by
using use the Source toolbar button.

This will run all active calculation modules, and update scripts in Calc (PostgreSQL) database.
Then go back to Calc. Because the results needs to be published in Saiku repository, run whole
data processing chain

A ‘Running’ window will open and the results will be displayed in Data view.

Next click Close.

Results are ready to be displayed in Saiku which is web-based open source software for data
visualization and data querying. Access Saiku interface by clicking on the Saiku button.

See Saiku manual for further information. Download it from


http://www.openforis.org/tools/calc/tutorials/saiku.html
54

5.9. Export / Import workspace


The Export / Import buttons in the Settings panel can be used to export and import all the
metadata defined in Calc for a workspace. The following information will be exported or imported:
areas of interest, external equations, sampling design, categorical formulas, calculation modules
and error calculation code. The export file is in .ZIP file format and it will be written into the default
download folder.

This functionality is needed to take a safety copy of the workspace, or transferring the calculation
environment to another computer.
55

ANNEX 1. PGADMIN: GET ENTITIES WITH RESULT VARIABLES FROM CALC


DATABASE INTO CSV

1) Open pgAdmin and connection to calc database. Under Schemas, select your survey and
under Views check the name of your entity result view.

2) Run the following SQL query. In this example the workspace name is ‘ilua2_1’ and entity
name is ‘tree’ (=tree data in Zambia NFI). Use the name from your View list.
Note: Running this function can take a long time!
56
57

3) When the results pop up, export them into a CSV file.
In pgAdmin menu, select File, Export..
Change column separator to comma if needed.

You might also like