You are on page 1of 43

1New image processing software for analyzing object size-frequency

2distributions, geometry, orientation, and spatial distribution*

3Ciarán Beggan1 and Christopher W. Hamilton2

41 Grant Institute, School of GeoSciences, University of Edinburgh, UK

52 Hawai‘i Institute of Geophysics and Planetology, University of Hawai‘i, USA

7Corresponding author: Ciarán Beggan, E-mail: ciaran.beggan@ed.ac.uk;

8Phone: +44 131 650 5916; Address: School of GeoSciences, Grant

9Institute, West Mains Road, Edinburgh, EH9 3JW, UK

10

11*Code available from server at http://www.geoanalysis.org

12

13Abstract

14Geological Image Analysis Software (GIAS) combines basic tools for calculating

15object area, abundance, radius, perimeter, eccentricity, orientation, and centroid

16location, with the first automated method for characterizing the aerial distribution of

17objects using sample-size-dependent nearest neighbor (NN) statistics. The NN

18analyses include tests for: (1) Poisson, (2) Normalized Poisson, (3) Scavenged k = 1,

19and (4) Scavenged k = 2 NN distributions. GIAS is implemented in MATLAB with a

20Graphical User Interface (GUI) that is available as pre-parsed pseudocode for use

21with MATLAB, or as a stand-alone application that runs on Windows and Unix

22systems. GIAS can process raster data (e.g., satellite imagery, photomicrographs, etc.)

23and tables of object coordinates to characterize the size, geometry, orientation, and

24spatial organization of a wide range of geological features. This information expedites

25quantitative measurements of 2D object properties, provides criteria for validating the

1 1
26use of stereology to transform 2D object sections into 3D models, and establishes a

27standardized NN methodology that can be used to compare the results of different

28geospatial studies and identify objects using non-morphological parameters.

29

30Key Words: Image analysis, software, size-frequency, spatial distribution, nearest

31neighbor, vesicles, volcanology

32

331. Introduction

34Geological image analysis extracts information from representations of natural objects

35that may either be captured by an imaging system (e.g., photomicrographs, aerial

36photographs, digital satellite imagery) or schematically rendered into visual form

37(e.g., geological maps). In addition to examining the properties of individual objects,

38spatial analysis may be used to quantify object distributions and investigate their

39formation processes.

40 ImageJ (Rasbald, 2005) is a commonly used image processing application that

41was developed as an open source Java-based program by the National Institutes of

42Health (NIH). Custom plug-in modules enable ImageJ to solve numerous tasks,

43including the analysis of vesicle size-frequency distributions within geological thin-

44sections (e.g., Szramek et al, 2006, Polacci et al., 2007). However, ImageJ and similar

45programs for Windows (e.g., Scion Image and ImageTool) and Macintosh (e.g., NIH

46Image) are not specifically designed for geological applications nor do they provide

47analytical tools for investigating patterns of spatial organization.

48 To take greater advantage of the information contained within geological

49images, we have developed Geological Image Analysis Software (GIAS). This

50program combines (1) an image processing module for calculating and visualizing

2 2
51object areas, abundance, radii, perimeters, eccentricities, orientations, and centroid

52locations, and (2) a spatial distribution module that automates sample-size dependent

53nearest neighbor (NN) analyses. Although other programs can perform the basic

54functions in the “Image Analysis" module, GIAS is the first program to automate

55sample-size-dependent analyses of NN distributions.

56

572. Motivation

58Nearest neighbor (NN) analysis is well-suited for investigating patterns of spatial

59distribution within intrinsically two-dimensional (2D) datasets, such as orthorectified

60aerial photographs and satellite imagery. Applications of NN analyses to remote

61sensing imagery include the study of volcanic landforms (Bruno et al., 2004; 2006;

62Baloga et al., 2007; Bishop, 2008; Hamilton et al., 2009; Bleacher et al., 2009),

63sedimentary mud volcanoes (Burr et al., 2009), periglacial ice-cored mounds (Bruno

64et al., 2006), glaciofluvial features (Burr et al., 2009), dune fields (Wilkins and Ford,

652007), and impact craters (Bruno et al., 2006). Despite widespread utilization of NN

66analyses, its value as a remote sensing tool and effectiveness as basis for comparison

67between different datasets is limited by the lack of a standardized NN methodology—

68particularly in terms of defining feature field areas, thresholds of significance, and

69criteria for apply higher-order NN methods.

70 In addition to remote sensing applications, NN analyses can be used to study

71objects in photomicrographs such as crystals (Jerram et al., 1996; 2003; Jerram and

72Cheadle, 2000) and vesicles. In this study, we emphasize the application of NN

73analyses to vesicle distributions to demonstrate how GIAS can be used to validate (or

74refute) the assumption of randomness, which is a prerequisite for effectively applying

3 3
75stereological techniques to derive vesicle volumes from photomicrographs and for

76selecting appropriate statistical models to characterize those vesicle populations.

77 Vesicle textures preserve information about the pre-eruptive history of

78magmas and can be used to investigate the dynamics of explosive and effusive

79volcanic eruptions (e.g., Mangan et al., 1993; Cashman et al., 1994; Mangan and

80Cashman 1996; Cashman and Kauahikaua, 1997; Polacci and Papale 1997; Blower et

81al., 2001; Blower et al., 2003; Gaonac'h et al., 2005; Shin et al., 2005; Lautze and

82Houghton, 2005; Adams et al., 2006; Polacci et al., 2006; Sable et al., 2006; Gurioli

83et al., 2008). Quantitative vesicle analyses stem from Marsh (1988), who explored the

84physics of crystal nucleation and growth dynamics to derive an analytical formulation

85for crystal size distributions. This research was then applied to numerous bubble size-

86frequency distribution studies (e.g., Sarda and Graham, 1990; Cashman and Mangan,

871994; Cashman et al., 1994; Blower et al., 2003). Early studies of vesicle distributions

88(e.g., Cashman and Mangan, 1994) were limited by their inability to characterize the

89full range of vesicle sizes because their methodology could not resolve the smallest

90vesicles. Nested photomicrographs solve this problem because photomicrographs

91captured at multiple magnifications enable the reconstruction of total vesicle size-

92frequency distributions (e.g., Adams et al., 2006; Gurioli et al., 2008).

93 In general, vesicle studies are limited by two major factors: transformations of

942D cross-sections into representative vesicle volumes (Mangan et al., 1993; Sahagian

95and Proussevitch, 1998; Higgins, 2000; Jerram and Cheadle, 2000); and development

96of reliable statistical characterizations of vesicle populations in terms of distribution

97functions and their spatial characteristics (Morgan and Jerram, 2007; Proussevitch et

98al., 2007a). These difficulties have been partially addressed by improved statistical

99techniques for investigating vesicle populations; however, a single cross-section

4 4
100cannot be used to reconstruct a representative 3D vesicle distribution unless the

101objects viewed in a 2D section can be characterized using 2D reference textures with

102known 3D distributions (Jerram et al., 1996; 2003; Jerram and Cheadle 2000;

103Proussevitch et al., 2007a). Although synchrotron X-ray tomography is increasingly

104being used to directly generate 3D vesicle distributions (e.g., Gualda and Rivers,

1052006; Polacci et al., 2007; Proussevitch et al., 2007b), nested datasets containing

106multiple Scanning Electron Microscope (SEM) images remain the most common

107input for vesicle studies because of their superior spatial resolution relative to X-ray

108tomography. To facilitate the analysis of vesicles in SEM imagery, GIAS can be used

109to determine the geometric properties of vesicles within the plane of a given

110photomicrograph and establish if objects fulfil the criteria of spatial randomness,

111which is required for transforming 2D sections into 3D models using stereology.

112

1133. Nearest neighbor (NN) analysis

114Clark and Evans (1954) proposed a simple test for spatial randomness in which the

115actual mean NN distance (ra) in a population of known density is compared to the

116expected mean NN distance (re) within a randomly distributed population of

117equivalent density. Following Clark and Evans (1954), re and expected standard error

118(σe) of the Poisson distribution are:

1 0.26136
119 re  ; e  , [1; 2]
2 0 N 0

120where the input population density, ρ0, equals the number of objects (N) divided by

121the area (A) of the feature field (ρ0 = N / A). The following two test statistics (termed

122R and c) are used to determine if ra follows a Poisson random distribution:

ra ra  re
123 R ; c . [3; 4]
re e

5 5
124 If a test population exhibits a Poisson random distribution, R should ideally

125have a value of 1, while c should equal 0. If R is approximately equal to 1, then the

126test population may have a Poisson random distribution. If R > 1, then the test

127population exhibits greater than random NN spacing (i.e., tends towards a maximum

128packing arrangement), whereas if R < 1, then the NN distances in the test case are

129more closely spaced than expected within a random distribution and thus exhibit

130clustering relative to the Poisson model.

131 To identify statistically significant departures from randomness at the 0.95 and

1320.99 confidence levels, |c| must exceed the critical values of 1.96 and 2.58,

133respectively (Clark and Evans, 1954); however, these critical values implicitly assume

134large sample populations (N > 104). Jerram et al. (1996) and Baloga et al. (2007) note

135that finite-sampling effects introduce biases into the variation of NN statistics. These

136biases become significant for small populations (N < 300) and thereby necessitate the

137use of sample-size-dependent calculations of R and c thresholds.

138 The NN methodology of Clark and Evans (1954) assumes that objects can be

139approximated as points. However, for frameworks of solid objects, NN statistics

140exhibit scale-dependent variations in randomness due to the restrictions imposed by

141avoiding overlapping object placements. For 2D sections through 3D distributions of

142equal-sized solid spheres, Jerram et al. (2003) define a porosity-dependent threshold

143for R, which they term the random sphere distribution line (RSDL). With increasing N

144(i.e., decreasing porosity), the RSDL threshold increases from a value just above 1 to

145a maximum of 1.65 at 37% porosity, based on the experimental spherical packing

146model of Finney (1970). Jerram et al. (1996, 2003) also note that compaction,

147overgrowth, and sorting can introduce systematic variations in R relative to the RSDL

6 6
148and thus, when correctly identified, can provide information relating to the relative

149importance of these processes during rock formation.

150 In addition to sample-size-dependent tests for Poisson randomness, Baloga et

151al. (2007) proposed the following three NN models:

152

153 1. Normalized Poisson NN distributions account for data resolution-limits, which

154 require normalization of re, σe, skewness, and kurtosis values based on a user-

155 defined minimum distance threshold.

156 2. Scavenged Poisson NN distributions assume features consume (or “scavenge”)

157 resources in their surroundings, which affects the kth nearest neighbor by

158 increasing re relative to the standard Poisson NN model. The order of the

159 scavenging model is given by the Poisson index, k, which identifies the

160 number of NN objects participating in the scavenging process and, if k = 0, the

161 standard Poisson NN distribution is obtained.

162 3. Logistic NN distributions apply the classical probability distribution for self-

163 limiting population growth due to the use of available resources. Under this

164 probability assumption, NN distances become more uniform, which leads to

165 smaller skewness, but larger kurtosis compared to the Poisson NN model.

166

167 Sample-size-dependent R and c test statistics, in addition to the expected

168skewness versus kurtosis values, allow for robust statistical tests of spatial

169organization. However, prior to this study, these extensions to the classical NN

170method have not been automated and freely available for use.

171

1724. Method

7 7
1734.1. Input data and processing parameters

174GIAS is designed to process two input data types: images and tables of object

175coordinates. In the first instance, images may include rectangular raster files (e.g.,

176*.tiff, *.jpg, *.gif, etc.) encoded in 8-bit grey-scale formats. All inputs are assumed to

177have an orthogonal projection with equal x and y pixel dimensions. If the input image

178pixels are not square, their area can be uniquely specified. To appropriately scale the

179output results, the pixel size must be defined by the analyst for each input image.

180 Image segmentation is achieved using Digital Number (DN) thresholding,

181such that inputs are converted to a binary threshold logical matrix. GIAS assumes

182input grey-scale pixel values >250 are white; however, the upper and lower thresholds

183can be adjusted in the input options. Within input images, black pixels are assumed to

184be the objects of interest (e.g., vesicles) whereas white pixels are assumed to be part

185of the background (e.g., non-vesicles). An option is provided for image inversion.

186 Other than performing the binary conversion, GIAS does not modify input

187images. Consequently, pre-processing may be necessary to remove image artifacts and

188segment objects into feature classes. To facilitate image analysis and pre-processing

189of inputs to the NN module, GIAS supplies labeling information and metadata for

190each object in the image analysis output file; however, the level of image modification

191will depend upon the particular scientific investigation and the preferences of the

192analyst.

193 Inputs to the NN module can be passed from the image analysis module or

194loaded as a two column table (tab- or space- delimited) with x-coordinates in the first

195column and y-coordinates in the second column. GIAS automatically checks input

196files for duplicate entries and retains only unique coordinate pairs.

8 8
197 Slider bars control the detection thresholds for the minimum and maximum

198object areas, in addition to the distance threshold used within the Normalized Poisson

199tests. A check-box excludes objects that touch the periphery of the image from

200subsequent processing. This option is enabled by default because meaningful cross-

201sectional properties (e.g., area, geometry, and orientation) cannot be calculated for

202truncated objects. Calculations of object abundance (as a percentage of total image

203area) are based on total image properties and are unaffected by peripheral object

204exclusion. Once the input options are satisfactorily set, the analyst simply presses the

205“Process” button to calculate the output graphs and statistics.

206

2074.2. Output results

208GIAS outputs its results to on-screen graphs and text boxes, which are presented in

209two tabbed panels. Statistics relating to the object area, geometry, and orientation are

210shown in the first panel, labeled “Image Analysis” (Figure 1). The frequency axis can

211be plotted in linear or log units with on-screen options for defining custom

212dimensions. Object areas, radii, perimeters, and eccentricities are presented as

213histograms, whereas object orientations are displayed using a rose plot. Object radii

214are estimated by calculating the radii of equal area circles. In addition to the graphical

215output, there are text-box summaries of the total number of objects, their abundance,

216minimum, maximum, mean, standard deviation (at 1 σ), skewness, and kurtosis based

217on the object area calculations.

218 Results of the spatial analyses are summarized in a second panel, labeled

219“Nearest Neighbor Analysis” (Figure 2). These results include histograms of the

220measured NN distances, in addition to graphs of R, c, and skewness versus kurtosis

221(Baloga et al., 2007). The R and c values are presented with sample-size-dependent

9 9
222thresholds for rejecting the null hypothesis (i.e., a given model of re) at significance

223levels of 1 and 2 σ. By default, the program assumes a Poisson random model for the

224expected spatial distribution; however, the user may also view results based on

225Normalized Poisson, or Scavenged (k = 1 or 2) models. Similarly, the skewness versus

226kurtosis plots are provided with sample-size-dependent significance levels of 0.90,

2270.95, and 0.99.

228 Textbox outputs provide the model-dependent value of re and summarize ra

229statistics, including the total number of objects, number within the convex hull, and

230NN distance statistics (minimum, maximum, mean, and standard deviation at 1 and 2

231σ). The convex hull is a polygon defined by the outermost points in a feature field

232such that each interior angle of the polygon measurers less than 180º (Graham, 1972).

233The convex hull thus forms a boundary around a feature field with the minimum

234possible perimeter length (see Appendix 1 for examples).

235 The input parameters and computed statistics from both tabbed panels can be

236output in separate user-named text files. The data are saved in a tab-delimited format,

237allowing the files to be viewed using a text editor or imported into a spreadsheet.

238

2394.3. Implementation

240GIAS is written and implemented in MATLAB with a Graphical User Interface (GUI)

241that is designed for use within the geological community. The program is available as

242preparsed pseudocode for use with MATLAB (provided both “Image Processing” and

243“Statistics” toolboxes are installed) or as a standalone application that can run on

244Windows and Unix systems without requiring MATLAB or its auxiliary toolboxes.

245These software products may be downloaded from the internet for free via

246http://www.geoanalysis.org.

10 10
247 The regionprops function within the MATLAB Image Processing toolbox

248calculates the following key properties of each object: area, centroid position,

249perimeter, eccentricity, orientation, equal-area circle equivalent diameter and a list of

250pixel positions within each vesicle. The pixel list is used to determine which objects

251are in contact with the image edge and the equal area circle diameters are used to

252calculate object radii. The eccentricity and orientation are calculated by fitting an

253ellipse about the object. The angle between the major axis of the ellipse and the x-axis

254of the image defines the orientation.

255 By default, the NN analysis utilizes centroid positions calculated in the image

256analysis module, however, object locations may be uploaded as a table of x- and y-

257coordinates. Within the NN module, the MATLAB function convhull is used to

258identify the vertices of the convex hull for all of the input objects. The function

259convhull is also used to calculate the area of the convex hull (Ahull) and to separate

260interior objects (Ni) from the hull vertices (see Appendix 1 for examples).

261 NN distances are determined by calculating the Euclidean distance between

262each interior object (centroid) and all other objects within the dataset, including the

263vertices of the hull. The results for each interior point are sorted in ascending order to

264identify the minimum NN distance and calculate the actual mean NN distance (ra) by

265averaging the NN distances for all interior objects (Ni). The interior population density

266(ρi = Ni / Ahull), provides a statistical estimate of the true density within an infinite

267domain and thus substitutes for ρ0. We then calculate R as the ratio of ra-to-re.

268 Using sample-size-dependent values of R, c, and their respective thresholds of

269significance, we may define the following scenarios. If the values of R and c fall

270within ±2 σ of their expected values, then the results suggest that the model correctly

271describes the input distribution. If the value of c is outside the range of ±2 σ, then the

11 11
272inference is that the input distribution is inconsistent with the model, regardless of the

273value of R. If c is outside ±2 σ, and R is greater than the +2 σ threshold, then the input

274distribution exhibits a statistically significant repelling relative to the model.

275Conversely, if c is outside the ±2 σ range, and R less then than the -2 σ threshold, then

276the input distribution is inferred to be clustered relative to the model. Lastly, if R is

277outside ±2 σ, and c is within the ±2 σ range, then the test results are ambiguous,

278which generally indicates that there is insufficient data to perform the analysis and/or

279the standard error (σe) of the input distribution is large.

280 One of the novel contributions of our application is the introduction of

281automated calculations of sample-size-dependent biases in NN statistics. Baloga et al.

282(2007) described how finite sampling would affect several NN models; however, they

283only implemented a solution for the Poisson NN case. In our application, we have

284automated the Poisson NN procedure, extended it to include the three other models of

285Baloga et al. (2007), and derived the k = 2 case for the scavenged NN distribution

286(Appendix 2). Automation of the NN method within GIAS enables users to rapidly

287determine if their data fulfills the size-dependent criteria of statistical significance for

288the following five spatial distribution models: (1) Poisson, (2) Normalized Poisson,

289(3) Scavenged k = 1; and (4) Scavenged k = 2. Table 2 summarizes the properties of

290these models.

291 To obtain sample-size-dependent values of R, c, and their confidence limits,

292we generated simulations in MATLAB. Following the method of Baloga et al. (2007),

293confidence limits for the Poisson and the Scavenged k = 1 and 2 models were

294simulated using 1000 random draws from a modeled distribution. However, for the

295Normalized Poisson model, limits for the standard Poisson case are employed because

296simulating unique limits for each user-defined distance threshold is computationally

12 12
297intensive in real-time (e.g., requiring approximately four minutes to perform the

298simulations using a computer with a 2 GHz processor and 1 GB of RAM).

299 GIAS does not include the logistic density distribution described by Baloga et

300al. (2007) because its parameters (i.e., the logistic mean (m) and the standard

301deviation (b), see Table 2) are estimated using a best-fit to the input data. The logistic

302case therefore differs substantially from the other NN analyses because they involve a

303comparison between input spatial distributions and expected model distributions.

304Additionally, the fit of a logistic curve to an input distribution does not necessarily

305imply an underlying logistic process because other processes could generate

306distributions that better fits the data. Thus given the differences between the logistic

307density distribution described by Baloga et al. (2007) and the other Poisson NN

308models, we have omitted the logistic case from the geospatial analysis tools

309implemented within GIAS.

310 To correctly interpret the results of the Scavenging NN models, it is important

311to recognize that the formulation of Baloga et al. (2007) is a scale-free distribution.

312Modification to expected NN distances depend upon the number of objects

313participating in the scavenging process and not on the location of the objects.

314Consequently, the mean and standard deviation statistics are not directly related to the

315radius of the scavenged area. This is an unrealistic condition for resource-scavenging

316phenomena in nature, which have a fixed upper limit to r. Alternative modeling

317scenarios, with a user-defined scavenging radius, may be more appropriate for some

318applications—provided that a scale for the scavenging process can be determined.

319 Distinguishing between Poisson NN and Scavenged Poisson NN models

320requires statistical separation of their respective NN distributions. The minimum

321number of objects required will depend on ρ0 and k. In Appendix 3, we show that

13 13
322separation of the Poisson and k = 1 Scavenged Poisson models, at the 95% confidence

323level, requires a minimum 50 to 60 objects for ρ0 values of 0.5 to 0.0005 objects / unit

324area, whereas 100 to 150 objects are required to distinguish between k = 1 and 2

325models.

326 In addition to sample-size-dependent R and c statistics, we expand upon the

327methods of Baloga et al. (2007) by simulating the sample-size-dependent range of

328expected skewness and kurtosis values for each NN hypothesis. Calculated

329confidence intervals—at 90%, 95% and 99%—are plotted within GIAS to illustrate

330the expected ranges of skewness versus kurtosis for each model. Plots of skewness

331versus kurtosis are also useful for discriminating between populations with otherwise

332similar R and c values, such as ice-cored mounds (e.g., pingos) and volcanic rootless

333cones (Bruno et al., 2006).

334

3355. Results

336To demonstrate the utility of GIAS, we have applied the program to vesicle patterns

337within proximal magmatic tephra deposits from Fissure 3 of the 1783-1784 Laki cone

338row, Iceland. Tables 3 and 4 summarize the results of the “Image Analysis” and

339“Nearest Neighbor Analysis” modules, respectively. In Appendix 1, we include

340additional examples of geospatial distributions that are clustered and repelled relative

341to the Poisson NN model. These examples are based on the Hamilton et al. (2010a,

3422010b) and demonstrate how statistical NN analyses can be combined with field

343observations to obtain information about the effects of environmental resource

344limitation on the spatial distribution of volcanic landforms. Appendix 1 also explores

345how the geospatial analysis tools that are provided within GIAS can be applied to

346remote sensing data to quantitatively compare the spatial distribution of landforms in

14 14
347different planetary environments, supply non-morphological evidence to discriminate

348between feature identities, and examine self-organization processes within geological

349systems.

350

3515.1. Vesicle analysis

352Vesicle characteristics can provide information about the pre-eruptive history of

353magmas, volatile exsolution dynamics, and conduit ascent processes—all of which are

354important factors in controlling the intensity of explosive volcanic eruptions.

355Quantification of these parameters relies largely upon the use of vesicle size, number,

356and volume distributions to establish links between natural volcanic samples and

357experimental data (Adams et al., 2006). Nested SEM images at several magnifications

358are typically used in combination with stereology (e.g., Sahagian and Proussevitch,

3591998) to transform 2D vesicle properties into 3D distributions. Although acquisition

360and analysis of a complete nested SEM dataset is outside the scope of this study, we

361present an example of how GIAS can be used to analyze vesicle properties within a

362single image. For this example, we have chosen a magmatic tephra sample (LAK-t-

36323c, Emma Passmore, unpublished data) from 1783-1784 Laki eruption in Iceland.

364The resolution of the image is 3.81 μm / pixel and we have followed the method of

365Gurioli et al. (2008) by using a minimum object area threshold of 20 pixels (76.2

366μm2) to avoid complications associated with poorly resolved objects near the limit of

367resolution within our data.

368 The output of the "Image Analysis" module (Table 3 and Figure 1) reveals that

369the sample has a vesicularity of 65.48% and—excluding objects in contact with the

370periphery of the image—there are 537 objects larger than the detection threshold of

37176.2 μm2. Assuming the properties of an equal area circle, the vesicle diameters range

15 15
372from 19.21 μm to 2.85 × 103 μm, with a mean of 3.01 × 102 μm ±6.89 × 102 (at 1 σ).

373The ratio of equal area circle circumference to object perimeter is 0.77 ±1.53 (at 1 σ);

374however; for the 21 vesicles larger than 3.10 × 105 μm2 the ratio decreases to 0.55

375±0.21, which indicates that the largest vesicles strongly deviate from a circular

376geometry (see the graph of “Objects Perimeters”). The vesicles have a mean

377eccentricity of 0.71 ± 0.18 and a mean orientation trending 7.27° to 187.27° ±43.91°

378(n.b., 90° points towards the top of the image; see “Object Eccentricity” and “Object

379Orientations” in Figure 1).

380 The NN module results (Table 4 and Figure 2) show that NN distances

381between vesicle centroids range from 28.05 to 1.53 ×103 μm, with a mean (ra) of 1.75

382× 102 ±1.15 ×102 (see “Nearest Neighbor Frequency Distribution” in Figure 2). The

383sample has R and c values within the 2 σ thresholds of significance (Figure 2) and

384thus we conclude that the vesicle centroids exhibit a Poisson (random) distribution.

385We have also analyzed the sample using a Normalized Poisson threshold of 5 pixels

386(19.24 μm), which corresponds to the diameter of a circle that is equal in area to our

387object area threshold of 20 pixels. Given this threshold, the sample exhibits a repelled

388distribution with R and c values greater than their respective upper 2 σ limits. The

389Scavenged k = 1 and 2 cases overestimate re relative to ra, and thus result in clustered

390results relative to both Scavenged models.

391 The GIAS outputs suggest that vesicles within this sample exhibit an overall

392Poisson distribution despite the presence of a vesicle fabric trending 7.27° to 187.27°

393±43.91°. The random distribution of vesicles supports the usage of a stereological

394transformation, however, deformation and coalescence have caused vesicles >3.10 ×

395105 μm2 to strongly deviate from a circular geometry. Consequently, if the vesicles

396represented within this thin-section were transformed into a 3D distribution using

16 16
397stereology, it would be necessary to acknowledge propagating errors associated with

398the non-circular geometry of the largest vesicles in the sample. Additionally, it is

399important to recognize the resolution limitations associated with analysis because we

400have applied a minimum object area threshold of 20 pixels, which excludes all

401vesicles <76.2 μm2 (i.e., 630 of 1167 objects). A detection threshold of 20 pixels is

402appropriate for studies of vesicle size-frequency distributions because an

403misclassification of 1 pixel results in an error of 5% of the estimate of total object area

404(Lucia Gurioli, personal communication, 2009). Nevertheless, this threshold appears

405insufficient for filtering artifacts associated with the geometric properties of the

406smallest vesicles in the input image. For instance, eccentricity values calculated for

407vesicles within the Laki magmatic tephra sample appear to be very high (0.71 ± 0.18),

408but this is a resolution-induced artifact resulting from pixel-scale asymmetries in a

409large population of small vesicles. Resolution limitations also affect the Normalized

410Poisson NN distribution because the large number of objects below the detection

411thresholds leads to underestimates of ρ0 and re, which in turn results in a repelled

412distribution relative to the model. In instances where the majority of objects are below

413the detection limit of analysis, it is therefore imperative to use nested image datasets,

414and/or high magnification mosaics to resolve the properties of the smallest objects in

415a given distribution.

416

4176. Discussion

418For randomly distributed spherical objects, stereological techniques may be used to

419convert cross-sectional areas into volumes (Sahagian and Proussevitch, 1998).

420Higgins (2000) and Morgan and Jerram (2007) have also applied 2D to 3D

421transformations for crystal shapes such as cubes and prisms based upon a large

17 17
422number of observed cross-sectional areas. Nevertheless, for irregular object

423geometries, transformations do not exist for generating an accurate 3D model from a

424single 2D image (Mock and Jerram, 2005; Sable et al., 2006). For vesicle studies,

425direct measurements of 3D vesicle distributions using X-ray microtomography can be

426used to circumvent the problems associated with stereological transformations (e.g.,

427Gualda and Rivers, 2006; Polacci et al., 2007; Proussevitch et al., 2007b), but the

428resolution of synchrotron-based X-ray tomography is insufficient to resolve micron-

429sized vesicle walls, which can result in erroneously large estimates of vesicle

430permeability (Gurioli et al., 2008). Tomographic resolution limitations similarly

431prohibit the detection of sub-micron vesicles, which decreases estimates of vesicle

432number densities relative to investigations of higher resolution SEM images (Gurioli

433et al., 2008). Consequently, even given the availability of 3D imaging techniques,

434thin-section analysis continues to supply unique information about vesiculation

435processes and, therefore, 2D image processing should continue to be explored.

436 Similarly, NN analysis of a single 2D image cannot resolve the spatial

437organization of the centroids in 3D, unless additional reference information is

438provided (e.g., Jerram et al., 1996; 2003; Jerram and Cheadle, 2000). Instead, 2D

439methods characterize the aerial distribution of objects within the plane of the image.

440When applying NN analyses to vesicle distributions in thin-sections, it is important to

441remember that object centroids are unlikely to coincide with the center of a vesicle in

4423D, especially when vesicles are irregularly shaped. To obtain a more meaningful

443statistical description of vesicle organization it may be necessary to analyze several

444photomicrographs—preferably three mutually perpendicular thin-sections that sample

445the principal component directions of the vesicle fabric. Vertical offsets in the position

446of objects in orthometric aerial photographs and satellite imagery may similarly

18 18
447introduce discrepancies between projected object separations and true NN distances,

448but the vertical offsets are generally small relative to the separation of objects in the

449plane of the image and hence the differences can be ignored.

450

4517. Conclusions

452We have developed Geological Image Analysis Software (GIAS) for calculating the

453geometric properties of objects and quantifying their spatial distribution. The program

454improves the efficiency and reproducibility of geological object characterizations by

455combining image processing tools for calculating object areas, abundance, radii,

456perimeters, eccentricity, orientation, and centroid location with sample-size-dependent

457NN tests for (1) Poisson; (2) Normalized Poisson; (3) Scavenged, k = 1; (4) and

458Scavenged k = 2 distributions. These geospatial analyses have not been implemented

459in other software packages and, consequently, GIAS represents a novel approach to

460characterizing vesicle distributions in thin-sections and examining the spatial

461organization of other geological features in datasets ranging from photomicrographs

462(e.g., Section 5) to remote sensing imagery of planetary surfaces (e.g., Appendix 1

463and Hamilton et al., 2010b).

464 Future work will focus on developing the following aspects of GIAS: (1)

465image segmentation using Kriging analysis of DN histograms (e.g., Oh and Lindquist,

4661999); (2) automated object recognition (e.g., Proussevitch and Sahagian, 2001;

467Lindquist et al., 1996; Shin et al, 2005); (3) geometric cluster analysis (e.g., Still et

468al., 2004); (4) statistical analyses to address truncated and multimodal populations

469(e.g., Proussevitch et al., 2007a); (5) processing nested image datasets (e.g., Gurioli et

470al., 2008); (6) incorporating stereological techniques for converting 2D data into 3D

471distributions and then using the 3D models to calculate vesicle number density (e.g.,

19 19
472Sahagian and Proussevitch, 1998; Proussevitch et al., 2007a); (7) developing NN

473analyses for 3D datasets (Jerram and Cheadle, 2000; Mock and Jerram, 2005); and (8)

474treatment of solid object interactions on NN statistics (e.g., Jerram et al.; 2003).

475

476Acknowledgements

477We thank Alexander Proussevitch, Steve Baloga, Sarah Fagents, Bruce Houghton,

478Lucia Gurioli, and Mark Naylor for insightful discussions relating to vesicle

479distributions and NN simulations; Emma Passmore for kindly providing the

480geological thin-sections examined within this study; and Dougal Jerram and

481Guilherme Gualda for their thorough reviews, which have greatly improved this

482manuscript. CDB wishes to acknowledge the assistance of Sue Rigby in providing

483funding for software purchases. CDB was funded under NERC studentship award

484NER/S/J/2005/13496. CWH is funded by the Icelandic Centre for Research

485(RANNÍS) and the NASA Mars Data Analysis Program (MDAP) grant

486NNG05GQ39G. SOEST publication number: 7808. HIGP publication number 1802.

487

488

489References

490Adams, N.K., Houghton, B.F., Fagents, S.A., Hildreth, W., 2006. The transition from 

491explosive to effusive eruptive regime: The example of the 1912 Novarupta eruption, 

492Alaska. Geological Society of America Bulletin 118, 620–634. 

493doi:10.1130/B25768.1.

494

20 20
495Baloga, S.M., Glaze, L.S., Bruno, B.C., 2007. Nearest-neighbor analysis of small

496features on Mars: Applications to tumuli and rootless cones. Journal of Geophysical

497Research 112, E03002, 1-17. doi:10.1029/2005JE002652.

498

499Bishop, M.A., 2008. Higher-order neighbor analysis of the Tartarus Colles cone

500groups, Mars: The application of geographical indices to the understanding of cone

501pattern evolution. Icarus 197, 73–83.

502

503Bleacher, J.E., Glaze, L.S., Greeley, R., Hauber E., Baloga, S.M., Sakimoto, S.E.H.,

504Williams, D.A., Glotch, T.D., 2009. Spatial and alignment analyses for a field of small

505volcanic vents south of Pavonis Mons and implications for the Tharsis province,

506Mars. Journal of Volcanology and Geothermal Research 185, 96-102.

507doi:10.1016/j.jvolgeores.2009.04.008.

508

509Blower, J.D., Keating, J.P., Mader, H.M., Phillips, J.C., 2001. Inferring

510volcanic degassing processes from vesicle size distributions.

511Geophysical Research Letters 28, 347–350.

512

513Blower, J.D., Keating, J.P., Mader, H.M., Phillips, J.C., 2003. The

514evolution of bubble size distributions in volcanic eruptions. Journal of

515Volcanology and Geothermal Research 120, 1–23.

516

517Bruno B.C., Fagents S.A., Thordarson T., Baloga S.M., Pilger E., 2004.

518Clustering within rootless cone groups on Iceland and Mars: Effect of

21 21
519nonrandom processes. Journal of Geophysical Research 109,

520E07009, 1-11. doi:10.1029/2004JE002273.

521

522Bruno, B.C., Fagents, S.A., Hamilton, C.W., Burr, D.M., Baloga, S.M.,

5232006. Identification of volcanic rootless cones, ice mounds, and

524impact craters on Earth and Mars: Using spatial distribution as a

525remote sensing tool. Journal of Geophysical Research 111, E06017,

5261-16. doi:10.1029/2005JE002510.

527

528Burr, D.M., Bruno, B.C., Lanagan, P.D., Glaze, L.S., Jaeger, W.L., Soare, R.J., Wan

529Bun Tseung, J.M., Skinner, J.A., Baloga, S.M., 2009. Mesoscale raised-rim

530depressions (MRRDs) on Earth: A review of the characteristics, processes and spatial

531distributions of analogs for Mars. Planetary and Space Science 57, 579-596.

532

533Cashman, K.V., Mangan, M.T., 1994. Physical aspects of magmatic degassing: II.

534Constraints on vesiculation processes from textural studies of eruption products. In:

535Carroll, M.R. and Holloway, J.R. (Eds.), Volatiles in Magmas. Reviews in

536Mineralogy, Mineralogical Society of America, pp. 447–478.

537

538Cashman, K.V., Kauahikaua, J.P., 1997. Reevaluation of vesicle

539distributions in basaltic lava flows. Geology 25, 419–422.

540

541Clark, P.J., Evans, F.C., 1954. Distance to nearest neighbor as a measure of spatial

542relationships in populations. Ecology 35, 445–453.

543

22 22
544Finney J., 1970. The geometry of random packing. Proceedings of the Royal Society

545of London, Series A 318, 479-493.

546

547Gaonac'h, H., Lovejoy, S., Schertzer, D., 2005. Scaling vesicle

548distributions and volcanic eruptions. Bulletin of Volcanology 67, 350–

549357.

550

551Gualda, G.A.R., Rivers, M., 2006. Quantitative 3D petrography using X-ray

552tomography: Application to Bishop Tuff pumice clasts. Journal of Volcanology and

553Geothermal Research 154, 48–62.

554Gurioli, L., Harris, A.J.L., Houghton, B.F., Polacci, M., Ripepe, M., 2008. Textural 

555and geophysical characterization of explosive basaltic activity at Villarrica volcano, 

556Journal of Geophysical Research 113, B08206, 1­16. doi:10.1029/2007JB005328.

557

558Hamilton, C.W., Thordarson, T., Fagents, S.A., 2010a, Explosive lava-water

559interactions I: architecture and emplacement chronology of volcanic rootless cone

560groups in the 1783-1784 Laki lava flow, Iceland. Bulletin of Volcanology, (in press)

561

562Hamilton, C.W., Fagents, S.A., Thordarson, T., 2010b, Explosive lava­water 

563interactions II: self­organization processes among volcanic rootless eruption sites in 

564the 1783­1784 Laki lava flow, Iceland. Bulletin of Volcanology, (in press).

565

23 23
566Higgins, M.D., 2000. Measurement of crystal size distributions. American

567Mineralogist 85, 1105-1116.

568

569Jerram, D.A., Cheadle, M.J., Hunter, R.H., Elliot, M.T., 1996. The spatial distribution

570of crystals in rocks. Contributions to Mineralogy and Petrology 125(1), 60-74.

571

572Jerram, D. A., Cheadle, M. J., 2000. On the cluster analysis of rocks. American

573Mineralogist 84, 47-67.

574

575Jerram, D.A., Cheadle, M.J., Philpotts, A.R., 2003. Quantifying the building blocks of

576igneous rocks: are clustered crystal frameworks the foundation? Journal of Petrology

57744, 2033-2051. doi:10.1093/petrology/egg069.

578

579Lautze, N.C., Houghton, B.F., 2005. Physical mingling of magma and complex

580eruption dynamics in the shallow conduit at Stromboli volcano, Italy. Geology 33

581425–428.

582

583Lindquist, W.B., Lee, S-M., Coker, D.A., Jones, K.W., Spanne, P., 1996. Medial axis

584analysis of void structure in three-dimensional tomographic images of porous media.

585Journal of Geophysical Research 101, 8297–8310.

586

587Mangan, M.T., Cashman, K.V., 1996. The structure of basaltic scoria and reticulite

588and inferences for vesiculation, foam formation, and fragmentation in lava fountains.

589Journal of Volcanology and Geothermal Research 73, 1–18.

590

24 24
591Mangan, M.T., Cashman, K.V., Newman S., 1993. Vesiculation of basaltic magma

592during eruption, Geology 21, 157–160.

593

594Mangan, M.T., Mastin, M., Sisson, T., 2004. Gas evolution in eruptive conduits:

595combining insights from high temperature and pressure decompression experiments

596with steady-state flow modelling. Journal of Volcanology and Geothermal Research

597129, 23–36.

598

599Marsh, B.D., 1988. Crystal size distributions (CSD) in rocks and the

600kinetics and dynamics of crystallization: 1. Theory. Contributions to

601Mineralogy and Petrology 99, 277–291.

602

603Mock, A., Jerram, DA., 2005. Crystal size distributions (CSD) in three dimensions:

604Insights from the 3D reconstruction of a highly porphyritic rhyolite. Journal of

605Petrology 46, 1525-1541.

606

607Morgan, D., Jerram, D.A., 2006. On estimating crystal shape for crystal size

608distribution analysis. Journal of Volcanology and Geothermal Research 154, 1-7.

609

610Oh W., Lindquist W., 1999. Image thresholding by indicator Kriging. IEEE

611Transactions on Pattern Analysis and Machine Intelligence 21, 590–602.

612

613Polacci, M., Baker, D.R., Bai, L., Mancini L., 2007. Large vesicles record pathways

614of degassing at basaltic volcanoes. Bulletin of Volcanology 70, 1023-1029. doi

61510.1007/s00445-007-0184-8.

25 25
616

617Polacci, M., Baker, D.R., Mancini, L., Tromba, G., Zanini, F., 2006.

618Three-dimensional investigation of volcanic textures by X-ray

619microtomography and implications for conduit processes.

620Geophysical Research Letters 33, L13312, 1-5.

621doi:10.1029/2006GL026241.

622

623Polacci, M., Papale, P., 1997. The evolution of lava flows from

624ephemeral vents at Mount Etna: insights from vesicle distribution

625and morphological studies. Journal of Volcanology and Geothermal

626Research 76, 1–17.

627

628Proussevitch, A., Sahagian, D., Tsentalovich, E.P., 2007a. Statistical analysis of

629bubble and crystal size distributions: Formulations and procedures. Journal of

630Volcanology and Geothermal Research 164, 95-111.

631

632Proussevitch, A., Sahagian, D., Carlson, W., 2007b. Statistical analysis of bubble and

633crystal size distributions: Application to Colorado Plateau Basalts. Journal of

634Volcanology and Geothermal Research 164, 112-126.

635

636Proussevitch, A., Sahagian, D., 2001. Recognition and separation of discrete objects

637within complex 3D voxelized structures. Computers & Geosciences 27, 441–454.

638

639Rasband, W.S., 2005. ImageJ, U. S. National Institutes of Health, Bethesda,

640Maryland, USA, http://rsb.info.nih.gov/ij/ [accessed November 28, 2009].

26 26
641

642Sahagian, D., Proussevitch, A., 1998. 3D particle size distributions from 2D

643observations: stereology for natural applications. Journal of Volcanology and

644Geothermal Research 84, 173–196.

645

646Sable, J., Houghton, B.F., Del Carlo, P., Coltelli, M., 2006. Changing conditions of

647magma ascent and fragmentation during the Etna 122 BC basaltic Plinian eruption:

648evidence from clast microtextures. Journal of Volcanology and Geothermal Research

649158, 333–354.

650

651Sarda, P., Graham, D., 1990. Mid-ocean ridge popping rocks:

652implication for degassing at ridge crests. Earth and Planetary

653Science Letters 97, 268–289.

655Shin, H., Lindquist, W.B., Sahagian, D.L., Song, S-R., 2005. Analysis of the vesicular

656structure of basalts. Computers & Geosciences 31, 473–487.

657

658Stephenson, G., 1992. Mathematical Methods for Science Students, 2nd edn.,

659Longman Scientific & Technical, London, 528pp.

660

661Still, S., Bialek, W., Bottou, L., 2004. Geometric clustering using the Information

662Bottleneck method. In: Thrun, S., Saul, L., and Schölkopf, B. (Eds.), Advances in

663Neural Information Processing Systems, 16, MIT Press, Cambridge, MA, pp. 1165-

6641172.

665

27 27
666Szramek, L., Gardner, J.E., Larsen, J., 2006. Degassing and microlite crystallization

667of basaltic andesite magma erupting at Arenal Volcano, Costa Rica. Journal of

668Volcanology and Geothermal Research 157, 182–201.

669

670Wilkins, D.E., Ford, R.L., 2007. Nearest neighbor methods applied to dune field

671organization: The Coral Pink Sand Dunes, Kane County, Utah, USA. Geomorphology

67283, 48-57.

28 28
673Appendix 1: Additional nearest neighbor analysis examples

674Explosive interactions between lava and groundwater can generate volcanic rootless

675cones (VRCs). VRCs are well suited to nearest neighbor (NN) analyses because they

676occur in groups, which consist of a large number of landforms that share a common

677formation mechanism. Geological Image Analysis Software (GIAS) includes NN

678analysis tools that were used by Hamilton et al. (2010b) to investigate self-

679organization processes within VRC groups and quantitatively compare the spatial

680distribution of VRCs in the Laki lava flow in Iceland to analogous landforms in the

681Tartarus Colles Region of Mars.

682 Figure 1 presents examples of rootless eruption sites within the Hnúta and

683Hrossatungur groups of the Laki lava flow. In the field, Differential Global

684Positioning System (DGPS) measurements were used to delimit rootless crater floors

685and their centroids were assumed to represent the surface projection of underlying

686rootless eruption sites (Hamilton et al., 2010a, 2010b). Hamilton et al. (2010a) used

687tephrochronology and stratigraphic relationships to divide the complete study area

688(Figure 1A) into chronologically and spatially distinct groups, domains, and

689subdomains. Figure 1B provides an example of a VRC subdomain (Hnúta Subdomain

6901.1). Figures 1C and D show the convex hull boundaries of the complete study area

691and Subdomain 1.1 of the Hnúta Group, respectively. Figure 1 also shows the

692locations of the convex hull vertices (Nv) and interior rootless eruption sites (Ni).

693 Identification of internal boundaries within the Hnúta and Hrossatungur

694groups allowed Hamilton et al. (2010b) to examine the effects of scale and resource

695availability on the spatial distribution explosive lava-water interactions. Relative to

696the Poisson NN model, rootless eruption sites in both the complete study area (Ni =

6972193) and Hnúta Subdomain 1.1 (Ni = 98) have |c| and |R| values that exceed their

29 29
698respective sample-size-dependent limits of confidence at 2 σ. Consequently, neither of

699these distributions are constant with a Poisson random model. Within the complete

700study area, R is 0.72, which is lower than the 2 σ confidence limit of R (0.98) and

701implies that the distribution is clustered with respect to the Poisson NN model. In

702Hnúta Subdomain 1.1, R is 1.23, which is above the upper 2 σ confidence limit of

7031.16 and implies that the distribution is repelled relative to a Poisson NN model.

704Hamilton et al. (2010b) interpret these patterns of spatial distribution within the

705context of resource abundance and depletion though competitive utilization. On

706regional scales, VRCs cluster in locations that contain sufficient lava and groundwater

707to initiate rootless eruptions, but on local scales rootless eruption sites tend to self-

708organize into distributions that maximize the utilization of limited groundwater

709resources. Hamilton et al. (2010b) interpret topography to be the primary influence on

710regional clustering because it would control initial groundwater abundance and

711regions of lava inundation. In contrast, they attribute local repelling to the non-

712initiation, or early cessation, of rootless eruptions at unfavorable localities due to

713groundwater depletion in a competitive system that is analogous to an aquifer

714perforated by a network of extraction wells.

715 Baloga et al. (2007) showed that rootless eruption sites in the Tartarus Colles

716Region of Mars have R = 1.22 and c = 6.14. These NN statistics are similar to those

717observed within Hnúta Subdomain 1.1 (R = 1.23 and c = 4.45), which suggests that a

718similar formation process in both environments. Thus the statistical NN analysis tools

719provided within GIAS can be combined with field observations and remote sensing to

720quantify patterns of spatial distribution and infer underlying mechanisms of formation

721within a diverse range of geological systems.

722

30 30
723Appendix 1 Figure 1:

724A and B depict rootless eruption sites (red circles) superimposed on a 0.5 m/pixel

725color aerial photograph showing 1783-1784 Laki eruption lava flows in Iceland. A:

726Full extent of Hnúta and Hrossatungur groups and an inset (white rectangle) denoting

727Hnúta Subdomain 1.1 (shown in B). C and D correspond to identical regions shown in

728A and B, respectively, and provide examples of convex hull boundaries, rootless

729eruption sites that form convex hull vertices (Nv), and interior points (Ni).

730

731Appendix 1 Figure 1

31 31
732Appendix 2: Derivation for the Scavenged Nearest Neighbor (k = 2) case

733The probability of finding exactly k points within a circle of radius r on a plane can be

734found using the Poisson distribution:

( 0 r 2 ) k 0 r 2
735 P(r , k )  e , [0]
k!

736where ρ0 is the spatial density—defined by the number of objects (N) divided by the

737area of the feature field (A). If the process of creating a point on the plane consumes

738(or “scavenges”), resources that would have formed the kth nearest neighbor, then the

739process can be modelled as (Baloga et al., 2007):

740 P( r ; k scavenges)  1  P (0)  P (1)   P (k ) .

741 [0]

742For k = 2, the density distribution of the function can be written as:


2
743 p(r )  ( 0 ) 3 r 5 e 0 r . [0]

744For convenience, we write    0 , leading to:


2
745 p ( r )   3 r 5 e  r . [0]

746To find the mean of this distribution, we multiply by r, integrate from 0 to ∞, and

747normalise the resulting equation, to obtain first moment of the distribution r :

748

 3  r  r 5  e r dr
2

749 r  0
 . [0]
r
r 2
5
e dr
0

750

751 The integrals can be solved using integration by parts and two mathematical

752identities for the odd and even terms of r (e.g., Stevenson, 1992). For the even terms,

753the following identity can be used,

754

32 32
755

1.3.5 (2n  1) 
r
2
756 2n
 e r dr  , [0]
2 n 1  n  2
1

757while odd integrals of the type,


r
2

758
2 n 1
 e r dr [0]
0

759can be solved using the substitution x2 = u and solving using integration by parts,

760e.g.,
 
1
 r  e dr   ue
2
3 r u
761 2
du . [0]
0 0

762This leads to the following solution for the mean distribution of the nearest neighbour

763distances for k = 2:

1 3  5  
7
2    15    0  15
4 3 3
764 2 . [0]
r 
1 3 7 7 1 16 0
 16 0 2 2

765The second moment r2 can be calculated in a similar manner:

r
2
2
 r 2  r 3  e r dr
3
766 r2  0
 . [0]

 0
 r  e dr
2
5 r

767 The standard deviation is calculated using the identity:  :


2
 r2  r

2
3 15 0.27572
768    2  . [0]
 0 16  0 0

769Thus, the expected Standard Error (σe) is:

0.27572
770 e  , [0]
N  0

33 33
771where N is the number of points in the area of interest. The third and fourth moments

772can be derived as above, leading to:

773

r
2
3
 r 5  e r dr
105
774 r3  0

 3 ; [0]
32 0 2
r
r 2
5
e dr
0

r
2
4
 r 5  e r dr
12
775 r4  0
 . [14]

( 0 ) 2
 r  e dr
2
5 r

776Using the moment formulae for skewness and kurtosis:


3 2 4
777 skew  r 3 3 r r2 2 r ; kurt  r 4 4 r r3 6 r r2 3 r ,

778 [15]

779 we obtain:

0.006663 0.0174838
780 skew  3 ; kurt  . [16; 17]
 4 0
2
 0
3 2

34 34
781Appendix 3: Separation of population means

782To examine the approximate number of samples required to find a statistically

783significant separation of two Gaussian distributions, where the expected means and

784standard deviations are known, a commonly-used test is to examine if zero lies within

785the following confidence interval:

786

     
787 P s 0  s1  1.96 0  1   0  1  s 0  s 1  1.96 0  1   0.95 , [1]
 N N N N 

788where s0 and s1 are the sample means, μ0 and μ1 are the expected means and σ0 and σ1

789are the expected standard deviations. N can be varied to estimate the number of

790samples required to separate the distributions. In particular, if zero is within the

791interval, then there is no discernible difference between the means of the distributions

792at the 95% confidence level.

793 This method is applied to simulate two different means based upon the

794expected values of the Poisson and the Scavenged NN when k = 1 and k = 2. To

795simplify the problem, an approximation to a normal distribution is assumed.

796 Simulation from two normal distributions with increasing numbers of N was

797coded in MATLAB. We found the number of samples required to statistically

798differentiate between the distributions by varying ρ0.

799 For PNN and Scavenged NN, k = 1:

800
ρ0 Minimum N required to separate
(objects / unit area) Poisson NN and Scavenged NN (k = 1)
0.5 50
0.05 60
0.005 60
0.0005 60
801

802For Scavenged NN, k = 1 and Scavenged NN, k = 2:

35 35
ρ0 Minimum N required to separate
(objects / unit area) Poisson NN and Scavenged NN (k = 2)
0.5 100
0.05 120
0.005 130
0.0005 150
803

804 Given that these simulations assume Normal distributions, it would be wise to

805regard them as indicative of the minimum number of data points required to separate

806the competing models. A much larger number of samples should be used separate the

807Scavenged NN, k = 1 and k = 2 models because their expected means are relatively

808close together.

36 36
809Electronic Supplements

810(1) MATLAB preparsed pseudo code and standalone MATLAB executable file,

811available to the public at http://www.geoanalysis.org.

812(2) GIAS help file

813

814Figures and Tables

815Tables

816Table 1: Notation.

817

818Table 2: Nearest neighbor model properties (after Baloga et al., 2007).

819

820Table 3: “Image Analysis” module: summary of results.

821

822Table 4: “Nearest Neighbor Analysis” module: summary of results.

823

824Figures

825Figure 1: Image analysis program in “Image Analysis” mode, with maximum object

826areas truncated to ~104 pixels for visualization purposes which excludes very large

827(and least circular) vesicles from displayed plots and histograms. Note that object

828thresholding does not affect results presented in Tables 3 and 4.

829

830Figure 2: Image analysis program in “Nearest Neighbor Analysis” mode with outputs

831displaying results for a Poisson NN test.

37 37
832Table 1.

Variable Definition
A Area of a feature field
Ahull Area of the convex hull
C Statistic for comparing observed (actual) to expected mean NN distances
k Poisson index
N Number of features within a sample population
Ni Number of features within the convex hull
Nv Number of features forming the vertices of the convex hull
NN Abbreviation for nearest neighbor
P Probability
R Statistic for comparing actual (ra) to expected (re) mean NN distances
R Radial distance
ro Threshold distance used to calculate Normalized Poisson NN distributions
ra Mean NN distance observed in the data
re Mean NN distance calculated from a given NN model (e.g., Poisson)
P Probability density
ρ0 Population (spatial) density of the input features (ρ0 = N / A)
ρi Population (spatial) density of the input features (ρi = Ni / Ahull)
Σ Standard deviation
σe Standard error
833

38 38
834Table 2.

835
Statistic Poisson Scavenge k = 1 Scavenge k = 2 Normalized NN Logistic
Probability 2 0 re  0r
2 2
2(0 ) 2 r 3e  0r 2(0 ) 3 r 5 e  0r
2 2 2
e( r  m) b
Density (P)
20 re   0 ( r0 r )

b[1  e  ( r  m ) b ]2
Mean (re) 1 3 15 2 0  m
 r e 0 dr
2
2  r
2 0 4 0 16  0 2
e  0 r0 r0

Mode (rmode) 1 3 5 1 m
2 0 2 0 2 0 2 0
Standard Error 0.26136 0.272249 0.27572 r02  r 2  1  0 N b
(σe) N 0 N 0 N 0 3
Skewness (  3) 0.008187 0.0066638 1  2  3
r (r0  3r )  r 

 2r 2 
0
3  0
3 3 3    20 
4 3 0 2  30 2  30 2

Kurtosis 2  3 2 16 0.016807 0.0174838 1  2 2 2  3  6


 r  r  4 r r  6 r 2
   3r 4

 4 ( 0 ) 2  4 0 2
 4 0 2
 4  
0 0 0
0   0  2  5

39 39
836Table 3.

Image Analysis Magmatic vesicles


(base unit: μm)
Resolution (pixel size) 3.81 μm
Minimum object area 20 pixels = 76.2 μm2
Total number of objects 537
(excluding boundary objects)
Abundance (% of total) 65.48
Area (min. to max.) 2.90 ×10 to 6.40 ×106
2

Mean area (±1 σ) 7.11 ×104 (±3.73 ×105)


Eccentricity (min. to max.) 0.00 to 0.97
Mean eccentricity (±1 σ) 0.71 (±0.18)
Perimeter (min. to max.) 56.50 to 3.55 ×104
Mean perimeter (±1 σ) 8.64 ×102 (±2.29 ×103)
Orientation (min. to max.) -89.32 to 90.00
Mean orientation (±1 σ) 7.27 (±43.91)

40 40
837Table 4.

Nearest Neighbor (NN) Magmatic vesicles


Analysis (base unit: μm)
Input distribution properties
Resolution (pixel size) 3.81 μm
Minimum object area 20 pixels = 76.2 μm2
Objects inside convex hull (Ni) 520
Interior object density (ρi) 7.79 ×10-6
Range of NN distances 28.05 to 1.53 ×103
Mean NN distance (ra; ±1 σ) 1.75 ×102 (±1.15 ×102)
Skewness 2.48
Kurtosis 16.13

Poisson model
Expected mean NN distance (re) 1.79 ×102
R with thresholds at 2 σ 0.98 (0.97 and 1.07)
c with thresholds at 2 σ -0.91 (-1.33 and 2.83)
Conclusions (c value) Model supported (<2 σ)

Normalized Poisson model


Distance threshold 5 pixels = 19.05 μm
Expected mean NN distance (re) 1.50 ×102
R with thresholds at 2 σ 1.17 (0.97 and 1.07)
c with thresholds at 2 σ 3.51 (-1.34 and 2.88)
Conclusions (c value) Model rejected (>2 σ)
Repelled vs. Normalized
Scavenged (k = 1) model
Expected mean NN distance (re) 2.69 ×102
R with thresholds at 2 σ 0.65 (0.99 and 1.06)
c with thresholds at 2 σ -21.82 (-0.69 and 3.91)
Conclusions (c value) Model rejected (>2 σ)
Clustered vs. k = 1
Scavenged (k = 2) model
Expected mean NN distance (re) 3.36 ×102
R with thresholds at 2 σ 0.52 (1.00 and 1.06)
c with thresholds at 2 σ -37.05 (-0.14 and 4.78)
Conclusions (c value) Model rejected (>2 σ)
Clustered vs. k = 2
838

41 41
839Figure 1

840

42 42
841Figure 2

842

43 43

You might also like