lacunr
lacunr
is a package designed to measure the lacunarity
of LiDAR data and other 3-dimensional spatial structures. This vignette
provides a brief introduction to the concept of lacunarity, as well as a
breakdown of the expected standard workflow when using this package.
Lacunarity — from the Latin lacuna meaning “lake” or “gap” — was introduced by Benoit Mandelbrot (1982) to describe the space-filling properties of fractal patterns. It was later formalized by Gefen and colleagues (1983) as a measure of translational invariance — put simply, how much do the various subregions of a spatial pattern differ from one another? The translational invariance depends both on the heterogeneity or “clumpiness” of the pattern, and also the scale at which it is measured.
Lacunarity captures this scale-dependent variation using a gliding-box algorithm (Allain and Cloitre 1991). For a given scale \(r\), the spatial pattern is divided up into a grid of overlapping \(r\)-sized boxes. The total amount or “mass” of occupied space within each box is tallied, and then the lacunarity at that scale, \(\Lambda(r)\), is simply the ratio of the variance of the box masses and their squared mean:
\[\Lambda(r) = \displaystyle \frac{\mathrm{var}(masses_{r})}{\mathrm{ mean}(masses_{r})^{2}} + 1\]
These \(\Lambda(r)\) values are typically calculated at a range of box sizes, and the result plotted as a lacunarity curve. The shape of the curve, particularly its rate of decay toward translational invariance, will depend on the heterogenity of the spatial pattern.
To gain an intuition for how lacunarity changes across scales and spatial patterns, let’s create four example datasets with varying properties. These will all take the form of 3-dimensional binary maps — arrays containing only values of 1 (for occupied space) or 0 (for empty space) — and all of them will contain exactly the same proportion of occupied space (50%).
The first pattern is a perfectly uniform field of ones and zeroes, like a checkerboard. We can imagine this representing a maximally homogeneous distribution (Feagin 2005).
# 16*16*16 uniform array
uniform <- array(data = c(rep(c(rep(c(1,0),8), rep(c(0,1),8)),8),
rep(c(rep(c(0,1),8), rep(c(1,0),8)),8)),
dim = c(16,16,16))
The second is a perfectly segregated array, with one half filled with ones and the other half with zeroes. We might think of this as a maximally heterogeneous distribution:
# 16*16*16 segregated array
segregated <- array(data = c(rep(1, 2048), rep(0, 2048)),
dim = c(16,16,16))
The third array has been filled by drawing at random, without replacement, from a pool of 50% ones and 50% zeroes:
# 16*16*16 random array
set.seed(245)
random <- array(data = sample(c(rep(1,2048), rep(0,2048)), 4096, replace = FALSE),
dim = c(16,16,16))
The fourth is once again drawn from an equal pool, but this time, there is a much higher probability of drawing a 1 than a 0, with the end result being that most of the ones will end up in the front of the array, and most zeroes in the back:
# 16*16*16 gradient array
set.seed(245)
gradient <- array(data = sample(c(rep(1,2048), rep(0,2048)), 4096, replace = FALSE,
prob = c(rep(0.9,2048), rep(0.1,2048))),
dim = c(16,16,16))
We can plot cross-sections of these arrays to get a clearer sense of how their spatial patterns differ. The plots below show occupied cells as black and empty ones as white:
par(mfrow = c(1, 4), mar = c(0.5,0.5,0.5,0.5), bg = "gray90")
image(t(uniform[1,,]),col = c("white","black"),axes = FALSE, asp = 1)
image(t(segregated[1,,]),col = c("white","black"),axes = FALSE, asp = 1)
image(t(random[1,,]),col = c("white","black"),axes = FALSE, asp = 1)
image(t(gradient[1,,]),col = c("white","black"),axes = FALSE, asp = 1)
Keep in mind that the arrays themselves are 3 dimensional — these
figures merely show a 2D slice of each of them. We can see very clearly
that each spatial pattern is quite different, but lacunr
offers a means of quantifying those differences. This is accomplished
using the lacunarity()
function:
# calculate lacunarity at all box sizes for each array
lac_unif <- lacunarity(uniform, box_sizes = "all")
lac_segregated <- lacunarity(segregated, box_sizes = "all")
lac_random <- lacunarity(random, box_sizes = "all")
lac_grad <- lacunarity(gradient, box_sizes = "all")
lacunarity()
returns a data.frame
containing \(\Lambda(r)\) values at the
desired box sizes. These can be plotted using one of
lacunr
’s convenience plotting functions:
# plot all four lacunarity curves
lac_plot(lac_segregated, lac_grad, lac_random, lac_unif,
group_names = c("Segregated","Gradient","Random","Uniform"))
There are several key takeaways from this figure:
Lacunarity curves thus give us the ability to quantify both how heterogeneous a given spatial pattern is, and also at what range of spatial scales it remains heterogeneous.
Now let’s use lacunr
to analyze some real LiDAR data.
The package comes with an example dataset, glassfire
, which
contains point cloud data from two scans of a 24*24m forest plot in
Northern California. These scans cover the exact same section of oak
forest, but they were taken before and after a major disturbance in the
form of a wildfire.
We can see the impact this has had on the forest stand by viewing the point cloud from above:
library(ggplot2)
# plot point cloud data at each time point
plot <- ggplot(data = glassfire, aes(x = X, y = Y)) +
geom_raster(aes(fill = Z)) +
facet_grid(cols = vars(Year)) +
scale_fill_viridis_c(option = "plasma") +
theme(panel.grid = element_blank(),
panel.background = element_rect(fill = "black"),
aspect.ratio = 1)
print(plot)
Even when represented in two dimensions, the change in the canopy structure is obvious — one of the trees has burned down. We might guess intuitively that this new gap has made the forest stand more heterogeneous. But calculating the change in lacunarity will give us a quantitative measure of that heterogeneity.
voxelize()
point cloudsThe first task when working with LiDAR data is to convert the raw point cloud into a grid of binned values. These bins are known as voxels — the 3D analogue to pixels in a 2D raster image. Voxelization is necessary because it controls for variations in point density. In most LiDAR scans, objects that are closer to the scanner will have a denser coating of points. Using the raw number of points to measure occupancy will unfairly weight regions of the point cloud that were close to the scanner. So we use voxelization to bin the point cloud into an even grid of occupied space.
In lacunr
, this step is performed using the
voxelize()
function. For this example, we will voxelize the
point data for each time point at a resolution of 0.5m.
# voxelize the pre-fire point cloud
voxpre <- voxelize(glassfire[glassfire$Year == "2020",], edge_length = c(0.5,0.5,0.5))
# voxelize the post-fire point cloud
voxpost <- voxelize(glassfire[glassfire$Year == "2021",], edge_length = c(0.5,0.5,0.5))
Now we have a data.table
for each time point containing
both the coordinates of the voxels and the number of points binned
within them.
bounding_box()
Next we need to take the voxel data and turn them into a pair of 3-dimensional arrays like in the example above. Processing the data this way accomplishes two things:
We can create these arrays using the bounding_box()
function. This comes with the optional argument threshold
,
which we can use to filter out any voxels with an extremely low number
of points. For instance, we can decide in this example that voxels with
only one point shouldn’t really be counted as occupied space. We can
create the arrays like so:
pad_array()
If we examine the dimensions of the resulting arrays, we will notice a problem:
The dimensions are not quite the same. Specifically, the third
dimension — which in lacunr
represents the vertical axis —
is one value shorter in the pre-fire array than it is in the post-fire
array.
Why did this happen? It’s because bounding_box()
uses
the largest and smallest spatial coordinates in the voxel data to
determine the size of the resulting array. It looks like the canopy in
the second time point has gotten slightly taller, just enough to be
binned into an extra layer of voxels.
But this presents a problem if we want to compare the lacunarity
values from these two time points. Remember, this is exactly the same
stand of forest, so ideally we want lacunarity()
to explore
the same volume of space with each scan. The simplest way to accomplish
this is to add an additional layer of empty space to the top of the
pre-fire array so it matches the post-fire one.
We can do this with the pad_array()
function, which will
add as much extra space as we want to any side of the array:
# pad the top of the pre-fire array with one layer of empty space
boxpre <- pad_array(boxpre, z = 1)
Now when we compare the two arrays, their dimensions match exactly.
We now have what we need to measure lacunarity for both time points.
It is a simple as calling the lacunarity()
function like we
did with the synthetic data:
Then we can plot the lacunarity curves and examine the difference. However, recall from the previous example that the value of \(\Lambda(1)\) is dependent on the proportion of occupied space. A quick check of our pre- and post-fire arrays shows that they have very different occupancy values:
This means that each lacunarity curve will start at a different Y value, which makes it harder to compare the curves to one another. Luckily, we can factor out the effect of occupancy by using normalized lacunarity (Plotnick et al. 1996). This is typically calculated by log transforming the raw lacunarity values and dividing them by the lacunarity at the smallest box size (or, in mathematical terms, \(\log\Lambda(r)/\log\Lambda(1)\)). Normalized lacunarity thus always starts at 1 and decays to 0.
lacunarity()
automatically calculates normalized
lacunarity, and we can plot it using lacnorm_plot()
:
# plot normalized lacunarity pre- and post-fire
lacnorm_plot(lac_pre, lac_post, group_names = c("Pre-fire", "Post-fire"))
Note the values on the x axis are in terms of the voxel number. To get the box sizes in meters, simply multiply the values by the voxel resolution, which in our case is 0.5.
The plot shows that heterogeneity has increased substantially after the fire, with the curve remaining well above the pre-fire curve until around box size 32 (or 16 meters). This aligns with what we would expect from a major disturbance like a whole tree burning down. We might imagine a forest stand in which only the understory burned, in which case the post-fire curve might only remain taller for box sizes up to, say, 5 meters instead of 16.
In addition to raw and normalized lacunarity curves,
lacunarity()
also calculates a metric called \(\mathrm{H}(r)\). \(\mathrm{H}(r)\) is a transformed version of
normalized lacunarity originally introduced by Feagin (2003). Rather than measuring a spatial
pattern’s decay toward translational invariance, \(\mathrm{H}(r)\) instead measures its
deviation from standard Brownian motion. The letter \(\mathrm{H}\) is a reference to the
Hurst exponent, a metric used to quantify the autocorrelation
of neighboring samples. It is commonly used for time series but in fact
has general applications to probability theory as a component of
fractional Brownian motion (Mandelbrot
and Van Ness 1968).
We can plot the \(\mathrm{H}(r)\)
curves for our pre- and post-fire data using the function
hr_plot()
:
The Hurst exponent can range between 0 and 1, with 0.5 corresponding to standard Brownian decay, where any two regions of the spatial pattern tend to have no correlation with each other. Values of \(\mathrm{H}(r)\) greater than 0.5 indicate that neighboring spatial values are autocorrelated/persistent — local variations tend to remain local — while values less than 0.5 mean that neighboring values are anticorrelated/anti-persistent — local variations tend to repeat across the whole spatial pattern.
We can see in the above plot that the post-fire data is more autocorrelated up to a little past box size 16, while the pre-fire data remains close to or slightly below the neutral zone of 0.5 for all box sizes. \(\mathrm{H}(r)\) thus enables us not only to compare spatial patterns against one another, but also measure them against a theoretical spatial distribution that is neither heterogeneous nor homogeneous.