Handling Large Acquisitions¶
The pyfor collections
module allows for efficient handling of large acquisitions with paralell
processing of user-define functions. This document presents a couple examples of large acquisition
processing.
Creating Project-Level Canopy Height Models¶
A CloudDataFrame
is the integral part of managing large point cloud acquisitions. It is initialized
by passing a directory that contains .las
or .laz
files:
import pyfor
my_collection = pyfor.collection.from_dir("my_collection_dir", n_jobs=3)
Here we have instantiated a collection such that the processing function will be done in parallel across three cores.
It will be useful to set the coordinate reference system for the project. This is done via the .crs
attribute:
import pyproj
crs = pyproj.proj({'init': 'epsg:26910'}).srs
my_collection.crs = crs
Here we must grapple with a few things. First, we will want to process buffered tiles to eliminate edge effects from normalization. This requires defining a few things for the collection.
First, we want to set the collection tiles such
that the output rasters will be consistent for a specified grid cell size. This is done by manipulating the internal
geometries stored in my_collection.tiles
. By default, these describe the .las/z
bounding boxes, but we want to
ensure the output rasters are exactly the correct size so that they line up correctly. We can do this easily with the
.retile_raster
helper function. This is done in place for the collection.
my_collection.retile_raster(0.5, 500, buffer=20)
The first argument is the desired resolution of the output raster. The second argument is the resolution of the new
tile sets. 500
means they will be processed in 500m x 500m chunks. Finally buffer=20
means that each tile will
be buffered by 20 meters, such that we can process a large buffered tile to eliminate edge effects.
Next, we want to define a function to process each buffered tile. This is where the flexibility of the collection enters in.
def my_process_func(buffered_cloud, tile):
buffer_dist = 20
buffered_cloud.normalize(1)
# Generate CHM of buffered cloud
chm = buffered_cloud.chm(0.5, interp_method="nearest", pit_filter="median")
# Define output bounding box (remove the buffered part)
coords = list(tile.exterior.coords)
bbox = coords[0][0] + buffer_dist, coords[2][0] - buffer_dist, coords[0][1] + buffer_dist, coords[1][
1] - buffer_dist
chm.force_extent(bbox)
# Make a readable name for this particular tile
flat_coords = [int(np.floor(coord)) for coord in bbox]
tile_str = '{}_{}_{}_{}'.format(*flat_coords)
# Write out the canopy height model
chm.write('{}.tif'.format(tile_str))
The above function is lengthy, but really quite simple. First, we normalize and create a canopy height model for some
given buffered point cloud. Then, we restrict its output to remove the buffer using .force_extent
. Finally, we write
out the canopy height model to file with a nicely formatted name.
Finally, we can execute the processing job with the following:
my_collection.par_apply(my_process_func)