PDAL + python = use official bindings? not so fast!
So you want to use PDAL to process point clouds and are also using python? Well for me this is a pretty common scenario as even when I only want to call pdal (without any additional process), I often end up using python. Using python in a jupyter notebook is a much better experience that just using the bash, as you have a proper scripting language and you can keep track of the commands outputs. In this scenario the first ideas that comes to my mind is using the official python bindings of PDAL https://github.com/PDAL/python, however htye have some important limitations that are not immediatly clear. An alternative is to use the cli interface of pdal and calling it using subprocess.run
Pro and Cons of python bindings
Cons:
- if there is an error in pdal it can crash the python interpreter! This is a pretty poor experience as you also don’t know why it crashed.
- you cannot interrupt the process using
ctrl+c(or kernel interrupt) as the pdal bindings block the python interpreter and doesn’t check for signals 1 .subprocess.rundoes handle signals properly and kills the process when you interrupt the python interpreter - it doesn’t support multiple processes/threads. In my experience it just crashes the python interpreter, making it pretty hard to run in parallel.
- some commands, like
pdal tindexare not available in the python bindings and need to use the cli anyway
Pros:
- can pass data from python to pdal without writing to disk
Solution
This is the function that I use to run pdal from python. I still use the pdal bindings to build the pipeline (i.e. Pipeline, Writer, Reader, Filter) as it is nicer than manually creating the json file. However it is the executed in a subprocess instead in the python process.
from pdal import Pipeline
import subprocess
def run_pdal(pipeline: Pipeline, pipe_name="pdal", args=[]):
with open(f"pipeline_{pipe_name}.json", "w") as f:
f.write(pipeline.toJSON())
cmd = ["pdal", "pipeline", f"pipeline_{pipe_name}.json"]
cmd.extend(args) # overwrites pipeline attrs, for example ["--writer.las.filename", "new_name"]
subprocess.run(cmd, check=True)Extra: You can easily run this in parallel with a progress bar by using tqdm.contrib.concurrent.thread_map
Conclusion
If your pdal pipeline doesn’t take data from python, don’t use the bindings but the cli interface
Footnotes
This is what you should do to properly handles signals inpybind11 https://pybind11.readthedocs.io/en/stable/faq.html#how-can-i-properly-handle-ctrl-c-in-long-running-functions↩︎