PDAL + python = use official bindings? not so fast!

when calling PDAL from python using the cli interface is often a better choice than using the python bindings
Author

Simone Massaro

Published

March 10, 2025

So you want to use PDAL to process point clouds and are also using python? Well for me this is a pretty common scenario as even when I only want to call pdal (without any additional process), I often end up using python. Using python in a jupyter notebook is a much better experience that just using the bash, as you have a proper scripting language and you can keep track of the commands outputs. In this scenario the first ideas that comes to my mind is using the official python bindings of PDAL https://github.com/PDAL/python, however htye have some important limitations that are not immediatly clear. An alternative is to use the cli interface of pdal and calling it using subprocess.run

Pro and Cons of python bindings

Cons:

  1. if there is an error in pdal it can crash the python interpreter! This is a pretty poor experience as you also don’t know why it crashed.
  2. you cannot interrupt the process using ctrl+c (or kernel interrupt) as the pdal bindings block the python interpreter and doesn’t check for signals 1 . subprocess.run does handle signals properly and kills the process when you interrupt the python interpreter
  3. it doesn’t support multiple processes/threads. In my experience it just crashes the python interpreter, making it pretty hard to run in parallel.
  4. some commands, like pdal tindex are not available in the python bindings and need to use the cli anyway

Pros:

  • can pass data from python to pdal without writing to disk

Solution

This is the function that I use to run pdal from python. I still use the pdal bindings to build the pipeline (i.e. Pipeline, Writer, Reader, Filter) as it is nicer than manually creating the json file. However it is the executed in a subprocess instead in the python process.

from pdal import Pipeline
import subprocess
def run_pdal(pipeline: Pipeline, pipe_name="pdal", args=[]):
    with open(f"pipeline_{pipe_name}.json", "w") as f:
        f.write(pipeline.toJSON())
    cmd = ["pdal", "pipeline", f"pipeline_{pipe_name}.json"]
    cmd.extend(args) # overwrites pipeline attrs, for example ["--writer.las.filename", "new_name"]
    subprocess.run(cmd, check=True)

Extra: You can easily run this in parallel with a progress bar by using tqdm.contrib.concurrent.thread_map

Conclusion

If your pdal pipeline doesn’t take data from python, don’t use the bindings but the cli interface

Footnotes

  1. This is what you should do to properly handles signals inpybind11 https://pybind11.readthedocs.io/en/stable/faq.html#how-can-i-properly-handle-ctrl-c-in-long-running-functions↩︎