Generate PDF using python, docx templating and libreoffice deployed on Scalingo

What I need to achieve ?

For one of my client project, I need to produce legal documents (named CERFA) filled with information entered by users. Those documents have a specific template including macro such as page numerotation, number of page, marging left on odd pages and right on even pages, borders, filigramme… To summarize, a complex layout.

Furthermore, I need to generate it in docx and in pdf format, depending of the usecase.

The project is written in Django (python) and is executed in a sovereign french PaaS : Scalingo

Generate a complex page setup document

To by pass setting-up the complex layout of the document by code, we decide to start with a template, written in docx format and we use the package docx-template to fill it with user infromations.

Docx-template allow you to execute Jinja2 template resolution in a docx file.

For example, you can write this kind of condition directly in your document:

Convention n° {% if convention.numero %}{{convention.numero}}{% else %}Le numéro de la convention sera défini et ajouté ici une fois la convention validée{% endif %}

Which will be resolved by :

Convention n°123-456-789

Convention n° Le numéro de la convention sera défini et ajouté ici une fois la convention validée

Docx-template allow also ti make iteration to fill a table or insert an image.

From here, we were able to generate the document in docx format as it is expected : we created the doc with all the expected layout and doc macros, use it as a template with Jinja2 syntax then we resolve it using image and data from our application.

However, we need to generate a PDF from this docx.

3. Convert it to PDF

3.1. Using API service

At the beginning, we use a service to generate a PDF from a docx file. It exists a lot of internet service which is able to do it. In our application we used ConvertAPI.

It works like a charm, however, our solution depends of this external service and we haven’t any debate when they decide to increase their price of 300%. It was still sustainable but it wasn’t the best choice for our independence, keep our solution sovereign and for the security of the document.

3.2. Using libreoffice headless

Then we made the migration, based on heroku buildpack which are compatible with Scalingo PaaS and we install libreoffice in our instance and execute a headless command to generate the pdf from docx file.

3.2.1. Installation

.buildpack

https://github.com/Scalingo/apt-buildpack
…
https://github.com/BlueTeaLondon/heroku-buildpack-libreoffice-for-heroku-18.git

AptFile : debian package to install

libsm6
libice6
libxinerama1
libdbus-glib-1-2
libharfbuzz0b
libharfbuzz-icu0
libx11-xcb1
libxcb1

We decide to store as env variable the path to libreoffice

LIBREOFFICE_EXEC = get_env_variable(
    "LIBREOFFICE_EXEC",
    default="/app/vendor/libreoffice/opt/libreoffice7.3/program/soffice",
)

3.2.2. Execution code

We just need to call a subprocess:

def run_pdf_convert_cmd(
    src_docx_path: Path, dst_pdf_path: Path
) -> subprocess.CompletedProcess:
    return subprocess.run(
        [
            settings.LIBREOFFICE_EXEC,
            "--headless",
            "--convert-to",
            "pdf:writer_pdf_Export",
            "--outdir",
            dst_pdf_path.parent,
            src_docx_path,
        ],
        check=True,
        capture_output=True,
    )

Here is the PR to migrate from ConvertAPI service to Libreoffice headless solution : https://github.com/MTES-MCT/apilos/pull/1202