Handling CI/CD in a Mono Repo With Multiple Python Packages

In this article i'll share how I handheld the CI/CD in a mono repo with multiple python packages using GitHub Actions.

Managing a monorepo with multiple Python packages presents unique challenges, particularly in ensuring efficient CI/CD. In this article, I will share how I created CI/CD workflows for our monorepo of Python packages using GitHub Actions.

We will delve into the workflows, including matrix complexity, and scripts for checking changed paths, alongside other custom scripts and composite actions that facilitate this process.

Our primary CI workflow (on-pr-build-push.yaml) triggers on pull requests, ensuring that all changes are validated before merging. This workflow comprises several jobs: ChangedPaths, Lint, Typecheck, Test, and Build-Push.

Job: ChangedPaths

This job determines which packages have been modified and outputs a matrix that drives the subsequent jobs. So for example, if a PR modifies package1 and package2, the matrix will contain two rows: package1 and package2.

name: Build & Push Python Package

on:
  pull_request:

jobs:
  ChangedPaths:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.get_changed_paths.outputs.matrix }}
    timeout-minutes: 10
    permissions:
      id-token: write
      contents: read
    
    steps:
      - name: Checkout
        uses: actions/checkout@v3
        with:
          token: ${{ secrets.GLOBAL_GITHUB_TOKEN }}
          fetch-depth: 0

      - name: Run init
        uses: ./.github/actions/init # Custom composite action for initializing the environment
      
      - name: Get Changed Paths
        id: get_changed_paths
        shell: bash
        run: |
          python $GITHUB_WORKSPACE/.github/scripts/internal/get_changed_paths.py

The get_changed_paths.py python script identifies the changed paths and filters them to include only those within the packages directory. This script is quite straightforward and uses the git diff command by using the built-in subprocess module in order to identify the changed paths.

import subprocess
import os

output_file = os.getenv('GITHUB_OUTPUT')


def get_changed_paths():
    cmd = 'git diff --name-only $(git rev-parse HEAD) HEAD~1'
    result = subprocess.run(cmd, shell=True, capture_output=True)
    if result.returncode != 0:
        raise Exception("Error getting changed paths")

    changed_paths = result.stdout.decode("utf-8").strip().split("\n")
    print(f"Changed paths: {changed_paths}")
    return changed_paths


def get_filtered_paths(changed_paths):
    filtered_paths = set()
    for path in changed_paths:
        if path.startswith("packages/"):
            filtered_paths.add(path.split("/")[1])
    print(f"Filtered paths: {filtered_paths}")
    return list(filtered_paths)


if __name__ == "__main__":
    changed_paths = get_changed_paths()
    filtered_paths = get_filtered_paths(changed_paths)
    print(filtered_paths)
    with open(output_file, "a") as file:
        file.write(f"matrix={filtered_paths}")

Note the capture_output=True argument allows us to capture the command's output. We then decode and split the output to get a list of changed paths. The get_filtered_paths function filters these paths to include only those that start with packages/ which is my path in my repos case.

Also note that GitHub Actions uses environment files to pass data between steps and jobs. By writing the output matrix to a file (GITHUB_OUTPUT), we can ensure that the list of changed paths is available as an output in the ChangedPaths job. This allows subsequent jobs to access and iterate over the paths efficiently.

Job: Lint, Typecheck, and Test

These jobs runs composite actions for linting (flake8), type checking (mypy), and tests (pytest) on the changed packages using a matrix strategy. They depend on the ChangedPaths job.

  Lint:
    needs: ChangedPaths
    if: ${{ needs.ChangedPaths.outputs.matrix != '[]' }}
    strategy:
      matrix: 
        path: ${{ fromJson(needs.ChangedPaths.outputs.matrix) }}
    runs-on: ubuntu-latest
    timeout-minutes: 10
    permissions:
      id-token: write
      contents: read

    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Get path
        run: echo ${{ matrix.path }}
      
      - name: Run init
        uses: ./.github/actions/init
        with: 
          change_path: ${{ matrix.path }}

      - name: Run Lint action
        uses: ./.github/actions/lint
        env: 
          paths: ${{ matrix.path }}

Job: Build-Push

The final job in the CI pipeline builds and pushes the packages if all previous jobs succeed.

  Build-Push:
    needs: [ChangedPaths, Lint, Typecheck, Test]
    if: ${{ needs.ChangedPaths.outputs.matrix != '[]' }}
    strategy:
      matrix:
        path: ${{ fromJson(needs.ChangedPaths.outputs.matrix) }}
    runs-on: ubuntu-latest
    timeout-minutes: 15
    permissions:
      id-token: write
      contents: read

    steps:
      - name: Checkout
        uses: actions/checkout@v3
        id: checkout

      - name: Run init
        id: run-init
        uses: ./.github/actions/init
        with: 
          change_path: ${{ matrix.path }}

      - name: Build & Publish Python Packages
        uses: ./.github/actions/build-push-package
        env: 
          paths: ${{ matrix.path }}

This job calls a custom composite action (build-push-package) that builds and upload the package to GitHub cache, so the user can install it from there.

name: Build & Release Package
description: "Build & Push Python Package"
author: "Or Kazaz"

runs:
  using: composite

  steps:
    - name: Checkout
      uses: actions/checkout@v3
      with:
        fetch-depth: 0

    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'

    - name: Install Python Dependencies
      shell: bash
      run: |
        pip install --upgrade pip
        pip install poetry twine check-wheel-contents

    - name: Run Poetry Check for pyproject.toml
      working-directory: ./packages/${{ env.paths }}
      shell: bash
      run: poetry check

    - name: Build Package Distribution
      working-directory: ./packages/${{ env.paths }}
      shell: bash
      run: poetry build

    - name: Check Distribution Descriptions
      working-directory: ./packages/${{ env.paths }}
      shell: bash
      run: |
        twine check dist/*

    - name: Check Wheel Contents
      working-directory: ./packages/${{ env.paths }}
      shell: bash
      run: |
        check-wheel-contents dist/*.whl

    - name: Cache Package Distribution
      uses: actions/upload-artifact@v3
      with:
        name: ${{ env.paths }}
        path: ./packages/${{ env.paths }}/dist/

The secondary workflow (tag-and-release.yaml) is triggered on a manually on demand by using workflow_dispatch. It tags the package, updates its version is pyproject.toml.toml, and publish the packages into our internal PyPI repository (on Jfrog Artifactory). It consists of multiple jobs to ensure a smooth release process.

Job: Tag and Release

This job tags the repository and creates a release on GitHub by zipping it and using an action to create a release to be available for download.

jobs:
  tag-and-release:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    outputs:
      package_version: ${{ steps.dynamic_version.outputs.package_version }}

    steps:
      - name: Checkout
        uses: actions/checkout@v3
        with:
          ref: ${{ github.event.inputs.commit_hash }}

      - name: Run Dynamic-Versioning
        id: dynamic_version
        uses: ./.github/actions/dynamic-versioning
        with:
          package_name: ${{ github.event.inputs.package_name }}

      - name: Archive package directory
        working-directory: ./packages
        run: |
          zip -r ${{ github.event.inputs.package_name }}.zip ./${{ github.event.inputs.package_name }}
          echo "ARTIFACT_FILE_PATH=./packages/${{ github.event.inputs.package_name }}.zip" >> $GITHUB_ENV
        env:
          ARTIFACT_FILE_PATH: ${{ env.ARTIFACT_FILE_PATH }}

      - name: Create Release
        uses: ncipollo/release-action@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          artifacts: "${{ env.ARTIFACT_FILE_PATH }}"
          commit: ${{ github.event.inputs.commit_hash }}
          tag: ${{ github.event.inputs.package_name }}-v${{ steps.dynamic_version.outputs.package_version }}
          name: ${{ github.event.inputs.package_name }}-v${{ steps.dynamic_version.outputs.package_version }}
          token: ${{ secrets.GLOBAL_GITHUB_TOKEN }}
          skipIfReleaseExists: true

Note that the dynamic-versioning action is a custom action that updates the version in the pyproject.toml file.

name: "Dynamic Versioning"
description: "Bump version in python package"
author: "Or Kazaz"

inputs:
  package_name:
    description: "Package name"
    required: true

outputs:
  package_version:
    description: 'bumped package version'
    value: ${{ steps.bump_version.outputs.package_version }}

runs:
  using: composite

  steps:
    - name: Checkout
      uses: actions/checkout@v3
      with:
        fetch-depth: 0

    - name: Bump version
      id: bump_version
      shell: bash
      run: |
          PACKAGE_VERSION=$($GITHUB_WORKSPACE/.github/scripts/internal/bump_version.sh ${{ inputs.package_name }})
          echo "${{ inputs.package_name }} package will be bumped to version: ${PACKAGE_VERSION}"
          echo "package_version=$PACKAGE_VERSION" >> $GITHUB_OUTPUT

Bump version is a very simple bash script the increments the package version in the pyproject.toml file.

#!/bin/bash

PACKAGE_NAME=$1

VERSION_LINE=$(grep version ./packages/$PACKAGE_NAME/pyproject.toml)
OLD_VERSION=$(echo "$VERSION_LINE" | awk -F'"' '{print $2}')
NEW_VERSION=$(echo "$OLD_VERSION" | awk -F. -v OFS=. '{$3=$3+1}1')

sed -i "s/$VERSION_LINE/version = \"$NEW_VERSION\"/g" ./packages/$PACKAGE_NAME/pyproject.toml

VERSION=$(grep version ./packages/$PACKAGE_NAME/pyproject.toml | awk -F'\"' '{print $2}')
echo $VERSION

Job: Build

This job builds the package distribution using Poetry.

  build:
    needs: tag-and-release
    name: Build Python Package
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:

      - name: Checkout
        uses: actions/checkout@v3
        with:
          ref: ${{ github.event.inputs.commit_hash }}

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install Python Dependencies
        working-directory: ./packages/${{ github.event.inputs.package_name }}
        run: |
          pip install --upgrade pip
          pip install --upgrade poetry twine check-wheel-contents

      - name: Run Dynamic-Versioning
        id: dynamic_version
        uses: ./.github/actions/dynamic-versioning
        with:
          package_name: ${{ github.event.inputs.package_name }}

      - name: Build Package Distribution
        working-directory: ./packages/${{ github.event.inputs.package_name }}
        run: |
          poetry build

Job: Publish to JFrog

This job publishes the built package to JFrog by retrieving the cached package distribution and using the official gh-action-pypi-publish action.

  publish-jfrog:
    needs: build
    name: Publish Package to JFrog
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - name: Retrieve Cached Package Distribution
        uses: actions/download-artifact@v3
        with:
          name: ${{ github.event.inputs.package_name }}
          path: ./packages/${{ github.event.inputs.package_name }}/dist

      - name: Publish to JFrog
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          packages-dir: ./packages/${{ github.event.inputs.package_name }}/dist
          user: ${{ secret.secret_user }}
          password: ${{ secret.secret_password }}
          repository-url: https://the.internal.repo.url/artifactory/api/pypi/pypi

Job: Commit Updated Version

Finally, this job commits the updated version to the repository while skipping CI.

  commit-version:
    needs: [tag-and-release, build, publish-jfrog]
    name: Commit updated version
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
    - name: Checkout
      uses: actions/checkout@v3

    - name: Commit updated version
      shell: bash
      run: |
          git config --global user.name $GITHUB_USER
          git config --global user.email $GITHUB_EMAIL
          git pull
          bash $GITHUB_WORKSPACE/.github/scripts/internal/bump_version.sh ${{ github.event.inputs.package_name }}
          git add ./packages/${{ github.event.inputs.package_name }}/pyproject.toml
          git commit -m "Bump ${{ github.event.inputs.package_name }} to version ${{ needs.tag-and-release.outputs.package_version }} [skip ci]"
          git push origin ${GITHUB_REF#refs/heads/}

CI-ON-PROn Pull request CI of 3 packages modified

This setup ensures that only the affected packages are processed, reducing build times and resource usage. The use of matrix strategies, custom scripts, and composite actions provides a flexible, generic and scalable approach to managing CI/CD workflows of such use case.



Tags:

Related Articles

Lab as a Service in DAZN

Read More

S3 Bucket Redirect URL With Terraform

Read More

Scheduling Lambdas Using AWS EventBridge and Terraform

Read More

GitHub Actions for Dynamic Cross-Platform Testing

Read More