Managing a monorepo with multiple Python packages presents unique challenges, particularly in ensuring efficient CI/CD. In this article, I will share how I created CI/CD workflows for our monorepo of Python packages using GitHub Actions.
We will delve into the workflows, including matrix complexity, and scripts for checking changed paths, alongside other custom scripts and composite actions that facilitate this process.
Our primary CI workflow (on-pr-build-push.yaml
) triggers on pull requests, ensuring that all changes are validated before merging. This workflow comprises several jobs: ChangedPaths
, Lint
, Typecheck
, Test
, and Build-Push
.
Job: ChangedPaths
This job determines which packages have been modified and outputs a matrix that drives the subsequent jobs. So for example, if a PR modifies package1
and package2
, the matrix will contain two rows: package1
and package2
.
name: Build & Push Python Package
on:
pull_request:
jobs:
ChangedPaths:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.get_changed_paths.outputs.matrix }}
timeout-minutes: 10
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v3
with:
token: ${{ secrets.GLOBAL_GITHUB_TOKEN }}
fetch-depth: 0
- name: Run init
uses: ./.github/actions/init # Custom composite action for initializing the environment
- name: Get Changed Paths
id: get_changed_paths
shell: bash
run: |
python $GITHUB_WORKSPACE/.github/scripts/internal/get_changed_paths.py
The get_changed_paths.py
python script identifies the changed paths and filters them to include only those within the packages directory.
This script is quite straightforward and uses the git diff
command by using the built-in subprocess
module in order to identify the changed paths.
import subprocess
import os
output_file = os.getenv('GITHUB_OUTPUT')
def get_changed_paths():
cmd = 'git diff --name-only $(git rev-parse HEAD) HEAD~1'
result = subprocess.run(cmd, shell=True, capture_output=True)
if result.returncode != 0:
raise Exception("Error getting changed paths")
changed_paths = result.stdout.decode("utf-8").strip().split("\n")
print(f"Changed paths: {changed_paths}")
return changed_paths
def get_filtered_paths(changed_paths):
filtered_paths = set()
for path in changed_paths:
if path.startswith("packages/"):
filtered_paths.add(path.split("/")[1])
print(f"Filtered paths: {filtered_paths}")
return list(filtered_paths)
if __name__ == "__main__":
changed_paths = get_changed_paths()
filtered_paths = get_filtered_paths(changed_paths)
print(filtered_paths)
with open(output_file, "a") as file:
file.write(f"matrix={filtered_paths}")
Note the capture_output=True
argument allows us to capture the command's output. We then decode and split the output to get a list of changed paths. The get_filtered_paths
function filters these paths to include only those that start with packages/
which is my path in my repos case.
Also note that GitHub Actions uses environment files to pass data between steps and jobs. By writing the output matrix to a file (GITHUB_OUTPUT), we can ensure that the list of changed paths is available as an output in the ChangedPaths job. This allows subsequent jobs to access and iterate over the paths efficiently.
Job: Lint, Typecheck, and Test
These jobs runs composite actions for linting (flake8), type checking (mypy), and tests (pytest) on the changed packages using a matrix strategy. They depend on the ChangedPaths job.
Lint:
needs: ChangedPaths
if: ${{ needs.ChangedPaths.outputs.matrix != '[]' }}
strategy:
matrix:
path: ${{ fromJson(needs.ChangedPaths.outputs.matrix) }}
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Get path
run: echo ${{ matrix.path }}
- name: Run init
uses: ./.github/actions/init
with:
change_path: ${{ matrix.path }}
- name: Run Lint action
uses: ./.github/actions/lint
env:
paths: ${{ matrix.path }}
Job: Build-Push
The final job in the CI pipeline builds and pushes the packages if all previous jobs succeed.
Build-Push:
needs: [ChangedPaths, Lint, Typecheck, Test]
if: ${{ needs.ChangedPaths.outputs.matrix != '[]' }}
strategy:
matrix:
path: ${{ fromJson(needs.ChangedPaths.outputs.matrix) }}
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v3
id: checkout
- name: Run init
id: run-init
uses: ./.github/actions/init
with:
change_path: ${{ matrix.path }}
- name: Build & Publish Python Packages
uses: ./.github/actions/build-push-package
env:
paths: ${{ matrix.path }}
This job calls a custom composite action (build-push-package
) that builds and upload the package to GitHub cache, so the user can install it from there.
name: Build & Release Package
description: "Build & Push Python Package"
author: "Or Kazaz"
runs:
using: composite
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install Python Dependencies
shell: bash
run: |
pip install --upgrade pip
pip install poetry twine check-wheel-contents
- name: Run Poetry Check for pyproject.toml
working-directory: ./packages/${{ env.paths }}
shell: bash
run: poetry check
- name: Build Package Distribution
working-directory: ./packages/${{ env.paths }}
shell: bash
run: poetry build
- name: Check Distribution Descriptions
working-directory: ./packages/${{ env.paths }}
shell: bash
run: |
twine check dist/*
- name: Check Wheel Contents
working-directory: ./packages/${{ env.paths }}
shell: bash
run: |
check-wheel-contents dist/*.whl
- name: Cache Package Distribution
uses: actions/upload-artifact@v3
with:
name: ${{ env.paths }}
path: ./packages/${{ env.paths }}/dist/
The secondary workflow (tag-and-release.yaml
) is triggered on a manually on demand by using workflow_dispatch
. It tags the package, updates its version is pyproject.toml.toml, and publish the packages into our internal PyPI repository (on Jfrog Artifactory). It consists of multiple jobs to ensure a smooth release process.
Job: Tag and Release
This job tags the repository and creates a release on GitHub by zipping it and using an action to create a release to be available for download.
jobs:
tag-and-release:
runs-on: ubuntu-latest
timeout-minutes: 15
outputs:
package_version: ${{ steps.dynamic_version.outputs.package_version }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
ref: ${{ github.event.inputs.commit_hash }}
- name: Run Dynamic-Versioning
id: dynamic_version
uses: ./.github/actions/dynamic-versioning
with:
package_name: ${{ github.event.inputs.package_name }}
- name: Archive package directory
working-directory: ./packages
run: |
zip -r ${{ github.event.inputs.package_name }}.zip ./${{ github.event.inputs.package_name }}
echo "ARTIFACT_FILE_PATH=./packages/${{ github.event.inputs.package_name }}.zip" >> $GITHUB_ENV
env:
ARTIFACT_FILE_PATH: ${{ env.ARTIFACT_FILE_PATH }}
- name: Create Release
uses: ncipollo/release-action@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
artifacts: "${{ env.ARTIFACT_FILE_PATH }}"
commit: ${{ github.event.inputs.commit_hash }}
tag: ${{ github.event.inputs.package_name }}-v${{ steps.dynamic_version.outputs.package_version }}
name: ${{ github.event.inputs.package_name }}-v${{ steps.dynamic_version.outputs.package_version }}
token: ${{ secrets.GLOBAL_GITHUB_TOKEN }}
skipIfReleaseExists: true
Note that the dynamic-versioning
action is a custom action that updates the version in the pyproject.toml
file.
name: "Dynamic Versioning"
description: "Bump version in python package"
author: "Or Kazaz"
inputs:
package_name:
description: "Package name"
required: true
outputs:
package_version:
description: 'bumped package version'
value: ${{ steps.bump_version.outputs.package_version }}
runs:
using: composite
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Bump version
id: bump_version
shell: bash
run: |
PACKAGE_VERSION=$($GITHUB_WORKSPACE/.github/scripts/internal/bump_version.sh ${{ inputs.package_name }})
echo "${{ inputs.package_name }} package will be bumped to version: ${PACKAGE_VERSION}"
echo "package_version=$PACKAGE_VERSION" >> $GITHUB_OUTPUT
Bump version is a very simple bash script the increments the package version in the pyproject.toml
file.
#!/bin/bash
PACKAGE_NAME=$1
VERSION_LINE=$(grep version ./packages/$PACKAGE_NAME/pyproject.toml)
OLD_VERSION=$(echo "$VERSION_LINE" | awk -F'"' '{print $2}')
NEW_VERSION=$(echo "$OLD_VERSION" | awk -F. -v OFS=. '{$3=$3+1}1')
sed -i "s/$VERSION_LINE/version = \"$NEW_VERSION\"/g" ./packages/$PACKAGE_NAME/pyproject.toml
VERSION=$(grep version ./packages/$PACKAGE_NAME/pyproject.toml | awk -F'\"' '{print $2}')
echo $VERSION
Job: Build
This job builds the package distribution using Poetry.
build:
needs: tag-and-release
name: Build Python Package
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v3
with:
ref: ${{ github.event.inputs.commit_hash }}
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install Python Dependencies
working-directory: ./packages/${{ github.event.inputs.package_name }}
run: |
pip install --upgrade pip
pip install --upgrade poetry twine check-wheel-contents
- name: Run Dynamic-Versioning
id: dynamic_version
uses: ./.github/actions/dynamic-versioning
with:
package_name: ${{ github.event.inputs.package_name }}
- name: Build Package Distribution
working-directory: ./packages/${{ github.event.inputs.package_name }}
run: |
poetry build
Job: Publish to JFrog
This job publishes the built package to JFrog by retrieving the cached package distribution and using the official gh-action-pypi-publish
action.
publish-jfrog:
needs: build
name: Publish Package to JFrog
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Retrieve Cached Package Distribution
uses: actions/download-artifact@v3
with:
name: ${{ github.event.inputs.package_name }}
path: ./packages/${{ github.event.inputs.package_name }}/dist
- name: Publish to JFrog
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ./packages/${{ github.event.inputs.package_name }}/dist
user: ${{ secret.secret_user }}
password: ${{ secret.secret_password }}
repository-url: https://the.internal.repo.url/artifactory/api/pypi/pypi
Job: Commit Updated Version
Finally, this job commits the updated version to the repository while skipping CI.
commit-version:
needs: [tag-and-release, build, publish-jfrog]
name: Commit updated version
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Commit updated version
shell: bash
run: |
git config --global user.name $GITHUB_USER
git config --global user.email $GITHUB_EMAIL
git pull
bash $GITHUB_WORKSPACE/.github/scripts/internal/bump_version.sh ${{ github.event.inputs.package_name }}
git add ./packages/${{ github.event.inputs.package_name }}/pyproject.toml
git commit -m "Bump ${{ github.event.inputs.package_name }} to version ${{ needs.tag-and-release.outputs.package_version }} [skip ci]"
git push origin ${GITHUB_REF#refs/heads/}
On Pull request CI of 3 packages modified
This setup ensures that only the affected packages are processed, reducing build times and resource usage. The use of matrix strategies, custom scripts, and composite actions provides a flexible, generic and scalable approach to managing CI/CD workflows of such use case.