Some general recommendations on setting up an advanced GitHub actions pipeline

1/23/2024 by Stefan Bauer

We recently migrated from CircleCI to GitHub Actions, and I must say that I'm quite happy with the result. The community seems not to have a unified view on Actions. Some HackerNews discussions are mostly positive, whereas others are overwhelmingly negative. I assume this is just another case that boils down to whether it's suitable for your specific use case.

Since I invested a lot of hours migrating two major and some minor projects to GitHub Actions, I now feel I can give some advice to people who intend to or think about doing the same.

How to approach building a pipeline with GitHub Actions

Step 1: Draw the basic workflow, then draw your dream workflow – then ask for feedback and do it again

I like drawing stuff like that on my iPad, but a photo of something you scribbled down on paper will also do. Send it to your coworkers and discuss if they share your vision or if they have some alternative recommendations or wishes. In general, asking for feedback is good advice. I also created a quick mermaid diagram afterward and asked CoPilot to evaluate my workflow. It had some decent points to make (e.g. extracting unrelated steps and putting them in another YAML file), and I'm happy I asked. I mean, we have those AI tools now, right?

If you're happy with the final iteration of your draft, get to work. It's not what you'll eventually end up with, but it will serve as an excellent and helpful outline.

Step 2: Think about what belongs together and what does not

I'm convinced that making your workflows understandable and readable is crucial. Don't just dump everything into one single configuration file. Your future self will thank you for it.

Here is an example of some workflows that you could use:

  • backup.yaml – Save your source code to another repository. For example, every time we commit and push code, we want to have it in both GitHub and AWS Code Commit (you could also use GitLab or something self-hosted). You know, just to be safe.
  • labeler.yaml – There is some merit to automatically assigning labels to your pull requests. It makes it easy to filter for them or assign reviewers. (Okay, we use code owners for that, but that's another story.)
  • create-docker-image.yaml – Using your own Docker image can be much faster than using something already available. We have a workflow that creates a Docker image and uploads it to the GitHub Container Registry.
  • security.yaml – Some security checks can (and probably should) be performed automatically. For example, we use GitLeaks both in a pre-commit hook and as a workflow step that checks if we're leaking secrets. You can also perform NPM audits and other security checks in such workflows.
  • main-workflow.yaml – Feel free to find a better name for this. At our company, this workflow is the most important one. Many steps are in here because they rely on each other. For a Symfony PHP application, the workflow could look as follows.
    • Step: Static analysis (Psalm, PHPStan, ESLint, ...)
    • Step: Build Node assets
    • Step: Run Node tests and create a coverage report
    • Step: Install composer dependencies
    • Step: Run backend tests and create a coverage report
    • Step: Download a DB dump and check if your migrations will break anything or if data will be lost
    • Step: Run E2E tests (e.g. with Playwright)
    • Step: Prepare the application for deployment
    • Step: Deploy the application to staging
    • Step: Ask in Slack if any developer approves the deployment to prod (if staging works)
    • Step: Deploy prod
    • Step: Send an automatic update notification including new entries in the changelog

When writing the main workflow, you have to make many interesting decisions. Which steps can run in parallel? Which files to persist? Which cache keys to use? Can some of the steps be extracted and put into independent workflow files?

In one of our workflows, a related microservice is also deployed together with a Symfony application. We have reasons for this, but in general, I recommend having separate workflows for that. However, this hinges on so, so many factors. Are you using a monorepo (yay)? Do microservices have to be in sync? And so on.

Step 3: Start writing from small to big

I think it's much easier to get started with... well, easy tasks. Maybe start with the pull request labeler. Or start with the security checker. Just be sure to make one pull request per workflow because that will make reviews faster and easier. If you start small, you can also educate your colleagues via code review, which will prepare all of you for the bigger and more difficult tasks.

Step 4: Start optimizing

Okay, you have created all workflows, and they work well. But you may have a lot of code duplication, some tasks fail occasionally, and some tasks are painfully slow (I'm looking at you, E2E tests!). This is when you can start optimizing your workflows. Here are some tips.

GitHub Composite Actions can help you reduce code duplication for things you have to do repeatedly. For example, we have one that waits for our MySQL server to be ready. Think about the "when" of workflow steps. For example, we don't run E2E tests for every PR upgrade. We run them each time something wants to enter an important branch (main or prod, for example).

If a step is flaky, it's probably not a good step, and you don't want it in your workflow. If you have to hit that repeat button three times before you can deploy, that's an annoying waste of time. Fix it now before you forget all the details you need.

Good candidates for bugs in your workflow are cache keys. Maybe they're too broad, and some required assets are outdated. You know, Karlton's saying about cache and naming everybody keeps bringing up? Not caching enough is also a common reason for slow workflows. Have a good hard look at your steps, their dependencies, and what you could cache.

Think about when you want your workflow to crash. Some things may be more important to you than others. For example, you probably want to be able to deploy your application even if you cannot send that Slack message that informs others about the new version.

Lastly, I think adding a step summary can make a lot of sense. For example, show a security report or your coverage report.

Tips on sharing data between steps

If you want to share data between steps in the same GitHub Actions job then you have two options available.

Cache – If you need to persist larger amounts of data, e.g. asset bundles created with npm run build, using cache is the right choice. Cached data persists across workflow runs (so you don't have to npm install for every commit).

Output variables – Those are useful when sharing information like computed values, configuration details, or any data that needs to be used within the same workflow run. For example, one step generates a ZIP file with a dynamic name and another step needs to know about that. Output variables exist only for the duration of the workflow run.

So, do I like GitHub Actions more than CircleCI?

CircleCI isn't all bad. Actually, it's not bad at all. It has some nice features. Debugging via SSH, for example, is super easy with CircleCI. With GitHub Actions you have to take some extra steps to get this working (e.g. with tmate). Also, CircleCI features some nice integrations with other tools (although GitHub Actions is very quickly catching up on this front). Still, I find the integration of GitHub Actions into, well, GitHub, very compelling, and working with it in general is easy because the documentation is solid and up-to-date.

Wrap up

The transition from CircleCI to GitHub Actions was surprisingly easy, and I am happy with the new pipeline. I am also happy that there were so few pitfalls and unexpected behaviors. Granted, I have been working with pipelines for years now, and could probably also build a cool CircleCI pipeline from scratch. But I love reducing complexity, and using one less service seems great to me.

I hope you enjoyed this post. Maybe it gave you a good idea about building pipelines with Actions or some ideas about how to optimize a pipeline's planning phase. Cheers.