It Started with People: PRs in the World of AI

Sometimes, even when you’re close to your team, patterns or underlying issues emerge over time. Some of these are insights you may wish you had recognised earlier. The important thing is to talk to your team and work on a solution together. Sometimes, together you resolve one challenge and enhance something else at the same time

Like many teams, we’ve been through many Pull Request (PR) process evolutions. We started with requests in Microsoft Teams chats - they often got lost. We tried automated notifications in Teams, but they were only possible in an odd place - so they got ignored, and lost. Then we created a chat for our team just for PRs, with a strict no chat policy, and an emoji system so engineers can see someone is looking at their PR, when there are comments, when it’s approved, and the engineer can even mark it as urgent if necessary.

It worked really well, until something unforeseen happened.

The Challenge

I realised that one of my engineers was taking on the majority of the PR reviews. As a result, they weren’t able to focus on their own work as much as they should have. It also created a bottleneck, and the rest of the team missed out on the benefits of reviewing PRs themselves: learning from others’ code, becoming more familiar with the codebase, and understanding the reasoning behind decisions. I know not every engineer naturally sees these as benefits, but they are.

When I spoke with them about it, they shared a few underlying concerns that had influenced this approach. They had noticed that some PR approvals appeared quite brief, with quick “Looks Good To Me” responses, and at times the speed of approvals raised questions about how thoroughly changes were being reviewed. As a result, they felt it was important to help ensure that each PR received at least one in-depth review.

They also observed that participation in PR reviews wasn’t always evenly distributed across the team. While some engineers were regularly involved, others were less engaged. Combined with this, they felt that, despite having clear coding guidelines, the consistency of approved code varied, suggesting that reviews weren’t always applied in the same way.

Finally, they noted that differences in how code was written across the team could sometimes lead to inconsistencies or duplication. From their perspective, a bit more shared scrutiny could help maintain alignment and cohesion in the codebase.

I got the team together to work out a solution.

The Solution

My engineer identified two primary concerns: an uneven distribution of PR reviews across the team, and deficiencies in the quality of both the reviews and the code produced. Both required focused improvement efforts.

Evening Out PR Reviews

Another of my engineers told the rest of the team that: “when I’m in the zone I don’t look at the PRs”, which reflects a natural tendency among software engineers to prioritise focused development work over interruptions. A few others in the team agreed, emphasising that relying on ad-hoc engagement can lead to gaps in review coverage and consistency.

We discussed some options and decided to establish a model where specific people are on PR duty, with explicit permission to pause their own work and focus on reviews. There was reluctance - few engineers enjoy doing PRs -, but agreement to give it a try. We decided to pick the PR Sheriffs, as they came to be known, randomly each day using https://wheelofnames.com/, with the deliberate intention that the duty would be distributed evenly, and not more than one of the pair is a junior.

Most of the time, this is how things work, but there are now occasions when engineers volunteer for one reason or another, or - often when I’m not able to attend standup - other methods are used. The important thing is that it works.

Building Quality

The organisation the team works in has been very proactive with AI in general, and we’ve been using Copilot for PR reviews for some time. The Copilot reviewer is a GitHub-hosted AI agent that orchestrates multiple large language models (OpenAI, Anthropic, Google, and Microsoft), typically defaulting to GPT‑5–level coding-optimised models.

Initially, the reviews were not particularly helpful, but they improved very quickly. Unfortunately, adoption across the team was uneven, and some Copilot comments were being resolved without much consideration. The first rule that we put in place was that Copilot comments should not be resolved without an accompanying justification comment. We then extended this rule to say that PRs should not be submitted for sheriff review until Copilot feedback had been reviewed and appropriately addressed.

We were aware that these changes could increase the time taken for PR reviews, so we looked to streamline the process as much as possible. Our coding guidelines are in a GitHub wiki, and the whole team has access to Claude Code. We agreed to use those guidelines to support automated PR reviews via Claude Code, including reviewing and responding to Copilot feedback, and to make sure we still reviewed the code manually before approving or requesting changes.

At this point, we had an agreed plan to make sure quality was as consistent as it could be, but there was some natural apprehension within the team, which could only really be addressed by putting the approach into practice and iterating based on feedback.

In The New World

Since introducing the PR Sheriff model and embedding AI into the review workflow, the team has seen clear and measurable progress. Code reviews have become more thorough and structured, with fewer cursory approvals and more meaningful engagement. While there are still variances in reviewer confidence and consistency across individuals, the overall standard has improved noticeably. Reviews now tend to include deeper commentary, better questioning, and more evidence of thoughtful consideration, even if not yet uniformly applied by everyone.

One of the most immediate and successful outcomes has been the distribution of PR workload. The Sheriff approach has removed the previous bottleneck and ensured that responsibility is shared more evenly across the team. No single individual is now overwhelmed with reviews, and engineers have clearer expectations about when they should prioritise reviewing over development work. This has reduced delays and made the flow of PRs more predictable, while also giving more team members exposure to the review process and the codebase as a whole.

Interestingly, while standardisation of code was one of the original goals, much of the progress in this area appears to have come indirectly through the use of AI tooling rather than deliberate enforcement. Copilot and Claude Code have helped reinforce consistent patterns and highlight deviations in a way that is both immediate and scalable. Although the team operates with common coding guidelines and rules, individual differences in skill levels and the varied prompts used during PR reviews mean code is examined from multiple perspectives, helping to uncover a broader range of issues, potential bugs, and regressions. As a result, code across the team is becoming more aligned, even without heavy-handed governance, suggesting that AI is acting as a subtle but powerful equaliser.

Engagement within PRs has also increased. There is noticeably more discussion, more comments, and more visible interaction around changes. Whether this is driven by engineers becoming more diligent or simply enabled by AI surfacing more points to discuss is still open to interpretation. However, the outcome is positive: PRs are no longer passive checkpoints but active conversations about code quality, design decisions, and shared understanding.

Another unexpected but welcome development has been an increase in ownership. The visibility and accountability that comes with being the designated PR Sheriff for the day appears to have encouraged individuals to take their role more seriously. This sense of responsibility has extended beyond reviews themselves, influencing how engineers think about the quality of their own submissions and their role within the team.

There are still areas that need attention. Some inconsistencies remain, particularly in reviewer confidence and depth of analysis, but these gaps are now more visible and better understood. Encouragingly, the team is already looking to address these through further use of AI support, refining processes, and continued iteration. The foundation is now much stronger, and the direction of travel is clearly positive.

Write Your Own Load Balancer: A worked Example

I was out walking with a techie friend of mine I’d not seen for a while and he asked me if I’d written anything recently. I hadn’t, other than an article on data sharing a few months before and I realised I was missing it. Well, not the writing itself, but the end result. In the last few weeks, another friend of mine, John Cricket , has been setting weekly code challenges via linkedin and his new website, https://codingchallenges.fyi/ . They were all quite interesting, but one in particular on writing load balancers appealed, so I thought I’d kill two birds with one stone and write up a worked example. You’ll find my worked example below. The challenge itself is italics and voice is that of John Crickets. The Coding Challenge https://codingchallenges.fyi/challenges/challenge-load-balancer/ Write Your Own Load Balancer This challenge is to build your own application layer load balancer. A load balancer sits in front of a group of servers and routes client requests across all of the serv...

Paul Grenyer

Search This Blog