AzureDevOps pipelines not triggering, why...?

How do you move a pipeline from one AzureDevOps organisation to another? What happens when you enforce SSO on GitHub? Not a lot of fun but here's our learnings.

We were busy enabling GitHub SSO on our GitHub account. We had ran it for a month or so without enforcing it and had even onboarded all our bot accounts with corporate accounts.

All looked great, about 70% of our users had linked their accounts, we gave up nudging people and decided to enforce it.

Within hours we had 90+% of people successfully using it and just a few user support issues, mostly about authorising SSH keys for use with an SSO account.

A few things started breaking thought mostly around bot accounts and OAuth connections. The SSO docs don't warn you about this, but there's a note on a different page about authentication that says you must reauthorise any OAuth apps after SSO is enforced. This was tedious but not too difficult.

A day or so later we received a report from a team that their builds in Azure DevOps weren't triggering any more and when they tried to build it manually they got this error:

Request failed. GitHub responded with result: 'Resource protected by organization SAML enforcement. You must grant your OAuth token access to this organization.'

They were asked to switch to the Azure Pipelines GitHub app from the personal account service connection that was there before. That allowed the pipeline to checkout when manually running builds but CI builds didn't trigger on pull request.

After some more investigation we found out this team also has their own Azure DevOps organisation and were dual running their organisation and our one. Their old one still worked but builds in our organisation didn't trigger.

If you ran azp where in a pull request it only reported the old organisation although this had previously worked fine, and swapping back to the old service connection didn't seem to fix it either.

At this point we weren't sure where to go next so we raised a support request with Microsoft. They pointed us towards: When I select a repository during pipeline creation, I get an error "The repository {repo-name} is in use with the Azure Pipelines GitHub App in another Azure DevOps organization."

The guidance there said you had to uninstall the GitHub app and then reinstall it to update the mapping. Err what. We have 15+ Azure DevOps projects, 100s of repositories using it and multiple organisations. We have to uninstall the GitHub app to fix one team's issue?

This wasn't something I was very keen to do, we tried to work around it by deleting the service connection in Azure DevOps and re-installing it from GitHub without uninstalling it but it didn't help.

We played support tag with Microsoft for about 2 months trying to get another option, but they were adamant this was the only way.

Finally I caved in and uninstalled the app in GitHub.

The first scary moment was when GitHub published a notification to say uninstallation had been scheduled in a job. Great I don't know when things might break.

I tried to re-install it, when I searched Google for 'github install azure pipelines app' I came to the Azure Pipelines marketplace page. As far as I could tell I couldn't because it was already setup:

timja-org (already setup)

Eventually I found the app page which you can configure it from.

In the meantime I was getting reports from people that their builds weren't triggering and they couldn't manually trigger the builds. If they clicked on the "Edit" button on the pipeline page they got this scary error:

An unexpected error has occurred within this region of the page. You can try reloading this component or refreshing the entire page. A button with "Refresh page", another button with "Reload component". A link with text "Show more info".

The refresh page button and reload component buttons don't do anything useful. We had seen this before when someone had left who had setup some service connections, their account deactivation had invalidated the service connections.

There's a workaround in the UI, if you:

  • go to a working pipeline
  • click edit
  • click the triple dots and then triggers

Edit the query parameter in the URL id=$id_of_working_pipeline to be the ID of the broken pipeline.

You can then click YAML -> Get Sources -> Change and select a different service connection. Then save the pipeline.

Ok well that somewhat works but:

  • 100s of pipelines and many projects
  • the new service connection is a different name to the old one

The naming issue was a pain because service connections are referred to by name in YAML pipelines when referencing resources, e.g.:

resources:
  repositories:
    - repository: shared-library
      type: github
      ref: master
      name: org/repo
      endpoint: 'org'

We had previously had an issue with our connection and someone had managed to install another one so it was called org (1). Okay well I guess I'll just try rename the service connection:

Edit service connection. This service connection is to a GitHub-InstallationToken and cannot be changed.

Well that's unfortunate. Next attempt was by sharing a connection from one project to another. Similar issue, we ended up with $org-$project-name.

I thought I would be cheeky and try rename the shared service connection via the API. The API accepted the update but the name didn't actually update on it and after I deleted the shared service connection I couldn't setup the service connection at all. I got an error during installation and didn't manage to figure a way around this, we ended up moving all pipelines in that project to a new project manually :(.

For the name  issue ($org (1)) I ended up deleting the old service connections and installing a new one. This got me back to $org. Someone from the team that wanted us to make the change helpfully wrote a couple of scripts that could be used for the next painful parts.

The first part was to replace all references in GitHub, the amazing multi-gitter tool was used for this which ran over every repository in the GitHub organisation and applied the replace.sh script to it, creating a pull request if there were any changes:

multi-gitter run -O $ORG \
  ./replace.sh \
  -m "Updating endpoint" \
  --token $GITHUB_TOKEN \
  --pr-title "Updating Endpoint" \
  --author-name <GH_Username> \
  --author-email <GH_User_Email>

replace.sh:

#!/bin/bash

search_string="org (1)"

# Find all .yml and .yaml files in the current directory
yml_files=$(find . -type f \( -name "*.yml" -o -name "*.yaml" \))

# Iterate over each .yml or .yaml file
for file in $yml_files; do
  echo "Processing file: $file"

  # Check if the file contains the search string
  if grep -q "$search_string" "$file"; then
    echo "String found in file: $file"

    # Perform the string replacement using sed
    sed -i 's/org (1)/org/g' "$file"
    
    echo "String replaced in file: $file"
  fi
done

echo "String replacement completed for all .yml and .yaml files in the current directory."

We then had to update all service connection references in Azure DevOps to the new service connection, another script to the rescue:

$ado_org = "org"
$ado_project = "<Project>"
$old_svc_id = "30ecbc3f-0ff7-4f24-b8d8-1a31dde70340"
$new_svc_id = "f2c7e1a2-8c64-4783-8ff9-0fdb25ae0e82"
$PAT = "<PAT>"

$encodedPAT = [Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("`:$PAT"))

$pipelines = Invoke-RestMethod -Method GET -Uri "https://dev.azure.com/$ado_org/$ado_project/_apis/pipelines?api-version=7.0" -Header @{"Authorization" = "Basic $encodedPAT"} 

foreach ($pipeline in $pipelines.Value) {
    $id = $pipeline.id
    $name = $pipeline.Name
    Write-Host "`n$id | $name"

    $pipelineDetails = Invoke-RestMethod -Method GET -Uri "https://dev.azure.com/$ado_org/$ado_project/_apis/build/definitions/$id" -Header @{"Authorization" = "Basic $encodedPAT"}
    $connectedServiceId = $pipelineDetails.repository.properties.connectedServiceId
    Write-Host "SC: $connectedServiceId"
    if ($connectedServiceId -eq $old_svc_id) {
        Write-Host "Needs updating to new SC" -ForegroundColor Red
        $body = ($pipelineDetails | ConvertTo-Json) -replace ($old_svc_id, $new_svc_id)
        Invoke-RestMethod -Method PUT -ContentType application/json -Body $body -Uri "https://dev.azure.com/$ado_org/$ado_project/_apis/build/definitions/$($id)?api-version=7.0" -Header @{"Authorization" = "Basic $encodedPAT"}
    } elseif($connectedServiceId -eq $new_svc_id) {
        write-host "Already Updated to new CS" -ForegroundColor Blue
    } else {
        write-host "Does not use 'org' SC" -ForegroundColor Green
    }

}

For each pipeline in a project, with the service connection of the specified ID, update it to the new ID.

This worked quite well, and was a lot quicker than doing it manually. There were a few issues though:

  • the IDs were different in every project for old and new service connections
  • some projects had up to 10 service connections which were all invalid now, it was tedious getting the IDs of each, running the script and deleting them
  • having to note down IDs, uninstall the old service connections, re-install the Azure Pipelines app to the project in every project took quite a bit of time
  • sometimes I had deleted the old service connection ID before making a note of it, so I had to look up the ID via the API
curl -s -u $USERNAME:${AZURE_DEVOPS_TOKEN} 'https://dev.azure.com/$ORG/$REPO/_apis/build/definitions/$PIPELINE_ID?api-version=7.0' | \
  jq -r .repository.properties.connectedServiceId

We did finally get around to retiring 3 of our old projects that didn't make sense to get back up and running.

After all this was done we were fully up and running.

It took about 2 days worth of time from 3 people to get this all sorted.

Key takeaways

  • Microsoft this is ridiculous. Find a better way of migrating projects between orgs, we should be able to install it to a new one without breaking the old one safely. Even if you have to go into the database, this was far too much of our time it costed.
  • Don't try and update shared service connections through the API
  • It's too easy to accidentally setup extra service connections and it shouldn't be, why did some projects have 10 connections, generally setup by different people and mostly using Azure Pipelines app.

What would I do differently next time?

If I had known what would happen in advance

The best options I can think of are:

  • Pressure Microsoft more to fix it in the "backend" in some way for us.
  • Delete the GitHub repository and re-create it, this might have broken the link and allowed it to work.
  • Create new GitHub repositories with different names for the affected team.