Drupal's Migrate API Won't Update Path Aliases

2022-03-14

In our Special Collections Web Portal we use ARKs for persistent identifiers as well as path aliases. This means that a special collections finding aid with the ARK ark:/62930/f10c99 will be available at https://special.library.unlv.edu/ark%3A/62930/f10c99.

At the time of writing, our ARKs are stored in the ArchivesSpace EAD Location field. When we synchronize our ArchivesSpace with the Portal using the Drupal Migrate API it takes the ARK from that field and populates the resource record’s “path” property, creating the path alias.

Recently we discovered that we had synced some resource records that didn’t have their ARK yet which meant no path alias was created. “No problem,” I thought, “I’ll simply add the ARKs to ArchivesSpace and run the sync again which will populate the path alias.” This, unfortunately, didn’t work. I was rather confused, because all the other node properties were being updated. Why not the alias?

I went looking for answers and the only hint I found was in a small note appended to a Stack Overflow answer:

Note: Path aliases get imported the first time you import the data. However, if you run drush migrate-import MIGRATION-ID --update, the path aliases do not get updated.

I haven’t found out why this is the case, but their statement matched my experience, so I switched my attention to how to fix the broken records.

Fortunately, we store the ARK in another field on these records, so the solution was to simply check each resource record to see if they have an ARK but no alias:

<?php
# load archival_resource nodes
$resource_nodes = \Drupal::entityTypeManager()->getStorage('node')->loadByProperties(['type'=>'archival_resource']);
foreach($resource_nodes as $node) {
  $path = '/node/'.$node->id();
  $alias = \Drupal::service('path_alias.manager')->getAliasByPath($path, 'en');
  # Oddly enough, the getAliasByPath returns the original path instead of an empty array when none is found.
  # So instead of checking if alias is empty, we check to see if it is the same as the path we checked. 🤷‍♂️
  if (!empty($node->field_finding_aid_link->uri) && $path == $alias) {
    $alias = preg_replace('#https?://n2t\.net#', '', $node->field_finding_aid_link->uri);
    print("Creating alias $alias for path $path\n");
    $path_alias = \Drupal\path_alias\Entity\PathAlias::create([
      'path' => $path,
      'alias' => $alias,
    ]);
    $path_alias->save();
  }
}

This worked well enough and all is well. Just be careful when you create your records for the first time!

Side note: Someone had asked why I didn’t use the Pathauto bulk regeneration feature. That is because we don’t use pathauto! 😉 Why? We store our ARKs using the full resolver URL, e.g. https://n2t.net/ark:/62930/f10c99, rather than just the ARK portion which requires us to sub-string the field, something that pathauto doesn’t support (to my knowledge) as it relies on tokens.