Fedora 5→6 Migration w/ Islandora

2022-04-06

Back in January we discovered that, although our original files stored in Fedora were being served through our Islandora site, metadata updates were failing. (See the thread on the Fedora Slack #tech channel.) After a bit of debugging we determined that the best solution would be to upgrade our Fedora 5 instance to Fedora 6.

A Fedora 5 migration to Fedora 6 is theoretically simple, but because it involves a complete reworking of the persistant storage, it can be resource intensive for larger repositories. We had more than 12 terabytes of files to migrate. The migration consists of three stages:

  1. export the Fedora 5 repository,
  2. transform the export into the Oxford Common File Layout (OCFL),
  3. install the Fedora 6 code and configure it to use the transformed repository content.
Read More

Drupal's Migrate API Won't Update Path Aliases

2022-03-14

In our Special Collections Web Portal we use ARKs for persistent identifiers as well as path aliases. This means that a special collections finding aid with the ARK ark:/62930/f10c99 will be available at https://special.library.unlv.edu/ark%3A/62930/f10c99.

At the time of writing, our ARKs are stored in the ArchivesSpace EAD Location field. When we synchronize our ArchivesSpace with the Portal using the Drupal Migrate API it takes the ARK from that field and populates the resource record’s “path” property, creating the path alias.

Recently we discovered that we had synced some resource records that didn’t have their ARK yet which meant no path alias was created. “No problem,” I thought, “I’ll simply add the ARKs to ArchivesSpace and run the sync again which will populate the path alias.” This, unfortunately, didn’t work. I was rather confused, because all the other node properties were being updated. Why not the alias?

Read More

Content Access Control Solutions Investigation

2021-02-19

My recent investigation into abysmal digital asset management site performance under load identified Permissions by Term module (permissions_by_term) as the primary culprit. Some preliminary exploration found at least one optimization that improved performance, but not sufficiently. Others have also noted performance issues and made some suggestions, although the suggestions appear abandoned. I could spend time developing more optimizations but, given that a patch I submitted in June 2020 was never reviewed, I have little hope that my patches would be merged.

This report documents my investigations into alternate solutions.

Read More

Islandora Performance Analysis

2021-02-18

Currently, the our new Islandora-based digital asset management system (yet to be released) is returning pages from a cold cache (meaning Drupal hasn’t pre-computed parts of the page) at an unacceptably slow rate. Some slowness is acceptable from a cold cache as most pages will be cached and return quickly (within a fraction of a second). However, our site contains a long tail of content and we can’t anticipate what users will search for. These pages do not have the benefit of page caching and so need a passable loading speed without it.

Read More

Permissions By Entity & File Access Fix

2020-06-23

Some of our digital repository content is not immediately (or ever will be) accessible to the public. The Drupal 8 module permissions_by_term and the sub-module permissions_by_entity allows us to create a ‘Staff-only’ term that, when applied to nodes and media will restrict access to them (and files as an extension of media) to Special Collections staff members.

However, access to files will only be checked if they are in a managed filesystem such as the Drupal private filesystem or one provided by a Flysystem adapter (like the Fedora Adapter provided by Islandora 8). When Islandora 8 creates derivate media it places them in the public filesystem by default. This is generally a good idea because it is more performant than other filesystems but then derivatives of restricted items become publicly accessible, which is not what we want. A repository manager can change the derivative actions to save them into either the private of Fedora filesystems, but this comes at a performance cost when accessing files that are publicly accessible.

Enter file_access_fix. (Credit to Jonathan Hunt who introduced me to this module!) This handy module will check an entity (node, media, etc.) on update to see if it has any referenced files. If so, it will check if the entity is available by Anonymous users and will move the file to the appropriate filesystem, either public or private, based on it’s availability. So, if I create a derivative that is saved in the private filesystem by default, this module will check to see if the media is publicly accessible, if so, it will move it to the more performant public filesystem for me! If I ever decide that the media actually should be restricted, the module will move the file from the public filesystem back to the private one.

Read More