To say NASA has a lot of web content would be quite the understatement. Being one of the first government websites to come online, they’ve amassed almost 20 years worth of content. This also meant 20 years of different content management systems (CMS), image managers, re-platformings, fixes, and conversions. Even a complete and full migration would have been tricky with these aspects, but luckily, that wasn’t our situation. Since some of the content was outdated or had been rewritten more recently, we were able to approach this with the constraint that not all content would be migrated. This migration was also a consolidation, so we began by identifying which content was intended to come over at all.
In order to help move the content, we wrote command-line scripts using the WP-CLI framework. The custom scripts began with JSON data files we generated from the Drupal database. While processing, each piece of content was then checked against what had already been brought in to avoid any duplicates. Since the new CMS did not have all the authors from the Drupal site, as content was migrated, they were assigned a migration author to make it easier to identify later on.
At this point, anything brought over was automatically brought into a dedicated content type just for migration, so things like the original publish date could easily be maintained and the embedded or linked media assets could be ingested into the WordPress media library. This also gave authors and editors time to check their content before converting it into its final post type.
With NASA moving to a completely new design system and information architecture, the content needed human eyes to confirm that it was still accurate and reliable and to also apply any new formatting or style options that the new site provided to them. In all, we migrated 70,000 pages which included 36,430 content items, 30,731 image features, and 845 podcasts, on top of over 100,000 media assets. As a result, amazingly, we never had a situation where the user base was having to enter content into two different CMS.