WordPress migration to Statamic v3 (part 2)

Arlind Musliu Portrait
Arlind Musliu

June 24, 2022 · 5 min read · 649 views

Wordpress and Statamic Blogpost Image

WordPress to Statamic

In our previous article, we outlined the initial steps for migrating content from WordPress to Statamic v3 using the Laravel package Corcel. If you haven't already, be sure to read that post before continuing here, as it sets the foundation for the more complex migration tasks we'll address.

Use Case: Stripping Visual Composer tags

Many WordPress users eventually seek alternatives due to various reasons, be it performance issues, the desire for a more streamlined interface, or the need for greater flexibility. For our client, the breaking point was the cumbersome nature of Visual Composer. While Visual Composer is a powerful page builder, it can inject its tags (enclosed in square brackets []) into the HTML content, complicating the migration process. We need to strip those out before saving them to Statamic. We also need to relink internal links because Visual Composer uses ids for them. The client also wanted to rename/remove some category names.

We are going to tackle the following issues:

  • Strip all Visual Composer tags from the HTML content

  • Relink internal page links to replace Visual Composer id links

  • Relink internal image URLs to replace Visual Composer id links

  • Importing the thumbnail image and using it as a featured image

  • Replace old category names with new categories

Keep in mind

As we explained in our previous article, we get all the posts and save each one of them as entries. The code snippets in the following sections will be inside the foreach statement, but we won't mention this again. You could also check the final code in the end of this post.

foreach ($posts as $post) {
    // everything else
}

Stripping Visual Composer Tags

We encountered content laden with Visual Composer-specific tags like [vc_row][vc_column][vc_column_text]....

Using the preg_replace() PHP function, we filtered out these tags from the $content variable, which is then ready for Statamic entry creation. Posts without content are skipped to avoid errors.

<?php

// get the content of this post
$content = $post->post_content;
        
$content = preg_replace("~(?:\[/?)[^/\]]+/?\]~s", '', $content);
$content = preg_replace('~(?:\[/?).*?"]~s', '', $content);
$content = preg_replace('(\\[([^[]|)*])', '', $content );
$content = preg_replace('/\[(.*?)\]/', '', $content );

if ($content === ""){
  continue;
}

Relinking Images

Since Visual Composer uses shortcodes to handle images, simply stripping tags can result in lost images. Visual Composer retrieves the images through their ids and displays them on the page. These ids have no value for Statamic and that's why the images are lost. Our solution is the following:

  1. Identifying instances of image_url=12345 where "12345" is a placeholder for the image ID.

  2. Looping through all image_url instances to extract the ID.

  3. Get the id by splitting the result with explode

  4. Querying the WordPress database for the associated image.

  5. Get the image name

  6. Converting the image path from absolute to relative URLs.

  7. Replacing the Visual Composer image tag with a standard <img> tag.

<?php

// match all instances where the image_url is found and get the id
preg_match_all('/image_url="\d{4,5}/', $content, $matches);

foreach ($matches[0] as $match) {
    $id = explode('"', $match);

    // find the image with this id and get the url
    $attachment = Post::where('id', $id[1])->first();
    $url = Str::replace('https://domain.com/wp-content/', 'assets/wp/', $attachment->guid);

    // replace the plugin text with img tag and url
    $content = preg_replace('(\[image_with_animation image_url="' . preg_quote($id[1], '/') . '[^\]]+])', '<img class="w-full h-auto" alt="image" src="' . $url . '">', $content);
}

Relinking Internal Links

Visual Composer also uses IDs for internal links, which we needed to convert to slugs for Statamic. This process is similar to image relinking:

  • Locating instances of p=12345 (12345 is an example id)

  • Loop through all found instances of p

  • Get the id by splitting the result with explode

  • Search the WordPress database for that id

  • Use the post title to create a slug

  • Updating the link from ID to slug.

<?php

// match all instances where the url has the post id and get the id
preg_match_all('/"\/\?p=\d{4,5}/', $content, $matches);
foreach ($matches[0] as $match) {
    $id = explode('=', $match);

    // find the post with this id and get the url
    $oldPost = Post::where('id', $id[1])->first();
    $url = Str::slug($oldPost->post_title);

    // replace the plugin url with slug url
    $content = preg_replace('(\?p=' . preg_quote($id[1], '/') . ')', $url, $content);
}

Inserting Featured Images

The client's site redesign required a featured image for each post. We imported thumbnail images from the old posts and saved them in a new wp folder. If a post lacked a thumbnail, we defaulted to no_image.jpg.

->thumbnail->size('invalid_size') is the way Corcel handles the retrieval of thumbnails.

<?php

if ($post->thumbnail) {
    $featured_image = Str::replace('https://domain.com/wp-content/', 'wp/', $post->thumbnail->size('invalid_size'));
} else {
    $featured_image = 'wp/no_image.jpg';
}

Converting HTML to ProseMirror

Statamic uses ProseMirror, a format different from HTML. To ensure compatibility, we converted the HTML content to ProseMirror. For those unfamiliar with ProseMirror or TipTap, Statamic provides a helpful guide.

<?php

$content = (new \HtmlToProseMirror\Renderer)->render($content);

$content = $content['content'];

Rename/remove category names

Our client wanted to group a couple of old category names into new names and rename some other categories. That's why we had to go through this array of old and new category names before saving the post as a Statamic entry.

<?php

// inside the handle() function
$category = $this->getCategories($post->main_category);

// extracted into a separate function
public function getCategories($category){
  
    $categories = [
      'Uncategorized' => 'default',
      'Other' => 'default',
      'Spin' => 'disinformation',
      'Corona' => 'disinformation',
      'Coronavirus' => 'disinformation',
      'COVID-19' => 'disinformation',
      'DISINFORMATION' => 'disinformation',
    ];

   return $categories[$category];
}

There was more than one category name for each post, but the client wanted to reduce that to a single category. Corcel already has $post->main_category which gets the first taxonomy name on the list. Although it says main_category, it actually retrieves the first taxonomy, as you can see:

<?php

// corcel/src/model/Post.php

/**
 * Gets the first term of the first taxonomy found.
 *
 * @return string
 */
public function getMainCategoryAttribute()
{
    $mainCategory = 'Uncategorized';

    if (!empty($this->terms)) {
        $taxonomies = array_values($this->terms);

        if (!empty($taxonomies[0])) {
            $terms = array_values($taxonomies[0]);
            $mainCategory = $terms[0];
        }
    }

    return $mainCategory;
}

We modified the code to specifically search for the first term of the category taxonomy.

<?php

/**
 * Gets the first term of the first taxonomy found.
 *
 * @return string
 */
public function getMainCategoryAttribute()
{
    $mainCategory = 'Uncategorized';

    if (!empty($this->terms)) {
        if (array_key_exists('category', $this->terms)) {
            $category = array_values($this->terms['category']);
            $mainCategory = $category[0];
        }
    }
    return $mainCategory;
}
$this->terms is another Corcel function that retrieves all taxonomies of a given post.

Using a progress bar

To enhance the user experience during the migration process, we implemented a progress bar. As each post was processed, the bar advanced, providing visual feedback on the migration's progress.

<?php

$bar = $this->output->createProgressBar(count($posts));

$bar->start();

$bar->advance();

$bar->finish();

The final solution to our problem

<?php

namespace App\Console\Commands;

use Corcel\Model\Post;
use Illuminate\Console\Command;
use Illuminate\Support\Str;
use Statamic\Facades\Entry;

class ImportWordPress extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'import:wp';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Import posts from WordPress';

    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }

    public function handle()
    {
        $lang = 'default';

        $posts = Post::whereIn('id', $ids)
              ->type('post')
              ->orderBy('post_date', 'desc')
              ->published()
              ->get();

        $bar = $this->output->createProgressBar(count($posts));

        $bar->start();

        foreach ($posts as $post) {

            // get the content of this post
            $content = $post->post_content;

            // match all instances where the image_url is found and get the id
            preg_match_all('/image_url="\d{4,5}/', $content, $matches);
            
            foreach ($matches[0] as $match) {
                $id = explode('"', $match);
            
                // find the image with this id and get the url
                $attachment = Post::where('id', $id[1])->first();
                $url = Str::replace('https://domain.com/wp-content/', 'assets/wp/', $attachment->guid);
            
                // replace the plugin text with img tag and url
                $content = preg_replace('(\[image_with_animation image_url="' . preg_quote($id[1], '/') . '[^\]]+])', '<img class="w-full h-auto" alt="image" src="' . $url . '">', $content);
            }

            // match all instances where the url has the post id and get the id
            preg_match_all('/"\/\?p=\d{4,5}/', $content, $matches);

            foreach ($matches[0] as $match) {
                $id = explode('=', $match);
            
                // find the post with this id and get the url
                $oldPost = Post::where('id', $id[1])->first();
                $url = Str::slug($oldPost->post_title);
            
                // replace the plugin url with slug url
                $content = preg_replace('(\?p=' . preg_quote($id[1], '/') . ')', $url, $content);
            }

            // remove all plugin data
            $content = preg_replace("~(?:\[/?)[^/\]]+/?\]~s", '', $content);
            $content = preg_replace('~(?:\[/?).*?"]~s', '', $content);
            $content = preg_replace('(\\[([^[]|)*])', '', $content);
            $content = preg_replace('/\[(.*?)\]/', '', $content);

            if ($content === "") {
                continue;
            }

            $content = (new \HtmlToProseMirror\Renderer)->render($content);
            $content = $content['content'];

            if ($post->thumbnail) {
                $featured_image = Str::replace('https://domain.com/wp-content/', 'wp/', $post->thumbnail->size('invalid_size'));
            } else {
                $featured_image = 'wp/no_image.jpg';
            }

            $category = $this->getCategories($post->main_category);

            $entry = Entry::make()
                ->collection('posts')
                ->locale($lang)
                ->slug($post->post_title)
                ->date($post->post_date)
                ->data([
                    'title' => $post->title,
                    'image' => $featured_image,
                    'categories' => $category,
                    'content' => $content,
                ]);

            $entry->save();

            $bar->advance();
        }

        $bar->finish();

        return 'WordPress posts successfully migrated to Statamic';
    }

    public function getCategories($category)
    {

        $categories = [
            'Uncategorized' => 'default',
            'Other' => 'default',
            'Spin' => 'disinformation',
            'Corona' => 'disinformation',
            'Coronavirus' => 'disinformation',
            'COVID-19' => 'disinformation',
            'DISINFORMATION' => 'disinformation',
        ];

        return $categories[$category];
    }
}

Quick note: If you have a huge amount of posts to migrate, the command may stop working. You should limit the number of posts you retrieve. With Laravel you can do this by modifying the command to this:
<?php 

$posts = Post::whereIn('id', $ids)
      ->type('post')
      ->orderBy('post_date', 'desc')
      ->published()
      ->skip(0)
      ->take(500)
      ->get(); //get first 500 rows
      
      
      // then simply change the numbers
      ->skip(500)
      ->take(500)
      ->get(); //get rows 501 to 1000


Bring Your Ideas to Life 🚀

Kickstart Your Statamic Project with the #1 Statamic Agency

Are you planning a new Statamic project or thinking about migrating your WordPress site to Statamic? Learn more about our expertise as a renowned Statamic development agency.

Arlind Musliu Portrait
Arlind Musliu

Cofounder and CFO of Lucky Media

Technologies:

Statamic
Heading Pattern

Related Posts

Stay up to date

Be updated with all news, products and tips we share!