WordPress migration to Statamic v3 (part 2)

WordPress migration to Statamic v3 (part 2)

Published:June 24, 2022

Updated: December 2, 2022

Views: 773

statamic
wordpress
migration
cms
laravel
corcel

WordPress to Statamic

In a previous post here we explain how to install Corcel for migrating your WordPress data to Statamic v3. You will need to check that post before further reading.

Use Case: Stripping Visual Composer tags

Our client has a WordPress website that is built with Visual Composer. After many struggles, they decide it's time to move on from WordPress and look for alternatives. We proposed to migrate their website to Statamic CMS and the client agreed. However, Visual Composer adds its own tags (inside of [] ) to the HTML content and we need to strip those out before saving them to Statamic. We also need to relink internal links because Visual Composer uses ids for them. The client also wanted to rename/remove some category names.

We are going to tackle the following issues:

  • Strip all Visual Composer tags from the HTML content
  • Relink internal page links to replace Visual Composer id links
  • Relink internal image URLs to replace Visual Composer id links
  • Importing the thumbnail image and using it as a featured image
  • Replace old category names with new categories

Keep in mind

In our previous post we get all the posts and save each one of them as entries. The code snippets in the following sections will be inside the foreach statement, but we won't mention this again. You could also check the final code in the end of this post.

foreach ($posts as $post) {
    // everything else
}

Strip Visual Composer tags

Upon retrieving the data we noticed that the content had some tags such as:

[vc_row][vc_column][vc_column_text]...

To remove them we used the PHP function preg_replace(). The post content is gradually filtered and saved to the $content variable that we will use when creating our Statamic posts. If any of the posts doesn't have a content, we skip that one.

<?php

// get the content of this post
$content = $post->post_content;
        
$content = preg_replace("~(?:\[/?)[^/\]]+/?\]~s", '', $content);
$content = preg_replace('~(?:\[/?).*?"]~s', '', $content);
$content = preg_replace('(\\[([^[]|)*])', '', $content );
$content = preg_replace('/\[(.*?)\]/', '', $content );

if ($content === ""){
  continue;
}

Relink the images

Images are wrapped inside Visual Composer image tags. When we strip Visual Composer tags, we also remove all images. That's why we need to do the tag stripping in the end.

Visual Composer retrieves the images through their ids and displays them on the page. These ids have no value for Statamic and that's why the images are lost. Our solution is the following:

  • Check all instances of image_url=12345 (12345 is an example id)
  • Loop through all found instances of image_url
  • Get the id by splitting the result with explode
  • Search the WordPress database for that id
  • Get the image name
  • Change from absolute to relative URL
  • Replace Visual Composer image tag with <img>
<?php

// match all instances where the image_url is found and get the id
preg_match_all('/image_url="\d{4,5}/', $content, $matches);

foreach ($matches[0] as $match) {
    $id = explode('"', $match);

    // find the image with this id and get the url
    $attachment = Post::where('id', $id[1])->first();
    $url = Str::replace('https://domain.com/wp-content/', 'assets/wp/', $attachment->guid);

    // replace the plugin text with img tag and url
    $content = preg_replace('(\[image_with_animation image_url="' . preg_quote($id[1], '/') . '[^\]]+])', '<img class="w-full h-auto" alt="image" src="' . $url . '">', $content);
}

Relink internal links

Visual Composer uses ids for creating relative internal links to other pages within our website. We need the following steps, similar to the section above when we replaced the image link:

  • Check all instances of p=12345 (12345 is an example id)
  • Loop through all found instances of p
  • Get the id by splitting the result with explode
  • Search the WordPress database for that id
  • Use the post title to create a slug
  • Change the URL from id to slug
<?php

// match all instances where the url has the post id and get the id
preg_match_all('/"\/\?p=\d{4,5}/', $content, $matches);
foreach ($matches[0] as $match) {
    $id = explode('=', $match);

    // find the post with this id and get the url
    $oldPost = Post::where('id', $id[1])->first();
    $url = Str::slug($oldPost->post_title);

    // replace the plugin url with slug url
    $content = preg_replace('(\?p=' . preg_quote($id[1], '/') . ')', $url, $content);
}

Insert featured image

The redesign of the website has a featured image when displaying all posts. That's why we had to import the thumbnail image of older posts and save them to a new folder named wp. If there is no thumbnail image, then we use no_image.jpg.

->thumbnail->size('invalid_size') is the way Corcel handles the retrieval of thumbnails.

<?php

if ($post->thumbnail) {
    $featured_image = Str::replace('https://domain.com/wp-content/', 'wp/', $post->thumbnail->size('invalid_size'));
} else {
    $featured_image = 'wp/no_image.jpg';
}

Convert HTML to ProseMirror

Before saving our HTML content to a Statamic entry, we need to convert it into ProseMirror. If you want to learn more about ProseMirror and TipTap, check out this awesome guide from Statamic.

<?php

$content = (new \HtmlToProseMirror\Renderer)->render($content);

$content = $content['content'];

Rename/remove category names

Our client wanted to group a couple of old category names into new names and rename some other categories. That's why we had to go through this array of old and new category names before saving the post as a Statamic entry.

<?php

// inside the handle() function
$category = $this->getCategories($post->main_category);

// extracted into a separate function
public function getCategories($category){
  
    $categories = [
      'Uncategorized' => 'default',
      'Other' => 'default',
      'Spin' => 'disinformation',
      'Corona' => 'disinformation',
      'Coronavirus' => 'disinformation',
      'COVID-19' => 'disinformation',
      'DISINFORMATION' => 'disinformation',
    ];

   return $categories[$category];
}

There was more than one category name for each post, but the client wanted to reduce that to a single category. Corcel already has $post->main_category which gets the first taxonomy name on the list. Although it says main_category, it actually retrieves the first taxonomy, as you can see:

<?php

// corcel/src/model/Post.php

/**
 * Gets the first term of the first taxonomy found.
 *
 * @return string
 */
public function getMainCategoryAttribute()
{
    $mainCategory = 'Uncategorized';

    if (!empty($this->terms)) {
        $taxonomies = array_values($this->terms);

        if (!empty($taxonomies[0])) {
            $terms = array_values($taxonomies[0]);
            $mainCategory = $terms[0];
        }
    }

    return $mainCategory;
}

We modified the code to specifically search for the first term of the category taxonomy.

<?php

/**
 * Gets the first term of the first taxonomy found.
 *
 * @return string
 */
public function getMainCategoryAttribute()
{
    $mainCategory = 'Uncategorized';

    if (!empty($this->terms)) {
        if (array_key_exists('category', $this->terms)) {
            $category = array_values($this->terms['category']);
            $mainCategory = $category[0];
        }
    }
    return $mainCategory;
}
$this->terms is another Corcel function that retrieves all taxonomies of a given post.

Using a progress bar

To make it visually more appealing, we will use a progress bar. The total of the progress bar will be the number of retrieved WordPress posts. For each post, we strip out Visual Composer tags, relink the internal page links, relink the internal image links, save the thumbnail image, convert HTML to Prose Mirror, etc. After doing each of these, we advance the bar. In the end, the bar is finished and the process is done. These are the code lines of the progress bar that you will find on the solution:

<?php

$bar = $this->output->createProgressBar(count($posts));

$bar->start();

$bar->advance();

$bar->finish();

The final solution to our problem

<?php

namespace App\Console\Commands;

use Corcel\Model\Post;
use Illuminate\Console\Command;
use Illuminate\Support\Str;
use Statamic\Facades\Entry;

class ImportWordPress extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'import:wp';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Import posts from WordPress';

    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }

    public function handle()
    {
        $lang = 'default';

        $posts = Post::whereIn('id', $ids)
              ->type('post')
              ->orderBy('post_date', 'desc')
              ->published()
              ->get();

        $bar = $this->output->createProgressBar(count($posts));

        $bar->start();

        foreach ($posts as $post) {

            // get the content of this post
            $content = $post->post_content;

            // match all instances where the image_url is found and get the id
            preg_match_all('/image_url="\d{4,5}/', $content, $matches);
            
            foreach ($matches[0] as $match) {
                $id = explode('"', $match);
            
                // find the image with this id and get the url
                $attachment = Post::where('id', $id[1])->first();
                $url = Str::replace('https://domain.com/wp-content/', 'assets/wp/', $attachment->guid);
            
                // replace the plugin text with img tag and url
                $content = preg_replace('(\[image_with_animation image_url="' . preg_quote($id[1], '/') . '[^\]]+])', '<img class="w-full h-auto" alt="image" src="' . $url . '">', $content);
            }

            // match all instances where the url has the post id and get the id
            preg_match_all('/"\/\?p=\d{4,5}/', $content, $matches);

            foreach ($matches[0] as $match) {
                $id = explode('=', $match);
            
                // find the post with this id and get the url
                $oldPost = Post::where('id', $id[1])->first();
                $url = Str::slug($oldPost->post_title);
            
                // replace the plugin url with slug url
                $content = preg_replace('(\?p=' . preg_quote($id[1], '/') . ')', $url, $content);
            }

            // remove all plugin data
            $content = preg_replace("~(?:\[/?)[^/\]]+/?\]~s", '', $content);
            $content = preg_replace('~(?:\[/?).*?"]~s', '', $content);
            $content = preg_replace('(\\[([^[]|)*])', '', $content);
            $content = preg_replace('/\[(.*?)\]/', '', $content);

            if ($content === "") {
                continue;
            }

            $content = (new \HtmlToProseMirror\Renderer)->render($content);
            $content = $content['content'];

            if ($post->thumbnail) {
                $featured_image = Str::replace('https://domain.com/wp-content/', 'wp/', $post->thumbnail->size('invalid_size'));
            } else {
                $featured_image = 'wp/no_image.jpg';
            }

            $category = $this->getCategories($post->main_category);

            $entry = Entry::make()
                ->collection('posts')
                ->locale($lang)
                ->slug($post->post_title)
                ->date($post->post_date)
                ->data([
                    'title' => $post->title,
                    'image' => $featured_image,
                    'categories' => $category,
                    'content' => $content,
                ]);

            $entry->save();

            $bar->advance();
        }

        $bar->finish();

        return 'WordPress posts successfully migrated to Statamic';
    }

    public function getCategories($category)
    {

        $categories = [
            'Uncategorized' => 'default',
            'Other' => 'default',
            'Spin' => 'disinformation',
            'Corona' => 'disinformation',
            'Coronavirus' => 'disinformation',
            'COVID-19' => 'disinformation',
            'DISINFORMATION' => 'disinformation',
        ];

        return $categories[$category];
    }
}

If you need to migrate your WordPress site to Statamic let's get in touch.

Quick note: If you have a huge amount of posts to migrate, the command may stop working. You should limit the number of posts you retrieve. With Laravel you can do this by modifying the command to this:
<?php 

$posts = Post::whereIn('id', $ids)
      ->type('post')
      ->orderBy('post_date', 'desc')
      ->published()
      ->skip(0)
      ->take(500)
      ->get(); //get first 500 rows
      
      
      // then simply change the numbers
      ->skip(500)
      ->take(500)
      ->get(); //get rows 501 to 1000

Stay up to date

Be updated with all news, products and tips we share!