June 24, 2022 · 5 min read · 971 views
In our previous article, we outlined the initial steps for migrating content from WordPress to Statamic v3 using the Laravel package Corcel. If you haven't already, be sure to read that post before continuing here, as it sets the foundation for the more complex migration tasks we'll address.
Many WordPress users eventually seek alternatives due to various reasons, be it performance issues, the desire for a more streamlined interface, or the need for greater flexibility. For our client, the breaking point was the cumbersome nature of Visual Composer. While Visual Composer is a powerful page builder, it can inject its tags (enclosed in square brackets []) into the HTML content, complicating the migration process. We need to strip those out before saving them to Statamic. We also need to relink internal links because Visual Composer uses ids for them. The client also wanted to rename/remove some category names.
We are going to tackle the following issues:
Strip all Visual Composer tags from the HTML content
Relink internal page links to replace Visual Composer id links
Relink internal image URLs to replace Visual Composer id links
Importing the thumbnail image and using it as a featured image
Replace old category names with new categories
As we explained in our previous article, we get all the posts and save each one of them as entries. The code snippets in the following sections will be inside the foreach
statement, but we won't mention this again. You could also check the final code in the end of this post.
foreach ($posts as $post) {
// everything else
}
We encountered content laden with Visual Composer-specific tags like [vc_row][vc_column][vc_column_text]...
.
Using the preg_replace()
PHP function, we filtered out these tags from the $content
variable, which is then ready for Statamic entry creation. Posts without content are skipped to avoid errors.
<?php
// get the content of this post
$content = $post->post_content;
$content = preg_replace("~(?:\[/?)[^/\]]+/?\]~s", '', $content);
$content = preg_replace('~(?:\[/?).*?"]~s', '', $content);
$content = preg_replace('(\\[([^[]|)*])', '', $content );
$content = preg_replace('/\[(.*?)\]/', '', $content );
if ($content === ""){
continue;
}
Since Visual Composer uses shortcodes to handle images, simply stripping tags can result in lost images. Visual Composer retrieves the images through their ids and displays them on the page. These ids have no value for Statamic and that's why the images are lost. Our solution is the following:
Identifying instances of image_url=12345
where "12345" is a placeholder for the image ID.
Looping through all image_url
instances to extract the ID.
Get the id by splitting the result with explode
Querying the WordPress database for the associated image.
Get the image name
Converting the image path from absolute to relative URLs.
Replacing the Visual Composer image tag with a standard <img>
tag.
<?php
// match all instances where the image_url is found and get the id
preg_match_all('/image_url="\d{4,5}/', $content, $matches);
foreach ($matches[0] as $match) {
$id = explode('"', $match);
// find the image with this id and get the url
$attachment = Post::where('id', $id[1])->first();
$url = Str::replace('https://domain.com/wp-content/', 'assets/wp/', $attachment->guid);
// replace the plugin text with img tag and url
$content = preg_replace('(\[image_with_animation image_url="' . preg_quote($id[1], '/') . '[^\]]+])', '<img class="w-full h-auto" alt="image" src="' . $url . '">', $content);
}
Visual Composer also uses IDs for internal links, which we needed to convert to slugs for Statamic. This process is similar to image relinking:
Locating instances of p=12345 (12345 is an example id)
Loop through all found instances of p
Get the id by splitting the result with explode
Search the WordPress database for that id
Use the post title to create a slug
Updating the link from ID to slug.
<?php
// match all instances where the url has the post id and get the id
preg_match_all('/"\/\?p=\d{4,5}/', $content, $matches);
foreach ($matches[0] as $match) {
$id = explode('=', $match);
// find the post with this id and get the url
$oldPost = Post::where('id', $id[1])->first();
$url = Str::slug($oldPost->post_title);
// replace the plugin url with slug url
$content = preg_replace('(\?p=' . preg_quote($id[1], '/') . ')', $url, $content);
}
The client's site redesign required a featured image for each post. We imported thumbnail images from the old posts and saved them in a new wp
folder. If a post lacked a thumbnail, we defaulted to no_image.jpg
.
->thumbnail->size('invalid_size')
is the way Corcel handles the retrieval of thumbnails.
<?php
if ($post->thumbnail) {
$featured_image = Str::replace('https://domain.com/wp-content/', 'wp/', $post->thumbnail->size('invalid_size'));
} else {
$featured_image = 'wp/no_image.jpg';
}
Statamic uses ProseMirror, a format different from HTML. To ensure compatibility, we converted the HTML content to ProseMirror. For those unfamiliar with ProseMirror or TipTap, Statamic provides a helpful guide.
<?php
$content = (new \HtmlToProseMirror\Renderer)->render($content);
$content = $content['content'];
Our client wanted to group a couple of old category names into new names and rename some other categories. That's why we had to go through this array of old and new category names before saving the post as a Statamic entry.
<?php
// inside the handle() function
$category = $this->getCategories($post->main_category);
// extracted into a separate function
public function getCategories($category){
$categories = [
'Uncategorized' => 'default',
'Other' => 'default',
'Spin' => 'disinformation',
'Corona' => 'disinformation',
'Coronavirus' => 'disinformation',
'COVID-19' => 'disinformation',
'DISINFORMATION' => 'disinformation',
];
return $categories[$category];
}
There was more than one category name for each post, but the client wanted to reduce that to a single category. Corcel already has $post->main_category
which gets the first taxonomy name on the list. Although it says main_category
, it actually retrieves the first taxonomy, as you can see:
<?php
// corcel/src/model/Post.php
/**
* Gets the first term of the first taxonomy found.
*
* @return string
*/
public function getMainCategoryAttribute()
{
$mainCategory = 'Uncategorized';
if (!empty($this->terms)) {
$taxonomies = array_values($this->terms);
if (!empty($taxonomies[0])) {
$terms = array_values($taxonomies[0]);
$mainCategory = $terms[0];
}
}
return $mainCategory;
}
We modified the code to specifically search for the first term of the category
taxonomy.
<?php
/**
* Gets the first term of the first taxonomy found.
*
* @return string
*/
public function getMainCategoryAttribute()
{
$mainCategory = 'Uncategorized';
if (!empty($this->terms)) {
if (array_key_exists('category', $this->terms)) {
$category = array_values($this->terms['category']);
$mainCategory = $category[0];
}
}
return $mainCategory;
}
$this->terms
is another Corcel function that retrieves all taxonomies of a given post.
To enhance the user experience during the migration process, we implemented a progress bar. As each post was processed, the bar advanced, providing visual feedback on the migration's progress.
<?php
$bar = $this->output->createProgressBar(count($posts));
$bar->start();
$bar->advance();
$bar->finish();
<?php
namespace App\Console\Commands;
use Corcel\Model\Post;
use Illuminate\Console\Command;
use Illuminate\Support\Str;
use Statamic\Facades\Entry;
class ImportWordPress extends Command
{
/**
* The name and signature of the console command.
*
* @var string
*/
protected $signature = 'import:wp';
/**
* The console command description.
*
* @var string
*/
protected $description = 'Import posts from WordPress';
/**
* Create a new command instance.
*
* @return void
*/
public function __construct()
{
parent::__construct();
}
public function handle()
{
$lang = 'default';
$posts = Post::whereIn('id', $ids)
->type('post')
->orderBy('post_date', 'desc')
->published()
->get();
$bar = $this->output->createProgressBar(count($posts));
$bar->start();
foreach ($posts as $post) {
// get the content of this post
$content = $post->post_content;
// match all instances where the image_url is found and get the id
preg_match_all('/image_url="\d{4,5}/', $content, $matches);
foreach ($matches[0] as $match) {
$id = explode('"', $match);
// find the image with this id and get the url
$attachment = Post::where('id', $id[1])->first();
$url = Str::replace('https://domain.com/wp-content/', 'assets/wp/', $attachment->guid);
// replace the plugin text with img tag and url
$content = preg_replace('(\[image_with_animation image_url="' . preg_quote($id[1], '/') . '[^\]]+])', '<img class="w-full h-auto" alt="image" src="' . $url . '">', $content);
}
// match all instances where the url has the post id and get the id
preg_match_all('/"\/\?p=\d{4,5}/', $content, $matches);
foreach ($matches[0] as $match) {
$id = explode('=', $match);
// find the post with this id and get the url
$oldPost = Post::where('id', $id[1])->first();
$url = Str::slug($oldPost->post_title);
// replace the plugin url with slug url
$content = preg_replace('(\?p=' . preg_quote($id[1], '/') . ')', $url, $content);
}
// remove all plugin data
$content = preg_replace("~(?:\[/?)[^/\]]+/?\]~s", '', $content);
$content = preg_replace('~(?:\[/?).*?"]~s', '', $content);
$content = preg_replace('(\\[([^[]|)*])', '', $content);
$content = preg_replace('/\[(.*?)\]/', '', $content);
if ($content === "") {
continue;
}
$content = (new \HtmlToProseMirror\Renderer)->render($content);
$content = $content['content'];
if ($post->thumbnail) {
$featured_image = Str::replace('https://domain.com/wp-content/', 'wp/', $post->thumbnail->size('invalid_size'));
} else {
$featured_image = 'wp/no_image.jpg';
}
$category = $this->getCategories($post->main_category);
$entry = Entry::make()
->collection('posts')
->locale($lang)
->slug($post->post_title)
->date($post->post_date)
->data([
'title' => $post->title,
'image' => $featured_image,
'categories' => $category,
'content' => $content,
]);
$entry->save();
$bar->advance();
}
$bar->finish();
return 'WordPress posts successfully migrated to Statamic';
}
public function getCategories($category)
{
$categories = [
'Uncategorized' => 'default',
'Other' => 'default',
'Spin' => 'disinformation',
'Corona' => 'disinformation',
'Coronavirus' => 'disinformation',
'COVID-19' => 'disinformation',
'DISINFORMATION' => 'disinformation',
];
return $categories[$category];
}
}
Quick note: If you have a huge amount of posts to migrate, the command may stop working. You should limit the number of posts you retrieve. With Laravel you can do this by modifying the command to this:
<?php
$posts = Post::whereIn('id', $ids)
->type('post')
->orderBy('post_date', 'desc')
->published()
->skip(0)
->take(500)
->get(); //get first 500 rows
// then simply change the numbers
->skip(500)
->take(500)
->get(); //get rows 501 to 1000
Are you planning a new Statamic project or thinking about migrating your WordPress site to Statamic? Learn more about our expertise as a renowned Statamic development agency.
Technologies:
Related Posts
Stay up to date
Be updated with all news, products and tips we share!