How we did it: Moving The Miami Hurricane from College Publisher to WordPress

This post also appears on the Innovation in College Media blog.

The question we’ve heard most often since launching the new TheMiamiHurricane.com is, “How did you do it?” Below, Webmaster Brian Schlansky offers a comprehensive explanation of the process, from setting up our own Web server to installing WordPress to importing our College Publisher archives.

For more background, check out these posts:

Enjoy!

Greg Linch
Editor at Large for Online and Multimedia
Former Editor in Chief (fall 2007 to spring 2008)
The Miami Hurricane

To contact me, visit www.greglinch.com or e-mail greglinch[at]gmail.com.

How we did it: Moving from College Publisher to WordPress

By Brian Schlansky
Webmaster
The Miami Hurricane
webmaster[at]themiamihurricane.com 

 

Our migration from College Publisher to WordPress was a very interesting experience. I will try to explain as much as I can, so anyone thinking about doing a similar migration has a guide. Feel free to contact me with any questions.

The new site uses WordPress 2.6.1, SlideShowPro Director 1.2.3, Vimeo and Issuu. We originally started with WordPress 2.5 and SSP Director 1.1.9 and have since upgraded.

Our total cost, excluding man hours, was about $205.

PART 1: SETTING UP THE TECHNICAL ASPECTS

Hardware Setup (skip to Domain Setup if you already have or plan to purchase hosting)

Once we decided to use WordPress and the LiveWire 2.0 Premium News Theme (now part of Woothemes), I began the four-month-long process of getting us from a state of complete dependence on College Publisher for all of our needs to an entirely independent system.

 

 

The first thing anyone needs if they want to build a Web site is a host. There are a lot of very good hosts and most offer the same basic features, including a decent amount of storage space, FTP access and bandwidth for a relatively low monthly rate.

For the Hurricane, we had a special situation. We were getting a new file server for all of our data, freeing up the old one for Web use. So instead of paying a monthly fee for a host, we ended up with a very powerful dedicated server hosted on campus.

If you have a dedicated server available, this will be the cheapest way to build a site because since there are no recurring costs and the university pays for bandwidth. It also gives you complete control of the hardware.

After I transferred all the data to the new file server, I wiped Windows Server 2003 and installed Ubuntu Linux 8.04 Hardy Server Edition. I opted to install a LAMP (Linux Apache MySQL PHP) server along with an SSH server.

This edition does not include a graphic user interface and is entirely controlled from the command line. To make things easier to manage, I partitioned our already RAIDed drives into two partitions — one for the Ubuntu install and one for all the Web files.

At first, I tried to install the graphic interface from the Ubuntu servers, but I could not log in after it booted back up. I ended up erasing the install and starting over.

If you are uncomfortable working with a command line interface, there are a few ways around this. If you are familiar with Unix based systems, secure shell (SSH) is a way to connect to a server through a command line without needing to be at the physical location. Nearly everything I did with the server was through SSH.

After establishing an SSH connection, I installed Webmin, which is available at www.webmin.com. This is a self contained Web server that runs independently from any existing Apache server. It allows you to log in to control all the aspects of Ubuntu Server that you’d need without having to know any terminal commands.

This is as close to a CPanel (control panel) as I could get. If you purchase hosting elsewhere, a CPanel will provide you with the same functionality. I’m not sure how many other pieces of Linux software I installed, but I did need to install ProFTPD, for FTP access.

Once everything is configured, you will have a fully functional, Web server (Apache), PHP and database server (MySQL). All of these are WordPress requirements.

Domain Seup

I set up a virtual Apache host for WordPress and proceeded to install it using their “Famous 5-Minute Install” method. Once it was up and running, I added the LiveWire 2.0 theme and we were set with the WordPress install.

At this point in time, everything was running within the University of Miami network. Nothing was talking to the outside world and everything else we had was still controlled by College Publisher.

Since I would be building the site from home, I needed access to the server from off campus. Our IT person at UM opened the necessary ports for me to connect to the server. This opened the server to the world and allowed me to connect our domain name to it later.

It was only after I had successfully set up the Web server and had a working WordPress install that we notified College Publisher that we’d be terminating our contract at the end of the summer.

This began a series of phone calls. Many things had to be dealt with. College Publisher had control of our Web site, our two domain names, our e-mail forwarders and they hosted our blogs. All of these needed to be transferred to us.

We started by requesting our archives, which they provided in mid-May as a massive 1 GB zip file containing our database — with the articles — and a folder of every piece of media we have ever uploaded to the site. A separate database contained the attachment info for each piece of media and the article it corresponded to.

Due to this incompatibility with how WordPress manages attachments, we ended up dumping all images from the new site once all the articles were imported, which I’ll explain later.

The next step was to take control of our domain name. If you have a site with a registered domain, there are three pieces of the package: the domain registrar, the domain name system (DNS) host and the Web host.

In our case, I chose GoDaddy to manage the domain names and DNS information, with the Ubuntu server as the host. In order to make beta.themiamihurricane.com lead you to our beta, many connections needed to be made.

In June, I purchased a domain transfer from GoDaddy and College Publisher was sent some codes needed to initiate the transfer. After about a week and many phone calls, I had themiamihurricane.com in our GoDaddy account; however, The DNS information was still on College Publisher’s servers.

Many things need to be taken into consideration before making any changes to a domain’s DNS records. Everything from where www.themiamihurricane.com points to how to route webmaster@themiamihurricane.com to our e-mail is stored in the DNS information.

College Publisher provided us with our DNS records and I switched from their servers to GoDaddy’s. The change can take up to 48 hours, but ours was up and running within a few hours. I immediately entered the DNS information to prevent any e-mail from getting bounced while the servers update.

During this time, I also transferred the files and database of blogs.themiamihurricane.com to the new server. I pointed the “blogs” DNS entry to the server’s IP address and, within an hour or two, our blogs were running on our own server, independent of College Publisher.

After I knew everything was still working, I went ahead and set up a Google Apps account for the domain. All of the Hurricane e-mail accounts are Gmail accounts with @themiamihurricane.com addresses.

I created the account names we already had on College Publisher and switched the DNS records from College Publisher to Google. We now had our blogs and our e-mail out of their systems, leaving only the Web site remaining.

After a months-long issue regarding our other domain, thehurricaneonline.com, it joined themiamihurricane.com in our GoDaddy account. I didn’t change the DNS to GoDaddy until after we launched the new site. It now forwards to themiamihurricane.com.

I know this all seems like a lot of work — and it was — but the Hurricane chose to take complete control of our Web operation, from top to bottom, DNS to Ubuntu. I’m assuming that most other newspapers will elect to pay a company to take care of the site hosting.

PART 2: BUILDING THE SITE

Customizing WordPress is very simple. If a feature is missing, it can probably be added with a plugin. The Hurricane has nearly 20 plugins providing additional functionality.

Since WP is written in PHP, any customizations to the theme require a basic knowledge of the PHP syntax and the PHP variables WordPress understands. I don’t know PHP by heart, but it is very easy to customize once you recognize the patterns and relationships.

WordPress.org contains a slew of documentation and forums. Other customizations involve editing CSS. The entire site is basically the stringing together of about 10 different PHP files that the server processes into a single HTML document.

Our server is actually hosting three separate sites: WordPress, our blogs and SlideShowPro Director. Needing a solution for expanded multimedia capabilities, SSP was a no-brainer.

While other journalism students may be familiar with SoundSlides, SlideShowPro (and its companion product, Director) provides much more customization options and allows everything to be edited from a Web browser. I built a few different player configurations and the source information is fed to the slideshow in the embed code.

College Publisher actually uses SSP for its slideshows, but they offer no customization options for the presentation.

Database

As for importing archives into WordPress, a basic understanding of MySQL is necessary. The database came in the format of a CSV (Comma Separated Value) file. I’ll try to explain the process as simply as possible.

To prevent me from breaking our development site, I installed a virtual server on my MacBook using MAMP (Mac Apache MySQL PHP). I then installed a fresh copy of a standard WordPress install. Just enough to create a database.

Using PHPMyAdmin, I browsed the structure of the WP database. Basically the only tables you need to be concerned with are the wp_posts table, and the wp_term_relationships table. Create at least one test post, and all the categories you need represented on the site, so they appear in the database.

All of the articles are stored in the wp_post table, but their category assignment is in the wp_term_relationships table. To import everything from College Publisher to WordPress, you must rearrange the CP data into the structure WP uses.

I first imported the CSV into Microsoft Access 2007. I have a Mac and needed to run Access 2007 and Excel 2007 in Parallels Desktop because Mac Excel 2008 could not handle the large file efficiently.

Once I had it imported, I eliminated any information that is unnecessary. For instance, I replaced the ID numbers with a fresh column starting at 1, all the way through 9,000(ish). All the equivalent columns need to be renamed to match WordPress and the dates need to be formatted to match the WP date format. Columns that are missing must be added.

It’s tricky because all the section names need to be replaced with the WP category ID number. Then keeping everything in the same order, you need to separate the category IDs and the article ID into its own table that matches the wp_term_relationships table. To prevent a discrepancy, I eliminated blank or broken stories from the database before lining up the category information.

Once through with Access, I exported to Excel 2007. This allowed me to clean up the database structure and also add repeating information such as post type (post) and publishing status (publish). Also, I needed to add the author ID, which in my case was 1. We lost the author information in the migration process because it would have been too difficult to match up everyone with a WordPress user from the past seven years.

Once the tables were formatted in Excel I exported to a CSV. I then went to PHPMyAdmin on the virtual server running on my MacBook and went to the wp_posts table. I clicked Import and directed it to my .csv file. It is important to tell PHPMyAdmin to import as “CSV using LOAD DATA.”

It took about 10 failures to realize that. Then import the category information into the wp_term_relationships table using the same procedure. When the import is complete, you should find all of your archives in the temporary WordPress install. After some further cleaning up, I used WordPress’s export function to export all the stories into an XML file. I then imported the file into our real site.

It took about the equivalent of about three to four days of actual work, but I spread this out over a couple weeks as I optimized the process.

One important note, I was unable to import all 8,000 articles in one shot; it failed after about 4,000 articles. I ended up splitting the database into 8 groups of 1,000 articles and imported one group at a time.

14 thoughts on “How we did it: Moving The Miami Hurricane from College Publisher to WordPress”

  1. Hey Greg,

    Nice article! We’re happy you found Webmin helpful. I will point out, however, that if you’re comparing Open Source tools to cPanel, you’d want to compare to Virtualmin. It’s actually quite a bit more powerful than cPanel (even in the GPL version), while providing the ease of use for virtual hosting tasks that cPanel has always been good at.

    Webmin is more targeted to general purpose system administration tasks, and is certainly the top of the field in that regard (nothing else really tries to address as wide a range of tasks as Webmin, so nothing is really comparable), but it’s not really designed to make managing a virtual hosting web server easy. Virtualmin is by the same folks who created Webmin (Jamie wrote most of the code in both, and I work on themes, docs and UI work), and it runs on top of Webmin and uses Webmin extensively as a library of general functionality, but makes common tasks like “build a website” really easy–instead of having to use the Apache module to setup the VirtualHost, the BIND module to setup the zone, the Postfix or Sendmail module to create virtual maps and aliases, the MySQL or PostgreSQL module to create databases and assign them to the new virtual server owner, you merely click “Create Virtual Server” and fill out a form with a few fields…which then does all the other stuff for you. It’s a pretty big shift towards abstracting away the complexities of managing your site, while still keeping most of the strengths of Webmin (like the ability to hit the command line without fear of confusing Virtualmin or Webmin, since they both work directly with the configuration files rather than generate them from a database on each change, as most configuration GUIs do).

    I made a chart comparing Virtualmin GPL and the commercial version to cPanel and Plesk here:

    http://www.virtualmin.com/compare.html

    And, of course, we support all of our projects and products on the Webmin mailing list, and the Virtualmin.com forums…we don’t concern ourselves with whether someone has bought any of our commercial products. We’re always happy to have new users.

    Regards,
    Joe

  2. Hey Greg,

    Congrats on the move and fantastic to see you basing the new design on one of our WooThemes templates.

    Live Wire was one of my designs and I always get really excited seeing it being used in an innovative way.

    Keep up the great work.

  3. As Mark mentioned – this is fantastic work and we’re adding your design to our modification showcase, which we’re going to launch this week. So be on the lookout for that! :)

    Also – we’d love to sponsor you with a copy of any of our other themes. So drop us an e-mail and we’ll hook you up!

  4. I wondered what was the best or easiest hosting platform to use. I have used quite a lot of dedicated servers, linux, windows using both apache and iis but at the moment I am liking the Dreamhost system. For web developers who need to cut costs and time this is a must. If anyone has a better solution please let me know.

    Also in your opinion what do you think is better i.e. a Virtual server, dedicated or using cloud which I think is the same as virtual.

  5. Pingback: The Future of News

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.