Importing vocabularies and terms

We'll begin by importing your site's taxonomy data: that is, its vocabularies and the terms within them. Go to administer -> categoryi legacy, and you should see a page that looks like this:

The legacy taxonomy import screen
The legacy taxonomy import screen looks like this, and is the default screen that you see in category_legacy.

The taxonomy import screen is the default screen in the category_legacy administration interface, but all of the screens have quite a lot in common, and we'll talk about the common elements now.

First of all, there is quite a large amount of help text on the page, which is different for each page in category_legacy, and which you should read carefully before proceeding. Next, there is a table listing all of the top-level data elements (in this case, vocabularies) that are available on your site to import. If no appropriate elements can be found, then you will see a message indicating this, and there will be nothing more displayed on the page. But if there are appropriate elements available, then you will see them listed in the table, and below them you will see a series of options that you can configure before performing your import. At the very bottom of the page, you will find the all-important 'import' button itself.

That's the common elements of the category_legacy interface, in a nutshell. Now let's look at the settings that are being shown on this page. To begin, open up the general settings box, and we'll have a look at the settings inside it.

The general taxonomy import settings
These are general settings that you can apply when importing your old taxonomy data.

The first option, import descriptions into, allows you to choose where you want the descriptions for your vocabularies and terms to go. Most users will want to import their descriptions into the description field in the category system. This is particularly important if you're using the glossary module, as glossary definitions must be kept in the description field; otherwise, glossary is unable to use them. Descriptions are also used as caption text by some modules, such as forum and image.

However, if your terms have particularly long descriptions (say, a paragraph or more in length), you may be better off importing them as body text. People who use the taxonomy_context module may wish to do this, as taxonomy_context can be configured to display term descriptions almost as if they were 'body text'; although terms do not actually have a body field, since they are not nodes. In the category module, you no longer need to pretend that your description text is body text: you can make a clean start, and do things the right way.

The next setting, redirect existing legacy paths, will make category_legacy 'take control' of your site's taxonomy term URLs. Category_legacy will record a mapping of old taxonomy term IDs to new node IDs, and will redirect requests for taxonomy terms to the new corresponding node pages, based on these mappings. It is important to understand that this setting relies on category_legacy remaining enabled after you complete your imports. As soon as you disable category_legacy, the redirection will not be in effect (but the mappings will still be in the database). Also, if you re-enable the core taxonomy module, then category_legacy will not perform any redirects, and will instead return control of term URLs to the taxonomy module. Redirection is, however, fully compatible with the taxonomy wrapperi.

The change existing legacy path aliases option is one that you will only see if you have the path module enabled. So if you're wondering why you don't see this option, try enabling the path module, and it should appear straight away. This setting does not perform any redirection: it simply searches for aliases to taxonomy paths in the database, and changes those aliases to point to their new corresponding node paths. So you don't need category_legacy to remain enabled after your import for this setting, but you do need the path module to stick around.

Finally, the backup all selected data option takes all of your taxonomy data, and backs it up in a separate table in the database before proceeding with the import. As of writing, there is currently no mechanism for viewing or restoring these backups, but the actual backing up does work, and the rest is under development.

That's really all of the category_legacy-specific stuff on the page. The rest of the settings are basically all of the settings which: (a) can be found on the add category / containeri page; and (b) do not have equivalent settings in the taxonomy system that can be imported. So in the container information box, for example, you will find only a small subset of the options that are available in the same box on the add container page. This is because most of the settings in this box have equivalents in the taxonomy system.

But the boxes for the other settings, such as those defined by category_display and category_menu, are virtually identical to their counterparts on the add container form. Unlike the general container settings, these ones have no equivalent in the taxonomy system. Also, the additional settings are defined by each module in the category package that you have installed. So if you install more modules, it is likely that you will have more options available on this page. For example, installing category_export will give you an additional category export settings box.

Now, before you hit 'import', it is crucial that you understand this fact: only the vocabularies that you have selected will be imported, and nothing will happen to those that you don't select. If you select no vocabularies and hit 'import', you will get an error. The beauty of this approach is that if you wish to configure different vocabularies in different ways during the import, you can easily do so. Simply select a group of vocabularies, set all the options to your desired values, and import them. Then select the remaining vocabularies, set the options differently, and hit 'import' again. So you can import 5 vocabularies in one way, and 3 vocabularies in another way.

Also, if you wish to make one of your new containers a distant parentii of another, then you will need to import the distant parent before its children. Once you import the parent container, it will be listed as an available distant parent when you return to the import screen. It is currently not possible to configure distant parents for the terms that you import: you must configure the distant parents for your new categories manually, after you import them. Additionally, there is currently no facility to import distant parent taxonomy relationships as defined through the distant parent module, although this feature may be added in the future.

Now that you're an expert on the topic of the taxonomy import screen, you should be able to tweak all the settings to your liking, and then to go right ahead and hit 'import'. When you do, you should see a whole series of notification messages:

Completion of legacy taxonomy import
When the legacy taxonomy import completes successfully, you will see a series of notification messages similar to these.

Assuming that these messages are all positive (apart from the one that reads: all crucial and highly personal data permanently and irrevocably deleted from site in all entirety, bwahaha :-p), you should be able to safely assume that all went well, and that your import was successful! But don't just assume: check for yourself. Your new containers should be listed on the administer -> categories page; and if you configured menu items to be generated, they should be displayed in a block on the side.

Well now, that wasn't such a hair-greying experience, was it? And don't worry: if you thought that was easy, wait until you perform a book import - it makes a taxonomy import look like rocket science!

Import suggestions

I have done repeated imports into a site with a large number of taxonomy terms and nodes, and have a couple other comments for anyone else with a lot of data:

1) If you have a large number of nodes/terms the import could take several minutes, so alter your php timeout settings before you start, preferrably set the timeout period to indefinite (a zero value). I didn't do that once or twice and had the import time out on me leaving a few things high and dry.

2) As mentioned above, the import can take quite a while if you have a lot of items, so do it offline, then move the updated database online. Some of my vocabularies took 15-20 minutes to import (and that was on a dedicated machine with no one else on it), and then I wanted to have some time to look at things and make sure they all looked as I expected before making them public.

3) If you use taxonomy_menu, turn it off *before* you import. It won't break anything, but will make your menu huge - with an old-style taxonomy menu item and a new categoryi menu item for everything on your menu. In my case, the taxonomy menu items all had meaningless links, and there were so many items on my menu I had trouble getting the menu page to load so I could remove them (After spending lots of time trying, I figured out all I had to do was unenable taxonomy_menu to make them go away, but it would have been easier to just have turned it off ahead of time).

4) If you have an existing site you will certainly want to be sure to have a database map showing you the old and new node numbers for each item. That means you really *must* select the option to 'redirect existing legacy paths'. I missed that on one or two imports and had to redo them since I had no way to get people from the old link to the new one.

5) I can't over-emphasize the need to do a backup before starting. I actually did the import several times because once I worked with things a bit I realized I could save a lot of time by making different selections during the import, but its hard to know how you will use it without just diving in and importing some data, so you get into a vicious cycle. I recommend doing a backup of a good freeze point, then import one or two vocabularies or books and play with things a bit to see how it works best on your site. Once you have a feel for that, you'll start to see what options you should have taken to make it work better. After that you may find it easiest to just reload your backup and start the whole thing over.

All the above being said, I really really like category. It is much better suited for the way my site works than either taxonomy or book alone ever were. I really like the table of contents on each page, for instance, and the back/next links from one category to the next.

Thanks

Thanks for your comments, Karen. I've never tried doing an import on more than about a dozen items, so your experiences are very valuable in helping me to determine how well categoryi_legacy performs on a large dataset. From what you've said, it is currently not particularly well-equipped to handle a large number of items. I consider this to be a big problem, since most people will be using it for the very reason that they have a lot of data that needs to be imported.

I have submitted a feature request to the issue tracker for splitting large imports into multiple requests (and exports too, when they become available). This probably won't be developed just yet - there are many other things in that queue that need doing - but when it is ready, hopefully it will make a big import much easier.