Drupal Links in Content

Submitted by tomo on October 11, 2012 - 3:01am

Problem: In your blog posts you have a lot of links to other pages on your site, then one day you decide to change the URL pattern for all nodes on your site, or you change the pattern and slowly update paths to nodes by editing and saving them. Unfortunately, you had hard-coded paths in your content bodies and now they are all leading to 404 pages. What to do?

1) Link checker (linkchecker) module will find the broken links. This is something you may need to use.

2) Path Finder is a module that turns your node links into permalinks with node id (slug) at the front of the URL so that any future change to title which results in a different URL still leads to the same node via the node id. Example: http://www.example.com/837/latest-news/my-descriptive-seo-friendly-url

Of course, this doesn't help you once you already have a bunch of nodes and content linking to them, but it's one strategy to start with. Then if you do change the titles or patterns of your nodes, as long as the node id / slug is still at the beginning of the url, then you won't get any 404 errors, although you'll then have multiple URLs pointing to the same content. So this isn't ideal.

3) Turn on Pathologic: "Pathologic is an input filter which can correct paths in links and images in your Drupal content in situations which would otherwise cause them to “break;” for example, if the URL of the site changes, or the content was moved to a different server. Pathologic is designed to be a simple, set-it-and-forget-it utility. You don't need to enter any special “tags,” path prefixes, or other non-content noise into your content to trigger Pathologic to work; it finds paths it can manage in your content automatically."

4) If you just need to remove a base path (like http://www.domain.com:8080) from all URLs, then URL Replace Filter (url_replace_filter) will suffice.

5) If you need to do more complicated search and replace on URLs, and want to use regular expressions, then use Search and Replace Scanner (scanner).

Finally, use Global Redirect to always have a single canonical path for each piece of content.

© 2010-2014 Saigonist.