seo

Drupal Links in Content

Submitted by tomo on October 11, 2012 - 3:01am

Problem: In your blog posts you have a lot of links to other pages on your site, then one day you decide to change the URL pattern for all nodes on your site, or you change the pattern and slowly update paths to nodes by editing and saving them. Unfortunately, you had hard-coded paths in your content bodies and now they are all leading to 404 pages. What to do?

1) Link checker (linkchecker) module will find the broken links. This is something you may need to use.

2) Path Finder is a module that turns your node links into permalinks with node id (slug) at the front of the URL so that any future change to title which results in a different URL still leads to the same node via the node id. Example: http://www.example.com/837/latest-news/my-descriptive-seo-friendly-url

Of course, this doesn't help you once you already have a bunch of nodes and content linking to them, but it's one strategy to start with. Then if you do change the titles or patterns of your nodes, as long as the node id / slug is still at the beginning of the url, then you won't get any 404 errors, although you'll then have multiple URLs pointing to the same content. So this isn't ideal.

3) Turn on Pathologic: "Pathologic is an input filter which can correct paths in links and images in your Drupal content in situations which would otherwise cause them to “break;” for example, if the URL of the site changes, or the content was moved to a different server. Pathologic is designed to be a simple, set-it-and-forget-it utility. You don't need to enter any special “tags,” path prefixes, or other non-content noise into your content to trigger Pathologic to work; it finds paths it can manage in your content automatically."

4) If you just need to remove a base path (like http://www.domain.com:8080) from all URLs, then URL Replace Filter (url_replace_filter) will suffice.

5) If you need to do more complicated search and replace on URLs, and want to use regular expressions, then use Search and Replace Scanner (scanner).

Finally, use Global Redirect to always have a single canonical path for each piece of content.

Dealing with Spam Comments on a Drupal Site

Submitted by tomo on October 11, 2012 - 1:26am

Drupal sites, like any content sites, naturally receive a lot of spam comments. Any blog or blog-like site which allows unauthenticated or anonymous users to comment on posts or any content is open to spam because spammers just want to have links to their websites displayed by your website for link-building purposes which is important for SEO. Spam comments used to be much more obvious as in blatant ads for penis-enhancing drugs. Nowadays, spammers are using more sophisticated scripts that are still easy to detect - by humans and usually also by spam-detection software. New spam might praise your blog or quote something you said in your blog post or just say something generic, perhaps on a related topic or perhaps not related at all. Then they will have their spam website in the URL field, if not also in the comment text. If the website has keywords which are totally unrelated to your blog, you can bet it's spam.

What can a Drupal website do about spam?

One old solution for spam, a common plugin on WordPress sites, is called Akismet. Akismet is a service and requires you to create an account and get an API key from them. Then they will help you detect spam.

Drupal has something like Akismet but improved and it was created by the Drupal creator himself. It's called Mollom, and it's also a cloud-based service, and you also create an account with them. Mollom is free to use although there are some paid services. I use Mollom on this blog. It catches hundreds of spam comments for me but also lets in a single false negative per day or so. I wouldn't have been able to turn on comments without turning on Mollom as I would immediately be deluged with spam. But I hope that Mollom continues to improve so that I don't even get 1 spam on most days.

Since I still do get some ham and spam comments, I have to process them somehow and report the spams to Mollom so they can improve their algorithms and heuristics for the future. Unfortunately, Drupal's default comments management view doesn't easily let me see if a comment is spam or not. In order to make that decision I need to see the URL and comment text that was saved. If there was no URL, it usually isn't spam, and if there was no URL in the comment text then most certainly it isn't spam. But if there was a link in the text, I need to see what site it links to, and see if the comment is at all intelligent or relevant. You can also tell by looking at any link saved in the URL field. Since Drupal's comment management page doesn't do this, I created a view, which you can import (this is Drupal 6) below.

The thing that Drupal's comment admin page offers is bulk operations. Mollom provides some operations such as report spam to Mollom and delete. This is usually what I want to do. Unfortunately, there's no way to use Mollom's actions in Views with Views Bulk Operations! So until this issue gets resolved (http://drupal.org/node/655846) I've done the next best thing which is to have links to the Mollom management page for each comment. It will still require you to open that page, select the reporting action you want, and then submit the form though. But this workflow has at least saved me some time and now I have removed all the really spammy spam from this site.

$view = new view;
$view->name = 'comments_more';
$view->description = '';
$view->tag = '';
$view->view_php = '';
$view->base_table = 'comments';
$view->is_cacheable = FALSE;
$view->api_version = 2;
$view->disabled = FALSE; /* Edit this to true to make a default view disabled initially */
$handler = $view->new_display('default', 'Defaults', 'default');
$handler->override_option('fields', array(
  'name' => array(
    'label' => 'Author',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'link_to_user' => 1,
    'exclude' => 0,
    'id' => 'name',
    'table' => 'comments',
    'field' => 'name',
    'relationship' => 'none',
  ),
  'homepage' => array(
    'label' => 'Author\'s website',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'display_as_link' => 1,
    'exclude' => 0,
    'id' => 'homepage',
    'table' => 'comments',
    'field' => 'homepage',
    'relationship' => 'none',
  ),
  'comment' => array(
    'label' => 'Body',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'exclude' => 0,
    'id' => 'comment',
    'table' => 'comments',
    'field' => 'comment',
    'relationship' => 'none',
  ),
  'delete_comment' => array(
    'label' => 'Delete link',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'text' => '',
    'exclude' => 0,
    'id' => 'delete_comment',
    'table' => 'comments',
    'field' => 'delete_comment',
    'relationship' => 'none',
  ),
  'edit_comment' => array(
    'label' => 'Edit link',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'text' => '',
    'exclude' => 0,
    'id' => 'edit_comment',
    'table' => 'comments',
    'field' => 'edit_comment',
    'relationship' => 'none',
  ),
  'hostname' => array(
    'label' => 'Hostname',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'exclude' => 0,
    'id' => 'hostname',
    'table' => 'comments',
    'field' => 'hostname',
    'relationship' => 'none',
  ),
  'status' => array(
    'label' => 'In moderation',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'type' => 'yes-no',
    'not' => 0,
    'exclude' => 0,
    'id' => 'status',
    'table' => 'comments',
    'field' => 'status',
    'relationship' => 'none',
  ),
  'timestamp' => array(
    'label' => 'Post date',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'date_format' => 'small',
    'custom_date_format' => '',
    'exclude' => 0,
    'id' => 'timestamp',
    'table' => 'comments',
    'field' => 'timestamp',
    'relationship' => 'none',
  ),
  'view_comment' => array(
    'label' => 'View link',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'text' => '',
    'exclude' => 0,
    'id' => 'view_comment',
    'table' => 'comments',
    'field' => 'view_comment',
    'relationship' => 'none',
  ),
  'cid' => array(
    'label' => 'Mollom',
    'alter' => array(
      'alter_text' => 1,
      'text' => '<a href="/mollom/report/comment/[cid]">Report to Mollom</a>',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'link_to_comment' => 0,
    'exclude' => 0,
    'id' => 'cid',
    'table' => 'comments',
    'field' => 'cid',
    'override' => array(
      'button' => 'Override',
    ),
    'relationship' => 'none',
  ),
));
$handler->override_option('access', array(
  'type' => 'none',
));
$handler->override_option('cache', array(
  'type' => 'none',
));
$handler->override_option('css_class', 'view-comments-more');
$handler->override_option('header', '<style>
.view-comments-more table {
background: white;
position: relative;
z-index: 100;
}
</style>');
$handler->override_option('header_format', '2');
$handler->override_option('header_empty', 0);
$handler->override_option('items_per_page', 30);
$handler->override_option('use_pager', '1');
$handler->override_option('style_plugin', 'table');
$handler->override_option('style_options', array(
  'grouping' => '',
  'override' => 1,
  'sticky' => 0,
  'order' => 'desc',
  'columns' => array(
    'name' => 'name',
    'homepage' => 'homepage',
    'comment' => 'comment',
    'delete_comment' => 'delete_comment',
    'edit_comment' => 'edit_comment',
    'hostname' => 'hostname',
    'status' => 'status',
    'timestamp' => 'timestamp',
    'view_comment' => 'view_comment',
  ),
  'info' => array(
    'name' => array(
      'sortable' => 1,
      'separator' => '',
    ),
    'homepage' => array(
      'sortable' => 1,
      'separator' => '',
    ),
    'comment' => array(
      'separator' => '',
    ),
    'delete_comment' => array(
      'separator' => '',
    ),
    'edit_comment' => array(
      'separator' => '',
    ),
    'hostname' => array(
      'sortable' => 0,
      'separator' => '',
    ),
    'status' => array(
      'sortable' => 0,
      'separator' => '',
    ),
    'timestamp' => array(
      'sortable' => 1,
      'separator' => '',
    ),
    'view_comment' => array(
      'separator' => '',
    ),
  ),
  'default' => 'timestamp',
));
$handler = $view->new_display('page', 'Page', 'page_1');
$handler->override_option('path', 'views/comments_more');
$handler->override_option('menu', array(
  'type' => 'none',
  'title' => '',
  'description' => '',
  'weight' => 0,
  'name' => 'navigation',
));
$handler->override_option('tab_options', array(
  'type' => 'none',
  'title' => '',
  'description' => '',
  'weight' => 0,
  'name' => 'navigation',
));

Drupal has many SEO features built in and available as contributed modules. One main one being Clean URLs. But it's often said that Drupal gives you "just enough rope to hang yourself".

Drupal has a powerful module called Pathauto that lets you create powerful URL patterns based on Tokens. Any number of modules can provides tokens via the Token API for you to use to construct URLs and you can easily create your own tokens with a bit of PHP code. With Path Auto you can also easily change the URLs for all your nodes, users, and taxonomy terms at once. This is the rope.

A big part of SEO is the link building you do outside of your site to get other sites to link to your site and your inner pages within the site. The problem is when you've done all this and have thousands of links to hundreds of your pages and then you decide to change all of your URLs. Suddenly, visitors to your pages are seeing 404 error pages and Google also no longer thinks you have any linked to pages.

So what can you do if you ever want to change the path of a node after you've already created the node and saved the URL? What can you do once you rebuild your paths from a new pattern? What if your maximum URL length limit was too short and now you need to fix all the long node URLs.

Fortunately, new versions of Path Auto integrate with Path Redirect, which is a module for managing lists of 301 URL redirects. So by hand, you could manually create 301 redirects from the previous alias to a new alias, but luckily you can also do this automatically now. A new option was added to Path Auto to "Create a new alias. Redirect from old alias." Choose this instead of just deleting old aliases and your new node paths should have old paths pointing to them, managed by Path Redirect.

Just remember that bulk update from within Path Auto isn't the only way to update URLs. You can also bulk update selected nodes from the default content management page and also edit the URL from each node's edit page.

To use this, just install both Path Auth and Path Redirect.

Related Issues: http://drupal.org/project/issues/pathauto?text=redirect+from+old+alias&status=All

When manually changing titles and thus URLs make sure to check "Automatically create redirects when URL aliases are changed." in Path Redirect's settings.

Regarding Path Redirect's fixes and Path Auto's fixes:

http://drupal.org/node/629742#comment-4336624


In short: the two work independently.

The option "Automatically create redirect" (on the Path Redirect admin screen) only governs situations where you manually change a path setting. Whether pathauto is or isn't installed, does not change this behavior.

Detail: when saving a node from the edit screen, the Path Redirect code executes before Pathauto does. It has no knowledge of whether Pathauto will or will not change the path alias, later on.

If pathauto runs afterwards and decides that the path needs changing, it will look only at its own "Update action" to determine what to do. If that is "Delete the old alias", it will indeed get deleted, and no redirect will be created (regardless of the "Automatically create redirect" option in the Path Redirect admin screen).

Drupal URLs can be pretty long. But practically, there's a limit in the database of 128 characters, even though browsers and web servers can support much longer URLs. With new versions of Path Auto, the schema is checked for alias column size. Then aliases longer than 128 characters are supported. This requires no hacking!

To change this, without just manually altering the length of the column in your database, you can use the following code in your own module:

/**
* Implementation of hook_schema_alter().
*/
function yourmodulename_schema_alter(&$schema) {
  $schema['url_alias']['fields']['dst']['length'] = 255;
}

Long URLs aren't very human friendly but even the oldest browsers can support URLs more than 2048 characters in length. Browsers support long paths, web servers support them, proxies support them, and mail clients either support them or break around 80 characters anyways. This doesn't mean you should carelessly throw around long URLs, but if you're generating them automatically then you don't need to cut them at an artificially low limit anymore.

Syndicate content
© 2010-2014 Saigonist.