spam

Use your own mail server or outsource it?

Let's say you have a rather large mailing list of a few thousand email addresses and, well, sending it out with GMail is no longer cutting it. What are your options?

One recent option that's pretty cool for many reasons is outsourcing the sending to MailChimp. MailChimp is actually a pretty comprehensive solution for mailing lists. They will help you design your HTML email template by providing some base themes. And most importantly and perhaps underrated is that they will do their best to not get your emails flagged as spam and have your sending address blacklisted on a Real-time Blackhole List (RBL, or DNSBL). There are some 100 such RBLs which are constantly updated with the IP addresses of hosts on the Internet being used to send spam. Anything coming from those IP addresses will be flagged as spam. So you really do not want to end up on one of those lists. There are hundreds of public RBLs and you can search your mail server's IP address on them if you suspect that you're on a blacklist.

It's unlikely that MailChimp's servers will ever end up on a RBL since they would fight strongly to protect their reputation. But if you choose to run your own mail server (rather than offloading it to Google Apps for domains which is free for most small businesses and organizations) and also to use it to pump out large mailing lists then the onus is on you to set it up correctly. For example, don't let it be an open relay for anyone to send mail through. You'll also want to set up SPF records for your domain.

If you mess up and your mails look like spam to someone like Google or Yahoo or Hotmail (does anybody still use Hotmail to receive email rather than to send spam themselves?) then they will block mail from you to all of their users. Or they will get your mails but they will automatically go into their spam folders. Then you will be forced to thrust yourself into the Kafkaesque world of customer support at companies which don't have retail stores for you to visit, and which you aren't really a customer of to begin with.

First, are you sure your host allows you to send a lot of emails out? Is it your own server or VPS or a shared host? A shared host will probably rate-limit your email sending meaning it will take hours or days to send out a single email to your entire list.

Next, since you're not using MailChimp's cool interface (I really should get paid for plugging MailChimp so much even though I don't use it) you need to run some mailing list manager software.

First up is Mailman. Mailman is a GNU project. It's by hackers for hackers and thus requires perhaps more manual configuration inside config files than most non-technical people can handle. The interface is not easy to use either. Side note: There are several Drupal modules for connecting Drupal to Mailman.

An alternative to that is Phplist, "the world's most popular open source mailing list manager". OK. It's PHP and web-based, and might integrate with your existing PHP website although there's not much reason to. It's popular.

Then there's poMMo. poMMo is a basic piece of software with a decent web interface, except when there are errors. Unfortunately, poMMo didn't get fully developed before being abandoned by its developers. But somebody else has created their own poMMo project on GitHub: https://github.com/soonick/poMMo

Neither poMMo nor Mailman do click through detection so you would need another solution like a URL link shortener to create links that you can keep track of clicks on.

Dealing with Spam Comments on a Drupal Site

Submitted by tomo on October 11, 2012 - 1:26am

Drupal sites, like any content sites, naturally receive a lot of spam comments. Any blog or blog-like site which allows unauthenticated or anonymous users to comment on posts or any content is open to spam because spammers just want to have links to their websites displayed by your website for link-building purposes which is important for SEO. Spam comments used to be much more obvious as in blatant ads for penis-enhancing drugs. Nowadays, spammers are using more sophisticated scripts that are still easy to detect - by humans and usually also by spam-detection software. New spam might praise your blog or quote something you said in your blog post or just say something generic, perhaps on a related topic or perhaps not related at all. Then they will have their spam website in the URL field, if not also in the comment text. If the website has keywords which are totally unrelated to your blog, you can bet it's spam.

What can a Drupal website do about spam?

One old solution for spam, a common plugin on WordPress sites, is called Akismet. Akismet is a service and requires you to create an account and get an API key from them. Then they will help you detect spam.

Drupal has something like Akismet but improved and it was created by the Drupal creator himself. It's called Mollom, and it's also a cloud-based service, and you also create an account with them. Mollom is free to use although there are some paid services. I use Mollom on this blog. It catches hundreds of spam comments for me but also lets in a single false negative per day or so. I wouldn't have been able to turn on comments without turning on Mollom as I would immediately be deluged with spam. But I hope that Mollom continues to improve so that I don't even get 1 spam on most days.

Since I still do get some ham and spam comments, I have to process them somehow and report the spams to Mollom so they can improve their algorithms and heuristics for the future. Unfortunately, Drupal's default comments management view doesn't easily let me see if a comment is spam or not. In order to make that decision I need to see the URL and comment text that was saved. If there was no URL, it usually isn't spam, and if there was no URL in the comment text then most certainly it isn't spam. But if there was a link in the text, I need to see what site it links to, and see if the comment is at all intelligent or relevant. You can also tell by looking at any link saved in the URL field. Since Drupal's comment management page doesn't do this, I created a view, which you can import (this is Drupal 6) below.

The thing that Drupal's comment admin page offers is bulk operations. Mollom provides some operations such as report spam to Mollom and delete. This is usually what I want to do. Unfortunately, there's no way to use Mollom's actions in Views with Views Bulk Operations! So until this issue gets resolved (http://drupal.org/node/655846) I've done the next best thing which is to have links to the Mollom management page for each comment. It will still require you to open that page, select the reporting action you want, and then submit the form though. But this workflow has at least saved me some time and now I have removed all the really spammy spam from this site.

$view = new view;
$view->name = 'comments_more';
$view->description = '';
$view->tag = '';
$view->view_php = '';
$view->base_table = 'comments';
$view->is_cacheable = FALSE;
$view->api_version = 2;
$view->disabled = FALSE; /* Edit this to true to make a default view disabled initially */
$handler = $view->new_display('default', 'Defaults', 'default');
$handler->override_option('fields', array(
  'name' => array(
    'label' => 'Author',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'link_to_user' => 1,
    'exclude' => 0,
    'id' => 'name',
    'table' => 'comments',
    'field' => 'name',
    'relationship' => 'none',
  ),
  'homepage' => array(
    'label' => 'Author\'s website',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'display_as_link' => 1,
    'exclude' => 0,
    'id' => 'homepage',
    'table' => 'comments',
    'field' => 'homepage',
    'relationship' => 'none',
  ),
  'comment' => array(
    'label' => 'Body',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'exclude' => 0,
    'id' => 'comment',
    'table' => 'comments',
    'field' => 'comment',
    'relationship' => 'none',
  ),
  'delete_comment' => array(
    'label' => 'Delete link',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'text' => '',
    'exclude' => 0,
    'id' => 'delete_comment',
    'table' => 'comments',
    'field' => 'delete_comment',
    'relationship' => 'none',
  ),
  'edit_comment' => array(
    'label' => 'Edit link',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'text' => '',
    'exclude' => 0,
    'id' => 'edit_comment',
    'table' => 'comments',
    'field' => 'edit_comment',
    'relationship' => 'none',
  ),
  'hostname' => array(
    'label' => 'Hostname',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'exclude' => 0,
    'id' => 'hostname',
    'table' => 'comments',
    'field' => 'hostname',
    'relationship' => 'none',
  ),
  'status' => array(
    'label' => 'In moderation',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'type' => 'yes-no',
    'not' => 0,
    'exclude' => 0,
    'id' => 'status',
    'table' => 'comments',
    'field' => 'status',
    'relationship' => 'none',
  ),
  'timestamp' => array(
    'label' => 'Post date',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'date_format' => 'small',
    'custom_date_format' => '',
    'exclude' => 0,
    'id' => 'timestamp',
    'table' => 'comments',
    'field' => 'timestamp',
    'relationship' => 'none',
  ),
  'view_comment' => array(
    'label' => 'View link',
    'alter' => array(
      'alter_text' => 0,
      'text' => '',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'text' => '',
    'exclude' => 0,
    'id' => 'view_comment',
    'table' => 'comments',
    'field' => 'view_comment',
    'relationship' => 'none',
  ),
  'cid' => array(
    'label' => 'Mollom',
    'alter' => array(
      'alter_text' => 1,
      'text' => '<a href="/mollom/report/comment/[cid]">Report to Mollom</a>',
      'make_link' => 0,
      'path' => '',
      'link_class' => '',
      'alt' => '',
      'prefix' => '',
      'suffix' => '',
      'target' => '',
      'help' => '',
      'trim' => 0,
      'max_length' => '',
      'word_boundary' => 1,
      'ellipsis' => 1,
      'html' => 0,
      'strip_tags' => 0,
    ),
    'empty' => '',
    'hide_empty' => 0,
    'empty_zero' => 0,
    'link_to_comment' => 0,
    'exclude' => 0,
    'id' => 'cid',
    'table' => 'comments',
    'field' => 'cid',
    'override' => array(
      'button' => 'Override',
    ),
    'relationship' => 'none',
  ),
));
$handler->override_option('access', array(
  'type' => 'none',
));
$handler->override_option('cache', array(
  'type' => 'none',
));
$handler->override_option('css_class', 'view-comments-more');
$handler->override_option('header', '<style>
.view-comments-more table {
background: white;
position: relative;
z-index: 100;
}
</style>');
$handler->override_option('header_format', '2');
$handler->override_option('header_empty', 0);
$handler->override_option('items_per_page', 30);
$handler->override_option('use_pager', '1');
$handler->override_option('style_plugin', 'table');
$handler->override_option('style_options', array(
  'grouping' => '',
  'override' => 1,
  'sticky' => 0,
  'order' => 'desc',
  'columns' => array(
    'name' => 'name',
    'homepage' => 'homepage',
    'comment' => 'comment',
    'delete_comment' => 'delete_comment',
    'edit_comment' => 'edit_comment',
    'hostname' => 'hostname',
    'status' => 'status',
    'timestamp' => 'timestamp',
    'view_comment' => 'view_comment',
  ),
  'info' => array(
    'name' => array(
      'sortable' => 1,
      'separator' => '',
    ),
    'homepage' => array(
      'sortable' => 1,
      'separator' => '',
    ),
    'comment' => array(
      'separator' => '',
    ),
    'delete_comment' => array(
      'separator' => '',
    ),
    'edit_comment' => array(
      'separator' => '',
    ),
    'hostname' => array(
      'sortable' => 0,
      'separator' => '',
    ),
    'status' => array(
      'sortable' => 0,
      'separator' => '',
    ),
    'timestamp' => array(
      'sortable' => 1,
      'separator' => '',
    ),
    'view_comment' => array(
      'separator' => '',
    ),
  ),
  'default' => 'timestamp',
));
$handler = $view->new_display('page', 'Page', 'page_1');
$handler->override_option('path', 'views/comments_more');
$handler->override_option('menu', array(
  'type' => 'none',
  'title' => '',
  'description' => '',
  'weight' => 0,
  'name' => 'navigation',
));
$handler->override_option('tab_options', array(
  'type' => 'none',
  'title' => '',
  'description' => '',
  'weight' => 0,
  'name' => 'navigation',
));

Improved Google Spam Filter?

Submitted by tomo on February 26, 2011 - 10:08pm

Google, in response to the flood of recent concern about spam/content farms showing up in their results, have just announced a big change in their system of algorithms which calculate page rankings. They had previously published a Chrome plugin that lets you manually block results, and Google says the new algorithm blocks some 84% of the same sites that people were blocking with the plugin. I guess some people were controversially blocking non-spammy sites, rather than guess that Google's algorithm isn't good enough. Or isn't it?

Matt Cutts, the main anti-spam guy at Google, says the new algorithm change affects 11.8% of queries. Since the change is only effective in the US right now and I can browse from both Vietnam and the US, we can compare results and some one in eight queries should be improved.

So I tested "dog shampoo" out of the blue. I have never had a dog because I think they smell.

In Vietnam, high ranking results included drnaturalvet.com which had a low quality page of filler about dog shampoo and dogshampoo.info which is clearly a made-for-adsense site. In the US, the drnaturalvet link is much lower, but dogshampoo.info maintains the same high position. A link to content farm ehow.com is also lower now. And a link to dogshampoo.co.uk, a made-for-adsense site with nothing about dog shampoo at the time of indexing (see cache) is now gone too.

A search for winrar came up with fairly similar results in either country, and both maintained links to spam sites like software.informer.com.

A search for "tightvnc server authentication successful closed connection" punished duplicate content site pinoytech.org slightly but another duplicate/copy site efreedom.com maintained its position in the top 20. Both copy the StackExchange site SuperUser.com.

So it seems that the new algorithm change is an improvement, but I don't think it goes far enough to filter spammy results. While it may be a slight setback for those guys, they are still in the running and will be emboldened to try to rank higher.

There may still be a need for users to crowdsource a database of filtered spam sites until further algorithm improvements.

Note: The Atlantic did a similar test from India on "is botox safe" and "drywall dust" and found their results to be much improved.

Google Spam / Content Farm Filter

Submitted by tomo on January 21, 2011 - 3:06pm

There's been a lot of talk about the decrease in quality of Google search results over the years due to spammers / content farms with strong SEO skills. I'm glad I'm not the one who's been annoyed by this.

Google should know which sites are spam, content farms, or duplicated content. That they aren't properly filtering or demoting them could be due to a conflict of interest - they make money from the ads on those crap sites.

But we, as individuals, can easily distinguish the spam results from the quality ones and we do so everyday. If only there were a way to stop duplicating this effort.

If Google won't do this for us, then we can do this ourselves.

Here's what I want:
1. When I've been tricked into opening an ad-filled page without meaningful content, I want to go back to Google and mark that link as "spam", have that noted somewhere in the cloud so I can access it from any computer, and have future search queries filter out that link.

2. I probably don't want to see any pages from that domain show up on any other queries.

3. I probably don't want to see any pages that my friends have also marked as spam.

4. I probably don't want to see any pages that friends of my friends have also marked as spam.

5. I may even want to befriend / "follow" strangers just because they're good at marking spam.

Read the rest of this article...
Syndicate content
© 2010-2014 Saigonist.