Terminak #3 has bad keyboard. Pkease fix.
 
April 26th, 2008

ALIPR Captchas

Captchas are so 2007. There are enough good captcha-breaking bots in the wild now that they’re pushing 10-15% success rates at decoding images, and can generate a new attempt every six seconds. Mail systems at Yahoo!, GMail and Hotmail all have been cracked in the past year. And Google’s Blogger service is under seige from spambots creating hundreds of thousands of splogs without human interaction — and they’re doing it through automated captcha cracking.

A new visual authentication system called IMAGINATION, from Penn State’s ALIPR (Automatic Linguistic Indexing of Pictures) program, takes a very different approach. Working with random images rather than characters means the pool of possibilities is not finite (image recognition is far more difficult than character recognition). And the two-part process refines the human requirement further: Find a center, then describe.

Imagination

But while traditional captchas have had problems with accessibility, ALIPR is going to be completely off-limits to the blind. Oh, and it takes up a whole screen, rather than a few hundred pixels2. That sounds like a deal-breaker right there. Or at least a deal-breaker until we get so fed up with being cracked that interaction designers are willing to give up an entire page to make it stop.

Once you solve the captcha, the site invites you to throw your best bot at it. I’m thinking maybe five years before the bots crack this one.

Music: David Byrne :: (The Gift Of Sound) Where The Sun Never Goes Down
April 2nd, 2008

Spam J-Curve

Weblog comment spam rates continue to surge. This chart is from one installation of anti-blog-spam tool Defensio, showing an insane uptick through the last part of March, 2008:

(Thanks ViperBond). Akismet’s charts show more than 5 billion blog spams identified in the past two years. I’ve personally noticed a dramatic increase in hand-written blog spam recently. Knowing that tools like Defensio and Akismet are going to get spammers banned from blogs net-wide within minutes, the method is now one of social engineering – getting bloggers to consciously allow spammy comments to go live by making them highly relevant to the post they’re attached to, and plausibly written. All that distinguishes this latest form are the author URLs — which no longer point to cialis and poker sites, but to tile shops, beauty parlors, commercial art galleries, pool-cleaning supply houses, etc. Human blog spammers have been around almost as long as bots (to defeat captchas, etc.) but this latest form amazes me because it’s written so carefully. I really have to puzzle over some of these recent ones to decide whether to push them through or not.

Because Akismet is less likely to have identified these as spammy, the moderation burden falls back onto blog authors. It’s no longer possible to identify spam at a glance – we now have to study each message carefully to ascertain sincerity.

January 2nd, 2008

When Good Mail Goes Bad

Great way to wrap up a holiday: Agreed to take on a new Birdhouse client – a mid-size company who’s had a horrible email experience with their previous “top tier” provider. They had a dozen or so addresses; could we take them on? No problem. The old host had been storing a couple weeks worth of their mail, but there was no way to get it through to the mail exchanger for delivery. The old host agreed to relay it all to Birdhouse for processing.

That’s when things turned ugly.

Turns out the previous host didn’t have the basic common sense to discard mail to unknown addresses on the domain (it hasn’t been feasible to accept mail for unknown names, like balloon345@domain.com) for years. But they were not only accepting it all; they relayed it ALL to Birdhouse.

300,000 messages worth, 95% of which was theoretically discardable.

Unfortunately, discarding crap mail isn’t trivial when parsing a queue that large. Needless to say, things came to a grinding halt. Complicating matters was the fact that Birdhouse actually utilizes two mail queues: One for MailScanner, which pre-processes spam, and another for Exim, which is the actual mail transfer agent. The MailScanner queue was so large we couldn’t even get things out into the Exim queue. Exim documentation assumes a single queue, and MailScanner doesn’t offer the same range of queue management options that Exim does.

Which meant I got to script a solution, examining each messages on the pre-que to determine whether it was destined for a valid or invalid address, and dropping it if invalid.

The script is running now, but will take a while. All spectacularly unpleasant. Once again, wanting to skewer a spammer or two and painfully aware of how much of my time is consumed by fighting bad guys.

Progress updates on the Birdhouse System Status page.

Music: Andy Bey :: I Let a Sing Fo Out of My Heart
December 17th, 2007

Spam Poster Art

Nogirls Thumbtack Press hosts a gazillion great pieces of indie art (whatever that means), available as posters. A friend tracked down this excellent collection of art posters made from common spam taglines. Also loved “Realize your dreams with our help for a short time.” How promising! Also: “This secret weapon will give more power to you little soldier.” Little soldier thanks you. See also: Spam Plants and Spamland.

Music: Leaders Of The New School :: Mt. Airy Groove
September 5th, 2007

Blackmail

The creativity of spammers never ceases to amaze me. Received this overnight, smack in the middle of a dozen spammy comments that made it through Akismet (but not through the moderation layer):

Hello , my name is Richard and I know you get a lot of spammy comments , I can help you with this problem . I know a lot of spammers and I will ask them not to post on your site. It will reduce the volume of spam by 30-50%. In return Id like to ask you to put a link to my site on the index page of your site. The link will be small and your visitors will hardly notice it, its just done for higher rankings in search engines.

I feel so vulnerable, so helpless. I don’t know who to turn to for help. OK Richard, you’ve got a deal!

Music: Steely Dan :: Black Cow
July 25th, 2007

Phishing Quiz

How good are you at identifying phishing scams? Interesting quiz at siteadvisor.com showing screenshots of 10 real sites and their phished counterparts side by side. I consider myself pretty well versed at picking out the tell-tale signs, but only got 8/10. What’s really scary is the fact that the quiz called me a “guru” for getting that score – which means that 20% of phishing sites are good enough to fool pretty much everyone (although the screenshots from the two I missed didn’t show the URLs, which is probably the most critical clue, though even those can be made to look convincing, or wholly spoofed in various ways).

How’d you score, and what threw you off?

Music: The Meters :: Chug Chug Chug-A-Lug-(Push N’ Shove)-Part II-(w Meters)
May 24th, 2007

Spamland

The spam I (secretly) most appreciate is the sort that uses randomly generated text cut-ups to bust spam filters, some of them fully worthy of the cut-up experiments Burroughs and Gysin were doing in the late 50s / early 60s.

In a gravitation without warning the face of rubbing grew sullen Black angry mouths, the clouds swallowed up the horsehair The air was religion with suppressed excitement

The Brothers McLeod are doing wonderful things with cut-up spam, having developed a series of animated characters to read aloud and act out the impossible, often mythical scenarios.

nodded. The door was closed and sealed again. Quietly forward. Hands extended, fingers lightly bowed. Iron John was Thats why there is no record of them

My own spam filters seem to have wised up to this form of spam in the past year, but every now and then one will eep through the multiple gatekeepers that mostly protect me from scarybig Spamland, discretely dropping special treats at my door in the middle of the night, causing me to feel the tiniest bit hopeful.

Accidental art committed for all the wrong reasons can still be beautiful, right?

Music: Will Oldham :: Ode #1b
May 1st, 2007

Volume, Volume, Volume

At IT Conversations interesting discussion (podcast) with Mikko Hypponen, director of anti-virus research for F-Secure. Hypponen threw out two sets of numbers that seem to collide, but don’t.

1) Spammers consider a response rate of 0.001% to be a “good” email spam campaign.

2) 40% of Americans (and 60% of Brazilians) report having made a purchase as a result of a spam sales pitch at least once.

How to square the difference? Volume, volume, volume.

I confess to having bought something from a spam once (and only once): A targeted pitch for a T-shirt bearing a big retro “Shacker” logo. It appeared that the spammer in that case had blasted their message to shacker@everydomain.com. No matter that “shacker” in the marketer’s context referred to college students who sleep in a different dorm room every night — I had to have it.

Music: Derek Bailey :: Gone With the Wind
November 16th, 2006

Botnets on the Rampage

“There has been a 67 percent increase in overall spam volume and a 500 percent increase in image spam since Aug. 2006.”

Botnet Illuminating (but seriously depressing) series of articles at eWEEK on botnets — arrays of 0wnz0r3d Windows computers assembled under the control of sophisticated “bot herders,” silently pumping every orifice of the interweb full of spam in all its forms. The virus that makes a machine part of a botnet does not cause harm to its host – like all successful viruses, it wants to assure its own survival. Amazingly, the latest generation of botnet software even installs antivirus software (a pirated copy of Kaspersky Anti-Virus, to be specific) to eradicate competing malware, so it can have the full resources of the infected host to itself.

For a while, it looked like botnet activity was shrinking, but lately it’s seen a huge uptick. vnunet reports that a million-bot botnet is quietly being assembled around the world, and that we’ll soon see an even more massive onslaught of phishing and spam attacks.

The sophistication of these systems is amazing — the botnets even come with their own self-contained DNS system. “This allows a bot herder to dynamically change IP addresses without changing a DNS record or the hosting—and constant moving around—of phishing Web sites on bot computers.”

So can’t botnet hunters just focus on nailing the central command and control machines? Nope – that’s the “beauty” of using a peer-to-peer model:

Control is still maintained by a central server, but in case the control server is shut down, the spammer can update the rest of the peers with the location of a new control server, as long as he/she controls at least one peer.

One of the many factors that makes fighting back so hard is that infected bots expect incoming commands to be digitally signed. Commands from the bot herders to members of the botnet are securely encrypted, and virtually impossible to decipher or reverse-engineer.

The sophistication of modern spammers is impressive on so many levels. Image spam (e.g. Viagra ads that appear as graphics rather than text) has been especially vexing lately, as it seems to elude all filters. Since almost all anti-spam mechanisms — even collaborative ones like Akismet — rely to some extent on the ability to deduce unique “signatures” from a message, every single image sent by machines on a botnet has slightly different dimensions and characteristics, making it nearly impossible to nail down. I’ve even noticed random graphical noise splattered in the background of image spam lately – which prevents any two images from producing identical signatures.

I think I was wrong when I said recently that my IP firewalling script was becoming less effective because spammers had learned to spoof IPs. I believe now that the problem is that the botnets are so widely distributed that the same IPs don’t come up with enough repetition to be useful. Rather than spam spewing from a volcano somewhere in the Ukraine for a few days, it’s now more like a steady mist that suffuses the atmosphere – an endless acid rain emanating from everywhere at once.

What amazes me is that articles like this never seem to point out the obvious: The botnets are comprised entirely of Windows machines. There are currently approximately 5.7 million infected Windows computers out there, ready and able to join a botnet at any time. If I were the sysadmin of a Windows network, this would be significant information to me. It’s not that OS X or Linux are theoretically incapable of this kind of takeover, but the plain reality is that it doesn’t happen. And yet, articles like this never make a recommendation that admins consider a platform shift. Why?

Sadly, experts are starting to feel hopeless about their prospects of staying in front of the game.

We’ve known about [the threat from] botnets for a few years, but we’re only now figuring out how they really work, and I’m afraid we might be two to three years behind in terms of response mechanisms,” said Marcus Sachs, a deputy director in the Computer Science Laboratory of SRI International, in Arlington, Va.

Amazon is having serious issues with spam, as is del.icio.us. Of course one would expect large services to be constantly hammered with spam, but if the largest and best-funded commercial entities on the web can’t keep spam off their public doorsteps, you know things are getting serious out there.

It’s becoming increasingly popular for admins to block entire nations, either at the apache or at the firewall level. I’ve been tempted to do the same myself, but haven’t. Yet.

All of this applies to the interactive aspect of the web as much as it does to email. I deal with it on wikis, discussion boards, blogs, and apache logs (referrer spam). In recent months, I’ve seen them stuffing personal contact forms, and even the public jobs database at the j-school (which is absurd, since no job ever gets published without human review, but that doesn’t stop them from trying). Amidst all the Web 2.0 talk of participatory journalism, the wisdom of crowds, the read/write web, and two-way communication, it’s those very features that are being exploited by spammers and the massive botnets.

I worry that the openness that made the internet possible will ultimately become the sword upon which it impales itself. I see a future where everything is so locked down that all of the fun participatory stuff becomes impossibly difficult. I worry that someday email will only be feasible with whitelisting, that registration with identity verification will be required for all participatory web features, and that the concept of anonymity will ultimately become untenable.

Compare the atmosphere of the internet to the ecology of the earth. It took us millions of years to get to industrial civilization, then just a few decades to pollute our environment to the brink of sustainability. I worry that the internet is following a similar course – 30 years to become mainstream and five years to become so polluted it’s unusable.

Thanks Mal

Technorati Tags: ,

November 2nd, 2006

TinyTuring

John Battelle’s SearchBlog, which is hosted by Birdhouse, has been undergoing a constant (and brutal) deluge of weblog comment spam over the past few days. It’s always been bad for him, but I’ve never seen anything like this. Akismet is still the bomb, but even the mighty Akismet couldn’t stay out in front of this wave. Since Akismet only knows about spam that’s been submitted to it by the hive mind, the first blogs to receive a new wave of spam are unprotected by it.

The script I wrote a while ago to query blog databases for spammy behavior and shunt IP addresses into the firewall works wonderfully when IP addresses are legitimate, but it seems that most spammers know how to fake their IPs these days, rendering it ineffective.

Ever wondered what a comment spam blitzkrieg looks like from a server load perspective? Take a look at the load average graph from today (snapshot every 6 minutes):

Comment Spike-1

Those spikes, some representing fairly long blocks of time, represent thousands of bogus comments being submitted into battellemedia.com simultaneously. For reference, load averages shouldn’t spike above 1.0 too often, or things get uncomfortable. This is why spammers – especially weblog comment spammers — make me insane.

Decided Battelle needed a second line of defense. We were reluctant about using a captcha for the usual accessibility reasons, so I went looking for a good Turing test system and found TinyTuring by Kevin Shay. As human detectors go, it doesn’t get much simpler than this – requires comments to enter just a single randomly selected letter. A hidden salt prevents algorithmic detection. Required modification of three MT templates. So far, 100% effective. Yes, armies of underpaid Malaysian human spambots can still jam crap into the system manually, but those comments will still have Akismet to deal with.

The cat and mouse game continues.

Music: Billy Martin :: Strangulation
October 21st, 2006

MailScanner

Mailscanner Recently installed an update/add-on to cPanel for Birdhouse Hosting – a package called MailScanner which integrates the usual complement of open source spam and virus controls (SpamAssassin, ClamAV, Razor, DCC) into a combined package, provides more spam config controls for individual hosting accounts, and provides the admin with a bunch of reporting tools. I can now see at a glance (graphically) how many messages are passing through the server each day, what percentage of them have been flagged as spam or virii, or drill down and get similar reports for individual domains or users. At left: A snapshot of mail and spam traffic on Birdhouse over the past week:

Highlights:
10,000 total messages processed on 10/16
77.8% of mail was flagged as spam today
(read that last one another way: less than 23% of the mail we’re spending money to process and handle is legitimate)

If you’re wistful for the good old days when you could use a “catch-all” address to receive mail bound for anything@yourdomain.com, note: 5,016 out of 6,449 messages received today were addressed to unknown email accounts on domains we handle. Which is why most hosts (including Birdhouse) strongly recommend against using catch-all addresses any more. Spammers 0wnz0r the ozone.

Music: Tom Glazer & Dottie Evans :: Constellation Jig
August 10th, 2006

Extreme Telemarketing

Never thought I’d feel sympathy for a telemarketer, but get an earful of this. My heart goes out to the poor guy. Kind of. Despite the caller’s general craziness, she does raise a point with him that I’ve tried before in conversations with telemarketers: The practice violates the categorical imperative, from which all moral action derives (according to Kant, and I agree):

Act only according to that maxim by which you can at the same time will that it would become a universal law.

In other words, don’t do anything that you don’t think all other people should also be allowed to do in the same situation. In context, one should engage in telemarketing only if one believes that all marketers should be allowed to call people in their homes. Consider the massive amount of advertising around us at all times, and imagine that every advertiser pushed their product by calling people at home. Universalizing the practice of telemarketing to all practitioners would make the telephone utterly useless, since it would never stop ringing, much as e-mail spam has diminished the viability of e-mail (which is only rescued through the application of great piles of technology).

When faced with the categorical imperative (though of course the caller does not call it that :), the telemarketer starts to lie to cover his position, saying that marketers do call his home phone all day every day, and that he doesn’t mind a bit.

Unfortunately, the caller’s philosophically sound position is completely blown out of the water by absolutely insane levels of hysteria.

Music: Sylford Walker :: Deuteronomy
July 26th, 2006

Spam Plants

Spamplants Romanian-born computer artist Alex Dragulescu turns crap into gold — he’s developed a computational analysis system to transform ordinary spam into renderings of organic-looking plants (though some look more like sea anemones to me). via c|net:


For the Spam Plants, he parsed the data within junk e-mail–including subject lines, headers and footers–to detect relationships between that data. For example, the program draws on the numeric address of an e-mail sender and matches those numbers to a color chart, from 0 to 225. It needs three numbers to define a color, such as teal, so the program breaks down the IP address to three numbers so it can determine the color of the plant. The time a message is sent also plays a role. If it’s sent in the early morning, the plant is smaller, or the time might stunt the plant’s ability to grow.

Dragulescu has also done similar projects with architecture, weblog text, transit, etc.

Music: Lou Reed and John Cale :: Nobody But You
July 1st, 2006

reducer: bad ips –> firewall

At the end of my rope with server loads caused by weblog and email spammers. SpamAssassin and Akismet etc. may keep spam away from users, but all that stuff still needs to be processed (and we’re talking about a huge percentage of all traffic).

Recently switched from the APF firewall to ConfigServer’s excellent CSF, which is integrated into WebHost Manager (the admin back-end for cPanel systems), and got thinking — the most heavily trafficked blogs here are already using spam rating systems that track IPs. The right script could harvest and rank those IPs and load them into the firewall in near real-time. Spent the past few evenings building a shell script to do just that.

reducer: Harvests bad IP addresses from multiple sources and adds them to the CSF firewall for cPanel systems. This version works with WordPress and Movable Type weblogs, and optionally the exim ACL deny system. Future versions will scan other sources for bad IPs as well.

Update, April 2008: Birdhouse Hosting has been running reducer system-wide for almost two years now, with great success. At this point, we wouldn’t even consider running a hosting business without it.

Download reducer here.

April 11th, 2006

Blogosphere Suffers Spam Explosion

c|net on the increasingly difficult problem of fighting spam on weblogs:

Boing Boing would allow its readers to leave comments and engage in a discussion on the wildly popular blog, if it weren’t for spam.

The piece focuses more on problems bloggers themselves face:

“It is a major hassle,” Frauenfelder said. “It is just getting worse and worse. My fantasies of violent revenge against spammers become more lurid every week.”

than on problems caused for their web hosts, and is a superficial overview in many respects, but it’s good to see some mainstream attention to the problem, which consumes more of my time than I had ever imagined it would.

At this point, I’ve tried every approach under the sun for the Birdhouse bloggers: standard blacklists (a moving target), moderation and authentication (chilling effect on conversation), mod_security blacklists (hard to keep updated, resource intensive), javascript (ultimately hackable), referrer tracking (shuts out commenters behind certain firewalls)…

But I’ve never had it as easy as I have since switching to WordPress and setting up the distributed Akismet system, which has blocked more than 1,000 spams from this blog in the past two weeks without a single false positive, and while requiring very minimal system resources. Sounds like a lot, but some of my users average around one spam/trackback submission attempt per minute, 24×7. You do the math.

Music: The Flaming Lips :: What Is The Light?

Technorati Tags:

December 28th, 2005

Who Gets No Spam?

Lebkowsky posts about his mostly-rosy transition from Outlook to Thunderbird, but wonders why the spam controls aren’t more robust. “… and though the junk mail filters are clearly catching a large percentage of the umpty hundreds of spams that fall into my mail bucket every day, there’s a bunch more that the filters miss.”

What I don’t get is why people are still dealing with daily buckets of spam on the client side at all. It’s been years since most mail hosts began offering excellent server-side spam handling (Birdhouse included). I’ve found the combination of SpamAssassin + ClamAV + RulesDuJour to be tremendously effective. And don’t forget to disable your “catch-all address — probably the most powerful single spam magnet you can have. After months of not landing a single false positive, I finally stopped using a server-side “Junk box” for monitoring at all – now I just set my spam threshold to 2.5 and let the systems delete spam before it ever hits my server-side mailbox.

Result: About 90% of the mail bound for my addresses is discarded without ever being seen by a human or handled by a mail client. What finally slips through the net is a grand total of about 3-5 spams a day.

On the TWiT podcast, John Dvorak gets teased regularly — by industry experts, no less — for his claim “I get no spam.” What’s so outlandish about that? If you’re still getting spam in your mail client, you probably just need to turn on the controls your mail host probably already has set up for you. And if your mail host doesn’t offer server-side spam controls, find one that does.

Music: Half Man Half Biscuit :: Bottleneck at Capel Curig
April 29th, 2005

SpamLookup

Just installed Brad Choate’s SpamLookup for John Battelle’s MT installation, ditching MT-Blacklist for the time being. Looks simple on the surface, but dig into the options and you start to realize this is the next generation comment/trackback-fighting tool. Actually, it’s a whole toolbelt, including realtime distributed blacklists (which probably accomplish 95% of the dirty work alone), moderation levels and exceptions for various types of commenters, bannable wordlists, and a built-in “Passphrase” feature you can use as a human detector. This last being similar in concept to a captcha, but text-based rather than graphical. The commenter is required to answer a dirt-simple question such as “What is John’s name?,” which a bot would be hard-pressed to do. If I wasn’t having such great success with MT-Keystrokes on birdhouse, I’d install it here as well…

Music: Roland Kirk ::Bag’s Groove
April 14th, 2005

NonJunk

Studying email headers of a spam turdlet that slipped through the net, found this in the headers, trying to pass as header lines added by SpamAssassin:

X-IMAPbase: 1113505409 1 NonJunk
Status: O
X-Status:
X-Keywords: NonJunk

The cat-n-mouse game is never-ending.

Music: blur :: country house
March 30th, 2005

Tagging Non-English Spam

Have recently noticed a huge uptick in the amount of non-English (especially Chinese) spam, which slips through the SpamAssasin nets much more readily than English spam (at least it does in most Western SA setups; not sure how different things are for, say, Chinese hosts).

Turns out you can tell SpamAssassin to give higher ratings to messages written in languages other than those you’ve explicitly sanctioned. Higher ratings mean more likelihood of messages getting tossed to /dev/null or saved in a junk box. In your local.cf or user_prefs, just add:

ok_languages en de la th sv

(e.g.) to accept messages in English, German, Latin, Thai, and Swedish. Full list of language codes here. Works a treat.

Music: The Seeds :: 900 Million People Daily
March 19th, 2005

Field Notes on Comment Registration

In order to respond to Birdhouse customers who want an answer to the question: “Why are you enforcing comment registration on Movable Type weblogs? Have you really exhausted all other options?,” I’ve put together this Brief History of Our Battle With Comment Spammers to summarize what we’ve done in the past, why it hasn’t worked, and why we think comment registration is our only remaining recourse.
(more…)

March 18th, 2005

Comment Registration Required

I’ve had it with Movable Type comment spam blitzkriegs dropping available server CPU to 0 and broadsiding web and mail services. Last night we endured a comment spam attack so severe it knocked out the mail server overnight. If you’ve followed this space for a while, you know I’ve tried virtually every trick and upgrade at my disposal to deal with the problem. But it just keeps getting worse.

A few minutes ago, I switched this weblog to a comment-registration-required system. I know this will discourage a percentage (probably a good percentage) of casual comments, and that’s a bummer. But TypeKey registration is trivially easy, and your registration will work at any TypeKey-enabled blog on the internet.

I’ve also just announced the new comment registration policy on status.birdhouse and to the owners of our four most intensive MT users.

My hatred of spammers is boundless and bottomless.

Music: The Roches :: Hammond Song
January 18th, 2005

nofollow

If an href tag includes the rel="nofollow" attribute, well-behaved search engines won’t follow the links they represent when spidering. So if there was a way to automatically modify the links that comment spammers leave in comments, their chief goal — raising their standings in the search engines — would be deflated.

SixApart has just released the nofolllow plugin, which scans incoming comments and adds rel="nofollow" to each embedded link automatically. Normal users are not affected — they can still click the links. But the simple presence of links to spammer’s sites will do nothing whatsoever for their GoogleRanks.

The downside, as I see it, is that for this to be effective, it must be intalled in the majority of weblogs. Spammers need to understand that their campaigns are flaccid, and that won’t be true until most of the world is using a solution like this.

Just installed nofollow at birdhouse and at the J-School.

Music: African Head Charge :: Far Away Chant
December 29th, 2004

Comment Spam Nihilism

Applying the MovableType 3.14 upgrade made a huge difference in server CPU usage when undergoing comment spam blitzkriegs, which now amount to barely a blip on the resource usage radar. Peace at last. Until…

A few days later we face a new anomaly: Someone out there has created a script that submits fake comments containing randomly generated URLs (all non-active and non-registered), randomly generated fake IPs, and randomly generated fake email addresses — they’re coming in locust clouds of one or two hundred at a time.

Because there are no recurring strings in these comment spams, blackisting them is pointless, and would only fill a blacklist database with garbage. Because the domains advertised are non-existent, I can’t correctly classify them as spam – they don’t advertise anything. Their purpose is purely vandalistic; to annoy blog owners and admins.

Even though Blacklist doesn’t catch them, they’re still held for moderation (so resource usage is nill), but you do have to take the time to batch-delete the suckers.

Posted a query to see if anyone had advice on battling this form of nihilism, but nothing useful so far. I’m quickly coming closer to the last resort: Forced registration for untrusted commenters.

December 21st, 2004

eWeek on Comment Spam

Heard from a reporter at eWeek yesterday who wanted to interview me about Movable Type comment spam overloads and how they affect web hosts. Unfortunately I got the email too late and wasn’t interviewed for the story, which was published today.

Six Apart has released MT 3.14 to address a bug which was triggering rebuild behavior even in settings where it shouldn’t be necessary, such as when moderated comments are added (99% of comment spam is held as moderated by various mechanisms). We’ll be applying the patch to birdhouse blogs throughout the day.

December 16th, 2004

Comment Spam – Up Against the Wall

The weblog comment spam problem has implications beyond crowded inboxes for users. Even with tools such as the incredible MT-Blacklist (which has blocked or moderated tens of thousands of comment spams on birdhouse-hosted blogs in the past few months), each request still requires a CGI process and a database request. When the spambots launch their massive onslaughts, shared hosting environments reel from the resource requirements. The problem has reached a critical threshold, and the muckety mucks at SixApart are coming out of the woods to address it head-on:

Jay Allen (author of MT-Blacklist and Product Manager at Six Apart) and Anil Dash (big cheese at SixApart) have both posted “official” positions on MT comment spam in the past few days.

So it looks like patches will be released in the next few days to address the biggest issues for web hosts. I like the fact that they’re approaching this not just as an MT problem but as an issue that affects all online discussion forums. The key to satisfying frustrated web hosts will be in creating a solution that can somehow block comment spam blitzkriegs without having to make a CGI and/or database call for every incoming request. It’s a hard problem to solve.

Update: Very good read on the many aspects and dimensions of comment spam load issues over at photodude. Throwing more hardware at the problem doesn’t make it go away (drooling over the server described there). Long comment section, also worth reading. One comment on the question of whether dynamic or statically generated sites fare better under this kind of load:

Also, last month, my husband and I shut down WordPress on the colo server we share with 3 other people, because … hits from comment spammers were making everything so slow. So we installed prerendering, which, if I’m reading this correctly, takes away the advantage of WP being dynamic(?) [right - this would make a dynamic site behave like a static site; you can't win. -SFH].
Music: Mildred Bailey :: Squeeze Me