scot hacker’s foobar blog
Time is an invention.
April 26, 2008

ALIPR Captchas

Captchas are so 2007. There are enough good captcha-breaking bots in the wild now that they’re pushing 10-15% success rates at decoding images, and can generate a new attempt every six seconds. Mail systems at Yahoo!, GMail and Hotmail all have been cracked in the past year. And Google’s Blogger service is under seige from spambots creating hundreds of thousands of splogs without human interaction — and they’re doing it through automated captcha cracking.

A new visual authentication system called IMAGINATION, from Penn State’s ALIPR (Automatic Linguistic Indexing of Pictures) program, takes a very different approach. Working with random images rather than characters means the pool of possibilities is not finite (image recognition is far more difficult than character recognition). And the two-part process refines the human requirement further: Find a center, then describe.

Imagination

But while traditional captchas have had problems with accessibility, ALIPR is going to be completely off-limits to the blind. Oh, and it takes up a whole screen, rather than a few hundred pixels2. That sounds like a deal-breaker right there. Or at least a deal-breaker until we get so fed up with being cracked that interaction designers are willing to give up an entire page to make it stop.

Once you solve the captcha, the site invites you to throw your best bot at it. I’m thinking maybe five years before the bots crack this one.

Music: David Byrne :: (The Gift Of Sound) Where The Sun Never Goes Down
April 2, 2008

Spam J-Curve

Weblog comment spam rates continue to surge. This chart is from one installation of anti-blog-spam tool Defensio, showing an insane uptick through the last part of March, 2008:

(Thanks ViperBond). Akismet’s charts show more than 5 billion blog spams identified in the past two years. I’ve personally noticed a dramatic increase in hand-written blog spam recently. Knowing that tools like Defensio and Akismet are going to get spammers banned from blogs net-wide within minutes, the method is now one of social engineering - getting bloggers to consciously allow spammy comments to go live by making them highly relevant to the post they’re attached to, and plausibly written. All that distinguishes this latest form are the author URLs — which no longer point to cialis and poker sites, but to tile shops, beauty parlors, commercial art galleries, pool-cleaning supply houses, etc. Human blog spammers have been around almost as long as bots (to defeat captchas, etc.) but this latest form amazes me because it’s written so carefully. I really have to puzzle over some of these recent ones to decide whether to push them through or not.

Because Akismet is less likely to have identified these as spammy, the moderation burden falls back onto blog authors. It’s no longer possible to identify spam at a glance - we now have to study each message carefully to ascertain sincerity.

January 2, 2008

When Good Mail Goes Bad

Great way to wrap up a holiday: Agreed to take on a new Birdhouse client - a mid-size company who’s had a horrible email experience with their previous “top tier” provider. They had a dozen or so addresses; could we take them on? No problem. The old host had been storing a couple weeks worth of their mail, but there was no way to get it through to the mail exchanger for delivery. The old host agreed to relay it all to Birdhouse for processing.

That’s when things turned ugly.

Turns out the previous host didn’t have the basic common sense to discard mail to unknown addresses on the domain (it hasn’t been feasible to accept mail for unknown names, like balloon345@domain.com) for years. But they were not only accepting it all; they relayed it ALL to Birdhouse.

300,000 messages worth, 95% of which was theoretically discardable.

Unfortunately, discarding crap mail isn’t trivial when parsing a queue that large. Needless to say, things came to a grinding halt. Complicating matters was the fact that Birdhouse actually utilizes two mail queues: One for MailScanner, which pre-processes spam, and another for Exim, which is the actual mail transfer agent. The MailScanner queue was so large we couldn’t even get things out into the Exim queue. Exim documentation assumes a single queue, and MailScanner doesn’t offer the same range of queue management options that Exim does.

Which meant I got to script a solution, examining each messages on the pre-que to determine whether it was destined for a valid or invalid address, and dropping it if invalid.

The script is running now, but will take a while. All spectacularly unpleasant. Once again, wanting to skewer a spammer or two and painfully aware of how much of my time is consumed by fighting bad guys.

Progress updates on the Birdhouse System Status page.

Music: Andy Bey :: I Let a Sing Fo Out of My Heart
December 17, 2007

Spam Poster Art

Nogirls Thumbtack Press hosts a gazillion great pieces of indie art (whatever that means), available as posters. A friend tracked down this excellent collection of art posters made from common spam taglines. Also loved “Realize your dreams with our help for a short time.” How promising! Also: “This secret weapon will give more power to you little soldier.” Little soldier thanks you. See also: Spam Plants and Spamland.

Music: Leaders Of The New School :: Mt. Airy Groove
September 5, 2007

Blackmail

The creativity of spammers never ceases to amaze me. Received this overnight, smack in the middle of a dozen spammy comments that made it through Akismet (but not through the moderation layer):

Hello , my name is Richard and I know you get a lot of spammy comments , I can help you with this problem . I know a lot of spammers and I will ask them not to post on your site. It will reduce the volume of spam by 30-50%. In return Id like to ask you to put a link to my site on the index page of your site. The link will be small and your visitors will hardly notice it, its just done for higher rankings in search engines.

I feel so vulnerable, so helpless. I don’t know who to turn to for help. OK Richard, you’ve got a deal!

Music: Steely Dan :: Black Cow
July 25, 2007

Phishing Quiz

How good are you at identifying phishing scams? Interesting quiz at siteadvisor.com showing screenshots of 10 real sites and their phished counterparts side by side. I consider myself pretty well versed at picking out the tell-tale signs, but only got 8/10. What’s really scary is the fact that the quiz called me a “guru” for getting that score - which means that 20% of phishing sites are good enough to fool pretty much everyone (although the screenshots from the two I missed didn’t show the URLs, which is probably the most critical clue, though even those can be made to look convincing, or wholly spoofed in various ways).

How’d you score, and what threw you off?

Music: The Meters :: Chug Chug Chug-A-Lug-(Push N’ Shove)-Part II-(w Meters)
May 24, 2007

Spamland

The spam I (secretly) most appreciate is the sort that uses randomly generated text cut-ups to bust spam filters, some of them fully worthy of the cut-up experiments Burroughs and Gysin were doing in the late 50s / early 60s.

In a gravitation without warning the face of rubbing grew sullen Black angry mouths, the clouds swallowed up the horsehair The air was religion with suppressed excitement

The Brothers McLeod are doing wonderful things with cut-up spam, having developed a series of animated characters to read aloud and act out the impossible, often mythical scenarios.

nodded. The door was closed and sealed again. Quietly forward. Hands extended, fingers lightly bowed. Iron John was Thats why there is no record of them

My own spam filters seem to have wised up to this form of spam in the past year, but every now and then one will eep through the multiple gatekeepers that mostly protect me from scarybig Spamland, discretely dropping special treats at my door in the middle of the night, causing me to feel the tiniest bit hopeful.

Accidental art committed for all the wrong reasons can still be beautiful, right?

Music: Will Oldham :: Ode #1b
May 1, 2007

Volume, Volume, Volume

At IT Conversations interesting discussion (podcast) with Mikko Hypponen, director of anti-virus research for F-Secure. Hypponen threw out two sets of numbers that seem to collide, but don’t.

1) Spammers consider a response rate of 0.001% to be a “good” email spam campaign.

2) 40% of Americans (and 60% of Brazilians) report having made a purchase as a result of a spam sales pitch at least once.

How to square the difference? Volume, volume, volume.

I confess to having bought something from a spam once (and only once): A targeted pitch for a T-shirt bearing a big retro “Shacker” logo. It appeared that the spammer in that case had blasted their message to shacker@everydomain.com. No matter that “shacker” in the marketer’s context referred to college students who sleep in a different dorm room every night — I had to have it.

Music: Derek Bailey :: Gone With the Wind
November 16, 2006

Botnets on the Rampage

“There has been a 67 percent increase in overall spam volume and a 500 percent increase in image spam since Aug. 2006.”

Botnet Illuminating (but seriously depressing) series of articles at eWEEK on botnets — arrays of 0wnz0r3d Windows computers assembled under the control of sophisticated “bot herders,” silently pumping every orifice of the interweb full of spam in all its forms. The virus that makes a machine part of a botnet does not cause harm to its host - like all successful viruses, it wants to assure its own survival. Amazingly, the latest generation of botnet software even installs antivirus software (a pirated copy of Kaspersky Anti-Virus, to be specific) to eradicate competing malware, so it can have the full resources of the infected host to itself.

For a while, it looked like botnet activity was shrinking, but lately it’s seen a huge uptick. vnunet reports that a million-bot botnet is quietly being assembled around the world, and that we’ll soon see an even more massive onslaught of phishing and spam attacks.

The sophistication of these systems is amazing — the botnets even come with their own self-contained DNS system. “This allows a bot herder to dynamically change IP addresses without changing a DNS record or the hosting—and constant moving around—of phishing Web sites on bot computers.”

So can’t botnet hunters just focus on nailing the central command and control machines? Nope - that’s the “beauty” of using a peer-to-peer model:

Control is still maintained by a central server, but in case the control server is shut down, the spammer can update the rest of the peers with the location of a new control server, as long as he/she controls at least one peer.

One of the many factors that makes fighting back so hard is that infected bots expect incoming commands to be digitally signed. Commands from the bot herders to members of the botnet are securely encrypted, and virtually impossible to decipher or reverse-engineer.

The sophistication of modern spammers is impressive on so many levels. Image spam (e.g. Viagra ads that appear as graphics rather than text) has been especially vexing lately, as it seems to elude all filters. Since almost all anti-spam mechanisms — even collaborative ones like Akismet — rely to some extent on the ability to deduce unique “signatures” from a message, every single image sent by machines on a botnet has slightly different dimensions and characteristics, making it nearly impossible to nail down. I’ve even noticed random graphical noise splattered in the background of image spam lately - which prevents any two images from producing identical signatures.

I think I was wrong when I said recently that my IP firewalling script was becoming less effective because spammers had learned to spoof IPs. I believe now that the problem is that the botnets are so widely distributed that the same IPs don’t come up with enough repetition to be useful. Rather than spam spewing from a volcano somewhere in the Ukraine for a few days, it’s now more like a steady mist that suffuses the atmosphere - an endless acid rain emanating from everywhere at once.

What amazes me is that articles like this never seem to point out the obvious: The botnets are comprised entirely of Windows machines. There are currently approximately 5.7 million infected Windows computers out there, ready and able to join a botnet at any time. If I were the sysadmin of a Windows network, this would be significant information to me. It’s not that OS X or Linux are theoretically incapable of this kind of takeover, but the plain reality is that it doesn’t happen. And yet, articles like this never make a recommendation that admins consider a platform shift. Why?

Sadly, experts are starting to feel hopeless about their prospects of staying in front of the game.

We’ve known about [the threat from] botnets for a few years, but we’re only now figuring out how they really work, and I’m afraid we might be two to three years behind in terms of response mechanisms,” said Marcus Sachs, a deputy director in the Computer Science Laboratory of SRI International, in Arlington, Va.

Amazon is having serious issues with spam, as is del.icio.us. Of course one would expect large services to be constantly hammered with spam, but if the largest and best-funded commercial entities on the web can’t keep spam off their public doorsteps, you know things are getting serious out there.

It’s becoming increasingly popular for admins to block entire nations, either at the apache or at the firewall level. I’ve been tempted to do the same myself, but haven’t. Yet.

All of this applies to the interactive aspect of the web as much as it does to email. I deal with it on wikis, discussion boards, blogs, and apache logs (referrer spam). In recent months, I’ve seen them stuffing personal contact forms, and even the public jobs database at the j-school (which is absurd, since no job ever gets published without human review, but that doesn’t stop them from trying). Amidst all the Web 2.0 talk of participatory journalism, the wisdom of crowds, the read/write web, and two-way communication, it’s those very features that are being exploited by spammers and the massive botnets.

I worry that the openness that made the internet possible will ultimately become the sword upon which it impales itself. I see a future where everything is so locked down that all of the fun participatory stuff becomes impossibly difficult. I worry that someday email will only be feasible with whitelisting, that registration with identity verification will be required for all participatory web features, and that the concept of anonymity will ultimately become untenable.

Compare the atmosphere of the internet to the ecology of the earth. It took us millions of years to get to industrial civilization, then just a few decades to pollute our environment to the brink of sustainability. I worry that the internet is following a similar course - 30 years to become mainstream and five years to become so polluted it’s unusable.

Thanks Mal

Technorati Tags: ,

November 2, 2006

TinyTuring

John Battelle’s SearchBlog, which is hosted by Birdhouse, has been undergoing a constant (and brutal) deluge of weblog comment spam over the past few days. It’s always been bad for him, but I’ve never seen anything like this. Akismet is still the bomb, but even the mighty Akismet couldn’t stay out in front of this wave. Since Akismet only knows about spam that’s been submitted to it by the hive mind, the first blogs to receive a new wave of spam are unprotected by it.

The script I wrote a while ago to query blog databases for spammy behavior and shunt IP addresses into the firewall works wonderfully when IP addresses are legitimate, but it seems that most spammers know how to fake their IPs these days, rendering it ineffective.

Ever wondered what a comment spam blitzkrieg looks like from a server load perspective? Take a look at the load average graph from today (snapshot every 6 minutes):

Comment Spike-1

Those spikes, some representing fairly long blocks of time, represent thousands of bogus comments being submitted into battellemedia.com simultaneously. For reference, load averages shouldn’t spike above 1.0 too often, or things get uncomfortable. This is why spammers - especially weblog comment spammers — make me insane.

Decided Battelle needed a second line of defense. We were reluctant about using a captcha for the usual accessibility reasons, so I went looking for a good Turing test system and found TinyTuring by Kevin Shay. As human detectors go, it doesn’t get much simpler than this - requires comments to enter just a single randomly selected letter. A hidden salt prevents algorithmic detection. Required modification of three MT templates. So far, 100% effective. Yes, armies of underpaid Malaysian human spambots can still jam crap into the system manually, but those comments will still have Akismet to deal with.

The cat and mouse game continues.

Music: Billy Martin :: Strangulation
October 21, 2006

MailScanner

Mailscanner Recently installed an update/add-on to cPanel for Birdhouse Hosting - a package called MailScanner which integrates the usual complement of open source spam and virus controls (SpamAssassin, ClamAV, Razor, DCC) into a combined package, provides more spam config controls for individual hosting accounts, and provides the admin with a bunch of reporting tools. I can now see at a glance (graphically) how many messages are passing through the server each day, what percentage of them have been flagged as spam or virii, or drill down and get similar reports for individual domains or users. At left: A snapshot of mail and spam traffic on Birdhouse over the past week:

Highlights:
10,000 total messages processed on 10/16
77.8% of mail was flagged as spam today
(read that last one another way: less than 23% of the mail we’re spending money to process and handle is legitimate)

If you’re wistful for the good old days when you could use a “catch-all” address to receive mail bound for anything@yourdomain.com, note: 5,016 out of 6,449 messages received today were addressed to unknown email accounts on domains we handle. Which is why most hosts (including Birdhouse) strongly recommend against using catch-all addresses any more. Spammers 0wnz0r the ozone.

Music: Tom Glazer & Dottie Evans :: Constellation Jig
August 10, 2006

Extreme Telemarketing

Never thought I’d feel sympathy for a telemarketer, but get an earful of this. My heart goes out to the poor guy. Kind of. Despite the caller’s general craziness, she does raise a point with him that I’ve tried before in conversations with telemarketers: The practice violates the categorical imperative, from which all moral action derives (according to Kant, and I agree):

Act only according to that maxim by which you can at the same time will that it would become a universal law.

In other words, don’t do anything that you don’t think all other people should also be allowed to do in the same situation. In context, one should engage in telemarketing only if one believes that all marketers should be allowed to call people in their homes. Consider the massive amount of advertising around us at all times, and imagine that every advertiser pushed their product by calling people at home. Universalizing the practice of telemarketing to all practitioners would make the telephone utterly useless, since it would never stop ringing, much as e-mail spam has diminished the viability of e-mail (which is only rescued through the application of great piles of technology).

When faced with the categorical imperative (though of course the caller does not call it that :), the telemarketer starts to lie to cover his position, saying that marketers do call his home phone all day every day, and that he doesn’t mind a bit.

Unfortunately, the caller’s philosophically sound position is completely blown out of the water by absolutely insane levels of hysteria.

Music: Sylford Walker :: Deuteronomy
July 26, 2006

Spam Plants

Spamplants Romanian-born computer artist Alex Dragulescu turns crap into gold — he’s developed a computational analysis system to transform ordinary spam into renderings of organic-looking plants (though some look more like sea anemones to me). via c|net:


For the Spam Plants, he parsed the data within junk e-mail–including subject lines, headers and footers–to detect relationships between that data. For example, the program draws on the numeric address of an e-mail sender and matches those numbers to a color chart, from 0 to 225. It needs three numbers to define a color, such as teal, so the program breaks down the IP address to three numbers so it can determine the color of the plant. The time a message is sent also plays a role. If it’s sent in the early morning, the plant is smaller, or the time might stunt the plant’s ability to grow.

Dragulescu has also done similar projects with architecture, weblog text, transit, etc.

Music: Lou Reed and John Cale :: Nobody But You
July 1, 2006

reducer: bad ips –> firewall

At the end of my rope with server loads caused by weblog and email spammers. SpamAssassin and Akismet etc. may keep spam away from users, but all that stuff still needs to be processed (and we’re talking about a huge percentage of all traffic).

Recently switched from the APF firewall to ConfigServer’s excellent CSF, which is integrated into WebHost Manager (the admin back-end for cPanel systems), and got thinking — the most heavily trafficked blogs here are already using spam rating systems that track IPs. The right script could harvest and rank those IPs and load them into the firewall in near real-time. Spent the past few evenings building a shell script to do just that.

reducer: Harvests bad IP addresses from multiple sources and adds them to the CSF firewall for cPanel systems. This version works with WordPress and Movable Type weblogs, and optionally the exim ACL deny system. Future versions will scan other sources for bad IPs as well.

Update, April 2008: Birdhouse Hosting has been running reducer system-wide for almost two years now, with great success. At this point, we wouldn’t even consider running a hosting business without it.

Download reducer here.

April 11, 2006

Blogosphere Suffers Spam Explosion

c|net on the increasingly difficult problem of fighting spam on weblogs:

Boing Boing would allow its readers to leave comments and engage in a discussion on the wildly popular blog, if it weren’t for spam.

The piece focuses more on problems bloggers themselves face:

“It is a major hassle,” Frauenfelder said. “It is just getting worse and worse. My fantasies of violent revenge against spammers become more lurid every week.”

than on problems caused for their web hosts, and is a superficial overview in many respects, but it’s good to see some mainstream attention to the problem, which consumes more of my time than I had ever imagined it would.

At this point, I’ve tried every approach under the sun for the Birdhouse bloggers: standard blacklists (a moving target), moderation and authentication (chilling effect on conversation), mod_security blacklists (hard to keep updated, resource intensive), javascript (ultimately hackable), referrer tracking (shuts out commenters behind certain firewalls)…

But I’ve never had it as easy as I have since switching to WordPress and setting up the distributed Akismet system, which has blocked more than 1,000 spams from this blog in the past two weeks without a single false positive, and while requiring very minimal system resources. Sounds like a lot, but some of my users average around one spam/trackback submission attempt per minute, 24×7. You do the math.

Music: The Flaming Lips :: What Is The Light?

Technorati Tags:

December 28, 2005

Who Gets No Spam?

Lebkowsky posts about his mostly-rosy transition from Outlook to Thunderbird, but wonders why the spam controls aren’t more robust. “… and though the junk mail filters are clearly catching a large percentage of the umpty hundreds of spams that fall into my mail bucket every day, there’s a bunch more that the filters miss.”

What I don’t get is why people are still dealing with daily buckets of spam on the client side at all. It’s been years since most mail hosts began offering excellent server-side spam handling (Birdhouse included). I’ve found the combination of SpamAssassin + ClamAV + RulesDuJour to be tremendously effective. And don’t forget to disable your “catch-all address — probably the most powerful single spam magnet you can have. After months of not landing a single false positive, I finally stopped using a server-side “Junk box” for monitoring at all - now I just set my spam threshold to 2.5 and let the systems delete spam before it ever hits my server-side mailbox.

Result: About 90% of the mail bound for my addresses is discarded without ever being seen by a human or handled by a mail client. What finally slips through the net is a grand total of about 3-5 spams a day.

On the TWiT podcast, John Dvorak gets teased regularly — by industry experts, no less — for his claim “I get no spam.” What’s so outlandish about that? If you’re still getting spam in your mail client, you probably just need to turn on the controls your mail host probably already has set up for you. And if your mail host doesn’t offer server-side spam controls, find one that does.

Music: Half Man Half Biscuit :: Bottleneck at Capel Curig
April 29, 2005

SpamLookup

Just installed Brad Choate’s SpamLookup for John Battelle’s MT installation, ditching MT-Blacklist for the time being. Looks simple on the surface, but dig into the options and you start to realize this is the next generation comment/trackback-fighting tool. Actually, it’s a whole toolbelt, including realtime distributed blacklists (which probably accomplish 95% of the dirty work alone), moderation levels and exceptions for various types of commenters, bannable wordlists, and a built-in “Passphrase” feature you can use as a human detector. This last being similar in concept to a captcha, but text-based rather than graphical. The commenter is required to answer a dirt-simple question such as “What is John’s name?,” which a bot would be hard-pressed to do. If I wasn’t having such great success with MT-Keystrokes on birdhouse, I’d install it here as well…

Music: Roland Kirk ::Bag’s Groove
April 14, 2005

NonJunk

Studying email headers of a spam turdlet that slipped through the net, found this in the headers, trying to pass as header lines added by SpamAssassin:

X-IMAPbase: 1113505409 1 NonJunk
Status: O
X-Status:
X-Keywords: NonJunk

The cat-n-mouse game is never-ending.

Music: blur :: country house
March 30, 2005

Tagging Non-English Spam

Have recently noticed a huge uptick in the amount of non-English (especially Chinese) spam, which slips through the SpamAssasin nets much more readily than English spam (at least it does in most Western SA setups; not sure how different things are for, say, Chinese hosts).

Turns out you can tell SpamAssassin to give higher ratings to messages written in languages other than those you’ve explicitly sanctioned. Higher ratings mean more likelihood of messages getting tossed to /dev/null or saved in a junk box. In your local.cf or user_prefs, just add:

ok_languages en de la th sv

(e.g.) to accept messages in English, German, Latin, Thai, and Swedish. Full list of language codes here. Works a treat.

Music: The Seeds :: 900 Million People Daily
March 19, 2005

Field Notes on Comment Registration

In order to respond to Birdhouse customers who want an answer to the question: “Why are you enforcing comment registration on Movable Type weblogs? Have you really exhausted all other options?,” I’ve put together this Brief History of Our Battle With Comment Spammers to summarize what we’ve done in the past, why it hasn’t worked, and why we think comment registration is our only remaining recourse.
(more…)