They're generally smart enough to handle the invisible box these days, so the approach I've taken for cutting down botspam on my CAPTCHA-less contact form (because CAPTCHAs punish the humans for the sins of the bots) is to do some clever content-based filtering and present a "Please correct ... and re-submit" message if the thing fails.
(Because, even if it's some kind of Mechanical Turk thing, the submitter probably doesn't have the authority to make the requested changes and is probably being paid per submission, so they will want to keep the pace up.)
Basically, there's spell-check, and grammar-check, and then there's this.
For example:
- If the message contains HTML or BBCode link markup, reject it with a message saying to use bare links because I'll receive the message as plain text. (This keeps confused humans from forcing you to read through raw markup and stops SEO spambots dead in their tracks because working around it would de-optimize results for other sites.)
- If the message contains links pointing to a blacklist of the most common URL shortening sites, ask the user to please use the full URL instead. (This prevents humans from disguising where their links go and prevents spammers from using link shorteners as a way to do that or wrap their URLs in click-through analytics. You can also detect URL shorteners in fancier ways, but a blacklist works well enough and requires no HTTP requests.)
- Iterate through the message body as a list of whitespace-separated "words", counting URLs and non-URLs and refuse messages that fail a
num_of_urls > num_of_non_url_words * 2
check. (This forces humans to provide descriptions and trips up the simpler SEO link-stuffing bots.)
- Use your programming language's support for querying unicode tables to require that a minimum of one third of the characters in submitted messages be within the character set for one of the languages you are literate in. For western european languages, "within 7-bit ASCII" is a good approximation. (If it fails, present an "I only speak ______. Please include translations." message for humans. This will trip up spambots from Russia and East Asia.)
- Extend your word-counter to refuse messages with fewer words than the shortest desirable message you can imagine. (This blocks spambots that are either broken or entirely reliant on the URL field for blog comment forms. Either way, they post a word of random gibberish or two in the message body.)
- Use a regular expression to disallow URLs or e-mail addresses in the subject line (For extra efficacy, incorporate the Public Suffix List so you can also reliably detect and refuse bare domains in subject lines.)
Those kill the vast majority of messages all on their own but, if you want to chase the long tail, some other "I wouldn't want this from a human either" tricks include:
- Refuse message bodies containing e-mail addresses and instruct the user to use the reply-to field in the form. (This trips up a common type of advertising copy from "Hey, site administrator. We're offering services for you." spam and they can always tell you the e-mail in their second message if their first message convinces you to reply.)
- Refuse message bodies containing the same URL more than once (This trips up ad copy that puts the URL at the top and the bottom.)
- Also blacklist links to common pastebin sites. (If the user wants to send you a pastebin link, they can send it to you after they've opened up a conversation.)
- Refuse message bodies containing the word "unsubscribe" and ask the user to replace it with either a phrase like "remove me from your mailing list" or some other word that any human will recognize to mean the same thing but which isn't in the dictionary, like "de-subscribe" or "ex-subscribe". (This puts the bot between a "pretend to be a legitimate, compliant-with-the-law mailing" rock and a spam-filter hard place.)
- Refuse messages which say the domain name in the subject line or message body (This trips up entry-level form-letter stuff like "UREGENT RE:ssokolow.com SERVICE EXPIRATION." Tell humans to please use the site's name instead.)
Notice how none of these are as fragile as the keyword-based blacklisting of the 90s, all of them are things I wouldn't want from a human either (shades of the "What if it stops a human from doing these things? "Mission ****ing accomplished" comic from the Robot 9000 introduction), and you can always combine it with a proper CAPTCHA if it's not thorough enough.
For me, without a CAPTCHA, this leaves me with maybe one botspam every month or two and that could be narrowed down further without annoying the average user with tricks like "If the message contains a URL that isn't on this list of heavily moderated things (eg. Wikipedia pages, IMDB entries, etc.) then (and only then) display a CAPTCHA" or "If the message contains words like 'telegram', 'signal', or other messaging networks, present the user with a 'Congratulations! You used the secret word! You win a CAPTCHA!' message".