Skip to main content

Honestly, whoever has an idea for a spam detection measure for Mastodon, and by that I do mean an implementation, get in touch with me, I'll pay for it.

I've been thinking about solutions for the past few days but the more I think about them the more they appear pointless.


my idea for spam is that you simply, do not allow it

delete all accouny

Can you explain in a bit more detail what kind of spam you expect? And how traditional social media handle it? And what options you have already looked in to?


Defining an account as suspicious when it has no local followers can be circumvented by just pre-following them, using account age can be circumvented with sleeper accounts, blacklisting URLs does nothing when the spam does not include URLs, checking for duplicate messages sent to different recipients can be circumvented by randomizing parts of the message...


Check for message similarity %?

Yes, correct. However, it is not a defence against all the servers that are not using it!

so essentially what you're stuck with is the problem of how to deal with *remote* spam?

well, that means whitelists or ocaps.

there is no other solution for push-based networks. email spam is just a thing we put up with. sms / phone spam is another thing that we can't really do anything about.

the only real way to *prevent* spam is to prevent unaudited and unapproved communications from being delivered to you... unfortunately. everything else is a half-measure.

check for duplicate mass messages by text matching over a period of time and if the same post happens more than x times in a row, mute the account for review and retroactively send a delete request to the duplicated posts.

Actually Discourse does some basic prevention here by having different user levels which are bound to rate-limits and the ability to post external links.

Not perfect of course, but maybe also worth to think about.

In general I guess the answer is: Small instances because that tends to increases the number of moderators per user. And hope that the community can take care of it.

Have you looked into federating block lists? Possibly making instance wide blocks more transparent and allowing others to subscribe to them?

It's not an instant fix, but I don'teven think this problem is NP complete. There are too many variables and opinions.

I could easily see a few instances maintaining these lists and everyone else just following them.

Spam comes from innocent servers where the spammer signs up. This has little to do with domain blocks.

Can instance wide bans only target entire domains?

Trust me, you don't want a globally shared account blocklist. Nobody bothers to oversee those when copying/subscribing. Your name put on there by an enemy? That might actually ruin most of the network for you.

I suppose it could focus a lot of power if everyone followed one list.

What is the list isn't curated by individuals, but rather created from blocks happening on the instance?
Anyone getting blocked by a significant % of the population on an instance for example. Then removed from the list later.

I still like the concept of federating because the times I do see spam it's usually on another instance and I'm sure lots of people already blocked it, but I still have to block it too.

This was specifically the wilw problem on Twitter.

Corollary: blocklists of any sort need an appeals process.


I don't think it's realistic to think there can be a technical solution to completely eliminate spam. But raising its cost, which can be done by each of these solutions, is still worthwhile because they will make spamming harder.

The events that have sparked this discussion is one dedicated person spamming the network. There is suspicious that the person is somehow keeping up with development discussions and changing tactics accordingly. Therefore, unless the solution can help against that type of spammer, it's kind of pointless. Plenty of tools against more mundane spam.

Ok. Forcing spammers to create sleeper accounts and sleeper instances would still help reduce the rate of abuses after previous instances and accounts have been blocked. Especially if the amount of messages they can send to people they haven't interacted with before is made proportional to their age. Or am I missing something?

(We've also discussed shared blocklists already. I'm now convinced they come with a lot of problems.)

I wonder if such behavior should even be lumped in with "spam"? What you outline sounds particularly adversarial (but then again, of course all spam adapts to countermeasures)

One person gaming existing mechanisms definitely sounds more like a problem suited for (better?) moderation mechanisms to me.

Trying to combat a dedicated person with ML or regexes or anything like that sounds utterly hopeless to me.

E-mail deals with spam using Bayesian filters or machine learning. The more training data there is, the more accurate the results, a monolith like GMail benefits from this greatly. Mastodon's decentralization means everyone has separate training data, and starts from scratch, which means high inaccuracy. It also means someone spamming a username could potentially lead to any mention of that username be considered spam due to the low overall volume of data, unless you strip usernames


However, if you strip usernames from the checked text, the spammer could write messages using usernames...


jyst mud e account ibn client sude

do what WTDWTF does

there's no secret magic to it
  • users require a published post to edit their profile
  • users with zero or negative upvotes require mod approval to post
  • registering an account from an IP that is already associated with an account requires admin approval
about a month into this policy the spammers completely gave up

We don't have a true emergency with spammers signing-up on a given instance. Approval-only registrations mode is a good tool for weeding those out. The problem we are experiencing is the spammer signing up on random open instances and sending spam remotely. Therefore, solutions based on IPs or captchas are not appropriate. Even if we release the perfect protection against local spammers, servers that haven't upgraded will continue to make this a problem.

We need to stop thinking about handling spam going out and start thinking about spam coming in, then. My instinct here is to read individual posts on their way in and handle spam detection at that level (likely on a separate lower-priority thread/task/whatever to prevent lagging out incoming posts).

I think one important aspect that has been pointed out by @ben is that users can be asked to classify the people they follow, and this can be used to compute some kind of credibility score for the new profiles, in order to limit their activity on a timescale at which modération is effective. I don't see any automated solution, tbh.

if an instance has open registration and refuses to update their service to deal with spam, I don't think it's unfair to defederate with them.

admins are responsible for the servers they run, and if those servers are the source of a disproportionate amount of spam, it doesn't matter whether the root cause is malice or simply inactivity from the admins. the end result is the same.

Honestly I'd pay to see someone do that, and then promptly ban them for it 😂

The more I think about email-like detection systems, the more I think as long as implementation is sound, it will help a lot with curbing common spam as the network grows and older instances and instances lots of users amass bigger datasets and higher confidence levels on spam detection.

Imperfect? Yeah. An arms race? Yeah. But it's a start.

Content warning: An idea for spam containing links

would it be possible to provide some kind of built in trainable spam detector for Mastodon, and have an opt-in option to share data with a global pool of training data? that way instances could collaborate to fight spam

I'm going to hazard a guess that > 90% of spammers aren't going to try to be clever.

We don't need to start from scratch on each instance. Tools like rspamd and spamassassin come around with pre-trained sets.

That means we can make community efforts to build a repository of spam messages in order to pre-train filters.

And of course it's not super effective, but if we really want spam protection, we have to start somewhere.

#wordpress is dealing with a lot of spam too, why don't you implement #alkismet on #mastodon?
It's free for noncommercial use.

Another free alternative on wordpress is:

Affinity groups can train reliable classifiers with a smaller corpus than groups without affinities, so message quantity isn't an obstacle. I could build a milter in about a week. Markov-based classifiers are a bit dodgy on short messages, so I'd have to test some other algorithms

Feature files can be transferable and non-reversible. An admin could use this data structure from one classifier to presort a training set for another, tuning for local requirements

Dealing with spam is hard. I worked at an email anti-spam company for 10 years. Bayesian was one method, but was troublesome and needed some hand-holding; we often cleared the bayesian data because it started to false-positive a lot. Most of our effective spam blocking came from: greylisting, DNS blacklists, a spam rules database that we added to based on the spam we saw, and users reporting messages (which we had tooling to analyse to then create more rules).

what about distributed model training which are trained on each instance and then shared between them. Kinda like Google does with assistant on phones ...

Content warning: not an implementation

you could federate the Maschine learning data too. But instances have to be at least x days seen by another instance/weighted by number/age of users to be accepted to prevent abuse of this.

Why not just federate the training data?

The @matrix folks are also still looking for a decentralised reputation system against spam.
So you might ask them about their ideas so far (not sure their instance is live again, so better drop by in one of their :matrix: rooms)

there's a protocol for federated learning. It's hard though.

note that i know very little about the implementation but is there any way to possibly only make replies that are consistently similar (perhaps 50% or more identical? i think itll be impossible to completely eradicate them but if theres a way to at least have a system in place to notify admins of "suspicious" behavior that can be detected, then it's at least getting the lazy ones

This is often the difficulty of designing deterrents versus walls; deterrents aren't meant to be uncircumventable. They're meant to be a pain in the ass to circumvent, at least a bit. And circumventions can be classed as a possible sign, because human review is still a part of the process.

Duplicate messages, frex -- if the last few dozen messages of a user have extremely similar word use, that's measurable without admins monitoring directly, by checking differences between messages.

Every measure can be circumvented. Facebook is banning 8M accounts per day and still some russian spies get through. No offense, but you won't invent anything better than facebook. Just don't reject measures that aren't perfect. It will never be perfect. In spam, making it harder, or more expensive, is good enough

what about all of these things, and a meta-scoring system that needs a message to not fail x number of criteria (perhaps criteria that could be modified per instance)?

i.e. if you have almost no followers AND the account is very new AND hit a few keywords on a list AND are making repetitive messages, you get blocked/filtered, or maybe just a couple of those metrics is enough, etc.

Great but the devil is in the details. Which criteria and which scores and which thresholds? I certainly have no clue

For invite-created accounts: track the source of the invite? If one user keeps on handing out invites to spam accounts, this feels like something I'd like to know as an admin.

Can't we just increase the number of moderators?
Or whenever a new user signs up, assign an old users to mentor them. They would be responsible for introducing new users and making sure that they do not violate any rule.

These filters could still be effective, even if they aren't 100% effective. It would cut down on spam even if it doesn't completely remove it. Sleeper accounts might be dealt with (in part) by removing accounts with no activity after a certain amount of time. Not foolproof, but a mitigation.

wouldn't these measures at least take out the script kiddies and the inexperienced? 🤔

If you can't solve the problem permanently, making it harder is still seems like a step forward.

Personally I think enabling admins to auto-report based on a filter for certain words in names/toots (possibly with fuzzy matching) would probably fix a lot of the problem, but I imagine it'd be tricky to implement.

Pretty much this. "Spam" is highly contextual, and it's ultimately a matter of *effects of behaviour*, which is difficult to pre-vet.

Not impossible, but hard.

Volume is a strong signal, as is keyword similarity and source. Graph analysis helps a lot here.

Dupe checks can be based on tuple matches, which mitigates the randomisation defence, though storing longer patterns (3, 4+ words) is very expensive.

There's also reboost attacks -- distinguishing legit from nefarious becomes interesting.

Also: collectively, these measures all increase spammer costs.

some one posted a github link a day or so ago to a bot that designed to take care of spam and the account it came from - seemed pretty effective.. maybe something can be gleaned from it's methods or included by default install

If there's an account that has posted the same message three times in a row, automatically report it, and if it posts it ten times in a day, automatically silence

I think especially in a federated system we might be able to learn a thing or two about spam detection and handling from e-mail - messages are far shorter here but spam patterns remain similar. That said, it's still a race to learn and counteract spammer patterns.

The biggest problem from a moderation perspective is that replies are harder to spot - while proper public toots show up in TLs, you're only gonna see the replies if it's you - we need to be able to flag users for that kind of behavior.

Even @lain agrees MRFs are not a sufficient tool against the kind of spam we've been seeing recently.

are we talking about the spam for that one specific site that has identical copy-pasted messages from multiple accounts?

because if MRF can't handle that I'm not sure what MRF can actually handle

The spammer has first changed URLs, then used shorteners, then simply gave up on linking to anything--they are just textual messages now.

Alone, maybe. Modern anti-spam systems are, I assume, a lot more than just one big mathy system like an ML network or one smart statistics algorithm.

Spam is going to happen - it's happening now, in a big way - and one way or another we're going to have to implement a variety of systems to counter it. I suppose the trial by fire of a lot of individual third-party systems is what is going to dictate what really fixes our spam problem at the pre-report level.

Just thinking out loud here, but have you considered looking into existing research papers on the subject? A quick search for “spam detection research paper” brings up many relevant results.

you can make a spam filter and become rich and smug like this

History has proven that there is only one antispam method that works: whitelisting all input.

Second to that, you need to be the size of Google or Microsoft and run an AI learning company via data you've collected by completely monopolizing the entire ecosystem and forcing all legitimate traffic to flow through your training model.

Please lurk the NANOG mailing list for proof of this being an unsolved problem.

If the spammer is abusing naively maintained instances the only solution I can see is treating sudden high amount of incoming messages from said type of instance as a red flag and temporarily flagging the instance/messages for moderation before accepting more from there.
Of course determining a "naive instance" beyond checking if they're invitation only is a bit tricky itself.
And then there's the question of what qualifies as "high traffic".

They're separate concerns and not necessarily mutually exclusive, but I think having a solid moderation API so folks can easily build tooling around spam and abuse response would go a long way. Automated detection would be nice, sure, but that's a much more difficult problem to solve.

Mastodon posts are the same general size of Wordpress comments. Akismet may work here.

But if not Akismet, some protocol to allow like-minded instances to share information about spam may also work.

One of the technologies developed in Japan is spam file filtering technology called "Selective SMTP Rejection (S25R)".

This is filtering using regular expression for reverse DNS lookup.

There is a possibility that it can be used for Social Network.

Perhaps a system based on the `ratio` between messages sent/received from a server that progressively degrades the throughput at which messages are accepted if below a certain threshold? I would expect a "normal" ratio to probably be close to one

As the current network has users, that aren't completely naive: Maybe some educational messages to users, that they should not follow the instructions or links in any spam post? When the spam gets the spammer zero clicks, he will eventually stop.

Hashcash? Maybe require it protocol-wise from new users, until they have somehow proven they are legitimate (by number of posts, interactions, followers/followees)?

Clients, including the web frontend, will have to implement it (which will not be super hard for most client authors), and it will place a (configurably large) burden on the client for mass posting.

(I'm talking about the original idea from 1990, cf.

back in 2000 I used heuristic spam detection for my emails, it worked well enough. A system could mark messages with a spam probability, paired with the possibility to whitelist accounts. Moderation could either be dealt with by an admin/moderator, or by individual users on a per-user basis (per-user whitelist and spam threshold setting). Contents of toots by whitelisted accounts could be used for training.

This is a bit complicated usability wise but would probably work rather well.

a class of service like relays that maintains a ruleset for a spamc/spamd like implementation?

have you looked into the way WordPress does it? They have plugins which do spamdetection with central infrastructure.

Maybe it would be an option to provide an API endpoint, where admins could connect their own scripts to e.g. check IPs, Email-Adresses against known Spam-Databases or the like. That way, admins could find their own solution or could use solutions provided by other admins.

Keep your money. We went that route with email and the best we ever came up with was heuristics (learning algorithms), but the spammers soon found ways around even that. The only way to stop spam is to not allow it in the first place. You achieve this by closing off any communications path that isn't controlled by whitelist or moderation. There is no other way. Maybe you can find one but I've been fighting these guys for 25 years now(*) including my work in this space for large commercial providers(**) and that's the conclusion I arrived at.

* Google "green card spam".
** Google "America Online". We blocked spammers. We applied learning algorithms using ~100 billion samples of known spam to seed the algorithms. We tracked them down and took them to court. We took their ISPs to court. And still they came.

Secure Scuttlebutt and Bitmessage lend themselves to whitelist-first communication, and are appealing for that reason.

The best spam filter I've found is going back to written communication send through the postal system. The time it takes to write a letter and the few cents it takes to send it are tall barriers for most people.

:jrbd: 📫

That appeal is also its weakness: It's unfitting for reaching larger audiences and makes it more difficult to join

We agree - different tools for different jobs. Some suited for smaller and closed groups, some suited for larger and open groups.

For instance, the Church of the SubGenius has at Mastodon with its degrees of openness, but there's also "ScrubGenius" at another location on another platform that is entirely closed to non-members.


do you know the "Jodel" Smartphone app ? They have a pretty effective system of up- and downvotes, basically the community decides what they want to see. but for Mastodon a system like that would be a major paradigm change and, as such, cause havoc in the community (as any major change)

make any toot 1ct. Regular use will be single digit dollars. Spamming will be hella expensive. At least half-joking.

Any longer write-up on the current problem and what you're looking at?

Also: inverting the question and considering what's #notSpam may be useful.