Threat detection and ML backtesting

I’m asking this question with the understanding and assumption that our threat detection rate will go up over time (and the opposite for false positive rate) given 1) growing developer and security research community around Forta, 2) improving docs & best practices around writing bots, and 3) underlying improvement in our machine learning algorithms.

With that all in mind, do we currently do any backtesting of our bots and ML algos against known hacks and exploits that are covered on Forta Network? Having subscribed to Forta bots for a few months at this point, I have noticed a fairly high false positive rate (even for the “high-signal” bots out there for “HIGH” and “CRITICAL” alerts) when it comes to alerts.

If we have done such a study, it would be great to publish the results here for the benefit of users and security researchers. If not, now might be a good time to consider such a backtesting study and/or implementing these studies on a regular, recurring basis to see where there might be areas of improvement etc.

1 Like

I think testing detectors on past attacks is important. I would go beyond though in one needs to analyze each attack to understand the gaps each bot has. Currently, a group of researchers within the Forta community conducts the analysis to determine:

  • did the attack detector feed trigger? If not why not?
  • what alerts should have been raised by basebots (e.g. a flashloan was used, but the flashloan bot didnt trigger)
  • what gaps exist (e.g. a new attack technique was used for which no bot exists)

In addition, for high precision bots (e.g. the attack detector) each alert raised is analyzed to further increase precision.

So backtesting and analysis is being performed. I believe the insights here ought to be shared publicly to ensure there the precision/recall of bots are understood.

In addition, I was thinking we should engage a public 3rd party testing vendor (AVTest, MITRE, etc.) to evaluate product offerings within the web3 threat monitoring space objectively. Usually these companies will artificially generate an attack and assess the detection efficacy of products, so products that don’t share precision/recall numbers publicly can be brought into the fold. This is pretty standard in the web2 space.


Not sure if this would be a good use of treasury/foundation funds, but potentially sponsoring data analytics grants (similar to bot building contests / hackathons) may not be a bad idea if you think crowdsourcing insights & approaches from the security community here could be additive to internal work that is already being done.

yes absolutely. some degree of 3P, independent benchmarking service would be great for crypto security products. and on the “independence” front, ideally this vendor or service provider should not also have a competing security product. not as familiar as you are though with the security vendor landscape and who might fit the bill here…that said, think this is an awesome idea christian!

1 Like