Some open source people have published code on codeberg that can be used in defense of your web server (or home network). It’s called konterfAI and works anywhere where there’s docker (or ollama itself), even on a raspberry Pi and is amazingly simple.
konterfAI is a proof-of-concept for a model-poisoner for LLM (Large Language Models) to generate nonsense(“bullshit”) content suitable to degenerate these models. Please note that AI crawlers or other tools that follow the guidelines in a robots.txt will not be affected by konterfAI.
Although it’s still work in progress and not yet ready for production, it already shows the concept of fighting fire with fire: The backend queries a tiny LLM running in ollama with a high ai-temperature setting to generate hallucinatory content.
No well-behaved AI crawler or scraper will feel any harm, only those ill-prepared that do not respect the robots.txt will get stuck in this tar-pit style AI trap. Amazing. “Fighting fire with fire”.
To keep performance at a reasonable level, konterfAI only creates hallucinations during startup, which needs 100%CPU or GPU for a short time, but is very nice during usage. New texts will only be created when the counter hits zero. And the user can set up konterfAI for his needs and environment – from a pi protecting the home network to a webserver.
(Update 2024-08-27): In the last weeks, konterfAI has proceeded and arrived at version 0.3.2, a bugfix and stability release. “We have introduced a new logger (based on slog) which can output log as text and json. It can also completely disable the output (needed for tests, otherwise it would clog woodpecker)”. According to the developers, they “pumped up (unit test) coverage of the code to 90%.
Prior milestones included prometheus metrics and tracing with opentelemetry (in version 0.3.0),
Update (2024-08-12): In its newest version (source code archive), konterfAI now has a pretty detailed statistics function – you can now see who and how and why they used your data for training and what they got and if they respect the robots.txt.
The ReadMe for konterfAI 0.2.0 has all the details:
“We now run a separate webserver for providing statistics.
This server shows you:
- the current configuration of your konterfAI instance
- requests grouped by
- IP
- User-Agent
- which of those above violate the rules defined in robots.txt”
Apart from that, there’s docker images available on Docker Hub and Quay.io and konterfAI now supports a bunch of languages for its stories:
Here’s two screenshots of konterfAI at work:
This is a standard “story” with endless, deep links to other great stories that an AI crawler will find on konterfAI-protected websites.
The statistics module is new in version 0.2.0 and shows active prompts, requests by user-agent and by IP.
Update (2024-08-05): konterfAI has received a deployment directory with example configurations for nginx and traefik reverse proxies.
Update (2024-08-04): German tech podcast TechnikTechnik has an interview on konterfAI with open-source journalist Markus Feilner.