There are many aspects of operational security (opsec) to be mindful of when posting in online communities or revealing any part of your identity. In this community, there is an immediate need for good operational security, and those who aren’t careful enough will face the consequences of their mistakes.
There are obvious points to secure, such as not linking real identities to those in this space, but one of the most overlooked issues is a form of fingerprinting through matching vocabulary.
Each person has unique language styles, and this becomes more apparent the more they write. If you only write short sentences or keep discussions brief, it’s harder to match your language patterns with another identity, as there are fewer chances for unique linguistic patterns to emerge. However, if you contribute extensively to this space (and such individuals are often targeted using this approach), you are likely to write large amounts of text, such as in forums or PDFs. This can provide ample information to cross-reference your anonymous identity in this sphere with your real-life identity or other online personas. This used to be done manually, but nowadays, it is primarily done with specialized AI.
This isn’t a theoretical concern; it has actively been used to identify members of the 3D-printed gun community. Because they lacked knowledge of this threat, it was only a matter of time before their real identities were exposed.
If you are in a location where you will actively be persecuted for believing in personal freedoms to bear arms, you will need to keep all aspects of opsec in mind, including language pattern matching.
Scrambling
There is a method of language anonymization (referred to as “scrambling” here) where a tool (either LLM-based or algorithmic) is used to shuffle and rearrange text to remove identifying features. Identifying features include the frequency of using specific words, the scope of word knowledge, localized language (British English, American English, etc.), symbols/punctuation, and the amount of spaces, etc.
If you intentionally change your style of language, it may make it harder to identify you, but many of the identifying features of language are not noticed and aren’t changed.
The most consistent method of changing language patterns is with tools. One method that has only existed recently is using LLMs. Instructing LLMs to rearrange text and make it formal or informal American English (for example) will significantly reduce the ability for text to be recognized.
However, there is the new concern of LLMs not adhering to privacy, and with any online service, you should not trust the word of AI providers not to share your information. There are a couple of workarounds, though.
The best and most straightforward option in terms of privacy is by hosting a local model. Ideally, you want to use a language AI model that is uncensored (nothing GPT-based) and one that will work well without requiring extreme hardware. You should be able to run a decent AI model on a good CPU with 32GB of RAM or a good GPU with 16GB+ of RAM.
If your text is not intended to be private, such as being posted to a forum, you can use a site that works on Tor, such as duck.ai (DuckDuckGo’s AI service), which works without needing to provide any personal information. This is also a good option but should not be used for any private content not intended for public release.
Scrambling doesn’t necessarily have to be used for all text, even if you are a high-risk individual in terms of threat factors. However, it is wise to have it as a defense, especially for longer forms of writing.
another good tip is to always remove EXIF data from pictures you post. https://imagy.app/remove-exif-data is a pretty good one though i know i personally do research before trusting what people on the internet say cuz there are alot of bad actors.
Slightly outdated but works well:
ty, i usually just remove it myself as i know its done correct but i know an app/website that does it is much easier making it more likely to be used. ill save that gihub one n try it out
Interested in what AI models you would recommend for local hosting.
Deepseek models or Hermes models may be suitable