Go to file
MTRNord d38f4dac17
Move bot to seperate repo
2023-01-07 16:13:48 +01:00
.github/workflows Setup docs 2022-12-22 13:51:00 +02:00
.vscode Improve trainingdata filtering, add new training data, add new model and use formatted body for the bot 2022-12-31 16:57:09 +01:00
bot@abdee5d68d Move bot to seperate repo 2023-01-07 16:13:48 +01:00
crates/model_server Add small bot 2022-12-10 19:02:35 +01:00
documentation@9301346086 Update docs 2022-12-22 14:06:47 +02:00
input Improve trainingdata filtering, add new training data, add new model and use formatted body for the bot 2022-12-31 16:57:09 +01:00
models Improve trainingdata filtering, add new training data, add new model and use formatted body for the bot 2022-12-31 16:57:09 +01:00
supply-chain Berty the Bert 2022-10-10 19:58:50 +02:00
.editorconfig Berty the Bert 2022-10-10 19:58:50 +02:00
.gitattributes Remove lfs 2022-12-10 22:41:00 +01:00
.gitignore Improve trainingdata filtering, add new training data, add new model and use formatted body for the bot 2022-12-31 16:57:09 +01:00
.gitmodules Move bot to seperate repo 2023-01-07 16:13:48 +01:00
Cargo.lock Update webserver 2022-12-06 15:09:02 +01:00
Cargo.toml Use a rust server to serve the model via a highlevel api 2022-09-27 20:30:38 +02:00
Dockerfile Copy templates over 2022-10-11 14:42:23 +02:00
LICENSE.md Update and rename LICENSE to LICENSE.md 2022-12-17 13:36:49 +01:00
README.md Add small bot 2022-12-10 19:02:35 +01:00
bert.ipynb Cleanup bot and introduce basic warning formatting and rooms 2022-12-19 20:22:54 +01:00
dataset_analysis.ipynb Improve trainingdata filtering, add new training data, add new model and use formatted body for the bot 2022-12-31 16:57:09 +01:00
model_v2.py Improve trainingdata filtering, add new training data, add new model and use formatted body for the bot 2022-12-31 16:57:09 +01:00

README.md

Matrix-Spam ML

This project consists of tooling to generate a spam detection model for the Matrix protocol.

It utilizes Tensorflow and builds the model in python and then provides a Rust server that provides some APIs to interact with the model easily and also extend it.

The current code base is fast moving. Expect to change rapidly.

Usage

Training

To train the model, you need to have a set of labeled data. This data is at ./input/MatrixData. It is a TSV file.

To train the model, run python3 model_v2.py. This will train the model and save it to ./model/. Please make sure you installed tensorflow.

Notes about the data

Please ensure to remove all urls, html tags and new lines. Also make sure to strip duplicate whitespace. All of these reduce accuracy easily.

Running the server

To run the server, run cargo run --release. This will start the server on port 3000.

If you dont see any log try preprending RUST_LOG=info to the command.

API

POST /test

This endpoint takes a JSON body with the following format:

{
    "input_data": "This is a message to be classified"
}

It will return a JSON response with the following format:

{
    "input_data": "This is a message to be classified",
    "score": 1.1349515e-24, // Note that this is a float
}

You do not have to strip urls like in the training data. However it might yield better results if you strip the html tags.

Ethical use/How the usage of the model is intended

The model is trained mainly on SMS and matrix spam. It is not curently checked for racism or other discrimination factors against other groups. It is also at this time not checked how exactly it reacts to various scenarios. Therefor please keep this in mind while using the model.

Additionally the model was designed as a warning systems for admins and not as an automod. This is important as a warning system has a lot wider tollerances while an automod should be very certain to take actions against people.

While following this paragraph isnt mandatory I hope you keep it in mind and use this model ethically and not for discrimination in the network.

Future Plans

  • Mjolnir plugin/patch for collecting spam and using it to retrain the model.
  • Balancing the sample data
  • Synapse Plugin to use as an automute bot across the whole HS.
  • Mjolnir plugin that allows applying the suggestions

Support

For support you can join #matrix-spam-ml:midnightthoughts.space on Matrix.