Commit Graph

8 Commits

Author SHA1 Message Date
c65fb94ad6
feat: better handle puncutation
Certain symbols are turned into one space so the model sees multiple
words instead of one. Previously "[RP]Hi" would turn into "RPHi" and
be its own token. Now it turns into "RP" and "Hi", counting as two
tokens. This change increased the model's accuracy.

Also make "18", "http", "https", and LGBT-related words into stop
words (meaning they're ignored). Each of these stop words made the
model more accurate and reduced unwanted bias.

Messages destined for ML are now normalised by the plugin in the same
way the model's input is for training. This should make the results
come closer to expected.
2021-02-17 20:01:34 -05:00
fd9e9330fc
refactor: use new new syntax 2021-02-16 12:15:00 -05:00
deac55d19b
feat(data): add more data 2021-01-02 13:09:00 -05:00
7593b42e00
refactor: fix up some code and prepare for sdk 2020-12-19 20:19:03 -05:00
34c679b189
refactor: update naming 2020-12-18 00:43:07 -05:00
fc079a6553
chore: update to new Lumina 2020-11-23 13:22:19 -05:00
49cd0b1a28
refactor: make normalisation faster 2020-09-05 14:31:29 -04:00
7b8d2fa4bc
feat: add option to filter unjoinable ilvl PFs 2020-08-23 10:59:48 -04:00