NoSoliciting

Author	SHA1	Message	Date
Anna	fe0f4d8232	fix: spacify before counting words	2021-05-03 16:24:57 -04:00
Anna	0af172e0a6	feat: stop trying to separate static sub messages	2021-03-04 16:48:11 -05:00
Anna	33f5421e79	chore: fix modes	2021-02-24 20:23:28 -05:00
Anna	c3df0a1f8e	feat: add normalisation to pipeline Add a step to normalise messages to the ML pipeline. This ensures computed properties run on the raw data (which is actually partially normalised by the compute context). This prevents properties which rely on symbols (e.g. "B>") from being unable to work properly when normalisation happens before they have access to the input.	2021-02-17 21:45:09 -05:00
Anna	d00b3b0845	feat: better handle puncutation Certain symbols are turned into one space so the model sees multiple words instead of one. Previously "[RP]Hi" would turn into "RPHi" and be its own token. Now it turns into "RP" and "Hi", counting as two tokens. This change increased the model's accuracy. Also make "18", "http", "https", and LGBT-related words into stop words (meaning they're ignored). Each of these stop words made the model more accurate and reduced unwanted bias. Messages destined for ML are now normalised by the plugin in the same way the model's input is for training. This should make the results come closer to expected.	2021-02-17 20:01:34 -05:00