NoSoliciting

Author	SHA1	Message	Date
Anna Clemens	79f4c702b2	feat(trainer): make test output more obvious	2021-03-02 12:53:19 -05:00
Anna Clemens	359b7cef3c	fix(trainer): use correct path for csv	2021-03-02 12:43:47 -05:00
Anna Clemens	e2a8e0154a	feat: add automated model creation	2021-03-02 12:38:30 -05:00
Anna Clemens	1836b6dad7	feat(trainer): run on net5 and accept csv path Hopefully will use this to automate model deployment.	2021-03-02 04:52:36 -05:00
Anna Clemens	7ef4a487e2	feat(data): ignore "blu" and add more	2021-02-26 12:07:19 -05:00
Anna Clemens	3eb7007186	fix(trainer): replace newlines automatically	2021-02-24 20:01:35 -05:00
Anna Clemens	da6accf432	feat: add "come" and "join" as stop words	2021-02-21 15:50:05 -05:00
Anna Clemens	84644d2806	feat(data): add more data Also pull out stop words into field.	2021-02-20 19:25:15 -05:00
Anna Clemens	b36377c16e	feat: add normalisation to pipeline Add a step to normalise messages to the ML pipeline. This ensures computed properties run on the raw data (which is actually partially normalised by the compute context). This prevents properties which rely on symbols (e.g. "B>") from being unable to work properly when normalisation happens before they have access to the input.	2021-02-17 21:45:09 -05:00
Anna Clemens	c65fb94ad6	feat: better handle puncutation Certain symbols are turned into one space so the model sees multiple words instead of one. Previously "[RP]Hi" would turn into "RPHi" and be its own token. Now it turns into "RP" and "Hi", counting as two tokens. This change increased the model's accuracy. Also make "18", "http", "https", and LGBT-related words into stop words (meaning they're ignored). Each of these stop words made the model more accurate and reduced unwanted bias. Messages destined for ML are now normalised by the plugin in the same way the model's input is for training. This should make the results come closer to expected.	2021-02-17 20:01:34 -05:00
Anna Clemens	2229a0534a	feat: use separate process for classifying	2021-01-30 16:02:37 -05:00
Anna Clemens	47f6da6ffb	fix(trainer): use LF newlines for real	2021-01-02 17:28:17 -05:00
Anna Clemens	1e8e512a7b	fix(trainer): use LF newlines	2021-01-02 16:59:40 -05:00
Anna Clemens	1e33ba0487	feat(trainer): have trainer sort data automatically	2021-01-02 16:59:00 -05:00
Anna Clemens	09995d3cf9	chore(trainer): only save model on full run	2021-01-02 07:31:34 -05:00
Anna Clemens	96ef48f9db	refactor(trainer): use correct schema, though it shouldn't matter	2020-12-28 22:04:50 -05:00
Anna Clemens	908133bdf8	refactor: put computation in interface This basically undoes the benefits of the previous commit. May end up being reverted.	2020-12-28 21:48:31 -05:00
Anna Clemens	e24c54cfbc	refactor(training): compute properties in pipeline Hopefully no longer required the data structure to be updated when new computed properties are added. This should also reduce duplication and make it easier to make bigger changes to the model without needing to update the plugin.	2020-12-28 21:01:35 -05:00
Anna Clemens	482cc23c7d	feat(trainer): add trainer to actual repo	2020-12-28 20:14:19 -05:00

19 Commits