Commit Graph

19 Commits

Author SHA1 Message Date
79f4c702b2
feat(trainer): make test output more obvious 2021-03-02 12:53:19 -05:00
359b7cef3c
fix(trainer): use correct path for csv 2021-03-02 12:43:47 -05:00
e2a8e0154a
feat: add automated model creation 2021-03-02 12:38:30 -05:00
1836b6dad7
feat(trainer): run on net5 and accept csv path
Hopefully will use this to automate model deployment.
2021-03-02 04:52:36 -05:00
7ef4a487e2
feat(data): ignore "blu" and add more 2021-02-26 12:07:19 -05:00
3eb7007186
fix(trainer): replace newlines automatically 2021-02-24 20:01:35 -05:00
da6accf432
feat: add "come" and "join" as stop words 2021-02-21 15:50:05 -05:00
84644d2806
feat(data): add more data
Also pull out stop words into field.
2021-02-20 19:25:15 -05:00
b36377c16e
feat: add normalisation to pipeline
Add a step to normalise messages to the ML pipeline. This ensures
computed properties run on the raw data (which is actually partially
normalised by the compute context). This prevents properties which
rely on symbols (e.g. "B>") from being unable to work properly when
normalisation happens before they have access to the input.
2021-02-17 21:45:09 -05:00
c65fb94ad6
feat: better handle puncutation
Certain symbols are turned into one space so the model sees multiple
words instead of one. Previously "[RP]Hi" would turn into "RPHi" and
be its own token. Now it turns into "RP" and "Hi", counting as two
tokens. This change increased the model's accuracy.

Also make "18", "http", "https", and LGBT-related words into stop
words (meaning they're ignored). Each of these stop words made the
model more accurate and reduced unwanted bias.

Messages destined for ML are now normalised by the plugin in the same
way the model's input is for training. This should make the results
come closer to expected.
2021-02-17 20:01:34 -05:00
Anna Clemens
2229a0534a
feat: use separate process for classifying 2021-01-30 16:02:37 -05:00
47f6da6ffb
fix(trainer): use LF newlines for real 2021-01-02 17:28:17 -05:00
1e8e512a7b
fix(trainer): use LF newlines 2021-01-02 16:59:40 -05:00
1e33ba0487
feat(trainer): have trainer sort data automatically 2021-01-02 16:59:00 -05:00
09995d3cf9
chore(trainer): only save model on full run 2021-01-02 07:31:34 -05:00
96ef48f9db
refactor(trainer): use correct schema, though it shouldn't matter 2020-12-28 22:04:50 -05:00
908133bdf8
refactor: put computation in interface
This basically undoes the benefits of the previous commit. May end up being reverted.
2020-12-28 21:48:31 -05:00
e24c54cfbc
refactor(training): compute properties in pipeline
Hopefully no longer required the data structure to be updated when new computed properties are added. This should also reduce duplication and make it easier to make bigger changes to the model without needing to update the plugin.
2020-12-28 21:01:35 -05:00
482cc23c7d
feat(trainer): add trainer to actual repo 2020-12-28 20:14:19 -05:00