Commit Graph

33 Commits

Author SHA1 Message Date
Anna 7ba91fbd48
refactor: move to net5 2021-08-24 14:32:04 -04:00
Anna 7c4130288d
fix(trainer): sort import 2021-07-19 20:32:46 -04:00
Anna cdcc5bd0cc
feat(trainer): add import feature 2021-07-19 16:15:31 -04:00
Anna dd6b2aa64b
feat(trainer): add import mode 2021-07-17 22:20:21 -04:00
Anna 218f93b2ec
feat(trainer): add normalise mode and accept base64
Windows sucks.
2021-05-20 13:49:58 -04:00
Anna 238407c879
fix(trainer): ignore numbers 2021-04-23 13:21:49 -04:00
Anna 3767514d26
fix(trainer): ignore gg 2021-04-15 16:13:48 -04:00
Anna ce98e27056
refactor: use nameof and normal enum order 2021-04-02 15:11:39 -04:00
Anna adf77877c1
feat(trainer): ignore word ffxiv 2021-03-22 18:23:08 -04:00
Anna 2a0ef7a921
feat(trainer): ignore "mounts" 2021-03-05 13:50:16 -05:00
Anna 00817f97dd
fix(trainer): remove auto-translate textvalue artifacts 2021-03-03 20:36:53 -05:00
Anna bf34559e63
fix(trainer): clean up some warnings 2021-03-02 23:01:16 -05:00
Anna 56b652fad3
fix(trainer): accept more invalid input interactively 2021-03-02 22:58:22 -05:00
Anna a336ac3342
feat: add explainer to test results 2021-03-02 13:25:05 -05:00
Anna 79f4c702b2
feat(trainer): make test output more obvious 2021-03-02 12:53:19 -05:00
Anna 359b7cef3c
fix(trainer): use correct path for csv 2021-03-02 12:43:47 -05:00
Anna e2a8e0154a
feat: add automated model creation 2021-03-02 12:38:30 -05:00
Anna 1836b6dad7
feat(trainer): run on net5 and accept csv path
Hopefully will use this to automate model deployment.
2021-03-02 04:52:36 -05:00
Anna 7ef4a487e2
feat(data): ignore "blu" and add more 2021-02-26 12:07:19 -05:00
Anna 3eb7007186
fix(trainer): replace newlines automatically 2021-02-24 20:01:35 -05:00
Anna da6accf432
feat: add "come" and "join" as stop words 2021-02-21 15:50:05 -05:00
Anna 84644d2806
feat(data): add more data
Also pull out stop words into field.
2021-02-20 19:25:15 -05:00
Anna b36377c16e
feat: add normalisation to pipeline
Add a step to normalise messages to the ML pipeline. This ensures
computed properties run on the raw data (which is actually partially
normalised by the compute context). This prevents properties which
rely on symbols (e.g. "B>") from being unable to work properly when
normalisation happens before they have access to the input.
2021-02-17 21:45:09 -05:00
Anna c65fb94ad6
feat: better handle puncutation
Certain symbols are turned into one space so the model sees multiple
words instead of one. Previously "[RP]Hi" would turn into "RPHi" and
be its own token. Now it turns into "RP" and "Hi", counting as two
tokens. This change increased the model's accuracy.

Also make "18", "http", "https", and LGBT-related words into stop
words (meaning they're ignored). Each of these stop words made the
model more accurate and reduced unwanted bias.

Messages destined for ML are now normalised by the plugin in the same
way the model's input is for training. This should make the results
come closer to expected.
2021-02-17 20:01:34 -05:00
Anna Clemens 2229a0534a
feat: use separate process for classifying 2021-01-30 16:02:37 -05:00
Anna 47f6da6ffb
fix(trainer): use LF newlines for real 2021-01-02 17:28:17 -05:00
Anna 1e8e512a7b
fix(trainer): use LF newlines 2021-01-02 16:59:40 -05:00
Anna 1e33ba0487
feat(trainer): have trainer sort data automatically 2021-01-02 16:59:00 -05:00
Anna 09995d3cf9
chore(trainer): only save model on full run 2021-01-02 07:31:34 -05:00
Anna 96ef48f9db
refactor(trainer): use correct schema, though it shouldn't matter 2020-12-28 22:04:50 -05:00
Anna 908133bdf8
refactor: put computation in interface
This basically undoes the benefits of the previous commit. May end up being reverted.
2020-12-28 21:48:31 -05:00
Anna e24c54cfbc
refactor(training): compute properties in pipeline
Hopefully no longer required the data structure to be updated when new computed properties are added. This should also reduce duplication and make it easier to make bigger changes to the model without needing to update the plugin.
2020-12-28 21:01:35 -05:00
Anna 482cc23c7d
feat(trainer): add trainer to actual repo 2020-12-28 20:14:19 -05:00