Commit Graph

36 Commits

Author SHA1 Message Date
Anna 2804490fc6
refactor: update for api 9 2023-09-28 20:59:42 -04:00
Anna 576634fa03 chore(trainer): change l1 and l2 regularisation 2022-08-29 22:10:33 -04:00
Anna dd26a5738c fix(trainer): handle weirdly-formatted emails 2021-09-19 17:56:21 -04:00
Anna 3511693208 refactor: move to net5 2021-08-24 14:32:04 -04:00
Anna fc89e7df79 fix(trainer): sort import 2021-07-19 20:32:46 -04:00
Anna 21c7e01097 feat(trainer): add import feature 2021-07-19 16:15:31 -04:00
Anna 78c5f8f8d2 feat(trainer): add import mode 2021-07-17 22:20:21 -04:00
Anna b738f801c8 feat(trainer): add normalise mode and accept base64
Windows sucks.
2021-05-20 13:49:58 -04:00
Anna 0a96858447 fix(trainer): ignore numbers 2021-04-23 13:21:49 -04:00
Anna 7ee3550400 fix(trainer): ignore gg 2021-04-15 16:13:48 -04:00
Anna b403b4db93 refactor: use nameof and normal enum order 2021-04-02 15:11:39 -04:00
Anna 6f663d582c feat(trainer): ignore word ffxiv 2021-03-22 18:23:08 -04:00
Anna 20530c6290 feat(trainer): ignore "mounts" 2021-03-05 13:50:16 -05:00
Anna d78f2f6dd7 fix(trainer): remove auto-translate textvalue artifacts 2021-03-03 20:36:53 -05:00
Anna 51ccd9ea0f fix(trainer): clean up some warnings 2021-03-02 23:01:16 -05:00
Anna ce15b07636 fix(trainer): accept more invalid input interactively 2021-03-02 22:58:22 -05:00
Anna 81b54c35aa feat: add explainer to test results 2021-03-02 13:25:05 -05:00
Anna 090c4eff3c feat(trainer): make test output more obvious 2021-03-02 12:53:19 -05:00
Anna 348c610ca1 fix(trainer): use correct path for csv 2021-03-02 12:43:47 -05:00
Anna 2fd2e54401 feat: add automated model creation 2021-03-02 12:38:30 -05:00
Anna 6c9dd9164b feat(trainer): run on net5 and accept csv path
Hopefully will use this to automate model deployment.
2021-03-02 04:52:36 -05:00
Anna 65558fa199 feat(data): ignore "blu" and add more 2021-02-26 12:07:19 -05:00
Anna c038adc4e9 fix(trainer): replace newlines automatically 2021-02-24 20:01:35 -05:00
Anna 2181649b22 feat: add "come" and "join" as stop words 2021-02-21 15:50:05 -05:00
Anna 0dc0c2ef00 feat(data): add more data
Also pull out stop words into field.
2021-02-20 19:25:15 -05:00
Anna c3df0a1f8e feat: add normalisation to pipeline
Add a step to normalise messages to the ML pipeline. This ensures
computed properties run on the raw data (which is actually partially
normalised by the compute context). This prevents properties which
rely on symbols (e.g. "B>") from being unable to work properly when
normalisation happens before they have access to the input.
2021-02-17 21:45:09 -05:00
Anna d00b3b0845 feat: better handle puncutation
Certain symbols are turned into one space so the model sees multiple
words instead of one. Previously "[RP]Hi" would turn into "RPHi" and
be its own token. Now it turns into "RP" and "Hi", counting as two
tokens. This change increased the model's accuracy.

Also make "18", "http", "https", and LGBT-related words into stop
words (meaning they're ignored). Each of these stop words made the
model more accurate and reduced unwanted bias.

Messages destined for ML are now normalised by the plugin in the same
way the model's input is for training. This should make the results
come closer to expected.
2021-02-17 20:01:34 -05:00
Anna 87c5602319 feat: use separate process for classifying 2021-01-30 16:02:37 -05:00
Anna df66d397ed fix(trainer): use LF newlines for real 2021-01-02 17:28:17 -05:00
Anna 081e670da4 fix(trainer): use LF newlines 2021-01-02 16:59:40 -05:00
Anna 9f15bb7d0d feat(trainer): have trainer sort data automatically 2021-01-02 16:59:00 -05:00
Anna 2f7761b9b0 chore(trainer): only save model on full run 2021-01-02 07:31:34 -05:00
Anna 753e0f710e refactor(trainer): use correct schema, though it shouldn't matter 2020-12-28 22:04:50 -05:00
Anna 1b8f7806f5 refactor: put computation in interface
This basically undoes the benefits of the previous commit. May end up being reverted.
2020-12-28 21:48:31 -05:00
Anna effe41a345 refactor(training): compute properties in pipeline
Hopefully no longer required the data structure to be updated when new computed properties are added. This should also reduce duplication and make it easier to make bigger changes to the model without needing to update the plugin.
2020-12-28 21:01:35 -05:00
Anna bd05abb5e0 feat(trainer): add trainer to actual repo 2020-12-28 20:14:19 -05:00