Running experiments
Run cross-validation using --fold and --split
[1]:
!minimel -v experiment 'wiki/iawiki-latest' --fold 1 2 --split 5 --cluster-threshold 0.5 --evaluate
INFO:root:Running experiments for wiki/iawiki-latest, outputs in wiki/iawiki-latest
INFO:root:Using wiki/iawiki-latest/index_iawiki-latest.dawg
INFO:root:Using wiki/iawiki-latest/ents-disambig.txt
INFO:root:Using wiki/iawiki-latest/disambig.json
INFO:root:Using wiki/iawiki-latest/iawiki-latest-paragraph-links
INFO:root:Sweeping parameters {'head': [None], 'stem': ('',), 'min_count': (2,), 'split': [5], 'fold': [1, 2]}
INFO:root:Sweeping parameters {'stem': [None], 'min_count': [2], 'freqnorm': (False,), 'badentfile': ('',), 'tokenscore_threshold': (0.1,), 'entropy_threshold': (1.0,), 'countratio_threshold': (0.5,), 'quantile_top_shadowed': (0,), 'cluster_threshold': [0.5]}
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [1]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [1]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 18242.27it/s]
INFO:root:micro precision 0.874295
recall 0.874295
fscore 0.874295
macro precision 0.824055
recall 0.803768
fscore 0.809093
support 38805.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 14345.83it/s]
INFO:root:micro precision 0.877799
recall 0.877799
fscore 0.877799
macro precision 0.827748
recall 0.807085
fscore 0.812927
support 38805.000000
dtype: float64
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [1]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [1]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 17616.30it/s]
INFO:root:micro precision 0.874295
recall 0.874295
fscore 0.874295
macro precision 0.824055
recall 0.803768
fscore 0.809093
support 38805.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 13261.47it/s]
INFO:root:micro precision 0.878134
recall 0.878134
fscore 0.878134
macro precision 0.827807
recall 0.807175
fscore 0.813038
support 38805.000000
dtype: float64
INFO:root:Sweeping parameters {'stem': [None], 'min_count': [2], 'freqnorm': (False,), 'badentfile': ('',), 'tokenscore_threshold': (0.1,), 'entropy_threshold': (1.0,), 'countratio_threshold': (0.5,), 'quantile_top_shadowed': (0,), 'cluster_threshold': [0.5]}
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [2]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [2]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11957/11957 [00:00<00:00, 25108.09it/s]
INFO:root:micro precision 0.895722
recall 0.895722
fscore 0.895722
macro precision 0.877247
recall 0.861811
fscore 0.865458
support 38407.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11957/11957 [00:01<00:00, 11525.34it/s]
INFO:root:micro precision 0.898690
recall 0.898690
fscore 0.898690
macro precision 0.880691
recall 0.865038
fscore 0.869222
support 38407.000000
dtype: float64
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [2]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [2]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11957/11957 [00:00<00:00, 23652.54it/s]
INFO:root:micro precision 0.895722
recall 0.895722
fscore 0.895722
macro precision 0.877247
recall 0.861811
fscore 0.865458
support 38407.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11957/11957 [00:01<00:00, 11056.60it/s]
INFO:root:micro precision 0.898951
recall 0.898951
fscore 0.898951
macro precision 0.880854
recall 0.865183
fscore 0.869402
support 38407.000000
dtype: float64
[2]:
import pandas as pd
pd.read_csv('wiki/iawiki-latest/evaluation.csv', index_col=0)
[2]:
| model | count.min_count | count.split | count.fold | clean.min_count | clean.tokenscore_threshold | clean.entropy_threshold | clean.countratio_threshold | clean.cluster_threshold | vec.split | ... | run.split | run.fold | run.fallback | micro.precision | micro.recall | micro.fscore | macro.precision | macro.recall | macro.fscore | .support | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | baseline | 2 | 5 | 1 | 2 | 0.1 | 1.0 | 0.5 | 0.25 | 5 | ... | 5 | 1 | /Users/benno/Documents/postdoc/projects/minima... | 0.874295 | 0.874295 | 0.874295 | 0.824055 | 0.803768 | 0.809093 | 38805.0 |
| 1 | model | 2 | 5 | 1 | 2 | 0.1 | 1.0 | 0.5 | 0.25 | 5 | ... | 5 | 1 | /Users/benno/Documents/postdoc/projects/minima... | 0.877799 | 0.877799 | 0.877799 | 0.827748 | 0.807085 | 0.812927 | 38805.0 |
| 2 | baseline | 2 | 5 | 1 | 2 | 0.1 | 1.0 | 0.5 | 0.50 | 5 | ... | 5 | 1 | /Users/benno/Documents/postdoc/projects/minima... | 0.874295 | 0.874295 | 0.874295 | 0.824055 | 0.803768 | 0.809093 | 38805.0 |
| 3 | model | 2 | 5 | 1 | 2 | 0.1 | 1.0 | 0.5 | 0.50 | 5 | ... | 5 | 1 | /Users/benno/Documents/postdoc/projects/minima... | 0.878134 | 0.878134 | 0.878134 | 0.827807 | 0.807175 | 0.813038 | 38805.0 |
| 4 | baseline | 2 | 5 | 2 | 2 | 0.1 | 1.0 | 0.5 | 0.25 | 5 | ... | 5 | 2 | /Users/benno/Documents/postdoc/projects/minima... | 0.895722 | 0.895722 | 0.895722 | 0.877247 | 0.861811 | 0.865458 | 38407.0 |
| 5 | model | 2 | 5 | 2 | 2 | 0.1 | 1.0 | 0.5 | 0.25 | 5 | ... | 5 | 2 | /Users/benno/Documents/postdoc/projects/minima... | 0.898690 | 0.898690 | 0.898690 | 0.880691 | 0.865038 | 0.869222 | 38407.0 |
| 6 | baseline | 2 | 5 | 2 | 2 | 0.1 | 1.0 | 0.5 | 0.50 | 5 | ... | 5 | 2 | /Users/benno/Documents/postdoc/projects/minima... | 0.895722 | 0.895722 | 0.895722 | 0.877247 | 0.861811 | 0.865458 | 38407.0 |
| 7 | model | 2 | 5 | 2 | 2 | 0.1 | 1.0 | 0.5 | 0.50 | 5 | ... | 5 | 2 | /Users/benno/Documents/postdoc/projects/minima... | 0.898951 | 0.898951 | 0.898951 | 0.880854 | 0.865183 | 0.869402 | 38407.0 |
8 rows × 23 columns