Running experiments

Run cross-validation using --fold and --split

[1]:
!minimel -v experiment 'wiki/iawiki-latest' --fold 1 2 --split 5 --cluster-threshold 0.5 --evaluate
INFO:root:Running experiments for wiki/iawiki-latest, outputs in wiki/iawiki-latest
INFO:root:Using wiki/iawiki-latest/index_iawiki-latest.dawg
INFO:root:Using wiki/iawiki-latest/ents-disambig.txt
INFO:root:Using wiki/iawiki-latest/disambig.json
INFO:root:Using wiki/iawiki-latest/iawiki-latest-paragraph-links
INFO:root:Sweeping parameters {'head': [None], 'stem': ('',), 'min_count': (2,), 'split': [5], 'fold': [1, 2]}
INFO:root:Sweeping parameters {'stem': [None], 'min_count': [2], 'freqnorm': (False,), 'badentfile': ('',), 'tokenscore_threshold': (0.1,), 'entropy_threshold': (1.0,), 'countratio_threshold': (0.5,), 'quantile_top_shadowed': (0,), 'cluster_threshold': [0.5]}
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [1]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [1]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 18242.27it/s]
INFO:root:micro  precision        0.874295
       recall           0.874295
       fscore           0.874295
macro  precision        0.824055
       recall           0.803768
       fscore           0.809093
       support      38805.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 14345.83it/s]
INFO:root:micro  precision        0.877799
       recall           0.877799
       fscore           0.877799
macro  precision        0.827748
       recall           0.807085
       fscore           0.812927
       support      38805.000000
dtype: float64
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [1]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [1]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 17616.30it/s]
INFO:root:micro  precision        0.874295
       recall           0.874295
       fscore           0.874295
macro  precision        0.824055
       recall           0.803768
       fscore           0.809093
       support      38805.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11972/11972 [00:00<00:00, 13261.47it/s]
INFO:root:micro  precision        0.878134
       recall           0.878134
       fscore           0.878134
macro  precision        0.827807
       recall           0.807175
       fscore           0.813038
       support      38805.000000
dtype: float64
INFO:root:Sweeping parameters {'stem': [None], 'min_count': [2], 'freqnorm': (False,), 'badentfile': ('',), 'tokenscore_threshold': (0.1,), 'entropy_threshold': (1.0,), 'countratio_threshold': (0.5,), 'quantile_top_shadowed': (0,), 'cluster_threshold': [0.5]}
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [2]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [2]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11957/11957 [00:00<00:00, 25108.09it/s]
INFO:root:micro  precision        0.895722
       recall           0.895722
       fscore           0.895722
macro  precision        0.877247
       recall           0.861811
       fscore           0.865458
       support      38407.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11957/11957 [00:01<00:00, 11525.34it/s]
INFO:root:micro  precision        0.898690
       recall           0.898690
       fscore           0.898690
macro  precision        0.880691
       recall           0.865038
       fscore           0.869222
       support      38407.000000
dtype: float64
INFO:root:Sweeping parameters {'head': [None], 'stem': [None], 'vectorizer': ('',), 'ent_feats_csv': ('',), 'balanced': (False,), 'usenil': (False,), 'split': [5], 'fold': [2]}
INFO:root:Sweeping parameters {'bits': (20,)}
INFO:root:Sweeping parameters {'runfile': [PosixPath('wiki/iawiki-latest/iawiki-latest-paragraph-links')], 'use_fallback': (True,), 'split': [5], 'fold': [2]}
INFO:root:Running baseline...
Predicting: 100%|██████████████████████| 11957/11957 [00:00<00:00, 23652.54it/s]
INFO:root:micro  precision        0.895722
       recall           0.895722
       fscore           0.895722
macro  precision        0.877247
       recall           0.861811
       fscore           0.865458
       support      38407.000000
dtype: float64
INFO:root:Running model...
Predicting: 100%|██████████████████████| 11957/11957 [00:01<00:00, 11056.60it/s]
INFO:root:micro  precision        0.898951
       recall           0.898951
       fscore           0.898951
macro  precision        0.880854
       recall           0.865183
       fscore           0.869402
       support      38407.000000
dtype: float64
[2]:
import pandas as pd
pd.read_csv('wiki/iawiki-latest/evaluation.csv', index_col=0)
[2]:
model count.min_count count.split count.fold clean.min_count clean.tokenscore_threshold clean.entropy_threshold clean.countratio_threshold clean.cluster_threshold vec.split ... run.split run.fold run.fallback micro.precision micro.recall micro.fscore macro.precision macro.recall macro.fscore .support
0 baseline 2 5 1 2 0.1 1.0 0.5 0.25 5 ... 5 1 /Users/benno/Documents/postdoc/projects/minima... 0.874295 0.874295 0.874295 0.824055 0.803768 0.809093 38805.0
1 model 2 5 1 2 0.1 1.0 0.5 0.25 5 ... 5 1 /Users/benno/Documents/postdoc/projects/minima... 0.877799 0.877799 0.877799 0.827748 0.807085 0.812927 38805.0
2 baseline 2 5 1 2 0.1 1.0 0.5 0.50 5 ... 5 1 /Users/benno/Documents/postdoc/projects/minima... 0.874295 0.874295 0.874295 0.824055 0.803768 0.809093 38805.0
3 model 2 5 1 2 0.1 1.0 0.5 0.50 5 ... 5 1 /Users/benno/Documents/postdoc/projects/minima... 0.878134 0.878134 0.878134 0.827807 0.807175 0.813038 38805.0
4 baseline 2 5 2 2 0.1 1.0 0.5 0.25 5 ... 5 2 /Users/benno/Documents/postdoc/projects/minima... 0.895722 0.895722 0.895722 0.877247 0.861811 0.865458 38407.0
5 model 2 5 2 2 0.1 1.0 0.5 0.25 5 ... 5 2 /Users/benno/Documents/postdoc/projects/minima... 0.898690 0.898690 0.898690 0.880691 0.865038 0.869222 38407.0
6 baseline 2 5 2 2 0.1 1.0 0.5 0.50 5 ... 5 2 /Users/benno/Documents/postdoc/projects/minima... 0.895722 0.895722 0.895722 0.877247 0.861811 0.865458 38407.0
7 model 2 5 2 2 0.1 1.0 0.5 0.50 5 ... 5 2 /Users/benno/Documents/postdoc/projects/minima... 0.898951 0.898951 0.898951 0.880854 0.865183 0.869402 38407.0

8 rows × 23 columns