minimel.run module

minimel.run.vectorize_text(texts, vectorizer=None, dim=None)

minimel.run.get_scores(golds, preds, per_name=False)

Bases: object

Parameters:

dawgfile (Path)
candidatefile (Optional[Path])
modelfile (Optional[Path])
vectorizer (Optional[Path])
ent_feats_csv (Optional[Path])
lang (Optional[str])
fallback (Optional[Path])

predict(text: str, name: str, upperbound=None, all_scores=False)

Make NED prediction.

Parameters:

text (str) – Some text
name (str) – An entity name in text

Keyword Arguments:

all_scores – Output all candidate scores
upperbound – Create upper bound on performance

Returns:

Wikidata ID

minimel.run.run(dawgfile: Path, candidatefile: Path | None = None, modelfile: Path | None = None, *runfiles: Path, outfile: Path | None = None, vectorizer: Path | None = None, ent_feats_csv: Path | None = None, lang: str | None = None, fallback: Path | None = None, evaluate: bool = False, evalfile: Path | None = None, evalfile_per_name: Path | None = None, predict_only: bool = True, all_scores: bool = False, upperbound: bool = False, split: int | None = None, fold: int | None = None)

Perform entity disambiguation

Parameters:

dawgfile (Path) – DAWG trie file of Wikipedia > Wikidata count
candidatefile (Optional[Path]) – Candidate {name -> [ID]} json
modelfile (Optional[Path]) – Vowpal Wabbit model
runfiles (Path) – Input file (- or absent for standard input). TSV rows of (ID, {name -> ID}, text) or ({name -> ID}, text) or (text)
outfile (Optional[Path])
vectorizer (Optional[Path])
ent_feats_csv (Optional[Path])
lang (Optional[str])
fallback (Optional[Path])
evaluate (bool)
evalfile (Optional[Path])
evalfile_per_name (Optional[Path])
predict_only (bool)
all_scores (bool)
upperbound (bool)
split (Optional[int])
fold (Optional[int])

Keyword Arguments:

outfile – Write outputs to file (default: stdout)
vectorizer – Scikit-learn vectorizer .pickle or Fasttext .bin word embeddings. If unset, use HashingVectorizer.
ent_feats_csv – CSV of (ent_id,space separated feat list) entity features
fallback – Additional fallback deterministic name -> ID json
evaluate – Report evaluation scores instead of predictions
evalfile – Write evaluation results to file
evalfile_per_name – Write evaluation results per name to file
predict_only – Only print predictions, not original text
all_scores – Output all candidate scores
upperbound – Create upper bound on performance
split – Split the data into several parts
fold – Use only this fold of the split data

minimel.run.evaluate(goldfile: Path, *predfiles: Path, agg: List[Path] = (), evalfile: Path | None = None)

Evaluate predictions

Parameters:

gold – Gold data TSV
pred – Prediction TSVs
goldfile (Path)
predfiles (Path)
agg (List[Path])
evalfile (Optional[Path])

Keyword Arguments:

agg – Aggregation jsons (TODO: depend on data…?)
evalfile – Write evaluation results to file