minimel.run module
- minimel.run.vectorize_text(texts, vectorizer=None, dim=None)
- minimel.run.get_scores(golds, preds, per_name=False)
- class minimel.run.MiniNED(dawgfile: Path, candidatefile: Path | None = None, modelfile: Path | None = None, vectorizer: Path | None = None, ent_feats_csv: Path | None = None, lang: str | None = None, fallback: Path | None = None)
Bases:
object- Parameters:
- minimel.run.run(dawgfile: Path, candidatefile: Path | None = None, modelfile: Path | None = None, *runfiles: Path, outfile: Path | None = None, vectorizer: Path | None = None, ent_feats_csv: Path | None = None, lang: str | None = None, fallback: Path | None = None, evaluate: bool = False, evalfile: Path | None = None, evalfile_per_name: Path | None = None, predict_only: bool = True, all_scores: bool = False, upperbound: bool = False, split: int | None = None, fold: int | None = None)
Perform entity disambiguation
- Parameters:
dawgfile (
Path) – DAWG trie file of Wikipedia > Wikidata countcandidatefile (
Optional[Path]) – Candidate {name -> [ID]} jsonrunfiles (
Path) – Input file (- or absent for standard input). TSV rows of (ID, {name -> ID}, text) or ({name -> ID}, text) or (text)evaluate (
bool)predict_only (
bool)all_scores (
bool)upperbound (
bool)
- Keyword Arguments:
outfile – Write outputs to file (default: stdout)
vectorizer – Scikit-learn vectorizer .pickle or Fasttext .bin word embeddings. If unset, use HashingVectorizer.
ent_feats_csv – CSV of (ent_id,space separated feat list) entity features
fallback – Additional fallback deterministic name -> ID json
evaluate – Report evaluation scores instead of predictions
evalfile – Write evaluation results to file
evalfile_per_name – Write evaluation results per name to file
predict_only – Only print predictions, not original text
all_scores – Output all candidate scores
upperbound – Create upper bound on performance
split – Split the data into several parts
fold – Use only this fold of the split data