minimel.count module

Count targets per anchor text in Wikipedia paragraphs

minimel.count.count(paragraphlinks: Path, *, outfile: Path | None = None, min_count: int = 2, stem: str | None = None, head: int | None = None, split: int | None = None, fold: int | None = None)

Count targets per anchor text in Wikipedia paragraphs.

Writes count.min{min_count}[.stem-{LANG}].json

Parameters:
Keyword Arguments:
  • outfile – Output file or directory (default: count.json)

  • stem – Stemming language ISO 639-1 (2-letter) code

  • min_count – Minimal (anchor-text, target) occurrence

  • head – Use only N first lines from each partition

  • split – Split the data into several parts

  • fold – Ignore this fold of the split data