minimel.count module
Count targets per anchor text in Wikipedia paragraphs
- minimel.count.count_links(lines, stem=None, head=None, split=None, fold=None)
- minimel.count.count(paragraphlinks: Path, *, outfile: Path | None = None, min_count: int = 2, stem: str | None = None, head: int | None = None, split: int | None = None, fold: int | None = None)
Count targets per anchor text in Wikipedia paragraphs.
Writes count.min{min_count}[.stem-{LANG}].json
- Parameters:
- Keyword Arguments:
outfile – Output file or directory (default: count.json)
stem – Stemming language ISO 639-1 (2-letter) code
min_count – Minimal (anchor-text, target) occurrence
head – Use only N first lines from each partition
split – Split the data into several parts
fold – Ignore this fold of the split data