dariah.mallet package

Submodules

dariah.mallet.api module

dariah.mallet.api

This module implements the high-level API to communicate with the CLI interface of MALLET.

class dariah.mallet.api.MALLET(executable)

Bases: object

Machine Learning for Language Toolkit (MALLET).

bulk_load(**parameters)

For big input files, efficiently prune vocabulary and import docs.

classify_dir(**parameters)

Classify the contents of a directory with a saved classifier.

classify_file(**parameters)

Classify data from a single file with a saved classifier.

classify_svmlight(**parameters)

Classify data from a single file in SVMLight format.

evaluate_topics(**parameters)

Estimate the probability of new documents under a trained model.

import_dir(**parameters)

Load contents of a directory into MALLET instances.

import_file(**parameters)

Load a file into MALLET instances.

import_svmlight(**parameters)

Load SVMLight data files into MALLET instances.

infer_topics(**parameters)

Use a trained topic model to infer topics for new documents.

info(**parameters)

Get information about MALLET instances.

prune(**parameters)

Remove features based on frequency or information gain.

split(**parameters)

Divide data into testing, training, and validation portions.

train_classifier(**parameters)

Train a classifier from MALLET data files.

train_topics(**parameters)

Train a topic model from MALLET data files.

dariah.mallet.core module

dariah.mallet.core

This module implements the core functions of the MALLET sub-package.

dariah.mallet.core.call(command, executable, **parameters)

Call MALLET.

Parameter:

command (str): Command for MALLET. executable (str): Path to MALLET executable. **parameter: Additional parameters for MALLET.

Returns:

True, if call was successful.

dariah.mallet.utils module

dariah.utils

This module implements general helper functions.

dariah.mallet.utils.call(args: list) → bool

Call a subprocess.

Parameter:

args (list): The subprocess’ arguments.

Returns:

True, if call was successful.

Module contents