vsm.model.TfMulti

class vsm.model.TfMulti(corpus=None, context_type=None)

Trains a term-frequency model.

In a term-frequency model, the number of occurrences of a word type in a context is counted for all word types and contexts. Word types correspond to matrix rows and contexts correspond to matrix columns.

The data structure is a sparse integer matrix.

See Also:vsm.model.base.BaseModel, vsm.corpus.Corpus, scipy.sparse.coo_matrix

Methods

__init__([corpus, context_type]) Initialize TfMulti.
load(f) Takes a filename or file object and loads it as an npz archive
save(f) Takes a filename or file object and saves self.matrix in an npz archive.
train(n_procs) Takes a number of processes n_procs over which to map and reduce.
__init__(corpus=None, context_type=None)

Initialize TfMulti.

Parameters:
  • corpus (Corpus, optional) – A Corpus object containing the training data
  • context_type (string, optional) – A string specifying the type of context over which the model trainer is applied.
static load(f)

Takes a filename or file object and loads it as an npz archive into a BaseModel object.

Parameters:file (str-like or file-like object) – Designates the file to read. If file is a string ending in .gz, the file is first gunzipped. See numpy.load for further details.
Returns:A dictionary storing the data found in file.
See Also:numpy.load()
save(f)

Takes a filename or file object and saves self.matrix in an npz archive.

Parameters:file (str-like or file-like object) – Designates the file to which to save data. See numpy.savez for further details.
Returns:None
See Also:numpy.savez()
train(n_procs)

Takes a number of processes n_procs over which to map and reduce.

Parameters:n_procs (int) – Number of processors.

Previous topic

vsm.model.TfIdf

Next topic

vsm.model.TfSeq

This Page