vsm.model.LdaCgsMulti

class vsm.model.LdaCgsMulti(corpus=None, context_type=None, K=20, V=0, alpha=[], beta=[])

An implementation of LDA using collapsed Gibbs sampling with multi-processing.

Methods

__init__([corpus, context_type, K, V, ...]) Initialize LdaCgsMulti.
load(filename) A static method for loading a saved LdaCgsMulti model.
save(filename) Saves the model in an .npz file.
train([n_iterations, verbose, n_proc, seeds]) Takes an optional argument, n_iterations and updates the model n_iterations times.
__init__(corpus=None, context_type=None, K=20, V=0, alpha=[], beta=[])

Initialize LdaCgsMulti.

Parameters:
  • corpus (Corpus) – Source of observed data.
  • context_type (string, optional) – Name of tokenization stored in corpus whose tokens will be treated as documents.
  • K (int, optional) – Number of topics. Default is 20.
  • beta (list, optional) – Topic priors. Default is 0.01 for all topics.
  • alpha (list, optional) – Context priors. Default is a flat prior of 0.01 for all contexts.
static load(filename)

A static method for loading a saved LdaCgsMulti model.

Parameters:filename (string) – Name of a saved model to be loaded.
Returns:m : LdaCgsMulti object
See Also:numpy.load
save(filename)

Saves the model in an .npz file.

Parameters:filename (string) – Name of a saved model to be loaded.
See Also:numpy.savez
train(n_iterations=500, verbose=True, n_proc=2, seeds=None)

Takes an optional argument, n_iterations and updates the model n_iterations times.

Parameters:
  • n_iterations (int, optional) – Number of iterations. Default is 500.
  • verbose (boolean, optional) – If True, current number of iterations are printed out to notify the user. Default is True.
  • n_proc (int, optional) – Number of processors used for training. Default is 2.
  • seeds (list of integers, optional) – List of random seeds, one for each thread. The length of the list should be same as n_proc. Default is None.

Previous topic

vsm.model.LdaCgsSeq

Next topic

vsm.model.Lsa

This Page