vsm.model.LdaCgsMulti¶

class vsm.model.LdaCgsMulti(corpus=None, context_type=None, K=20, V=0, alpha=[], beta=[])¶

An implementation of LDA using collapsed Gibbs sampling with multi-processing.

Methods

`__init__`([corpus, context_type, K, V, ...])	Initialize LdaCgsMulti.
`load`(filename)	A static method for loading a saved LdaCgsMulti model.
`save`(filename)	Saves the model in an .npz file.
`train`([n_iterations, verbose, n_proc, seeds])	Takes an optional argument, n_iterations and updates the model n_iterations times.

__init__(corpus=None, context_type=None, K=20, V=0, alpha=[], beta=[])¶

Initialize LdaCgsMulti.

Parameters:

corpus (Corpus) – Source of observed data.
context_type (string, optional) – Name of tokenization stored in corpus whose tokens will be treated as documents.
K (int, optional) – Number of topics. Default is 20.
beta (list, optional) – Topic priors. Default is 0.01 for all topics.
alpha (list, optional) – Context priors. Default is a flat prior of 0.01 for all contexts.

static load(filename)¶

A static method for loading a saved LdaCgsMulti model.

Parameters:	filename (string) – Name of a saved model to be loaded.
Returns:	m : LdaCgsMulti object
See Also:	`numpy.load`

save(filename)¶

Saves the model in an .npz file.

Parameters:	filename (string) – Name of a saved model to be loaded.
See Also:	`numpy.savez`

train(n_iterations=500, verbose=True, n_proc=2, seeds=None)¶

Takes an optional argument, n_iterations and updates the model n_iterations times.

Parameters:

n_iterations (int, optional) – Number of iterations. Default is 500.
verbose (boolean, optional) – If True, current number of iterations are printed out to notify the user. Default is True.
n_proc (int, optional) – Number of processors used for training. Default is 2.
seeds (list of integers, optional) – List of random seeds, one for each thread. The length of the list should be same as n_proc. Default is None.