vsmΒΆ

The vsm module provides tools and a workflow for producing semantic models of textual corpora and analyzing and visualizing these models.

The vsm module has been conceived within the SciPy ecosystem. In a typical work flow, a collection of texts is first transformed into a Corpus object, whose underlying data structures are NumPy numerical arrays. The user may then feed a Corpus object to one of the model classes, which contain the algorithms, implemented in NumPy, SciPy and IPython.parallel, for training models such as TF, TFIDF, LSA, BEAGLE, or LDA. Finally, the user may examine the results with a Viewer class specialized to a particular model type. A Viewer object contains a variety of methods for analysis and visualization and achieves its full functionality within an IPython notebook session extended with matplotlib and scikit-learn.

Classes

BeagleComposite(ctx_corp, ctx_matrix, ...[, ...]) BeagleComposite combines the BEAGLE order and context model
BeagleContextMulti(corpus, env_corpus, ...)
BeagleContextSeq(corpus, env_corpus, env_matrix)
BeagleEnvironment(corpus[, n_cols, dtype, ...]) BeagleEnvironment is a randomly generated fixed vectors
BeagleOrderMulti(corpus, env_matrix[, ...]) BeagleOrderSeq stores word order information in the context.
BeagleOrderSeq(corpus, env_matrix[, ...]) BeagleOrderSeq stores word order information in the context.
BeagleViewer(corpus, model) A class for viewing BEAGLE models.
Corpus(corpus[, context_types, ...]) The goal of the Corpus class is to provide an efficient representation of a textual corpus.
LdaCgsSeq([corpus, context_type, K, V, ...]) An implementation of LDA using collapsed Gibbs sampling.
LdaCgsMulti([corpus, context_type, K, V, ...]) An implementation of LDA using collapsed Gibbs sampling with multi-processing.
LdaCgsViewer(corpus, model) A class for viewing a topic model estimated by one of vsm’s LDA
Lsa([td_matrix, dtype, context_type])
LsaViewer(corpus, model) A class for viewing LSA model.
TfIdf([tf_matrix, dtype, context_type]) Transforms a term-frequency model into a term-frequency inverse-document-frequency model.
TfIdfViewer(corpus, model) A class for viewing Term frequency-Inverse document Frequency model.
TfMulti([corpus, context_type]) Trains a term-frequency model.
TfSeq([corpus, context_type]) Trains a term-frequency model.
TfViewer(corpus, model) A class for viewing Term-Frequency model.

Previous topic

vsm.viewer.TfViewer

Next topic

vsm.corpus

This Page