The vsm module provides tools and a workflow for producing semantic models of textual corpora and analyzing and visualizing these models.
The vsm module has been conceived within the SciPy ecosystem. In a typical work flow, a collection of texts is first transformed into a Corpus object, whose underlying data structures are NumPy numerical arrays. The user may then feed a Corpus object to one of the model classes, which contain the algorithms, implemented in NumPy, SciPy and IPython.parallel, for training models such as TF, TFIDF, LSA, BEAGLE, or LDA. Finally, the user may examine the results with a Viewer class specialized to a particular model type. A Viewer object contains a variety of methods for analysis and visualization and achieves its full functionality within an IPython notebook session extended with matplotlib and scikit-learn.
Classes
BeagleComposite(ctx_corp, ctx_matrix, ...[, ...]) | BeagleComposite combines the BEAGLE order and context model |
BeagleContextMulti(corpus, env_corpus, ...) | |
BeagleContextSeq(corpus, env_corpus, env_matrix) | |
BeagleEnvironment(corpus[, n_cols, dtype, ...]) | BeagleEnvironment is a randomly generated fixed vectors |
BeagleOrderMulti(corpus, env_matrix[, ...]) | BeagleOrderSeq stores word order information in the context. |
BeagleOrderSeq(corpus, env_matrix[, ...]) | BeagleOrderSeq stores word order information in the context. |
BeagleViewer(corpus, model) | A class for viewing BEAGLE models. |
Corpus(corpus[, context_types, ...]) | The goal of the Corpus class is to provide an efficient representation of a textual corpus. |
LdaCgsSeq([corpus, context_type, K, V, ...]) | An implementation of LDA using collapsed Gibbs sampling. |
LdaCgsMulti([corpus, context_type, K, V, ...]) | An implementation of LDA using collapsed Gibbs sampling with multi-processing. |
LdaCgsViewer(corpus, model) | A class for viewing a topic model estimated by one of vsm’s LDA |
Lsa([td_matrix, dtype, context_type]) | |
LsaViewer(corpus, model) | A class for viewing LSA model. |
TfIdf([tf_matrix, dtype, context_type]) | Transforms a term-frequency model into a term-frequency inverse-document-frequency model. |
TfIdfViewer(corpus, model) | A class for viewing Term frequency-Inverse document Frequency model. |
TfMulti([corpus, context_type]) | Trains a term-frequency model. |
TfSeq([corpus, context_type]) | Trains a term-frequency model. |
TfViewer(corpus, model) | A class for viewing Term-Frequency model. |