A class for viewing a topic model estimated by one of vsm’s LDA classes using CGS.
Methods
__init__(corpus, model) | Initialize LdaCgsViewer. |
dismat_doc([docs, dist_fn]) | Calculates the distance matrix for a given list of documents. |
dismat_top([topics, dist_fn]) | Calculates the distance matrix for a given list of topics. |
dist_doc_doc(doc_or_docs[, print_len, ...]) | Computes and sorts the distances between a document or list of documents and every document in the topic space. |
dist_top_doc(topic_or_topics[, weights, ...]) | Takes a topic or list of topics (by integer index) and returns a list of documents sorted by distance. |
dist_top_top(topic_or_topics[, weights, ...]) | Takes a topic or list of topics (by integer index) and returns a list of topics sorted by the distances between a given topic and every topic. |
dist_word_top(word_or_words[, weights, ...]) | Sorts topics according to their distance to the query word_or_words. |
doc_topics(doc_or_docs[, sort_by_entropy, ...]) | Returns the distribution over topics for the given documents. |
logp_plot([range, step, show, grid]) | Returns a plot of log probabilities for the specified range of |
topic_entropies([print_len]) | Returns the entropies of the topics of the model as an array sorted |
topic_hist([topic_indices, d_indices, show]) | Draws a histogram showing the proportion of topics within a set of documents specified by d_indices. |
topics([print_len, topic_indices, ...]) | Returns a list of topics estimated by the model. |
word_topics(word[, as_strings]) | Searches for every occurrence of word in the entire corpus and returns |
Initialize LdaCgsViewer.
Parameters: |
|
---|
Calculates the distance matrix for a given list of documents.
Parameters: |
|
---|---|
Returns: | an instance of IndexedSymmArray. n x n matrix containing floats where n is the number of documents. |
See Also: | vsm.viewer.wrapper.dismat_documents() |
Calculates the distance matrix for a given list of topics.
Parameters: |
|
---|---|
Returns: | an instance of IndexedSymmArray. n x n matrix containing floats where n is the number of topics considered. |
See Also: | vsm.viewer.wrapper.dismat_top() |
Computes and sorts the distances between a document or list of documents and every document in the topic space.
Parameters: | doc_or_docs – Query document(s) relative to which |
---|
distances are computed. :type doc_or_docs: string/integer or list of strings/integer.
Parameters: |
|
---|---|
Returns: | an instance of LabeledColumn. A 2-dim array containing documents and their distances to doc_or_docs. |
See Also: | vsm.viewer.wrapper.dist_doc_doc() |
Takes a topic or list of topics (by integer index) and returns a list of documents sorted by distance.
Parameters: |
|
---|---|
Returns: | an instance of LabeledColumn. A 2-dim array containing documents and their posterior probabilities to topic_or_topics. |
See Also: | def_label_fn(), vsm.viewer.wrapper.dist_top_doc() |
Takes a topic or list of topics (by integer index) and returns a list of topics sorted by the distances between a given topic and every topic.
Parameters: |
|
---|---|
Returns: | an instance of LabeledColumn. A 2-dim array containing topics and their distances to topic_or_topics. |
See Also: | vsm.viewer.wrapper.dist_top_top() |
Sorts topics according to their distance to the query word_or_words.
A pseudo-topic from word_or_words as follows. If weights are not provided, the word list is represented in the space of topics as a topic which assigns equal non-zero probability to each word in words and 0 to every other word in the corpus. Otherwise, each word in words is assigned the provided weight.
Parameters: |
|
---|---|
Returns: | an instance of LabeledColumn. A structured array of topics sorted by their distances with word_or_words. |
See Also: | vsm.viewer.wrapper.dist_word_top() |
Returns the distribution over topics for the given documents.
Parameters: |
|
---|---|
Returns: | an instance of LabeledColumn or of :class: DataTable. An structured array of topics (represented by their number) and their corresponding probabilities or a list of such arrays. |
Returns a plot of log probabilities for the specified range of the MCMC chain used to fit a topic model by LDAGibbs. The function requires matplotlib package.
Parameters: |
|
---|---|
Returns: | an instance of matplotlib.pyplot object. Contains the log probability plot. |
Returns the entropies of the topics of the model as an array sorted by entropy.
Draws a histogram showing the proportion of topics within a set of documents specified by d_indices.
Parameters: |
|
---|---|
Returns: | an instance of matplotlib.pyplot object. Contains the topic proportion histogram. |
Returns a list of topics estimated by the model. Each topic is represented by a list of words and the corresponding probabilities.
Parameters: |
|
---|---|
Returns: | an instance of DataTable. A structured array of topics. |
Searches for every occurrence of word in the entire corpus and returns a list each row of which contains the name or ID number of document, the relative position in the document, and the assigned topic number for each occurrence of word.
Parameters: |
|
---|---|
Returns: | an instance of LabeledColumn. A structured array consisting of three columns. Each column is a list of: (1) name/ID of document containing word (2) relative position of word in the document (3) Topic number assigned to the token. |