vsm.viewer.LsaViewer

class vsm.viewer.LsaViewer(corpus, model)

A class for viewing LSA model.

Methods

__init__(corpus, model) Initialize LsaViewer.
dismat_doc(doc_list[, dist_fn]) Calculates a distance matrix for a given list of documents.
dismat_word(word_list[, dist_fn]) Calculates a distance matrix for a given list of words.
dist_doc_doc(doc_or_docs[, weights, ...]) Computes and sorts the distances between a document or list of documents and every document.
dist_word_doc(word_or_words[, weights, ...]) Computes and sorts distances between a word or a list of words to every document.
dist_word_word(word_or_words[, weights, ...]) Computes and sorts the distances between a word or list of words and every word.
__init__(corpus, model)

Initialize LsaViewer.

Parameters:
  • corpus (Corpus) – Source of observed data.
  • model (Lsa) – An LSA model.
dismat_doc(doc_list, dist_fn=<function angle at 0x4859c80>)

Calculates a distance matrix for a given list of documents.

Parameters:
  • doc_list – A list of documents whose distance matrix is to be computed.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle().
Returns:

an instance of IndexedSymmArray. n x n matrix containing floats where n is the number of documents.

See Also:

vsm.viewer.wrappers.dismat_doc()

dismat_word(word_list, dist_fn=<function angle at 0x4859c80>)

Calculates a distance matrix for a given list of words.

Parameters:
  • word_list (list) – A list of words whose similarity matrix is to be computed.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle().
Returns:

an instance of IndexedSymmArray. n x n matrix containing floats where n is the number of words in word_list.

See Also:

vsm.viewer.wrappers.dismat_word()

dist_doc_doc(doc_or_docs, weights=[], print_len=10, filter_nan=True, label_fn=<function def_label_fn at 0x49c5a28>, as_strings=True, dist_fn=<function angle at 0x4859c80>, order='i')

Computes and sorts the distances between a document or list of documents and every document.

Parameters:
  • doc_or_docs (string/integer or list of strings/integers) – Query document(s) to which distances are calculated.
  • weights (list of floating point, optional) – Specify weights for each query doc in doc_or_docs. Default uses equal weights (i.e. arithmetic mean)
  • print_len (int, optional) – Number of words to be displayed. Default is 10.
  • filter_nan (boolean, optional) – If True not a number entries are filtered. Default is True.
  • label_fn (string, optional) – A function that defines how documents are represented. Default is def_label_fn which retrieves the labels from corpus metadata.
  • as_strings (boolean, optional) – If True, returns a list of words rather than their integer representations. Default is True.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle().
  • order (string, optional) – Order of sorting. ‘i’ for increasing and ‘d’ for decreasing order. Default is ‘i’.
Returns:

an instance of LabeledColumn. A 2-dim array containing documents and their distances to doc_or_docs.

See Also:

vsm.viewer.wrappers.dist_doc_doc()

dist_word_doc(word_or_words, weights=[], label_fn=<function def_label_fn at 0x49c5a28>, filter_nan=True, print_len=10, as_strings=True, dist_fn=<function angle at 0x4859c80>, order='i')

Computes and sorts distances between a word or a list of words to every document.

Parameters:
  • word_or_words (string/integer or list of strings/integers) – Query word(s) to which a pseudo-document is created for computation of distances.
  • weights (list of floating point, optional) – Specify weights for each query doc in word_or_words. Default uses equal weights (i.e. arithmetic mean)
  • print_len (int, optional) – Number of documents to be displayed. Default is 10.
  • filter_nan (boolean, optional) – If True not a number entries are filtered. Default is True.
  • label_fn (string, optional) – A function that defines how documents are represented. Default is def_label_fn() which retrieves the labels from corpus metadata.
  • as_strings (boolean, optional) – If True, returns a list of documents as strings rather than indices. Default is True.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle().
  • order (string, optional) – Order of sorting ‘i’ for increasing and ‘d’ for decreasing order. Default is ‘i’.
Returns:

an instance of LabeledColumn. A 2-dim array containing documents and their distances to word_or_words.

See Also:

vsm.viewer.wrappers.dist_word_doc()

dist_word_word(word_or_words, weights=[], filter_nan=True, print_len=10, as_strings=True, dist_fn=<function angle at 0x4859c80>, order='i')

Computes and sorts the distances between a word or list of words and every word.

Parameters:
  • word_or_words (string or list of strings) – Query word(s) to which distances are calculated.
  • weights (list of floating point, optional) – Specify weights for each query word in word_or_words. Default uses equal weights (i.e. arithmetic mean)
  • filter_nan (boolean, optional) – If True not a number entries are filtered. Default is True.
  • print_len (int, optional) – Number of words to be displayed. Default is 10.
  • as_strings (boolean, optional) – If True, returns a list of words as strings rather than their integer representations. Default is True.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle().
  • order (string, optional) – Order of sorting. ‘i’ for increasing and ‘d’ for decreasing order. Default is ‘i’.
Returns:

an instance of LabeledColumn. A 2-dim array containing words and their distances to word_or_words.

See Also:

vsm.viewer.wrappers.dist_word_word()

Previous topic

vsm.viewer.LdaCgsViewer

Next topic

vsm.viewer.TfIdfViewer

This Page