vsm.viewer.TfViewer

class vsm.viewer.TfViewer(corpus, model)

A class for viewing Term-Frequency model.

Methods

__init__(corpus, model) Initialize TfViewer.
coll_freq(word) Returns the frequency of word in all documents.
coll_freqs([print_len, as_strings]) Returns the frequency of all words in all documents.
dismat_doc(doc_list[, dist_fn]) Calculates a distance matrix for a given list of documents.
dismat_word(word_list[, dist_fn]) Calculates a distance matrix for a given list of words.
dist_doc_doc(doc_or_docs[, weights, ...]) Computes and sorts the distances between a document or list of documents and every document.
dist_word_doc(word_or_words[, weights, ...]) Computes and sorts distances between a word or a list of words to every document.
dist_word_word(word_or_words[, weights, ...]) Returns words sorted by the distances between word(s) and every word.
__init__(corpus, model)

Initialize TfViewer.

Parameters:
  • corpus (Corpus) – Source of observed data.
  • model (TfSeq or TfMulti object.) – A Term-Frequency model.
coll_freq(word)

Returns the frequency of word in all documents.

Parameters:word (string or integer) – Word to which its frequency is retrieved.
Returns:freqency as integer
coll_freqs(print_len=20, as_strings=True)

Returns the frequency of all words in all documents.

Parameters:
  • print_len (integer, optional) – Length of words to display. Default is 20.
  • as_strings (boolean, optional) – If True, words are represented as strings rather than their integer representation.
Returns:

an instance of LabeledColumn. A table with words and their frequencies.

dismat_doc(doc_list, dist_fn=<function angle_sparse at 0x4859cf8>)

Calculates a distance matrix for a given list of documents.

Parameters:
  • doc_list – A list of documents whose distance matrix is to be computed.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle_sparse().
Returns:

an instance of IndexedSymmArray. n x n matrix containing floats where n is the number of documents in doc_list.

See Also:

vsm.viewer.wrappers.dismat_doc()

dismat_word(word_list, dist_fn=<function angle_sparse at 0x4859cf8>)

Calculates a distance matrix for a given list of words.

Parameters:
  • word_list (list strings/integers.) – A list of words whose distance matrix is to be computed.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle_sparse().
Returns:

an instance of IndexedSymmArray. n x n matrix containing floats where n is the number of words in word_list.

See Also:

vsm.viewer.wrappers.dismat_word()

dist_doc_doc(doc_or_docs, weights=[], print_len=10, filter_nan=True, label_fn=<function def_label_fn at 0x49c5a28>, as_strings=True, dist_fn=<function angle_sparse at 0x4859cf8>, order='i')

Computes and sorts the distances between a document or list of documents and every document.

Parameters:
  • doc_or_docs (string/integer or list of strings/integers) – Query document(s) to which distances are calculated.
  • weights (list of floating point, optional) – Specify weights for each query doc in doc_or_docs. Default uses equal weights (i.e. arithmetic mean)
  • print_len (int, optional) – Number of documents to be displayed. Default is 10.
  • filter_nan (boolean, optional) – If True not a number entries are filtered. Default is True.
  • label_fn (string, optional) – A function that defines how documents are represented. Default is def_label_fn() which retrieves the labels from corpus metadata.
  • as_strings (boolean, optional) – If True, returns a list of documents as strings rather than indices. Default is True.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle_sparse().
  • order (string, optional) – Order of sorting. ‘i’ for increasing and ‘d’ for decreasing order. Default is ‘i’.
Returns:

an instance of LabeledColumn. A 2-dim array containing documents and their distances to doc_or_docs.

See Also:

vsm.viewer.wrappers.dist_doc_doc()

dist_word_doc(word_or_words, weights=[], label_fn=<function def_label_fn at 0x49c5a28>, filter_nan=True, print_len=10, as_strings=True, dist_fn=<function angle_sparse at 0x4859cf8>, order='i')

Computes and sorts distances between a word or a list of words to every document.

Parameters:
  • word_or_words (string/integer or list of strings/integers) – Query word(s) to which a pseudo-document is created for computation of distances.
  • weights (list of floating point, optional) – Specify weights for each query doc in word_or_words. Default uses equal weights (i.e. arithmetic mean)
  • print_len (int, optional) – Number of documents to be displayed. Default is 10.
  • filter_nan (boolean, optional) – If True not a number entries are filtered. Default is True.
  • label_fn (string, optional) – A function that defines how documents are represented. Default is def_label_fn() which retrieves the labels from corpus metadata.
  • as_strings (boolean, optional) – If True, returns a list of documents as strings rather than indices. Default is True.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle_sparse().
  • order (string, optional) – Order of sorting. ‘i’ for increasing and ‘d’ for decreasing order. Default is ‘i’.
Returns:

an instance of LabeledColumn. A 2-dim array containing documents and their distances to word_or_words.

See Also:

vsm.viewer.wrappers.dist_word_doc()

dist_word_word(word_or_words, weights=[], filter_nan=True, print_len=10, as_strings=True, dist_fn=<function angle_sparse at 0x4859cf8>, order='i')

Returns words sorted by the distances between word(s) and every word.

Parameters:
  • word_or_words (string or list of strings) – Query word(s) to which distances are calculated.
  • weights (list of floating point, optional) – Specify weights for each query word in word_or_words. Default uses equal weights (i.e. arithmetic mean)
  • filter_nan (boolean, optional) – If True not a number entries are filtered. Default is True.
  • print_len (int, optional) – Number of words to be displayed. Default is 10.
  • as_strings (boolean, optional) – If True, returns a list of words as strings rather than their integer representations. Default is True.
  • dist_fn (string, optional) – A distance function from functions in vsm.spatial. Default is angle_sparse().
  • order (string, optional) – Order of sorting. ‘i’ for increasing and ‘d’ for decreasing order. Default is ‘i’.
Returns:

an instance of LabeledColumn. A 2-dim array containing words and their distances to word_or_words.

See Also:

vsm.viewer.wrappers.dist_word_word()

Previous topic

vsm.viewer.TfIdfViewer

This Page