topicexplorer launch

Starts a web server and displays the Topic Explorer visualizations.

Visualizations

Often, a topic is represented by the top 10 highest probability words in the topic’s word distribution. However, it is important to recognize that these words do not fully represent the topic. The Topic Explorer visualizations show topics using the full distributions of the model. The topic map shows topics in relation to other topics through their complete word distributions. The hypershelf shows topics in relation to documents (the hypershelf).

Topic Map

The topic map places the topics from the all the trained models on a two-dimensional map that attempts to place similar topics close to each other. It uses the isomap method to reduce the multi-dimensional topic space to two dimensions.

The clusters and colors are determined automatically by an algorithm, and provide only a rough guide to groups of topics that have similar themes. The different axes also do not have any intrinsic meaning, but are often interpretable as representing historical or thematic dimensions in the underlying corpus.

Checking the collision detection checkbox will minimize overlap among the nodes but distort the underlying similarity relationships.

The nodes are scaled according to the number of topics in the corresponding model. Larger circles correspond to models with fewer topics. You can control which models are included in the map by clicking on the numbers on the left to toggle the corresponding models off and on.

You may also enter words in the search box to have the map change shading to help you find topics related to the words.

Clicking on any topic circle will take you to the hypershelf with the top documents for that topic already selected.

Hypershelf

The Hypershelf shows up to 40 documents that are most similar to the focal document. Each document is represented by a bar whose colors show the mixture and proportions of topics assigned to each document by the training process. The relative lengths of the bars indicate the degree of similarity to the focal document according to the topic mixtures.

Rolling over a colored segment shows the highest probability words associated with the topic. The key on the right shows all the topics identified by the model. If you click on a topic in the bar or the key, the display will sort the current documents ranked according to that topic. In this topic-sorted mode, a Top Documents button appears at the top that lets you retrieve the documents from the entire corpus that are most similar to that topic.

Focal Document

To select a new focal document you can:

  • Start typing a few letters in the focal document entry area;
  • Click the crossed arrows button to the right of the focal document entry area for a random document;
  • Refocus on one of the already-displayed documents by moving the cursor just to the left of the topic bar and clicking on the arrow that appears.

You may use the button to the right of the random document button to visualize the focal document and you may use the dropdown menu attached to the button to switch to a model with a different number of topics.

Other Options

Below the key are some additional display options that let you sort the displayed documents alphabetically, or to normalize the bar lengths so that you can compare the document mixtures more directly.

Other icons to the left of each topic bar allow you to view the document contents, or see a “fingerprint” of the topic mixtures for that document in all the available models with different numbers of topics. Clicking on a bar in the fingerprint will take you to a hypershelf focused on the selected document with that given model.

The numbers in the menu on the left can be used to navigate directly to a model with that number of topics.

Above the numbers on the left, the topic cluster button will take you to a different interface that lets you explore topic similarity across the models.

The home button at the top left will take you to a general information page about the corpus and models.

Command Line Arguments

Hostname (--host)

Hostname for the server instance. Set to 0.0.0.0 to listen on all names.

Default: 127.0.0.1 (localhost)

Port (--port)

Port number for the server instance.

Default: 8000

No browser launch (--no-browser)

By default, topicexplorer launch will open the server instance in the default browser. With --no-browser, only the server daemon will run.

Quiet Mode (-q)

Suppresses all user input requests. Uses default values unless otherwise specified by other argument flags. Very useful for scripting automated pipelines.