Topic Explorer Introduction¶
The InPhO Topic Explorer provides an integrated system for text modeling making it simple to go from a set of documents to an interactive visualization of LDA topic models. More advanced analysis is made possible by a built-in pipeline to Jupyter notebooks.
Live demos trained on the Stanford Encyclopedia of Philosophy, a selection of books from the HathiTrust Digital Library, a collection of Chinese-language texts and the original LDA training set of Associated Press articles are available at https://www.hypershelf.org.
Installation¶
Install Anaconda for Python 3.6. During “Advanced Options” choose “Add Anaconda to my PATH environment variable”.
Open a Terminal (Mac and Linux) or PowerShell (Windows).
Run
pip install --pre topicexplorer
.Note:
--pre
has two - characters.Test installation by typing
topicexplorer -h
to print usage instructions.
Example Workflow¶
The Topic Explorer is a 4-step process. Each step creates or modifies a
.ini
file that defines the links to corpus and model files, along with
other configuration options.
The Topic Explorer is run from the Terminal (macOS or Linux) or PowerShell (Windows).
Initialize the Topic Explorer on a file, folder of text files, or folder of folders:
topicexplorer init example
example
will be replaced with the folder you select. A configuration file calledexample.ini
will be generated.Prepare the corpus for modeling by removing common words to improve topic quality and removing uncommon words to improve modeling speed:
topicexplorer prep example
If you are unsure of what to select, we encourage experimentation but recommend the following settings:
topicexplorer prep example --high-percent 50 --low-percent 10 --min-word-len 3 -q
Train LDA models using the on-screen instructions:
topicexplorer train example
Launch the Topic Explorer:
topicexplorer launch example
Press Ctrl+C to quit the server instance.
See also
topicexplorer init
- More details on the data import step.
topicexplorer prep
- More details on the data preparation step.
topicexplorer train
- More details on topic modeling.
topicexplorer launch
- More details on the visualization interfaces.
topicexplorer notebook
- More details on the notebook interface.
Licensing and Attribution¶
The project is released under an MIT License.
Visualizations generated with the Topic Explorer are licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
The project may be cited as:
Jaimie Murdock and Colin Allen. (2015) Visualization Techniques for Topic Model Checking in Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI-15). Austin, Texas, USA, January 25-29, 2015. https://hypershelf.org/
Collaboration and Maintenance¶
The InPhO Topic Explorer is maintained by Jaimie Murdock:
- E-mail: mailto:jammurdo@indiana.edu
- Twitter: @JaimieMurdock
- GitHub: @JaimieMurdock
- Homepage http://jamram.net
Please report issues on the issue tracker or contact Jaimie directly.