Topic Explorer Introduction¶
The InPhO Topic Explorer provides an integrated system for text modeling making it simple to go from a set of documents to an interactive visualization of LDA topic models. More advanced analysis is made possible by a built-in pipeline to Jupyter notebooks.
Live demos trained on the Stanford Encyclopedia of Philosophy, a selection of books from the HathiTrust Digital Library, a collection of Chinese-language texts and the original LDA training set of Associated Press articles are available at https://www.hypershelf.org.
Installation¶
- Install Anaconda for Python 3.6. During “Advanced Options” choose “Add Anaconda to my PATH environment variable”. 
- Open a Terminal (Mac and Linux) or PowerShell (Windows). 
- Run - pip install --pre topicexplorer.- Note: - --prehas two - characters.
- Test installation by typing - topicexplorer -hto print usage instructions.
Example Workflow¶
The Topic Explorer is a 4-step process. Each step creates or modifies a
.ini file that defines the links to corpus and model files, along with
other configuration options.
The Topic Explorer is run from the Terminal (macOS or Linux) or PowerShell (Windows).
- Initialize the Topic Explorer on a file, folder of text files, or folder of folders: - topicexplorer init example - examplewill be replaced with the folder you select. A configuration file called- example.iniwill be generated.
- Prepare the corpus for modeling by removing common words to improve topic quality and removing uncommon words to improve modeling speed: - topicexplorer prep example - If you are unsure of what to select, we encourage experimentation but recommend the following settings: - topicexplorer prep example --high-percent 50 --low-percent 10 --min-word-len 3 -q 
- Train LDA models using the on-screen instructions: - topicexplorer train example 
- Launch the Topic Explorer: - topicexplorer launch example 
- Press Ctrl+C to quit the server instance. 
See also
- topicexplorer init
- More details on the data import step.
- topicexplorer prep
- More details on the data preparation step.
- topicexplorer train
- More details on topic modeling.
- topicexplorer launch
- More details on the visualization interfaces.
- topicexplorer notebook
- More details on the notebook interface.
Licensing and Attribution¶
The project is released under an MIT License.
Visualizations generated with the Topic Explorer are licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
The project may be cited as:
Jaimie Murdock and Colin Allen. (2015) Visualization Techniques for Topic Model Checking in Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI-15). Austin, Texas, USA, January 25-29, 2015. https://hypershelf.org/
Collaboration and Maintenance¶
The InPhO Topic Explorer is maintained by Jaimie Murdock:
- E-mail: mailto:jammurdo@indiana.edu
- Twitter: @JaimieMurdock
- GitHub: @JaimieMurdock
- Homepage http://jamram.net
Please report issues on the issue tracker or contact Jaimie directly.