Cookbook

HathiTrust Collection to Topic Explorer

The following script downloads a collection of Vonnegut’s works and associated criticism. It can easily be adapted to other collections.

#!/bin/bash

# install the tools
pip install topicexplorer[htrc] htrc

# make a folder
mkdir -p /tmp/vonnegut
cd /tmp/vonnegut

# download the list of identifiers from IDAH's collection
htrc export "https://babel.hathitrust.org/cgi/mb?a=listis;c=1100976828" > vonnegut.txt

# train the models
topicexplorer init vonnegut.txt -q --htrc --name "Vonnegut's works (and criticism thereof)"
topicexplorer prep vonnegut.txt -q --high-percent 70 --low-percent 5 --lang en
topicexplorer train vonnegut.txt -q -k 25 50 100 --iter 200 -p 4
topicexplorer metadata vonnegut.txt --htrc

# launch the explorer
topicexplorer launch vonnegut.txt

Import and export

This example shows how to use export and import.

topicexplorer export workset.ini -o workset.tez
topicexplorer import workset.tez