Skip to contents

Generators

Generate a simulated text corpus

peak_alpha()
Alpha parameter with a single peak
expected_entropy()
Expected entropy for samples from a Dirichlet distribution
rdirichlet()
Sample from the Dirichlet distribution
draw_corpus()
Draw a collection of documents
journal_specific()
"Journal-specific" simulation scenario

Information gain

Tools for information-theoretic vocabulary selection

ndH()
Information gain (uniform distribution)
ndR()
Information gain (length-proportional distribution)

tmfast

Fitting topic models with PCA+varimax

tmfast()
Fit a topic model using PCA+varimax
insert_topics()
Insert a topic model into a fitted tmfast
varimax_irlba()
Fit a varimax-rotated PCA using irlba
fit_varimax()
Given a (rank n) PCA fit, return a rank k < n varimax fit
predict(<varimaxes>)
Project new data into PCA score space

Tidiers

Extract beta and gamma matrices from tmfast objects

tidy(<tmfast>)
Extract beta and gamma matrices from tmfast objects
tidy_all()
Extract gamma or beta matrices for all topics

Renormalization

Renormalize a distribution to match a desired expected entropy

solve_power()
Solve the equation to find the desired exponent
target_power()
Find target power for renormalization
renorm()
Renormalize tidied distributions

Hellinger distances

Calculate Hellinger distances between distributions

hellinger()
Hellinger distances
compare_betas()
Compare topic-word distributions using Hellinger distance

Discursive space visualizations

tsne()
Discursive space using t-SNE
umap()
Discursive space using UMAP

Utilities

Utility functions

build_matrix()
Convert a long dataframe to a wide (sparse) matrix
entropy()
Entropy of a distribution
loadings()
Extract a PCA/varimax loadings matrix
scores()
Extract item scores from a fitted PCA/varimax model
rotation()
Extract varimax rotation

Package

tmfast-package
Fitting "topic models" with PCA+varimax