Generates a corpus with Mj documents from k journals, each of which has a characteristic topic. Fits a varimax topic model of rank k, rotates the word-topic distribution to align with the true values, and reports Hellinger distance comparisons for each topic (word-topic) and document (topic-doc).
Usage
journal_specific(
k = 5,
Mj = 100,
topic_peak = 0.8,
topic_scale = 10,
word_beta = 0.01,
vocab = 10 * Mj * k,
size = 3,
mu = 300,
bigjournal = FALSE,
verbose = TRUE
)Arguments
- k
Number of topics/journals
- Mj
Number of documents from each journal
- topic_peak
Peak value for the asymmetric Dirichlet prior for true topic-doc distributions
- topic_scale
Scale for the asymmetric Dirichlet prior for true topic-doc distributions
- word_beta
Parameter for the symmetric Dirichlet prior for true word-doc distributions
- vocab
Size of the vocabulary
- size
Size parameter for the negative binomial distribution of document lengths
- mu
Mean parameter for the negative binomial distribution of document lengths
- bigjournal
Should the first journal have documents 10x as long (on average) as the others?
- verbose
When TRUE, sends messages about the progress of the simulation
Value
A one-row tibble::tibble() with columns:
- phi
Mean Hellinger distance between true and fitted word-topic distributions
- phi_vec
List-column of per-topic Hellinger distances
- theta
Mean Hellinger distance between true and fitted document-topic distributions
- theta_vec
List-column of per-document Hellinger distances
See also
Other generators:
draw_corpus(),
peak_alpha(),
rdirichlet()