Generates a corpus with Mj
documents from k
journals, each of which has a characteristic topic. Fits a varimax topic model of rank k
, rotates the word-topic distribution to align with the true values, and reports Hellinger distance comparisons for each topic (word-topic) and document (topic-doc).
Usage
journal_specific(
k = 5,
Mj = 100,
topic_peak = 0.8,
topic_scale = 10,
word_beta = 0.01,
vocab = 10 * Mj * k,
size = 3,
mu = 300,
bigjournal = FALSE,
verbose = TRUE
)
Arguments
- k
Number of topics/journals
- Mj
Number of documents from each journal
- word_beta
Parameter for the symmetric Dirichlet prior for true word-doc distributions
- vocab
Size of the vocabulary
- bigjournal
Should the first journal have documents 10x as long (on average) as the others?
- verbose
When TRUE, sends messages about the progress of the simulation
- topic_peak, topic_scale
Parameters for the asymmetric Dirichlet prior for true topic-doc distributions
- size, mu
Parameters for the negative binomial distribution of document lengths
See also
Other generators:
draw_corpus()
,
peak_alpha()
,
rdirichlet()