Skip to contents

Generates a corpus with Mj documents from k journals, each of which has a characteristic topic. Fits a varimax topic model of rank k, rotates the word-topic distribution to align with the true values, and reports Hellinger distance comparisons for each topic (word-topic) and document (topic-doc).

Usage

journal_specific(
  k = 5,
  Mj = 100,
  topic_peak = 0.8,
  topic_scale = 10,
  word_beta = 0.01,
  vocab = 10 * Mj * k,
  size = 3,
  mu = 300,
  bigjournal = FALSE,
  verbose = TRUE
)

Arguments

k

Number of topics/journals

Mj

Number of documents from each journal

word_beta

Parameter for the symmetric Dirichlet prior for true word-doc distributions

vocab

Size of the vocabulary

bigjournal

Should the first journal have documents 10x as long (on average) as the others?

verbose

When TRUE, sends messages about the progress of the simulation

topic_peak, topic_scale

Parameters for the asymmetric Dirichlet prior for true topic-doc distributions

size, mu

Parameters for the negative binomial distribution of document lengths

See also

Other generators: draw_corpus(), peak_alpha(), rdirichlet()