"Journal-specific" simulation scenario

Generates a corpus with Mj documents from k journals, each of which has a characteristic topic. Fits a varimax topic model of rank k, rotates the word-topic distribution to align with the true values, and reports Hellinger distance comparisons for each topic (word-topic) and document (topic-doc).

Usage

journal_specific(
  k = 5,
  Mj = 100,
  topic_peak = 0.8,
  topic_scale = 10,
  word_beta = 0.01,
  vocab = 10 * Mj * k,
  size = 3,
  mu = 300,
  bigjournal = FALSE,
  verbose = TRUE
)

Arguments

k: Number of topics/journals
Mj: Number of documents from each journal
word_beta: Parameter for the symmetric Dirichlet prior for true word-doc distributions
vocab: Size of the vocabulary
bigjournal: Should the first journal have documents 10x as long (on average) as the others?
verbose: When TRUE, sends messages about the progress of the simulation
topic_peak, topic_scale: Parameters for the asymmetric Dirichlet prior for true topic-doc distributions
size, mu: Parameters for the negative binomial distribution of document lengths

Usage

Arguments

See also