Skip to contents

Generates a corpus with Mj documents from k journals, each of which has a characteristic topic. Fits a varimax topic model of rank k, rotates the word-topic distribution to align with the true values, and reports Hellinger distance comparisons for each topic (word-topic) and document (topic-doc).

Usage

journal_specific(
  k = 5,
  Mj = 100,
  topic_peak = 0.8,
  topic_scale = 10,
  word_beta = 0.01,
  vocab = 10 * Mj * k,
  size = 3,
  mu = 300,
  bigjournal = FALSE,
  verbose = TRUE
)

Arguments

k

Number of topics/journals

Mj

Number of documents from each journal

topic_peak

Peak value for the asymmetric Dirichlet prior for true topic-doc distributions

topic_scale

Scale for the asymmetric Dirichlet prior for true topic-doc distributions

word_beta

Parameter for the symmetric Dirichlet prior for true word-doc distributions

vocab

Size of the vocabulary

size

Size parameter for the negative binomial distribution of document lengths

mu

Mean parameter for the negative binomial distribution of document lengths

bigjournal

Should the first journal have documents 10x as long (on average) as the others?

verbose

When TRUE, sends messages about the progress of the simulation

Value

A one-row tibble::tibble() with columns:

phi

Mean Hellinger distance between true and fitted word-topic distributions

phi_vec

List-column of per-topic Hellinger distances

theta

Mean Hellinger distance between true and fitted document-topic distributions

theta_vec

List-column of per-document Hellinger distances

See also

Other generators: draw_corpus(), peak_alpha(), rdirichlet()

Examples

journal_specific(k = 2, Mj = 10, vocab = 50, verbose = FALSE)
#>  Rotating scores
#> # A tibble: 1 × 4
#>     phi phi_vec    theta theta_vec 
#>   <dbl> <list>     <dbl> <list>    
#> 1 0.612 <dbl [2]> 0.0713 <dbl [20]>