Skip to contents

Project new data into PCA score space

Usage

# S3 method for class 'varimaxes'
predict(object, newdata, ...)

Arguments

object

Fitted varimaxes or tmfast object

newdata

Document-term matrix (observations x terms) to project

...

Not used; included for S3 method compatibility.

Value

Matrix of PCA scores (n_obs x max_k)

Details

Projects newdata through the PCA rotation stored in object, returning raw PCA scores (not varimax scores). Intended for use in pipelines that combine new data with an existing fitted model (e.g., insert_topics()). Fragile: newdata must share the vocabulary of the training DTM, and the centering/scaling stored in object must match how the training data was prepared.

Memory warning: scale() coerces sparse matrices to dense. For large DTMs, this can be a substantial memory hazard. This mirrors the behavior of prcomp_irlba itself, which is why PCA scores are computed once at fit time and not re-projected on demand.

Examples

# \donttest{
set.seed(42)
theta   = rdirichlet(50, 1, k = 3)
phi     = rdirichlet(3, 0.1, k = 20)
corpus  = draw_corpus(rep(50L, 50), theta, phi)
model   = tmfast(corpus, n = 3)
theta2  = rdirichlet(5, 1, k = 3)
newdocs = draw_corpus(rep(200L, 5), theta2, phi) |>
    tidytext::cast_sparse(doc, word, n)
predict(model, newdocs)
#>        PC1        PC2         PC3
#> 1 73.45987 -10.315999  -2.8267179
#> 2 26.61557  22.533310 -12.8102579
#> 3 24.01242 -19.028145  -0.8185281
#> 4 30.52516  31.905962  -7.9864402
#> 5 60.10890   7.934742 -10.6114665
# }