Posts dated prior to 2018 were migrated from an old Tumblr site during a website redesign in September-October 2018. I selected the posts to migrate based on a quick skim of their contents; they generally reflect my current views, but I did not necessarily read them carefully. In a few cases I noticed and updated some language that I now recognize as problematic (mostly some trans-insensitive phrasing). But there are probably cases that I overlooked in my quick review. Also, the posts contain Tumblr timestamps and tags, and sometimes some uncorrected formatting errors.
Tumblr HTML files were converted to Markdown using Pandoc, then the following R script was used to adjust the Markdown formatting for Jekyll.
library(tidyverse)
= tibble(file = list.files(path = '.', '*.md'))
dataf
%>%
dataf mutate(text = map_chr(file, read_file)) %>%
separate(col = text, into = c('title', 'body'),
sep = '[=]{2,}') %>%
mutate(title = str_replace(title, '\n', '')) %>%
## Parsing
mutate(title_san = str_replace_all(tolower(title), '[^[:word:]]+', '-'),
body_nobreaks = str_replace_all(body, '\n', ' '),
dt_str = str_extract(body_nobreaks,
regex('::: \\{#footer\\}.*')),
dt_str = str_extract(dt_str, '\\[[^\\]]*\\]'),
dt_parsed = parse_datetime(dt_str,
format = '[ %B %d%.%., %Y %I:%M%p ]'),
date = lubridate::date(dt_parsed)) %>%
## Output values
mutate(text_out = str_c('---\n',
'layout: post\n',
'title: "', title, '"\n',
'---\n',
body),filename = str_c(date, '-', title_san, '.md')) %>%
rowwise() %>%
mutate(out = write_file(text_out, path = filename))