Text

Misc

Also see

Count most popular words

data %>%
    unnest_tokens(word, text_var) %>%
    count(word, sort = TRUE)

Avg value of outcome variable that associated with words

data %>%
    unnest_tokens(word, text_var) %>%
    group_by(word) %>%
    summarize(avg_outcome = mean(outcome),
              n = n()) %>%
    arrange(desc(n)) %>%
    head(30) %>%
    mutate(word = fct_reorder(word, avg_outcome)) %>%
    ggplot(aes(avg_outcome, word, size = n)) +
    geom_point()

Pattern in the example shows that words in an airbnb listing probably have predictive power on price (outcome variable)