Text

Misc

  • Also see

  • Count most popular words

    data %>%
        unnest_tokens(word, text_var) %>%
        count(word, sort = TRUE)
  • Avg value of outcome variable that associated with words

    data %>%
        unnest_tokens(word, text_var) %>%
        group_by(word) %>%
        summarize(avg_outcome = mean(outcome),
                  n = n()) %>%
        arrange(desc(n)) %>%
        head(30) %>%
        mutate(word = fct_reorder(word, avg_outcome)) %>%
        ggplot(aes(avg_outcome, word, size = n)) +
        geom_point()
    • Pattern in the example shows that words in an airbnb listing probably have predictive power on price (outcome variable)