Text
Misc
Also see
Count most popular words
data %>% unnest_tokens(word, text_var) %>% count(word, sort = TRUE)Avg value of outcome variable that associated with words
.png)
data %>% unnest_tokens(word, text_var) %>% group_by(word) %>% summarize(avg_outcome = mean(outcome), n = n()) %>% arrange(desc(n)) %>% head(30) %>% mutate(word = fct_reorder(word, avg_outcome)) %>% ggplot(aes(avg_outcome, word, size = n)) + geom_point()- Pattern in the example shows that words in an airbnb listing probably have predictive power on price (outcome variable)