Text
Misc
Also see
Count most popular words
%>% data unnest_tokens(word, text_var) %>% count(word, sort = TRUE)
Avg value of outcome variable that associated with words
%>% data unnest_tokens(word, text_var) %>% group_by(word) %>% summarize(avg_outcome = mean(outcome), n = n()) %>% arrange(desc(n)) %>% head(30) %>% mutate(word = fct_reorder(word, avg_outcome)) %>% ggplot(aes(avg_outcome, word, size = n)) + geom_point()
- Pattern in the example shows that words in an airbnb listing probably have predictive power on price (outcome variable)