Text
Getting started
For this exercise, you’ll download some books from Project Gutenberg and visualize patterns in the words.
You should use an RStudio Project to keep your files well organized (either on your computer or on RStudio.cloud). Either create a new project for this exercise only, or make a project for all your work in this class.
To help you, I’ve created a skeleton R Markdown file with a template for this exercise, along with some helpful starter code. Download that here and include it in your project:
In the end, the structure of your project directory should look something like this:
your-project-name\
13-exercise.Rmd
your-project-name.Rproj
To check that you put everything in the right places, you can download and unzip this file, which contains everything in the correct structure:
The example from today’s session will be incredibly helpful for this exercise.
This can be as simple or as complex as you want. You don’t need to make your plots super fancy, but if you’re feeling brave, experiment with changing colors or modifying themes and theme elements.
You’ll need to insert your own code chunks where needed. Rather than typing them by hand (that’s tedious and you might miscount the number of backticks!), use the “Insert” button at the top of the editing window, or type ctrl + alt + i on Windows, or ⌘ + ⌥ + i on macOS.
Task 1: Reflection
Write your reflection for the day’s readings.
Task 2: Word frequencies
Use the gutenbergr package to download 4+ books by some author on Project Gutenberg. Jane Austen, Victor Hugo, Emily Brontë, Lucy Maud Montgomery, Arthur Conan Doyle, Mark Twain, Henry David Thoreau, Fyodor Dostoyevsky, Leo Tolstoy. Anyone. Just make sure it’s all from the same author. The example page shows how to do that.
Alternatively, you can try using text from a source other than Project Gutenberg. Check out, for instance, harrypotter (the full text of all 7 Harry Potter books), quRan (the full text of the Qur’an; here’s an example of some text analysis with it), or scriptuRs (the full text of the King James Version of the Bible; here’s an example of some text analysis with it).
Make these two plots and describe what each tell about your author’s books: (you’ll probably want to facet by book)
- Top 10 most frequent words in each book
- Top 10 most unique words in each book (i.e. tf-idf)
100% optional bonus fun tasks
If you want, do some other things with the text you’ve downloaded. Make a “he verbs vs. she verbs” plot. Tag the parts of speech and find the most common verbs or nouns. Try some sentiment analysis. Do something fun.
Turning everything in
When you’re all done, click on the “Knit” button at the top of the editing window and create an HTML or Word version (or PDF if you’ve installed tinytex) of your document. Upload that file to iCollege.