After completing these lessons, you should be able to:
- Use RStudio to write basic R commands.
- Know the distinctions between different R operators and data types,
including numeric, string, and logical data.
- Use tidyverse functions to wrangle and manipulate
data in R.
- Use the ggplot2 library to create plots in R.
- Reproducibly import, export, wrangle, and visualize data.
It can be very helpful to see these skills in practice, so here are a
few excellent examples of using the Tidyverse tools to wrangle,
visualize, and gain insights about data:
- Survivors
of the Titantic: A great overview of getting started with the
tidyverse.
- Is
Usain Bolt really the fastest man on earth?, by Anindya Mozumdar. This analysis
uses some statistical modeling (which won’t cover), but the use of
visualizations alone are effective for addressing the research questions
posed.
- F1
racing data analysis, by Jonathan Bouchet. This is an extensive
exploratory data analysis of a variety of data on races, drivers, and
constructors across the history of Formula 1 racing. This would
probably be overkill for a course project, but sections of it
(e.g. section 3 on the races, or sections 4 and 5 together on the
drivers and constructors) would make a nice project. It involves
mutliple data sets, some not-so-simple data cleaning, and a wide variety
of nice visualizations.
- Increasing
Intensity of Strong Atlantic Hurricanes, by James B. Elsner. This is
a short analysis with the goal of validating the predictions of this study
published in Nature in 2008, which predicted that Atlantic tropical
cyclones would continue to get stronger on average over time. While this
analysis is too short to be considered adequate for a final project, it
is still a good example of good data source documentation and one
effective visualization. An expanded inquiry into the questions
addressed here would make a good course project.
- Death:
Reality vs Reported, by Owen Shen: This is an interesting original
analysis in which the author collected the data himself. The analysis
compares whether the proportion of actual causes of death is consistent
with the proportion of death causes that the media reports. This
analysis also uses some interactive graphics so that the reader can view
different versions of the same plot, such as scrolling through the plot
over time.
- Are
first babies more likely to be late?, by Allen Downey. This is
another relatively short analysis, but the author does a good job of
documenting his data sources, stating the assumptions he makes, and also
describes what data were dropped from the analysis (i.e. babies born via
C-section).