Page
A Whole World of Data
Statistics and Data science training needs to include a range of realistic data to prepare students for the real world. R provides easy access to an incredibly rich collection of data sets.
This activity introduces ways you can access real data to use in R-Instat.
First watch the video:
Video script prepared by Rachel Kirk-Gushowaty, and Roger Stern. Video constructed and voiced by Beryl Waswa.
- 0:00 Introduction
- 01:17 Introduction to R-Instat
- 01:34 General Datasets
- 02:00 Diamonds data
- 02:55 Graphs
- 04:08 mydata data
- 05:28 efc data
- 06:24 happy data
- 06:57 Datasets for Specific points
- 07:22 Anscombe data
- 07:53 Datasaurusdozen dataset
- 08:09 Graphs
- 09:25 Simpsons paradox dataset
- 11:34 Wikipedia explanation of Simpsons paradox
- 12:18 UCBAdmissions dataset
- 12:47 Datasets from books
- 12:53 Introduction to Data science book
- 13:41 Movie ratings data
- 14:04 dslabs datasets
- 14:47 Datasets from book references
- 15:39 Agriculture data
- 16:35 Gomezsplitssplit dataset
- 16:51 Graphs
- 17:45 Data from lists
- 18:47 Data from outside R packages: The MICS data
- 20:19 Reflections
Then use this practice document to follow along with parts of all of the activity.
All of this data is easily accessible through the Import from Library dialog:
![](https://ecampus.r-instat.org/pluginfile.php/897/mod_page/content/16/2023-11-27_20-17-44%20%282%29.png)
Simply use the dropdown menu to select the package and then choose the dataset you want to explore. Click on the R Help button to learn more about the data or click OK to open it.
Package | Dataset | ||||
---|---|---|---|---|---|
Agricolae: | |||||
Agridat: | split.split. | ||||
Agritutorial: | |||||
datasauRus: | datasaurus_dozen, | simpsons_paradox. | |||
datasets: | anscombe, | UCBAdmissions. | |||
dslabs: | movielens, | historic_co2, | divorce_margarine, | murders, | trump_tweets. |
ggplot2: | diamonds. | ||||
openair: | mydata. | ||||
questionr: | happy. | ||||
sjlabelled: | efc. |
The data from lists, from the rcorpora package, is access through the New Data Frame dialog.
![](https://ecampus.r-instat.org/pluginfile.php/897/mod_page/content/16/2023-11-28_10-49-42.png)
From here you can use the drop down lists to browse the available categories and the lists within each on.