Overview of running R scripts using RStudio

 

'R' and Rstudio

 

'R' is a programming language used widely in data science for statistical computing and graphing data (https://www.r-project.org). At the Sydney Cytometry facility, we routinely run R scripts using R Studio (https://www.rstudio.com). Download R and R Studio using the links above.

To use an R script in RStudio, right click on the script (e.g. demo.R) and select open with R Studio. When RStudio opens, it will look like the image below (depending on your colour choices). 

Top left = R script. This is a text editor where lines or segments of code can be 'run', which will send commands to R.
Bottom left = console. When commands are sent to R, the console will show the progress/output/result.
Top right = workspace. Whenever you create an object in R (such as saving a set of data) it will show up here.
Bottom right = various. This is mainly used for displaying plots (under 'Plots'), investigating the packages ('Packages'), or using the help section ('Help').
Screen Shot 2018-06-29 at 9.25.00 AM.png

In RStudio you can click on a single line and press CMD + return (on Mac) or CTRL + enter (on Windows) to run the command on that line. Alternatively, you can highlight any section of text over one or more lines and press CMD + return (on Mac) or CTRL + enter (on Windows) to run the contents. Our scripts are designed so 'modules' of code can be run at a time.

Text on any line beginning with a '#' denotes a comment. This text will not be read by RStudio as a command, rather it is there to instruct the user.

 

DEMO using the iris dataset

 
Screen Shot 2018-06-29 at 10.24.17 AM.png

As a demo, we will load, plot, and save the 'iris' dataset. Open RStudio, and create a new R script (using the button on the top left). 

Copy the all of the following lines into the script (you can scroll left and right to see all of the text).

You can then save the script anywhere you like by selecting 'File' and 'Save As'.

 
 
## DEMO SCRIPT
    # We will use the 'iris' dataset, which consists of measurements of 150 flowers
    # Each row represents one flower, and each column represents a different measurement of that flower.


## Part 1: read the dataset
    
    # Use the 'iris' dataset (150 flowers one per row) with various measurement (each column is a different measurement)
    dat <- iris
    
    # Determine the number of rows and columns in the dataset
    dim(dat)
    
    # Examine the first few lines of dataset
    head(dat)

    
## Part 2: plot the dataset
    
    # Plot iris dataset (all plots)
    plot(dat)
    
    # Plot iris dataset (chosen X and Y parameters)
    plot(x = dat$Sepal.Length, y = dat$Sepal.Width, main = "Test Plot", type = "p")

    
## Part 3: save the dataset

    # Determine the current working directory
    getwd()                                     
      
    # Set working directory to new location
    setwd("Desktop/Demo_folder/")      
    
    # Determine the current working directory
    getwd()                                     
    
    # Write a .csv file of the dataset
    write.csv(x = dat, file = "iris_dataset.csv")
      
 
 

Part 1: read the dataset

 

The first command we will run is to load the 'iris' dataset and save it as the object 'dat'. The lines starting with '#' are only comments, and will not excute as commands (even if you select them and press CMD + return).

Select the following (in RStudio) and press return.

 
## Part 1: read the dataset
    
    # Use the 'iris' dataset (150 flowers one per row) with various measurement (each column is a different measurement)
    dat <- iris
 

After executing, you should should see a new object in the workspace (top right). This will be called 'dat', containing 150 observations, and 5 variables.

 
Screen Shot 2018-06-29 at 10.42.08 AM.png
 

Next we will review the dimensions of 'dat' (how many rows and columns) and preview data from the first 6 rows of dat. 

Select the following (in RStudio) and press return.

 
    # Determine the number of rows and columns in the dataset
    dim(dat)
    
    # Examine the first few lines of dataset
    head(dat)
 

You should now see the following in the console. Lines starting with '>' denote the commands that were executed. Lines without '>' are the output. As you can see below the request to show the dimensions of our dataset using dim(dat) has given us 150 rows and 5 columns. The request to preview the first 6 rows of our data using head(dat) has shown us the contents of the first 6 rows.

 

> # Determine the number of rows and columns in the dataset
>     dim(dat)
[1] 150   5
>     
>     # Examine the first few lines of dataset
>     head(dat)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

 

 

Part 2: plot the dataset

 

Next, we will plot some of the dataset. Select the following (in RStudio) and press return.

 
## Part 2: plot the dataset
    
    # Plot iris dataset (all plots)
    plot(dat)
 

After executing, your should see the following under 'Plots'.

 
Screen Shot 2018-06-29 at 10.36.28 AM.png
 

To be a little more specific, let's try plotting one column of the dataset against another.

 
    # Plot iris dataset (chosen X and Y parameters)
    plot(x = dat$Sepal.Length, y = dat$Sepal.Width, main = "Test Plot", type = "p")
 

After executing, your should see the following under 'Plots'.

 
 
 

Part 3: save the dataset

 

Now, let's save the dataset as a .csv file. A .csv file is kind of like an .xlsx file, without the bells and whistles. Data in a table format is saved, using commas to indicate the separation of new columns. When this is read by excel or RStudio, it displays a table.

Run the following lines to determine the current working directory (where you will read files from and write files to):

 
## Part 3: save the dataset

    # Determine the current working directory
    getwd()         
 

This will return the location of your current working directory. In my case:

[1] "/Users/thomasashhurst"

To change the working directory, we can take advantage of a trick in RStudio. In the line which contains setwd("") you can click in the middle of the quotation marks and hit tab -- this will show you a list of the folders that are under your current working directory.

 
Screen Shot 2018-06-29 at 10.52.59 AM.png
 

Select an option using your mouse, or up and down arrows, and press return or tab to select. You should now see something similar to the following (in my example I have selected Desktop).

 
Screen Shot 2018-06-29 at 10.52.40 AM.png
 

You can now click in between the / and " towards the end of the line, press tab again and see the list of possible directories. In my case, I have made a folder on my desktop called 'Demo_folder' -- I want to use this as my working directory.

 
Screen Shot 2018-06-29 at 10.56.07 AM.png
 

Press return or tab to select.

 
Screen Shot 2018-06-29 at 10.56.54 AM.png
 

Select the line (or highlight the text) and press CMD + return to save this as your working directory.

You can run the following line to check that the working directory has been changed correctly.

 
    # Determine the current working directory
    getwd()    
 
[1] "/Users/thomasashhurst/Desktop/Demo_folder"

Now we will write the dataset to a .csv file (which will be saved in the working directory). We will use the function 'write.csv'. The input variables here are what dataset we want to write (x = dat) and what we want to call the file (file = "iris_dataset.csv"). 

Execute the following, and check the folder (set as your working directory) to see that the new file has been created.

 
    # Write a .csv file of the dataset
    write.csv(x = dat, file = "iris_dataset.csv")