# Introduction to R - Yale University

Introduction to R Jiang Du Jan 17th 2008 What is R? A software package for data analysis and graphical representation Scripting language Flexible and customizable Free

Weaknesses Not particularly efficient in handling large data sets Slow in executing big loops 2 Where to get R? http://www.r-project.org/ 3 Basic operations

> 1+2*3 [1] 7 > log(10) [1] 2.302585 > 4^2 [1] 16 > sqrt(16) [1] 4 > pi [1] 3.141593 4

Basic operations > x = pi * 2 >x [1] 6.283185 > floor(x) [1] 6 > ceiling(x) [1] 7 5 Data type: vector > x = c(1,2,3,5,4)

>x [1] 1 2 3 5 4 > y = 1:5 >y [1] 1 2 3 4 5 >x+2 [1] 3 4 5 7 6 > x+y [1] 2 4 6 9 9 > length(x) [1] 5 > sorted_x = sort(x)

> sorted_x [1] 1 2 3 4 5 6 Data type: vector >x [1] 1 2 3 5 4 > x[3] [1] 3 > x[1:2] [1] 1 2

> x[-3] [1] 1 2 5 4 > x[x > 3] [1] 5 4 >x>3 [1] FALSE FALSE FALSE TRUE TRUE > which(x > 3) [1] 4 5 7 Data type: matrix

> m = matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE) >m [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 > m[1, 2] [1] 2 > m[1:2, 2:3] [,1] [,2] [1,] 2 3 [2,] 5 6

8 Data type: matrix > m2 = matrix(c(2,0,0,0,2,0,0,0,2), nrow = 3, byrow = TRUE) > m2 [,1] [,2] [,3] [1,] 2 0 0 [2,] 0 2 0 [3,] 0 0 2 > m * m2 [,1] [,2] [,3] [1,] 2 0 0

[2,] 0 10 0 [3,] 0 0 18 > m %*% m2 [,1] [,2] [,3] [1,] 2 4 6 [2,] 8 10 12 [3,] 14 16 18 9 Date type: data frame > a = c(1:5) > b = a^2

> df = data.frame(a,b) > df a b 11 1 22 4 33 9 4 4 16 5 5 25 > df\$b [1] 1 4 9 16 25 > df[3, 2] [1] 9

10 Data type: data frame > dim(df) [1] 5 2 > subset(df, a > 2) a b 33 9 4 4 16 5 5 25 > subset(df, a > 2 & b < 10)

ab 339 11 Visualization of data > x = 1:10 > y = x^2 > plot(x, y) > z = c(rep(1, 3), rep(5:6, 10), 1:10) > hist(z) 12

Visualization of data > x = seq(-10, 10, length= 30) >y=x > f = function(x,y) { r z = outer(x, y, f) > persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = "lightblue") 13 Visualization of data

14 Loops, functions, etc. > x = c(1, 2, 3, 4, 5) >y=x > for (i in 1:length(x)) {y[i] = x[i]^2} >y [1] 1 4 9 16 25 > apply(as.array(x), 1, "^", 2) [1] 1 4 9 16 25 > x^2 [1] 1 4 9 16 25

15 Loops, functions, etc. > x = 1:5 > f3 = function(x) {return(x^3)} > apply(as.array(x), 1, f3) [1] 1 8 27 64 125 > source("~/test.r") [1] -1 -1 9 16 25

16 One of the most useful commands ? > ?apply 17 Practice: on Bordeaux wines

Problem Bordeaux wine vintage quality and the weather Bordeaux wines in different vintage years have different qualities (reflected in prices) The older the better? Weather is an important factor Hot, dry summer preferred 18 Practice: the data

WRAIN Winter (Oct.-March) Rain ML DEGREES Average Temperature (Deg Cent.) April-Sept. HRAIN Harvest (August and Sept.) ML TIME_SV Time since Vintage (Years) 19 Practice: load the data > wine_data = read.table("~/wine.data",

header = TRUE, na.strings = "."); 20 Practice: visualization > plot(wine_data\$TIME_SV, wine_data\$LPRICE2); 21 Practice: visualization

22 Practice: visualization 23 Practice: visualization avg_price = median(wine_data\$LPRICE2, na.rm = TRUE); plot(wine_data\$DEGREES, wine_data\$HRAIN, type = "n", xlab = "Temperature", ylab = "Harvest rain"); points(wine_data\$DEGREES[wine_data\$LPRICE2 >= avg_price],

wine_data\$HRAIN[wine_data\$LPRICE2 >= avg_price], pch = 19, col = "blue"); points(wine_data\$DEGREES[wine_data\$LPRICE2 < avg_price], wine_data\$HRAIN[wine_data\$LPRICE2 < avg_price], pch = 19, col = "red"); legend(15, 250, c(">= avg price", "< avg price"), pch = 19, col = c("blue", "red")); 24 Practice: linear regression Find a set of parameters a, , e, such that:

LPRICE2 ~ a * WRAIN + b * DEGREES + c * HRAIN + d * TIME_SV + e + error_term The overall error should be minimized In this case, the sum/average of squared errors Sum((prediction - actual_price)^2) 25 Practice: linear regression > lmfit = lm(LPRICE2 ~ WRAIN + DEGREES + HRAIN + TIME_SV,

wine_data); > lmfit Coefficients: (Intercept) -12.145334 WRAIN 0.001167 DEGREES 0.616392

HRAIN -0.003861 TIME_SV 0.023847 > cat("RMS: ", sqrt(sum(lmfit\$residuals^2)/length(lmfit\$r esiduals)), "\n");

RMS: 0.2586167 26 Practice: linear regression 27 Practice: linear regression plot(wine_data\$VINT, wine_data\$LPRICE2, xlab = "Vintage year", ylab = "log2 rel. price, pch = 19,

col = "black"); points(wine_data\$VINT[30:38], predict(lmfit, wine_data[30:38,]), pch = 19, col = "red"); legend(1965, -0.2, c("old data", "prediction"), pch = 19, col = c("black", "red")); 28 Practice: linear regression

29 Practice: linear regression Using fewer parameters in the model? LPRICE2 ~ b * DEGREES + c * HRAIN + d + error_term lmfit2 = lm(LPRICE2 ~ DEGREES + HRAIN, wine_data); RMS: 0.349513 30

Links Classesv2: http://classesv2.yale.edu/ Course wiki: http://lab.zoo.cs.yale.edu/cs445-wi ki/ R: http://www.r-project.org/ Bordeaux wine analysis: http://www.liquidasset.com/orley.h 31 tm

## Recently Viewed Presentations

• A. the foil is always a villain . B. the foil is always a main character . C. the foil is never the protagonist. 2. True or False: There can only be one foil in a work of literature. Dude,...
• THIS PRESENTATION IS MEANT FOR TECHNICAL AUDIENCES TO COVER DETAILED ARCHITECTURE FOR THE ORACLE BI APPS ... :2 etc depending on the number of keys of that component In the OBIEE Administration tool, create an Initialization Block and a Session...
• By Jenna Warren. Submitted to Mr. Young. ENG 3U. The turn of the screw by henry James . The author of The Turn of the Screw is Henry James. It was published in 1898. He was born on April 15,...
• Look & Feel Compliance Compliance for this project is (select one of the following): Not Applicable Net Yet Approved (pending review ) Approved (agreement reach between Trevor and the Product and Project Managers) Compliance Profile If you are Not Yet...
• Hasil-hasilnya adalah kejadian yang tidak terikat satu sama lain, Daftar hasilnya lengkap. Jadi jumlah probabilitas dari berbagai kejadian adalah 1. ... Sampling Techniques & Sample Size, Presentation Material of Biostatistic, High Institute of Public Health, University of Alexandria. Akhir ...
• Zahlen (2014): 1.200.000 stationär in AP - 7.600 psychisch Kranke in FP (ohne Suchtkranke) 90.000 Psychosen in AP - 3.300 Psychosen in FP 32.000 Persönlichkeitsstörung in AP - 2.300 Persönlichkeitsstörung in FP 450.000 Suchtkranke - 3.500 Suchtkranke in FP Medienwirksamkeit...
• I need to use the bathroom! OR Def: EXTREME EXAGGERATION Look, there's an allusion too! Def: SENSORY DETAIL to evoke feeling or emotion or to describe; the 5 senses Ex: "Her cheeks were rosy and so was my love -...
• Globally desired results Visionary benefits Inspirational values Ultimate desired results Issues to be addressed Change in a target population WHAT ARE OBJECTIVES? 1 Benchmarks Who benefits How benefits When benefits How much benefits, to what degree WHAT ARE OBJECTIVES? 2...