Warning: Parameter 1 to Language::getMagic() expected to be a reference, value given in /opt/local/apache2/htdocs/wiki/includes/StubObject.php on line 58
R Quick Guide - OSR

R Quick Guide

From OSR

Jump to: navigation, search

This is a quick reference guide for R.

Contents

Vectors Variables and Arithmetic

x=c(n1,n2,n3) create a vector containing the list n1 n2 n3 and assign it to x
x=c("word1","word2") create a vector of categorical values assigned to x
y=x assign the contents of x to y (overwriting whatever was in y before)
packagename$variablename refers to a specific column (vector) within the package (eg. cars$mileage), which is annoying to type, so:
attach(packagename) use the variable without the dollar sign notation (eg. attach(cars)), example:
variablename now you can just type mileage
detach(packagename) then clean up the variables when you're done (eg. detach(cars))
data.entry(x=c(NA)) enter elements into x using a spreadsheet
x[2] what is the 2nd element?
x[-2] all but the 2nd element?
x[1:5] first 5 elements?
x[(length(x)-5):length(x)] last 5?
x[c(1,3,5)] slice of 1st, 3rd, 5th?
x[x>3] list of elements greater than 3
x[x<-2|x>2] list of elements less than -2 and everything bigger than 2
x[3]=5 add (or change) data value in 3rd position to 5
x=c(x,...) adds additional data elements to end of x
x[17:20]=c() adds (or changes) data values in the 17th to 20th positions
x==5 returns T/F for each position on whether or not it equals 5
which(x==5) returns place numbers that have a value of 5
x+p new vector...each element of x added to each element of p
x-p new vector...each element of p subtracted from each element of x
x*p new vector...each element of x multiplied by each element of p
x^2 new vector...each element of x squared
sqrt(x) new vector...each element of x to the square root
mean(x)
mean(x[5:9]) mean of positions 5 to 9
median(x) median value as if data were sorted
var(x) sample variance
sd(x) sample standard deviation
sum(x) sum of the values of each element in x
sum(x>5) how many total elements have a value larger than 5?
sum(x[x>5]) sum of those elements with a value larger than 5
diff(x) diff btw value of adjacent elements (i.e. ele 2-1, ..., n-(n-1))

Data Manipulation

length(x) returns the number of values in x
sort(x) sorts values in increasing order, rev(sort(x)) will sort descending
rev(x) reverses the elements of x
max(x) gives the maximum value among the elements
which.max(x) gives the index of the maximum value
cummax(x) lists cumulative maximum in vector...useful in finance
min(x) gives the minimum value among the elements
cummin(x) lists cumulative minimum in vector
table(x) lists values of x in 1st row and frequencies in 2nd row
factor(x) lists x on 1st row and its factors/levels on 2nd
cut(x, breaks) can this construct frequency distribution classes???
names(x)=c("name1","name2") creates a name vector for the elements in x
subset( df, foo==bar ) select all elements from a data frame were column foo is equal to bar (or any other expression).

Input and Output

install.packages(packagename) installs a package
library(packagename) loads a package
read.csv(file="filename") loads data from a csv file
x=scan() create a vector by typing one value per line...or separating values with a space
x=scan(file="filename") read in a vector from a file, where each value is on a separate line
x=scal(file="filename", what="character") for a file that contains a vector of strings

Plotting

barplot( table(x), xlab="values", ylab="freqency" ) 1 bar for each value w/frequeny vertically
barplot( table(x)/length(x), xlab="values", ylab="proportion" ) 1 bar for each value w/proportion vertically
DOTplot(x) gives a beautiful dotplot, but you have to install and load the UsingR package
stripchart(x, method="stack", pch=1, offset=1, cex=2) another decent dotplot (of stacked circles) where pch is the shape and cex is the size
stem(x) a stem and leaf chart
hist(x) a decent histogram
truehist(x, h=5) better histogram, with bin widths as 5
pie(x) pie chart
boxplot(x); box and whiskers of x using lower/upper hinges
pdf( "/tmp/plotname.pdf" ); plot( .... ); dev.off(); plot.new() Generate a PDF of the plot and recreate the interactive plot window

Regression Analysis

Basic Commands

The best tutorial I've found so far is here. I'll add commands as I get comfortable with them. (Maybe not.. this one also assumes a deep prior knowledge of regression analysis.)

Plot my bivariate data:

plot( y, x ) plot(dependent, independent), although I'm confused and keep changing this. Ex: (population, year)

Give me the intercept and slope for a line modeling my data:

linear.model = lm( y ~ x ) There can be many dependent, or predictor (x) variables for multivariate data. Read: response "is modeled by" predictor. Example: lm( population ~ year ) should be exponential.
linear.model will show only the intercept and slope.
summary(linear.model) will show all the available info.

Add the line for the linear model to my plot:

abline( linear.model ) either this
abline( lm( y ~ x ) ) or this, basically: abline( a, b ) is y = bx + a

Transform the data to fit a linear model, assuming the relationship is y = x^2, and plot it:

plot( y ~ I(x^2) ) (that's a capital I (eye) for isolate, so that we can do math operations here)
abline( lm(y ~ I(x^2)) ) or, if:
new.model = lm( y ~ I(x^2) )
abline( new.model )

Other commands (that I haven't tried yet) that extract information from a linear model:

coef( lm() ) returns the coefficients
residuals( lm() ) returns the residuals (vertical distance of point from best-fit line)
predict( lm() ) performs predictions
anova( lm() ) finds various sums of squares
AIC( lm() ) is used for model selection
fitted( lm() ) returns fitted values for y
deviance( lm() ) returns RSS

Examples of Use from the Homework

NOTE: This one below isn't giving the correct answer yet.

Q: Find an exponential model y=ax^b for (a data set).

A: Enter your two vectors, x and y, and then:

 df=data.frame(x,y)
 fit=nls( y ~ a*b^x, data=df )
 summary(fit) Will give you the "Estimate" values for a and b, or:
 coef(fit)  will also give them.

To predict at the x value 4 or the values 4, 5 and 6:

 predict( fit, list( x=4 ) )
 predict( fit, list( x=c(4,5,6) ) )