R Quick Guide
From OSR
This is a quick reference guide for R.
Contents |
Vectors Variables and Arithmetic
- x=c(n1,n2,n3) create a vector containing the list n1 n2 n3 and assign it to x
- x=c("word1","word2") create a vector of categorical values assigned to x
- y=x assign the contents of x to y (overwriting whatever was in y before)
- packagename$variablename refers to a specific column (vector) within the package (eg. cars$mileage), which is annoying to type, so:
- attach(packagename) use the variable without the dollar sign notation (eg. attach(cars)), example:
- variablename now you can just type mileage
- detach(packagename) then clean up the variables when you're done (eg. detach(cars))
- data.entry(x=c(NA)) enter elements into x using a spreadsheet
- x[2] what is the 2nd element?
- x[-2] all but the 2nd element?
- x[1:5] first 5 elements?
- x[(length(x)-5):length(x)] last 5?
- x[c(1,3,5)] slice of 1st, 3rd, 5th?
- x[x>3] list of elements greater than 3
- x[x<-2|x>2] list of elements less than -2 and everything bigger than 2
- x[3]=5 add (or change) data value in 3rd position to 5
- x=c(x,...) adds additional data elements to end of x
- x[17:20]=c() adds (or changes) data values in the 17th to 20th positions
- x==5 returns T/F for each position on whether or not it equals 5
- which(x==5) returns place numbers that have a value of 5
- x+p new vector...each element of x added to each element of p
- x-p new vector...each element of p subtracted from each element of x
- x*p new vector...each element of x multiplied by each element of p
- x^2 new vector...each element of x squared
- sqrt(x) new vector...each element of x to the square root
- mean(x)
- mean(x[5:9]) mean of positions 5 to 9
- median(x) median value as if data were sorted
- var(x) sample variance
- sd(x) sample standard deviation
- sum(x) sum of the values of each element in x
- sum(x>5) how many total elements have a value larger than 5?
- sum(x[x>5]) sum of those elements with a value larger than 5
- diff(x) diff btw value of adjacent elements (i.e. ele 2-1, ..., n-(n-1))
Data Manipulation
- length(x) returns the number of values in x
- sort(x) sorts values in increasing order, rev(sort(x)) will sort descending
- rev(x) reverses the elements of x
- max(x) gives the maximum value among the elements
- which.max(x) gives the index of the maximum value
- cummax(x) lists cumulative maximum in vector...useful in finance
- min(x) gives the minimum value among the elements
- cummin(x) lists cumulative minimum in vector
- table(x) lists values of x in 1st row and frequencies in 2nd row
- factor(x) lists x on 1st row and its factors/levels on 2nd
- cut(x, breaks) can this construct frequency distribution classes???
- names(x)=c("name1","name2") creates a name vector for the elements in x
- subset( df, foo==bar ) select all elements from a data frame were column foo is equal to bar (or any other expression).
Input and Output
- install.packages(packagename) installs a package
- library(packagename) loads a package
- read.csv(file="filename") loads data from a csv file
- x=scan() create a vector by typing one value per line...or separating values with a space
- x=scan(file="filename") read in a vector from a file, where each value is on a separate line
- x=scal(file="filename", what="character") for a file that contains a vector of strings
Plotting
- barplot( table(x), xlab="values", ylab="freqency" ) 1 bar for each value w/frequeny vertically
- barplot( table(x)/length(x), xlab="values", ylab="proportion" ) 1 bar for each value w/proportion vertically
- DOTplot(x) gives a beautiful dotplot, but you have to install and load the UsingR package
- stripchart(x, method="stack", pch=1, offset=1, cex=2) another decent dotplot (of stacked circles) where pch is the shape and cex is the size
- stem(x) a stem and leaf chart
- hist(x) a decent histogram
- truehist(x, h=5) better histogram, with bin widths as 5
- pie(x) pie chart
- boxplot(x); box and whiskers of x using lower/upper hinges
- pdf( "/tmp/plotname.pdf" ); plot( .... ); dev.off(); plot.new() Generate a PDF of the plot and recreate the interactive plot window
Regression Analysis
Basic Commands
The best tutorial I've found so far is here. I'll add commands as I get comfortable with them. (Maybe not.. this one also assumes a deep prior knowledge of regression analysis.)
Plot my bivariate data:
- plot( y, x ) plot(dependent, independent), although I'm confused and keep changing this. Ex: (population, year)
Give me the intercept and slope for a line modeling my data:
- linear.model = lm( y ~ x ) There can be many dependent, or predictor (x) variables for multivariate data. Read: response "is modeled by" predictor. Example: lm( population ~ year ) should be exponential.
- linear.model will show only the intercept and slope.
- summary(linear.model) will show all the available info.
Add the line for the linear model to my plot:
- abline( linear.model ) either this
- abline( lm( y ~ x ) ) or this, basically: abline( a, b ) is y = bx + a
Transform the data to fit a linear model, assuming the relationship is y = x^2, and plot it:
- plot( y ~ I(x^2) ) (that's a capital I (eye) for isolate, so that we can do math operations here)
- abline( lm(y ~ I(x^2)) ) or, if:
- new.model = lm( y ~ I(x^2) )
- abline( new.model )
Other commands (that I haven't tried yet) that extract information from a linear model:
- coef( lm() ) returns the coefficients
- residuals( lm() ) returns the residuals (vertical distance of point from best-fit line)
- predict( lm() ) performs predictions
- anova( lm() ) finds various sums of squares
- AIC( lm() ) is used for model selection
- fitted( lm() ) returns fitted values for y
- deviance( lm() ) returns RSS
Examples of Use from the Homework
NOTE: This one below isn't giving the correct answer yet.
Q: Find an exponential model y=ax^b for (a data set).
A: Enter your two vectors, x and y, and then:
df=data.frame(x,y) fit=nls( y ~ a*b^x, data=df ) summary(fit) Will give you the "Estimate" values for a and b, or: coef(fit) will also give them.
To predict at the x value 4 or the values 4, 5 and 6:
predict( fit, list( x=4 ) ) predict( fit, list( x=c(4,5,6) ) )