Wednesday, January 27, 2010

Merge two data.frame by keywords

Total<- merge(d3,docd3,by = "Gene")


refer to : http://www.statmethods.net/management/merging.html

learn R from youtube

http://www.youtube.com/watch?v=W2GZFeYGU3s&feature=related

order by data.frame

with(d3,d3[order(d3$Ecombo),])
with(d3,d3[order(-d3$Ecombo),])

statistics

> wilcox.test(d3$Ecombo,d3$Edoc,paired=TRUE, conf.level = 0.95)

Wilcoxon signed rank test with continuity correction

data: d3$Ecombo and d3$Edoc
V = 25425, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

> help("t.test")
> p-value
Error: object 'p' not found
> p-value
Error: object 'p' not found
> x = wilcox.test(d3$Ecombo,d3$Edoc,paired=TRUE, conf.level = 0.95)
> x

Wilcoxon signed rank test with continuity correction

data: d3$Ecombo and d3$Edoc
V = 25425, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

> x[1]
$statistic
V
25425

> x[2]
$parameter
NULL

> x[3]
$p.value
[1] 1.157085e-38

> x[0]
list()
> x$p.value
[1] 1.157085e-38
>

R wiki

http://wiki.r-project.org/rwiki/doku.php?id=tips:data-frames:sort

One example on how to upload and select data

## upload data
gefdata = read.table(file="gef-combo-commoneffects.txt",sep = ' ',header=TRUE)
docdata = read.table(file="doc-combo-commoneffects.txt",sep = ' ',header=TRUE)
d1= gefdata[(gefdata$Ecombo>gefdata$Egef&gefdata$Ecombo>2000)==TRUE,]
d2= d1[(d1$DoseGef=="20um")==TRUE,]
d3= d2[(d2$DoseDoc=="1.2um")==TRUE,]

> d3
Gene Egef Ecombo DoseGef DoseDoc
2074 STMN1 268.9287 2806.164 20um 1.2um
2094 AREG 2252.8792 2300.873 20um 1.2um
2175 ACTB 545.9648 3802.323 20um 1.2um
2229 ARHGEF6 2394.9913 2396.652 20um 1.2um
2263 UBB 1610.2985 5845.116 20um 1.2um
2265 ACTG1 261.1798 4023.568 20um 1.2um
2287 XRCC6 815.7438 2749.311 20um 1.2um
2289 KRT8 2966.2987 10506.174 20um 1.2um
2290 KRT18 2470.1614 10265.253 20um 1.2um
2291 NCOR2 681.8869 2255.428 20um 1.2um
2296 STATIP1 1418.3960 4557.809 20um 1.2um
2310 CALM2 481.1912 2865.798 20um 1.2um
2403 GAPDH 2970.3675 6717.153 20um 1.2um
2421 ATP1A1 725.7277 2798.023 20um 1.2um
2502 SLC9A3R1 1838.9419 6886.804 20um 1.2um
2521 HMGN2 117.6828 3845.859 20um 1.2um

A link to learn R

http://www.medstatstar.com/r/r2.htm

Monday, January 25, 2010

Conditioning Plots by R

coplot(gefdata$Ecombo~gefdata$Egef gefdata$DoseGef*gefdata$DoseDoc, panel = function(x, y, ...) panel.smooth(x, y, span = .8, ...))

Friday, January 22, 2010

Link Python with R

Took a lot of time, finally I got it.
But still some problems, for example, only wilcoxon rank sum test is there, others are missing. I'm not clear the reason. please see how to do that.

(1) download Python 2.6 and R 2.10.0
http://www.python.org/
http://www.r-project.org/
(2) download numpy and win32
http://starship.python.net/crew/mhammond/win32/Downloads.html
http://numpy.scipy.org/
(3) download Rpy
http://pypi.python.org/pypi/rpy2/
(4) install all of above
(5) set path to R.exe location, for example, C:\Program Files\R\R-2.10.1\bin
Put C:\Program Files\R\R-2.10.1\bin as Path in Environmental variable.
Like this: my computer -> property-> Advanced -> Environmental variable -> systems variable -> PATH -> Edit
(7) Open python GUI
(8) Run followings:
>>>import rpy2.robjects as robjects
>>> pi = robjects.r['pi']
>>> pi[0]
3.14159265358979

For more commands, please refer to
http://rpy.sourceforge.net/rpy2/doc-2.1/html/introduction.html#getting-started

Thursday, January 21, 2010

Learning R

Change Path:
>>getwd()
[1] "C:/Documents and Settings/tmhgxj/My Documents"
>>setwd("C:/Documents and Settings/tmhgxj/Desktop/SOFT WARE/Python dominates R")
Read file:
> data = read.table(file="test.dat",sep = '\t',header=TRUE)

> data gene num blastbomb T1
BCL2 197 21.666928 104450.60392
PRKCA 13 4.056293 14053.44213
TP53 48 9.358030 9479.08444

> data$blastbomb
[1] 21.666928 4.056293 9.358030 24.279278 184.807035 78.335393 36.393178
[8] 438.538598 26.723373 58.529915 29.266824 628.262061 0.000000 18.468964
[15] 1233.334068 94.502689 278.558548 26.920042 230.457498 296.849626 63.847165[22] 9.956780 1271.750782 590.197136 668.536534

Comments: R is samilar to Matlab, but it is more powful in statistics than later. R is favorite to deal with data with colomns and do statistical analysis on the data. If R can combined with Python, the data analysis will become much easier. A reference can refer to http://math.illinoisstate.edu/dhkim/Rstuff/Rtutor.html