Data Analysis, R


Data Analysis, R

To get your Assignment/Homework solutions;

Simply Click ORDER NOW and your paper details. Our support team will review the assignment(s) and assign the right expert whose specialization is same to yours to complete it within your deadline. Our Editor(s) will then review the completed paper (to ensure that it is answered accordingly) before we email you a complete paper

Instructions:

The project aims to analyze a real data set. You are free to perform any analyses that you consider to be relevant and informative. Use R to perform calculations and computations. Do not submit pages of raw computer output, but you need include important tables or attach graphs to your report. Write concise answers that clearly describe the steps in your analysis and your conclusions. You will be graded with respect to how well your analyses reveal interesting aspects of the data set, the interpretation of your results, how well you justify your methods of analysis, and the overall quality of your writing.

 

Your report should contain the following:

  1. Provide a one paragraph summary of your major findings. This should not contain any formulas or mathematical symbols. It should be well written so that it could be easily understood by anybody else who is not a statistician.
  2. Provide a description of the steps taken to identify your best model (or models). Do not give all the details of your search. Do not submit any computer output in this section. Simply outline the issues you considered, your decisions, and the sequence of steps you took to develop a model. (Not to exceed one typed page.)
  3. Provide a description of your best model (or models), including estimates of parameters and their standard errors, and the statistical inference you performed. You may copy small parts of computer output into this part of the report. Discuss and interpret any important features of your model and state your conclusions in the context of the problem. (Not to exceed two typed pages).
  4. Provide convincing evidence that your analysis is based on a good model. Discussion of residual plots and other diagnostic checks would be appropriate. You may attach graphs but lists of raw computer output should not be submitted and will be ignored. (Not to exceed one typed page, excluding graphs.)
  5. You may submit one more paragraph outlining additional analyses that you would have done if you had more time. You will earn points for good suggestions and lose points for suggestions with little potential value.

 

Data set description:

This data set comes from a study conducted by Baty et al. (2006). The original purpose of this study was to measure the influence of beverages on blood gene expression. They would like to explore the underlying mechanisms of the cardio protective effects of beverages (You are not restricted to their study purpose). Six healthy individuals participated in the randomized controlled cross-over experiment. On 4 independent days they had 4 different beverages (500mL each: grape juice, red wine, 40g diluted ethanol, water). Blood samples were taken at baseline (0 hour, without drinking beverages), 1, 2, 4, 12 hours after the drink together with standardized nutrition. RNA of 120 samples was hybridized on Affymetrix microarrays. The gene expression data were obtained for 108 blood samples. The data set is Beverage Study Data Set. The data set is contained in “Alldata.Rdata” file, which can be loaded into your R by using the command (after setting the working directory to the place where you saved the data set) load(file=”Alldata.Rdata”)

Within the data set, “Alldata” is a list, which includes the following objects:

“originaldata”   “trt1”           “trt2”           “trt3”          “trt4”

“time_h0”         “time_h1”    “time_h2”   “time_h4”   “time_h12”

“ind1”        “ind2”     “ind3”         “ind4”       “ind5”    “ind6”

The objects included in the data set are

 

originaldata: All the gene expression data (transformed counts data);

trt1: IDs for individuals participated in Alcohol group;

trt2: IDs for individuals participated in Grape juice group;

trt3: IDs for individuals participated in Red wine group;

trt4: IDs for individuals participated in Water group;

time h0: Observations measured at baseline;

time h1: Observations measured at 1 hour after the drink;

time h2: Observations measured at 2 hours after the drink;

time h4: Observations measured at 4 hours after the drink;

time h12: Observations measured at 12 hours after the drink;

ind1: data obtained from individual 1;

ind2: data obtained from individual 2;

ind3: data obtained from individual 3;

ind4: data obtained from individual 4;

ind5: data obtained from individual 5;

ind6: data obtained from individual 6;

 

You can access to each object by using the operator $. For example, if you want to

get the data contained in trt1, you could type in the following

Alldata$trt1

The original data is a matrix containing 22283 rows and 130 columns. Each row corresponds to one gene, and the 3rd column to the 110th column correspond to gene expression data. The rest columns are gene IDs or the gene annotation information. The following is very small part of the data

GSM87863 GSM87887 GSM87896 GSM87934 GSM87943 GSM87853

1   6.96959      6.84646        6.99376      7.0678         7.07566       7.18618

2   4.94771      4.63228        4.47609      4.41107        4.6249        4.61241

3   7.38956      7.21881        7.62192      7.76446        7.4327        7.3896

4   7.73394      7.73069        8.12781      7.92782        7.96697      7.96694

5   3.10916      3.3146         3.4218         3.46084        3.29934      3.35648

6   6.93594      7.23465        6.63625      6.72077        7.15            7.07379

The first column in the above data example corresponds to the observation ID GSM87863. This ID is contained in the variable Alldata$trt1, Alldata$time_h0 and Alldata$ind1. This means that this column data (gene expressions) are obtained from the individual 1 at time 0h who participated in the treatment 1 (Alcohol group). Each row corresponds the gene expression for each gene. In the above data set, the rows 1-6 provide gene expressions for the first 6 genes.

 

References:

Baty F, Facompre M, Wiegand J, Schwager J et al. Analysis with respect to instrumental variables for the exploration of microarray data structures. BMC Bioinformatics, 2006 Sep 29;7:422.

To get your Assignment/Homework solutions;

Simply Click ORDER NOW and your paper details. Our support team will review the assignment(s) and assign the right expert whose specialization is same to yours to complete it within your deadline. Our Editor(s) will then review the completed paper (to ensure that it is answered accordingly) before we email you a complete paper

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: