R
Exercise 1
Both .csv and .txt function can be used to load spreadsheets into R
.txt
.txt (for files that are delimited by tabs) can be loaded in following way:
data from the file will become a data.frame object
the first argument isn't always a filename, but could possibly also be a webpage that contains data.
The header argument specifies whether or not you have specified column names in your data file.
.csv
.csv (for files that are not delimited by tabs, instead by separators)
.csv differs from .txt by: * The separator symbol * The header argument is always set at TRUE, which indicates that the first line of the file being read contains the header with the variable names * The fill argument is also set as TRUE, which means that if rows have unequal length, blank fields will be added implicitly.
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
Exercise 2
The str() command tells that there are 19 observations and 2 variables. ie 19 rows and 2 variables.
"$Smokers..mmHg" lists the numerical vectors of the comlumn.
"$Non.smokers..mmHg" tells that the second last row in the column caused problem to create numerical vectors.
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
Explain a solution for missing data, is.na()
Exercise 3
As R reads the data, R will classify the variables into types:
Columns with only numbers are made into numeric or integer variables.
Columns with non-numeric characters are made into factors unless you specify that they should remain characters using the stringsAsFactors = FALSE option in the read command.
A factor is a categorical variable whose categories represent levels. These levels are named, like characters, but the levels additionally have a numerical interpretation. It is easy to convert character data to factors later, when you need them.
** In the smokers data, the strings equal and unequal in column 2 can possibly add extra numerical value in data analysis later, inducing inaccuracy. ** Two methods were suggested in Exercise 4 to overcome this.
First, to eliminate the last two rows completely by choosing only the first 14 rows
Second, to replace the strings with NA
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
Exercise 4
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
Exercise 5
Error in quantile.default(x, prob = 0.25): missing values and NaN's not allowed if 'na.rm' is FALSE
Traceback:
1. data.frame(Label, DescriptiveStatistics(smokers1), DescriptiveStatistics(nonsmokers1))
2. DescriptiveStatistics(smokers1)
3. quantile(x, prob = 0.25) # at line 4 of file <text>
4. quantile.default(x, prob = 0.25)
5. stop("missing values and NaN's not allowed if 'na.rm' is FALSE")
**INSTRUCTOR Feedback:**How would you solve this problem? See solutions for some ideas.
** Clean data**
** With missing values **
** Without missing values **
The histograms allowed better visualization of the shift of central tendency(mean, median)and the spread of blood pressure between (smokers and non-smokers) and also groups(with missing values and without missing values).
The histogram of smokers without missing values has mean+median shifted to the right, ie effect of the data from last two rows + NA.
The histogram of non-smokers without missing values has mean+median shifted to he left and that the shape went from bimodal to normal distribution. This is caused by the strings in last 2 rows of data + NA.
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
Error in eval(expr, envir, enclos): object 'smokers2' not found
Traceback:
1. densityplot(~smokers2, plot.points = FALSE, auto.key = TRUE)
2. densityplot.formula(~smokers2, plot.points = FALSE, auto.key = TRUE)
3. latticeParseFormula(formula, data, subset = subset, groups = groups,
. multiple = allow.multiple, outer = outer, subscripts = TRUE,
. drop = drop.unused.levels)
4. eval(varsRHS[[1]], data, env)
5. eval(expr, envir, enclos)
Error in density.default(d, na.rm = TRUE): argument 'x' must be numeric
Traceback:
1. density(d, na.rm = TRUE)
2. density.default(d, na.rm = TRUE)
3. stop("argument 'x' must be numeric")
With the log transformation (blue and black), the range of BP is scaled down compared to pre-log transsformation(yellow and green)
Overall clarity (score = 0.25)
Correctness of the code (score = 0)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 0.75
INSTRUCTOR Feedback: You need to resolvethe errors when they happen. I have not run the notebook again but you can check whether the correct way of implementing this is min(c(O$x,P$x,Q$x,R$x))
which will give you the min of all those values. Same for max.
Exercise 7
Overall clarity (score = 0)
Correctness of the code (score = 0)
Exhaustive cover of required analysis (score= 0)
Interpretation of the results (score = 0)
Total Score = 0
M=log2(X)−log2(Y) A=0.5(log2(x)+log2(y))
Error in parse(text = x, srcfile = src): <text>:4:13: unexpected input
3: A <- 0.5(log2(x)+log2(y))
4: M <- log2(x) <e2>
^
Traceback:
Error in eval(substitute(groups), data, environment(formula)): object 'Chem97' not found
Traceback:
1. densityplot(~gcsescore | factor(score), Chem97, groups = gender,
. plot.points = FALSE, auto.key = TRUE)
2. densityplot.formula(~gcsescore | factor(score), Chem97, groups = gender,
. plot.points = FALSE, auto.key = TRUE)
3. eval(substitute(groups), data, environment(formula))
INSTRUCTOR FEEDBACK see solutions for ideas.
Exercise 8
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
Exercise 9
Reorganization of data
Error in model.frame.default(formula = geneexp ~ Ct_full): object is not a matrix
Traceback:
1. boxplot(geneexp ~ Ct_full, col = "light grey", xlab = "Genes",
. ylab = "Gene expression", main = "Bioconductor Experiment")
2. boxplot.formula(geneexp ~ Ct_full, col = "light grey", xlab = "Genes",
. ylab = "Gene expression", main = "Bioconductor Experiment")
3. eval(m, parent.frame())
4. eval(expr, envir, enclos)
5. stats::model.frame(formula = geneexp ~ Ct_full)
6. model.frame.default(formula = geneexp ~ Ct_full)
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
** INSTRUCTOR FEEDBACK** make sure you always resolve the errors. Don't leave them. Score:0.75
Exercise 11
Overall clarity (score = 0.25)
Correctness of the code (score = 0.25)
Exhaustive cover of required analysis (score= 0.25)
Interpretation of the results (score = 0.25)
Total Score = 1
INSTRUCTOR SCORE: 0.5
Final score = 9.75/11
** INSTRUCTOR FEEDBACK**:You really tryed hard to work out the solution. Please make sure you keept it going when you have real data. It more problematic but not at all hard. Refer to the solutions for finishing the motebook. So far well done. Score:8/12