Module #3 Assignment

 Module 3 Assignment:



Code from R Workspace



#Declaring vector data sets entitled set_1 and set_2


set_1 <- c(10,2,3,2,4,2,5)

set_2 <- c(20,12,13,12,14,12,15)


#Central Tendency Computations using set_1 (mean, median, standard deviation and quantile)


mean(set_1, trim = 0, na.rm = FALSE)


median(set_1, na.rm = FALSE)


sd(set_1, na.rm = FALSE)


quantile(set_1, trim = 0, na.rm = FALSE)


#Coefficient of Variation represented as a percentage for set_1


sd(set_1, na.rm = TRUE)/mean(set_1, na.rm = TRUE)*100


#Summary gives a generic breakdown of where certain values fall with respect to eachother on their respevtive interval

summary (set_1)


#Central Tendecy Computations using set_2 (mean, median, standard deviation and quantile)


mean(set_2, trim = 0, na.rm = FALSE)


median(set_2, na.rm = FALSE)


sd(set_2, na.rm = FALSE)


quantile(set_2, trim = 0, na.rm = FALSE)


#Coeffiecient of Variation represented as a percentage for set_2


sd(set_2, na.rm = TRUE)/mean(set_2, na.rm = TRUE)*100




R-Console Results

>set_1 <- c(10,2,3,2,4,2,5)

> set_2 <- c(20,12,13,12,14,12,15)

> mean(set_1, trim = 0, na.rm = FALSE)

[1] 4

> median(set_1, na.rm = FALSE)

[1] 3

> sd(set_1, na.rm = FALSE)

[1] 2.886751

> quantile(set_1, trim = 0, na.rm = FALSE)

  0%  25%  50%  75% 100% 

 2.0  2.0  3.0  4.5 10.0 

> sd(set_1, na.rm = TRUE)/mean(set_1, na.rm = TRUE)*100

[1] 72.16878

> summary (set_1)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 

    2.0     2.0     3.0     4.0     4.5    10.0 

> mean(set_2, trim = 0, na.rm = FALSE)

[1] 14

> median(set_2, na.rm = FALSE)

[1] 13

> sd(set_2, na.rm = FALSE)

[1] 2.886751

> quantile(set_2, trim = 0, na.rm = FALSE)

  0%  25%  50%  75% 100% 

12.0 12.0 13.0 14.5 20.0 

> sd(set_2, na.rm = TRUE)/mean(set_2, na.rm = TRUE)*100

[1] 20.61965


Discussion:

The data used in both sets are of follow a similar pattern of having all numbers within +3 or - 3 of the median of the data set along with one statistical outlier per set (10 in set_1 and 20 in set_2).  The data sets follow this same format but lie within different numerical intervals. set_1 has a data set with numbers of a lesser value, data entry compared to each respective data entry on set_2. By default, because the raw data is of a lesser value entity for entity in set_1, it can be concluded that most computations handling these larger numbers will produce larger numbers. The numbers despite having different values do have the same standard deviation, and that is because comparatively, the spread across all data entities in both sets are identical. The coefficients of variation however differ vastly, despite their similar traits over the differences in values in each set, this remains a mystery to me but in time I will understand why that this happens in statistics, sometimes the intuitive parts of mathematics take a little longer to reveal themselves.


Comments

Popular posts from this blog

R Package: pfStat

Module 7 Assignment

The Tampa Feasibility Report featuring R based Visualizations