R Statistics

Posts

Showing posts from March, 2023

Module 11: Tufte Visuals

March 24, 2023

Module 11: Tufte Visuals ##Module 11: Tufts Visualizations ##install.packages(c("CarletonStats", "devtools", "epanetReader", "fmsb", "ggplot2", "ggthemes","latticeExtra", "MASS", "PerformanceAnalytics", "psych", "plyr")) ##nstall.packages(c("prettyR", "plotrix","proto", "RCurl", "reshape", "reshape2")) library(CarletonStats) library(devtools) library(epanetReader) library(fmsb) library(ggplot2) library(ggthemes) library(latticeExtra) library(MASS) library(PerformanceAnalytics) library(psych) library(plyr) library(prettyR) library(plotrix) library(proto) library(RCurl) library(reshape) library(reshape2) #Example R Script 1 library(devtools) source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R") x <- faithful$waiting y <- faithful$eruptions plot(x, y, mai...

Module 11: R Debugging and Defensive Programming

March 24, 2023

Module 11: R Debugging and Defensive Programming ##Module 11 Debugging #Libraries library(dplyr) library(plyr) library(tidyverse) #Raw code with first edit tukey_multiple <- function(x) { outliers <- array(TRUE,dim=dim(x)) for (j in 1:ncol(x)) { outliers[,j] <- outliers[,j] ##&& tukey.outlier(x[,j]) } outlier.vec <- vector(length=nrow(x)) for (i in 1:nrow(x)) { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) } ##Second edit tukey_multiple1 <- function(x) { outliers <- array(TRUE,dim=dim(x)) for (j in 1:ncol(x)) { outliers[,j] <- outliers[,j] ##&& tukey.outlier(x[,j]) } outlier.vec <- vector(length=nrow(x)) for (i in 1:nrow(x)) { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) } #test block load(airquality) ac <- airquality str(ac) head(ac) tukey_multiple(ac)...

Module 10: Description Files

March 20, 2023

Module 10: Description Files ##Module # 10 Building your own R package assignment install.packages("roxygen2") install.packages("devtools") library(roxygen2) library(devtools) #Package write.dcf(list(Package = "Tyler", Title = "This package is a test file of competency", Description = "To tackle this problem", Version = "0.0.0.9000", License = "CC0 1.0 Universal", AuthorsR = "Tyler House <tahouse@usf.edu>", Depends = "R (>= 3.1.3)", LazyData = "TRUE")) ##This is my first, very brief, but complete description file using devtools ##are there other ways to forma...

Module 10: Graph Analysis

March 20, 2023

##Module 10 #Libraries library(ggplot2) library(tidyverse) library(dplyr) library(plyr) library(grid) #I will provide each example and then my take on the example in respective succession, so 1 and 1, 2 and 2 etc. #Load data hotdogs <- read_csv("http://datasets.flowingdata.com/hot-dog-contest-winners.csv") #Take a quick peek str(hotdogs) head(hotdogs) ##Model 1 #Note: The name of New.record must be changed to a `` literal expression of New record to initialize the column and eliminate the variable colors <- ifelse(hotdogs$`New record` == 1, "darkred", "grey") barplot(hotdogs$`Dogs eaten`, names.arg = hotdogs$Year, col=colors, border=NA, main = "Nathan's Hot Dog Eating Contest Results, 1980-2010", xlab="Year", ylab="Hot dogs and buns (HDBs) eaten") #This is a great starting point but here is where that I would take this, first lets take note of why some columns are red and some grey #colors <- ...

Module 9: Multivariate Analysis

March 10, 2023

##Module 9: Multivariate Analysis: Tobacco Statistics from the CDC #Libraries library(ggplot2) library(dplyr) library(tidyverse) library(plyr) library(stats) #Read CSV file smokedata <- read.csv("C:\\Users\\tyler\\Desktop\\Spring 2023\\Visual Analytics\\SmokeBan.csv") #Omit numbering column smokedata <- smokedata[,-1] head(smokedata) #Clean Column Names colnames(smokedata) <- c("Smoker","Ban","Age","Education","African American","Hispanic","Gender") #factor statement to differentiate every education level smokedata$Education <- factor(smokedata$Education, labels = c("hs dropout","hs","somecollege","college","master")) #Multivariate Visual ggplot(smokedata, aes(Education, Age, color = Smoker)) + geom_boxplot() + labs( x = "Level Of Education", y = "Age" ) + ggtitle("Age versus Education in Smokers versus Non...

Module 9 Data Visualization

March 10, 2023

#Module 9: 3 Visualizations #Libraries library(ggplot2) library(dplyr) library(tidyverse) library(plyr) library(stats) #import data .csv file arrests <- read.csv("C:\\Users\\tyler\\Desktop\\Spring 2023\\Visual Analytics\\Arrests.csv") str(arrests) head(arrests) #Visual 1 Boxplot arrests$sex <- factor(arrests$sex, labels = c("Female","Male")) arrests$year <- factor(arrests$year, labels = c("1997","1998","1999","2000","2001","2002")) ggplot(arrests, aes(sex,age, color = colour)) + geom_boxplot() #Visual 2 Histogram ggplot(arrests, aes(checks)) + geom_histogram( binwidth = 0.5 ) + labs( x = "Checks", y = "Number of Inmates with that number of Checks" ) #Visual 3 Scatterplot ggplot(arrests, aes(year, age, color = sex)) + geom_point() #Note I added this visual in to the assignment because finding data to use for making different visualizations #Ca...

Module 8: Input/Output, String Manipulation and 'plyr' Package

March 02, 2023

Module 8: Input/Output, String Manipulation and 'plyr' Package In this weeks assignment, we used a variety of functions to import, define, modify and resave a dataset provided to us by using various functions to produce a filtered list of names and values to where that all members were organized by sex and by average grade, and then were filtered down using the grepl(), to students of both sexes that have an I or i in their name. Maintaining the same syntactical arguements, a resulting table entitled "DataSubset" was created and saved. I compiled into a .docx and will submit in canvas but here is the R script from RStudio: ##Module # 8 Input/Output, String Manipulation and plyr package. #This weeks assignment deals with a short process of importing, changing, cleaning and resaving data within R from a non-native filetype ##Add your libraries first ##Then, we must first import the data from what was provided to us in the instructions library(dplyr) library(plyr) librar...

Module 8: Linear Regression Visualizations in ggplot2

March 02, 2023

Module 8: Linear Regression Visualizations in ggplot2 DISCLAIMER: In this Week's submission, I will post the code and my comments here, but will submit a separate HTML .rmd file for your convenience in Canvas, so that viewing my work is easier since it will be organized into a compiled document. In this weeks module, we explored different styles of Statistical Analysis from a Dataset in R programming. The key, is to remember when dealing with data, what is the appropriate time to separate (use factor()) or to treat data irrespective of each other by allowing unfiltered data directly from the source to be visually represented. I used related columns from the data frame to visualize similarities and trends amongst characteristics of automobiles that I know for a fact can tell us about the others even without knowledge of their data. Without data, one might assume a larger engine means a quicker quarter mile or less fuel mileage, but that is not always the case. This data does not ta...