Module 10: Graph Analysis

 ##Module 10


#Libraries


library(ggplot2)

library(tidyverse)

library(dplyr)

library(plyr)

library(grid)


#I will provide each example and then my take on the example in respective succession, so 1 and 1, 2 and 2 etc.

#Load data


hotdogs <- read_csv("http://datasets.flowingdata.com/hot-dog-contest-winners.csv")


#Take a quick peek


str(hotdogs)

head(hotdogs)


##Model 1

#Note: The name of New.record must be changed to a `` literal expression of New record to initialize the column and eliminate the variable

colors <- ifelse(hotdogs$`New record` == 1, "darkred", "grey")


barplot(hotdogs$`Dogs eaten`, names.arg = hotdogs$Year, col=colors, border=NA,

        main = "Nathan's Hot Dog Eating Contest Results, 1980-2010",  xlab="Year", ylab="Hot dogs and buns (HDBs) eaten")


#This is a great starting point but here is where that I would take this, first lets take note of why some columns are red and some grey

      

    #colors <- ifelse(hotdogs$`New record` == 1, "darkred", "grey")



#This composite ifelse() statement states that if a new record is set (== 1), then it will appear as darkred and else, will be grey.

##We can make this example more specific by creating this colors statement to hold two arguements that would change the graph into a more specific data representation


##Model 1//My Model: New Records for American Competitors per the ammount of hotdogs eaten per competition

#American competitors are going to be the second arguement in the statement that creates a different visual

#This new variable is titles 'altcolor'

#Use & to augment the initial arguement in altcolor so that two arguements can be initialized


altcolor <- ifelse(hotdogs$`New record` == 1 & hotdogs$Country == "United States", "darkred", "grey")


barplot(hotdogs$`Dogs eaten`, names.arg = hotdogs$Year, col=altcolor, border=NA,

        main = "Nathan's Hot Dog Eating Contest American Record Setters, 1980-2010",  xlab="Year", ylab="Hot dogs and buns (HDBs) eaten")


##This visual demonstrates a much more finite result, where there are only four years that an American competitor has set a record at this competition between 1980 and 2010

#----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

##Model 2: Use ggplot2 package

#This ggplot2 model depicts the same idea as the frst but using the package, lets change arguement but this time with a twist.....


ggplot(hotdogs) + geom_bar(aes(x=Year, y=`Dogs eaten`, fill=factor(`New record`)), stat="identity") + labs(title="Nathan's Hot Dog Eating Contest Results, 1980-2010", fill="New Record") + xlab("Year") + ylab("Hot dogs and buns (HDBs) eaten")


##I will clean the code to my own formating and the twist is? Adding another arguement that will alter the visuals


ggplot(hotdogs, aes(Year,`Dogs eaten`, fill = factor(Country))) +

         geom_bar(stat = "identity") +

         labs(

           title = "Nation of Winners of the Nathan's Hot Dog Eating Contest 1980-2010",

           x = "Year",

           y = "Hot Dogs Eaten",

           fill = "Winner's Nationality")


##This new model reflects the nationality of the winners and the respective numbers of hot dogs eaten by them at the time of their win.

#----------------------------------------------------------------------------------------------------------------------------------------

#Model 3: Stacked Bar plots in R-base Graphics

#Base Code:


hotdog_places <- as.matrix(hotdog_places) 

colnames(hotdog_places) <- lapply(2000:2010, as.character)

barplot(hotdog_places, border=NA, main="Hot Dog Eating Contest Results, 1980-2010", xlab="Year", ylab="Hot dogs and buns (HDBs) eaten")


#I was not sure how to edit this and would like to find out more information about how to approach stacked visuals such as this one 

#----------------------------------------------------------------------------------------------------------------------------------------

##Example 2: Economics

#----------------------------------------------------------------------------------------------------------------------------------------

#Model 4: Economics Data

#Base Code

head(economics)


year <- function(x) as.POSIXlt(x)$year + 1900


economics$year <- year(economics$date) # we setting up our analysis


plot1 <- qplot(date, unemploy / pop, data = economics, geom = "line")


plot1


##My code

#I added labels that were representative of the graph, I have a treat for the final example, my base r graphics knowledge is lesser than ggplot. Are there other ways to graph a time series style function?


plot2 <- qplot(date, unemploy / pop, data = economics, geom = "line", xlab = "Year", ylab = "Unemployment Rate (%)")


plot2


#Final Example:: Switch Formats :: ggplot and proportion scatterplot

#Base Code: I could not find how to get this base code to work, it kept saying that the grid.arrange() function was missing or not real, and I tried finding the appropriate packages but still will not work


#>plot2 <- qplot(date, uempmed, data = economics, geom = "line")

#grid.arrange(plot1, plot2, ncol=2)

#>plot1 <- qplot(unemploy/pop, uempmed, data = economics, geom = c("point", "path"))

#>plot2 <- qplot(unemploy/pop, uempmed, data = economics, geom = c("point", "path"), color=year)

#>grid.arrange(plot1, plot2, ncol=2)

#>plot2


#My edit to this is to change the format of the graph back into ggplot2 and offer a different kind of visual that is a proportion scatter plot between unemployment rate and each year with the size of each entry predicated by the proportion of the percentage that the unemployment rate represents


ggplot(economics, aes(year, uempmed, group = pop)) +

  

  stat_sum(aes(size = ..prop..)) +

  

  labs(

    xlab = "Year",

    ylab = "Unemployment Rate (%)",

    size = "Population Proportion"

  ) +

  

  theme_gray()

## In regards to the first example base code for the previous quesiton, it was a non continuous visual representation of how that the data has changed over that time period. By color and location there is a different painting being painted by the data, and actually, it indicated times where there were recessions, like in 1991 and 2008, and the era of stagflation in the mid 1980's which is what caused my father to find work internationally. He told me himself there was not a job to be found and the visuals in both cases represent this fact

#Additional thoughts on these inputs for the visuals includes wondering how the qplot() function workds. The console stated that it was consolidated into ggplot. Is this true because the syntax is slightly different.

##I want to discuss also how visuals like these are great for being able to distinguish outliers amongst seemingly stable data. Even though the effects of changes in the economy can be felt at all levels, I am certain in the fact that numerically and visually it is more possible to find significant data within the dataset that could bring about new conclusions and inferences about the data provided.

Comments

Popular posts from this blog

R Package: pfStat

Module 7 Assignment

Visual Analytics Final Project: Portuguese Wine and Which Components Correlate to Higher Quality