Data Wrangling And Visualization In R
We learned the basics about the programming language R in our previous article Basic Intro to R. Today, we’ll be moving ahead on that foundation and learn about Data Visualization and Data Wrangling in R. R is extensively used by statisticians and visualization. R is a near and dear language for most statisticians. Data Wrangling helps us get appropriate data for our visualization, and visualization itself brings meaning to our data while at the same time. Aesthetically pleasing graphs to showcase and represent our hard work is supported by R with numerous libraries which we discuss in this article.
Data Visualization
A picture is worth a thousand words. Data Visualization helps to represent information in a graphical form such that one can easily understand the gist of the data. Humans are innately visual creatures and visualizing the data using process tools in the proper way will express a lot more meaning to the world than the data in tables and array.
Data frames and Wrangling
Data frames
Data frame is two-dimensional and heterogeneous tabular data that are mutable in size. i.e. Its size can be changed.
You can know more about Data frames in R with examples from my previous article Basic Intro to R.
Data Wrangling
Data Wrangling can be understood as the process of mapping data or transforming data into the format which is appropriate for our operations such that it becomes appropriate and valuable for our analytics.
Some useful functions in R for Data Wrangling,
left_join()
It adds information from another table, i.e., all data from the left table and matching ones from the right table. It is similar to VLOOKUP in Excel.
count()
This function helps to count instances in a data frame which are the unique values of one or multiple variables.
mutate()
Thus function helps create a new column such that it adds new variables and the existing ones are preserved. Dplyr Package needs to be installed in order to access this function in R.
group_by()
It is used to group summaries by taking in existing tbl — which is a generic class for tabular data taken as an argument by dplyr function and converting it into grouped tbl.
Packages
library(tidyverse)
It is a R package that has been designed for Data Science in order to facilitate the conversation of data between user and computer. It consists of a collection of R packages, which share high-level design philosophy, low-level grammar, and data structures such that, understanding one of the packages will make it convenient to learn the others.
library(ggplot2)
In order to achieve the Data Visualization goals with R Programming Language, ggplot2 is used. This is a dedicated package for visualization and helps to upgrade the aesthetics of visual graphs in R.
Basic ggplot
Using ggplot2 with a data frame.
Let’s begin with our first ggplot,
ggplot(dataframe, aes(var1,var2))
Here, aes maps variables to “aesthetics”.
Geoms
The layout of a ggplot2 layer is defined using geom. The layer is what we add to the graph plot. It can be used to create different charts such as Bar Charts, Scatterplot charts, and many more.
geom_point()
This point geom helps to create scatterplots which are useful to display relationships between two continuous variables.
e.g., x + geom_point(aes(size = qsec))
geom_col()
It can be used to create different chart types having columns such as stacked bars, equal size columns, dodging columns, and more.
e.g.,:ggplot(data=Titanic, aes(x=Class, y=Freq, fill=Survived))+geom_col()
To Read the Full Article, Check it out at: https://bit.ly/2RHCOvz