Crime Analysis with Shiny & R

Crime Analysis with Shiny & R

Introduction

Shiny is a web application framework for R that will enable us to turn our analysis into interactive web application without the knowledge of HTML, CSS, or Javascript. The Shiny package is a free contributed package to R that makes it incredibly easy to deliver interactive data summaries and queries to end users through any modern web browser. Shiny comes with a variety of widgets for rapidly building user interfaces and does all of the heavy lifting in terms of setting up interactive user interfaces.

This is a follow up post to SFO City Crime Analysis with R which was purely developed with RConsole. We will reuse the data manipulated in the first post and convert that into web application using Shiny. This post will show the power of Shiny and how it exposes the functionalities of R to Data Analyst who are not programmers.

Use Case

This use case is built with Shiny and demonstrates different capabilities of the framework. We will start with basic summary and structure of the dataset, different field observations, and then followed by multiple ways of plotting the data that includes googlevis and ggplot plotting packages. It is highly recommended to follow our previous post on how the raw data was transformed in multiple steps which is the input to this use case.

What we want to do:

  • Prerequisites
  • Dataset Summary
  • Advanced Data Table
  • Observations of Dataset
  • Crime Plot with GoogleVis
  • Crime HeatMap with ggplot2
  • Crime Calendar HeatMap with GoogleVis
  • Subset Dataset

Solution

Prerequisites

  • Download and Install RStudio: Download the open source edition of Rstudio from the below link.

http://www.rstudio.com/products/rstudio/#Desk

  • Download and Install Shiny: Install Shiny package from RStudio console.
  • Get Pre-Processed Data: This dataset is based on the SFO City Crime Analysis with R blog post that does lot of manipulation on the data. The blog post can be followed to create RData or it can be downloaded from here crime_analysis_shiny_r
    pre data

Dataset Summary

  • Structure of Shiny App: The first thing to note is that Shiny programs are the easiest to build and understand using two scripts that are named as ui.R and server.R. The ui.R file is a description of the UI and is often the shortest and simplest part of a Shiny application. The server.R is where all the process happens that include data loading, cleansing, transforming, graphing, and anything that is possible with standalone R.
  • We will build the app by adding relevant code in each step. It is important that both ui.R, server.R and the required processed_sfo_data_crime.rdata are located in the same directory. The following code displays a date range that can be used to subset the data and shows the summary, structure, dimension, and some statistics on crime category.
  • Run the App: Running the initial app is very easy from RConsole. It’s important to set the working directory in RConsole where ui.R, server.R, and RData is located.
  • Shiny App: The above command will run the Shiny server and open up the browser. Date Range can be used to subset the dataset and it will dynamically change the dimension, structure, summary, and statistics of the crime categories that falls in the selected date range.
    shiny app

Advanced Data Table

Shiny has built-in support for JQuery datatable functionality that can be used to perform column searches, global search, sorting, and ability to navigate thru the whole dataset using pagination. From now on, relevant code pieces will be added to ui.R and server.R to the original code to keep it short

  • JQuery Datatable:
  • Output: Refresh the browser or re-run the app by issuing runApp() command to open up the browser. The App will have two tabs: Summary and Data Table as shown below
    data table

Observations of Dataset

We can perform some observations on the dataset to understand how the data is spread between factorials and continuous variables. This will provide a quick insight into how many times certain factorial variable appeared in the dataset. It is also possible to show how many unique values each field in the dataset has.

  • Factorial & Unique Value Observation:
  • Output: Refresh the browser or re-run the app by issuing runApp() command to open up the browser. The App will have three tabs: Summary, Data Table, and Observations as shown below. The dropdowns are populated with factorial variable that can be used to observe their occurrences in the dataset. On the right side, number of unique values for each field will be displayed. For example: DayofWeek shows 7 unique values denoting the days of the week and so do the Incident_Month that shows 12.
    dataset

Crime Plot with Googlevis

Googlevis chart library is an excellent package that will provide interactive plotting and supports merge and reduce functionalities to club two or more plots together for easy display. The below code will plot number of crime categories, crime categories by incident time, crime categories by day of week, and crime categories by month of year. It’s important to add the below library in the server.R

library(“googleVis”)

  • Plot Crime Categories:
  • Output: The App will now show the fourth tab Crime Plot that will show multiple interactive plots as shown below.
    crime plot

Crime HeatMap with ggplot2

ggplot2 is another excellent chart library that can be used to create heat maps with some tweaking. This library doesn’t have out of the box support for merge multiple charts but a special library gridExtra can be used to achieve the same functionality. It’s important to add the below two libraries in the server.R

library(“ggplot2″)
library(“gridExtra”)

  • Plot Crime HeatMap:
  • Output: The App will now show the fifth tab Crime Heatmap that will show multiple heat maps using ggplot2 which are merged with gridExtra library. The heat map is plotted between crime categories against time of the day, day of the week, and month of the year. These heat maps visually help to provide a quick idea of how the crime incidents are spread across.
    headmap

Crime Calendar HeatMap with GoogleVis

We will use Googlevis chart library again to plot the crime categories across the calendar to show a different heat map. This is useful to understand how each crime category is spread across the whole calendar year. The below server. R code will loop thru each crime category, performs subset of the data on each crime, and aggregates against the each day of the whole calendar to plot the heat map.

  • Plot Calendar Heat Map Crime Categories:
  • Output: The App will now show the final tab Crime Calendar that will show multiple heat maps spread across the whole calendar for each crime category.
    crime calender

Subset Dataset

This is the final change to the code that will enable the subsetting functionality. The power of R is in Data Manipulation and subsetting is one of the key manipulation strategy employed in R. Shiny makes it so easy in performing these data manipulation with different types of widget UI controls. We have already provided Date Range since the beginning of the post that will subset the dataset between the chosen date range. Shiny is very reactive and it will automatically change all the output and graphing to honor the data subset.

We will provide two more controls on Crime Category and Day of Week respectively to subset the dataset that can all be used together or individually. This is where Shiny really shines being so reactive and everything changes dynamically as we subset the dataset

  • Subset Crime Categories & Day of Week:
  • Output: The below screenshot shows how the subset affects the output of the App. For the below output, the dataset was subset as follows and it is evident from the Crime Calendar output that it shows the data as per the subset.

Date Range: 2013-01-01 to 2013-08-21
Day of Week: Wednesday
Crime Category: THEFT

crime dataset1

Conclusion

  • R is one of the popular programming languages that is capable of performing statistical analysis, Text analysis, Recommendations, Classification, Clustering, and other predictive modeling.
  • Shiny is the only way of bringing the power of R to the web and it does it really well. Shiny has very small learning curve and it enables data analysis over the web.
  • Treselle Systems have developed multiple curator tools using R & Shiny for the analysts who don’t have R knowledge and these web apps allow them to quickly understand the structure of the dataset and perform different ways of subsetting and plotting to get deep insights into the dataset.

References

10726 Views 3 Views Today