SFO City Crime Analysis with R

SFO City Crime Analysis with R

Introduction

Crime Rate has been declining and in better control for the past few years in the United States. This is due to improvement in law enforcement strategies especially the inclusion of technology for effective and efficient deployment of police force. Many police departments have turned to data science to translate vast amount of data into actionable insights. Ranging from trend reports and correlation analysis to behavioral modeling, the developments in the crime analysis have paved the way for predictive policing and strategic foresight. Predictive policing is been growing area of research where statistical techniques are used to identify criminal hot-spots in order to facilitate anticipatory and precautionary deployment of police force.

However, the efforts that go into designing an effective predictive policing strategy involve many challenges. The most pertinent challenge concerning statisticians and analysts, is the one relating to data. How to gather, process, cleanse, manipulate, disambiguate, enrich, and visualize the data so that predictive engines are simple enough to understand and at the same time accurate enough to be useful.

Use Case

This use case is based on San Francisco Incidents derived from SFPD (San Francisco Police Department) Crime Incident Reporting system for calendar year 2013. This dataset contains close to 130 K records that contains type of crime, date and time of the incident, day of the week, latitude and longitude of the incident. We will analyze this dataset and extract some meaningful insights.

What we want to do:

  • Prerequisites
  • Download Crime Incident Dataset
  • Data Extraction & Exploration
  • Data Manipulation
  • Data Visualization

Solution

Prerequisites

  • Download and Install RStudio: Download the open source edition of Rstudio from the below link.

http://www.rstudio.com/products/rstudio/#Desk

Download Crime Incident Dataset

  • Download Dataset: This use case is based on San Francisco Incidents derived from SFPD (San Francisco Police Department) Crime Incident Reporting system for calendar year 2013. Download the dataset by clicking on the Export button and choose CSV format. The file size is around 25 MB and will take few minutes to download. Save the dataset in your working directory.

https://data.sfgov.org/Public-Safety/SFPD-Incidents-2013/n4e2-etve

  • Understanding Dataset: This dataset contains the following columns. We will be dropping some columns, rename few, and add few extra columns that will help in our analysis.
IncidntNum This is the incident number for this incident. Unfortunately, some of these incident numbers are duplicated
Category This is the category of the crime. Eg: Robbery, Fraud, Theft, Arson, and so on. This use case will combine similar crimes to keep the categories handful
Descript This is the description of the crime. We won’t need this and hence this column will be dropped
DayOfWeek Which day this incident happened in the week.
Date Date of the incident
Time Hour and Minute representation
PdDistrict Which district this area corresponds. We won’t need this and will be dropped
Resolution What happened to the culprits in the incident?
Address Street address of the incident
X Longitude of the location. This column will be renamed to longitude
Y Latitude of the location. This column will be renamed to latitude
Location Comma separated of latitude and longitude. We won’t need this and will be dropped

 

  • Install Packages: This use case requires library “chron” to be installed to perform some date and time manipulation, and ggplot2 for visualization purpose

Data Extraction & Exploration

Read the CSV file and convert into RData to keep the source data in binary format which is compressed compared to the CSV file.

  • Open RStudio and Set Working Directory: Open RStudio and set the working directory to where the CSV file was downloaded as shown belowopen RStudio and Set Working Directory
  • Extract Data:
  • Explore Data: Before we start playing with data, it is important to understand the structure of the data, what fields are present, and how they are stored.
    Explore Data

Data Manipulation

The dataset needs to be manipulated to find deep insights by ordering, removing duplicates, dropping unwanted columns, add new columns, combine data, enrich data, and others.

  • Data Manipulation Step 1:
    Data Manipulation
crime data
drop_cols

  • Data Manipulation Step 2:
tag_timing

 

  • Data Manipulation Step 3:
  • data

     

    table

Data Visualization

  • Crimes by each Category:
    Crimes by each Category
  • Crimes by Time of the Day:
    Crimes by Time of the Day
  • Crimes by Day of the Week:
    Crimes by Day of the Week
  • Crimes by Month of the Year:
    Crimes by Month of the Year
  • Heat Map of Crime Incidents by Time of the Day:
    Heat Map
  • Heat Map of Crime Incidents by Day of the Week:
    Heat Map2
  • Heat Map of Crime Incidents by Month of the Year:
    Heat Map3

Conclusion

  • R is one of the popular programming languages that is capable of performing statistical analysis, Text analysis, Recommendations, Classification, Clustering, and other predictive modeling.
  • There are lots of public datasets similar to criminal records that can be used for data mining with R.
  • Our next blog in series will use Shiny with R to bring the power of R to the web.

References

8030 Views 7 Views Today