【原创】R语言案例数据分析可视化报告 (附代码数据) 联系客服

发布时间 : 星期一 文章【原创】R语言案例数据分析可视化报告 (附代码数据)更新完毕开始阅读c509093d974bcf84b9d528ea81c758f5f61f293b

【原创】R语言案例数据分析报告论文(附代码数据) 有问题到淘宝找“大数据部落”就可以了

Weather Events Effects in terms of Human Casualties and Economic Losses in the United States, From 1950 to 2011

Pablo Rosales 11/11/2017

Synopsis

This analysis is focused in answering two questions: 1) Across the United States, which types of events are most harmful with respect to population health; 2) Across the United States, which types of events have the greatest economic consequences? To respond them, several analysis were done using data gathered by the National Weather Service, from 1950 to 2011, accross all the states in the United States. The analisis were structured by two main dimensions: time and geography (at the state level). Using these two dimensions as pivots, three metrics were aggregated to measure the impact of events by type; namely: a) Human casualties; b) Property losses; and c) Crop losses. The insights provided by these results, may help local governors to take the preventive measures in order to reduce the impact of weather events that prevail in their geographic area.

Data Processing

For this analysis two data sets were merged: 1) The weather events provided as part of the assignment. 2) Geographic regions, divisions and states, as defined by the United States Census Bureau (United States Census Bureau, 2015). This merge step was done with the idea of making results more understandable to the reader, by having a geografical structure, as well as more descriptive names for geographical entities. The data loads and merge constitute stages 1 and 2 of the present analysis.

Stage 3 is about calculating the most frequent events by state from 1950 to 2011. Stages 4,5 and 6 were done to summarise the top 3 events in relation to mortality, property losses and crop losses, respectively. Stage 7 is an aggregation of events during the last 10 years, by region. And finally, stages 8 to 10 were made to sum up the the deadliest and costly property and crop damages, distributed across the months of the year.

# ---------------------- Setup ------------------------ # library(dplyr) library(ggplot2) library(lubridate) library(knitr) 【原创】R语言案例数据分析报告论文(附代码数据) 有问题到淘宝找“大数据部落”就可以了 # ---------------------- Constants Definition ------------------------ # RECENCY_SPAN_IN_YEARS<-10# Last X years Top Events by frequency, by geographic area C_NOT_DEFINED_STR<-'NOT DEFINED' C_NOT_DEFINED_INT<--1 # ---------------------- Stage 1: Load Source Data ------------------------ # setwd('/Users/prosales/Documents/Capacitaciones/Certificaciones/Coursera DS Certificate - Course 5 - Reproducible Research/Final Project/') natural_events_df<-read.csv('repdata/data/StormData.csv.bz2') state_geocodes_df<-read.csv('state-geocodes-v2015.csv') # ---------------------- Stage 2: Data Preparation: Enhancements and Restructuring ------------------------ # regions_df<-state_geocodes_df%>%filter(division==0&state_fips==0)%>%select(region, name) colnames(regions_df)<-c('region_id', 'region_name') divisions_df<-state_geocodes_df%>%filter(division!=0&state_fips==0)%>%select(division, name) colnames(divisions_df)<-c('division_id', 'division_name') states_df<-state_geocodes_df%>%filter(state_fips!=0)%>%select(region, division, state_fips, name) colnames(states_df)<-c('region_id', 'division_id', 'state_id', 'state_name') complete_geography_df<-merge(states_df, regions_df, by='region_id') 【原创】R语言案例数据分析报告论文(附代码数据) 有问题到淘宝找“大数据部落”就可以了 complete_geography_df<-merge(complete_geography_df, divisions_df, by='division_id') complete_geography_df<-complete_geography_df%>%select('region_id', 'region_name', 'division_id', 'division_name', 'state_id', 'state_name') geography_structured_events_df<-merge(natural_events_df, complete_geography_df, by.x='STATE__', by.y='state_id', all.x=TRUE) geography_structured_events_df<-geography_structured_events_df%>%mutate(region_name=as.character(region_name)) geography_structured_events_df<-geography_structured_events_df%>%mutate(division_name=as.character(division_name)) geography_structured_events_df<-geography_structured_events_df%>%mutate(state_name=as.character(state_name)) geography_structured_events_df<-geography_structured_events_df%>%mutate(region_name=replace(region_name, is.na(region_name), C_NOT_DEFINED_STR)) geography_structured_events_df<-geography_structured_events_df%>%mutate(division_name=replace(division_name, is.na(division_name), C_NOT_DEFINED_STR)) geography_structured_events_df<-geography_structured_events_df%>%mutate(state_name=replace(state_name, is.na(state_name), C_NOT_DEFINED_STR)) geography_structured_events_df<-geography_structured_events_df%>%mutate(region_id=replace(region_id, is.na(region_id), C_NOT_DEFINED_INT)) geography_structured_events_df<-geography_structured_events_df%>%mutate(division_id=replace(division_id, is.na(division_id), C_NOT_DEFINED_INT)) geography_structured_events_df<-geography_structured_events_df%>%mutate(BGN_DATE=as.Date(BGN_DATE, format='%m/%d/%Y'))