In this analysis, we combine the U.S. FPA-FOD and NIFC data sets to create a merged data set spanning the interval 1986-2014. There are two issues that arise in doing that: 1) creating a common set of variables (that will also be used in merging the U.S. and CNFDB (Canadian) data sets), and 2) reconciling the different fire-start “cause” codes in the two data sets.
The two data sets use different systems for indentifying the causes of the individual fire starts. The FPA-FOD data include 13 categories of causes (in the numeric variable STAT_CAUSE_CODE
), while the NIFC data contain 10 categories (in the factor variable GENERAL_CA
). The categories do not have a one-to-one relationship, but are close. We created two new cause categorizations, cause1
and cause2
where cause1
is a coarse-resolution categorization (lightning/natural, human, and unknown), and cause2
is a finer, 10-category list of causes:
The specific remappings of causes to the Merged
set (cause1
and cause2
) are shown in the code below, but can also be inferred from the table. Note that this system also accommodates the CNFDB data, which contains only a course-resolution categorization of causes
There are two main steps in creating the merged data set, including 1) creating a dataframe with the set of common variables to be written out, 2) recoding the cause codes to the cause1
and cause2
variables.
Load the cleaned-up data. (These are the “working” .RData
data sets that were created in an earlier step.)
load("e:/Projects/fire/DailyFireStarts/data/RData/fpafod.RData")
load("e:/Projects/fire/DailyFireStarts/data/RData/nifc.RData")
List the variables in the two different data sets.
str(fpafod)
## 'data.frame': 1727476 obs. of 15 variables:
## $ FOD_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ NWCG_REPORTING_AGENCY: Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ FIRE_YEAR : int 2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
## $ DISCOVERY_DATE : Date, format: "2005-02-02" "2004-05-12" "2004-05-31" ...
## $ DISCOVERY_DOY : int 33 133 152 180 180 182 183 67 74 183 ...
## $ STAT_CAUSE_CODE : num 9 1 5 1 1 1 1 5 5 1 ...
## $ CONT_DATE : POSIXct, format: "2005-02-02" "2004-05-12" "2004-05-31" ...
## $ CONT_DOY : int 33 133 152 185 185 183 184 67 74 184 ...
## $ FIRE_SIZE : num 0.1 0.25 0.1 0.1 0.1 0.1 0.1 0.8 1 0.1 ...
## $ LATITUDE : num 40 38.9 39 38.6 38.6 ...
## $ LONGITUDE : num -121 -120 -121 -120 -120 ...
## $ STATE : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ startday : num 2 12 31 28 28 30 1 8 15 1 ...
## $ startmon : num 2 5 5 6 6 6 7 3 3 7 ...
## $ AREA_HA : num 0.0405 0.1012 0.0405 0.0405 0.0405 ...
str(nifc)
## 'data.frame': 405124 obs. of 22 variables:
## $ AGENCY_COD : Factor w/ 5 levels "BIA","BLM","FWS",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ UNIT_ID : Factor w/ 881 levels "1002","1003",..: 361 361 361 361 361 361 361 361 361 361 ...
## $ FIRE_TYPE : Factor w/ 27 levels "0","1","11","12",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NUMBE : Factor w/ 31568 levels "0","1","10","100",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NAME : Factor w/ 151091 levels "''67''","''67'' 2",..: NA NA NA NA NA NA NA NA NA NA ...
## $ STATE : Factor w/ 51 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ DATE_DISCO : Factor w/ 8553 levels "19140716","19150424",..: 1241 1244 1247 1249 1252 1260 1264 1492 1497 1497 ...
## $ DATE_CONTR : Factor w/ 8577 levels "02031231","19800000",..: NA NA NA NA NA NA NA NA NA NA ...
## $ GENERAL_CA : Factor w/ 10 levels "0","1","2","3",..: 3 3 3 4 3 3 2 2 2 2 ...
## $ SPECIFIC_C : Factor w/ 33 levels "0","1","10","11",..: 33 33 33 3 33 32 2 2 2 2 ...
## $ YEAR_DISCO : int 1983 1983 1983 1983 1983 1983 1983 1984 1984 1984 ...
## $ LATITUDE : num 44.3 44.5 43.9 44 43.8 ...
## $ LONGITUDE : num -119 -119 -119 -119 -119 ...
## $ ACRES_CONT : num 0.1 1 0.1 1 0.1 1 0.1 0.1 0.1 0.1 ...
## $ SIZE_CLASS : Factor w/ 7 levels "A","B","C","D",..: 1 2 1 2 1 2 1 1 1 1 ...
## $ GEOGRAPHIC : Factor w/ 11 levels "Alaska","Eastern",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ STARTDATED : Date, format: "1983-10-05" "1983-10-08" "1983-10-11" ...
## $ YEAR : num 1983 1983 1983 1983 1983 ...
## $ startday : num 5 8 11 13 16 24 29 18 23 23 ...
## $ startmon : num 10 10 10 10 10 10 10 7 7 7 ...
## $ startdaynum: num 278 281 284 286 289 297 302 200 205 205 ...
## $ AREA : num 0.0405 0.4047 0.0405 0.4047 0.0405 ...
## - attr(*, "data_types")= chr "C" "C" "C" "C" ...
nobs <- length(fpafod[,1])
# direct copy
datasource <- rep("fpafod",nobs)
head(datasource)
## [1] "fpafod" "fpafod" "fpafod" "fpafod" "fpafod" "fpafod"
sourceid <- as.integer(fpafod$FOD_ID)
head(sourceid)
## [1] 1 2 3 4 5 6
latitude <- fpafod$LATITUDE
head(latitude)
## [1] 40.03694 38.93306 38.98417 38.55917 38.55917 38.63528
longitude <- fpafod$LONGITUDE
head(longitude)
## [1] -121.0058 -120.4044 -120.7356 -119.9133 -119.9331 -120.1036
year <- fpafod$FIRE_YEAR
head(year)
## [1] 2005 2004 2004 2004 2004 2004
mon <- fpafod$startmon
head(mon)
## [1] 2 5 5 6 6 6
day <- fpafod$startday
head(day)
## [1] 2 12 31 28 28 30
daynum <- fpafod$DISCOVERY_DOY
head(daynum)
## [1] 33 133 152 180 180 182
area_ha <- fpafod$AREA_HA
head(area_ha)
## [1] 0.0404686 0.1011715 0.0404686 0.0404686 0.0404686 0.0404686
cause_original <- as.integer(fpafod$STAT_CAUSE_CODE)
head(cause_original)
## [1] 9 1 5 1 1 1
stateprov <- as.character(fpafod$STATE)
head(stateprov)
## [1] "CA" "CA" "CA" "CA" "CA" "CA"
agency <- as.character(fpafod$NWCG_REPORTING_AGENCY)
head(agency)
## [1] "FS" "FS" "FS" "FS" "FS" "FS"
# fill cause1 and cause2 with 0's
cause1 <- rep(0,nobs)
cause2 <- rep(0,nobs)
# make data frame
fpafod_out <- data.frame(datasource, sourceid, latitude, longitude, year, mon, day, daynum,
area_ha, cause_original, cause1, cause2, stateprov, agency)
summary(fpafod_out)
## datasource sourceid latitude longitude year mon
## fpafod:1727476 Min. : 1 Min. :17.94 Min. :-178.80 Min. :1992 Min. : 1.000
## 1st Qu.: 465673 1st Qu.:32.83 1st Qu.:-109.83 1st Qu.:1998 1st Qu.: 3.000
## Median : 985582 Median :35.40 Median : -91.18 Median :2003 Median : 6.000
## Mean : 32179232 Mean :36.79 Mean : -95.29 Mean :2003 Mean : 5.935
## 3rd Qu.: 1761114 3rd Qu.:40.77 3rd Qu.: -82.25 3rd Qu.:2008 3rd Qu.: 8.000
## Max. :201940182 Max. :70.14 Max. : -65.26 Max. :2013 Max. :12.000
##
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.0 Min. : 1.0 Min. : 0.00 Min. : 1.000 Min. :0 Min. :0
## 1st Qu.: 8.0 1st Qu.: 89.0 1st Qu.: 0.04 1st Qu.: 3.000 1st Qu.:0 1st Qu.:0
## Median :15.0 Median :164.0 Median : 0.40 Median : 5.000 Median :0 Median :0
## Mean :15.5 Mean :164.9 Mean : 29.55 Mean : 5.921 Mean :0 Mean :0
## 3rd Qu.:23.0 3rd Qu.:230.0 3rd Qu.: 1.45 3rd Qu.: 9.000 3rd Qu.:0 3rd Qu.:0
## Max. :31.0 Max. :366.0 Max. :245622.14 Max. :13.000 Max. :0 Max. :0
##
## stateprov agency
## CA :173634 ST/C&L :1254551
## GA :162479 FS : 206731
## TX :125227 BIA : 108423
## NC :104263 BLM : 90801
## FL : 85576 IA : 21841
## SC : 78127 NPS : 19571
## (Other):998170 (Other): 25558
Recode cause_original into new causes (cause1, cause2)
# STAT_CAUSE_CODE
# 1 Lightning; 2 Equipment Use; 3 Smoking; 4 Campfire; 5 Debris Burning; 6 Railroad; 7 Arson; 8 Children;
# 9 Miscellaneous; 10 Fireworks; 11 Power Line; 12 Structure; 13 Missing/Undefined
# cause1
# 1 Lightning; 2 Human; 3 Unknown
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
fpafod_out$cause1[fpafod_out$cause_original == 1] <- 1
fpafod_out$cause1[fpafod_out$cause_original != 1 & fpafod_out$cause_original != 13] <- 2
fpafod_out$cause1[fpafod_out$cause_original == 13] <- 3
fpafod_out$cause2[fpafod_out$cause_original == 1] <- 1
fpafod_out$cause2[fpafod_out$cause_original == 2] <- 2
fpafod_out$cause2[fpafod_out$cause_original == 3] <- 3
fpafod_out$cause2[fpafod_out$cause_original == 4] <- 4
fpafod_out$cause2[fpafod_out$cause_original == 5] <- 5
fpafod_out$cause2[fpafod_out$cause_original == 6] <- 6
fpafod_out$cause2[fpafod_out$cause_original == 7] <- 5
fpafod_out$cause2[fpafod_out$cause_original == 8] <- 7
fpafod_out$cause2[fpafod_out$cause_original == 9] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 10] <-9
fpafod_out$cause2[fpafod_out$cause_original == 11] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 12] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 13] <- 10
Compare cause classifications
# STAT_CAUSE_CODE
# 1 Lightning; 2 Equipment Use; 3 Smoking; 4 Campfire; 5 Debris Burning; 6 Railroad; 7 Arson; 8 Children;
# 9 Miscellaneous; 10 Fireworks; 11 Power Line; 12 Structure; 13 Missing/Undefined
table(fpafod$STAT_CAUSE_CODE)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 260311 137575 49633 68684 390702 32569 267987 58354 291678 10350 11315 3178 145140
# cause1
# 1 Lightning; 2 Human; 3 Unknown
table(fpafod_out$cause1)
##
## 1 2 3
## 260311 1322025 145140
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
table(fpafod_out$cause2)
##
## 1 2 3 4 5 6 7 8 9 10
## 260311 137575 49633 68684 658689 32569 58354 306171 10350 145140
str(fpafod_out)
## 'data.frame': 1727476 obs. of 14 variables:
## $ datasource : Factor w/ 1 level "fpafod": 1 1 1 1 1 1 1 1 1 1 ...
## $ sourceid : int 1 2 3 4 5 6 7 8 9 10 ...
## $ latitude : num 40 38.9 39 38.6 38.6 ...
## $ longitude : num -121 -120 -121 -120 -120 ...
## $ year : int 2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
## $ mon : num 2 5 5 6 6 6 7 3 3 7 ...
## $ day : num 2 12 31 28 28 30 1 8 15 1 ...
## $ daynum : int 33 133 152 180 180 182 183 67 74 183 ...
## $ area_ha : num 0.0405 0.1012 0.0405 0.0405 0.0405 ...
## $ cause_original: int 9 1 5 1 1 1 1 5 5 1 ...
## $ cause1 : num 2 1 2 1 1 1 1 2 2 1 ...
## $ cause2 : num 8 1 5 1 1 1 1 5 5 1 ...
## $ stateprov : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ agency : Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
Write out a .csv file
merged_data_path <- "e:/Projects/fire/DailyFireStarts/data/MergedData/"
outfilename <- "fpafod_1992-2013.csv"
write.table(fpafod_out, paste(merged_data_path, outfilename, sep=""), row.names=FALSE)
load("e:/Projects/fire/DailyFireStarts/data/RData/nifc.RData")
str(nifc)
## 'data.frame': 405124 obs. of 22 variables:
## $ AGENCY_COD : Factor w/ 5 levels "BIA","BLM","FWS",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ UNIT_ID : Factor w/ 881 levels "1002","1003",..: 361 361 361 361 361 361 361 361 361 361 ...
## $ FIRE_TYPE : Factor w/ 27 levels "0","1","11","12",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NUMBE : Factor w/ 31568 levels "0","1","10","100",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NAME : Factor w/ 151091 levels "''67''","''67'' 2",..: NA NA NA NA NA NA NA NA NA NA ...
## $ STATE : Factor w/ 51 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ DATE_DISCO : Factor w/ 8553 levels "19140716","19150424",..: 1241 1244 1247 1249 1252 1260 1264 1492 1497 1497 ...
## $ DATE_CONTR : Factor w/ 8577 levels "02031231","19800000",..: NA NA NA NA NA NA NA NA NA NA ...
## $ GENERAL_CA : Factor w/ 10 levels "0","1","2","3",..: 3 3 3 4 3 3 2 2 2 2 ...
## $ SPECIFIC_C : Factor w/ 33 levels "0","1","10","11",..: 33 33 33 3 33 32 2 2 2 2 ...
## $ YEAR_DISCO : int 1983 1983 1983 1983 1983 1983 1983 1984 1984 1984 ...
## $ LATITUDE : num 44.3 44.5 43.9 44 43.8 ...
## $ LONGITUDE : num -119 -119 -119 -119 -119 ...
## $ ACRES_CONT : num 0.1 1 0.1 1 0.1 1 0.1 0.1 0.1 0.1 ...
## $ SIZE_CLASS : Factor w/ 7 levels "A","B","C","D",..: 1 2 1 2 1 2 1 1 1 1 ...
## $ GEOGRAPHIC : Factor w/ 11 levels "Alaska","Eastern",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ STARTDATED : Date, format: "1983-10-05" "1983-10-08" "1983-10-11" ...
## $ YEAR : num 1983 1983 1983 1983 1983 ...
## $ startday : num 5 8 11 13 16 24 29 18 23 23 ...
## $ startmon : num 10 10 10 10 10 10 10 7 7 7 ...
## $ startdaynum: num 278 281 284 286 289 297 302 200 205 205 ...
## $ AREA : num 0.0405 0.4047 0.0405 0.4047 0.0405 ...
## - attr(*, "data_types")= chr "C" "C" "C" "C" ...
summary(nifc)
## AGENCY_COD UNIT_ID FIRE_TYPE FIRE_NUMBE FIRE_NAME STATE
## BIA : 91480 304 : 8420 11 :125642 1 : 1872 FA 1 : 550 CA : 63643
## BLM : 88307 H50H58 : 6958 13 : 14425 2 : 1696 FA 2 : 512 AZ : 55535
## FWS : 0 H50H52 : 6385 16 : 14118 3 : 1556 FA 3 : 473 OR : 38566
## NPS : 26809 AKAFS : 5910 51 : 10310 4 : 1457 FA 4 : 426 ID : 29029
## USFS:198437 312 : 5387 15 : 9658 5 : 1378 FA 5 : 412 MT : 28450
## NA's: 91 (Other):371973 (Other): 32443 (Other):198637 (Other):286270 (Other):189810
## NA's : 91 NA's :198528 NA's :198528 NA's :116481 NA's : 91
## DATE_DISCO DATE_CONTR GENERAL_CA SPECIFIC_C YEAR_DISCO LATITUDE
## 19890726: 1078 19940000: 1554 1 :182663 30 :144501 Min. :1981 Min. :19.27
## 19860810: 861 19920000: 1422 9 : 44554 1 : 76258 1st Qu.:1989 1st Qu.:35.31
## 19870830: 852 19930000: 1396 5 : 42126 0 : 72892 Median :1994 Median :39.89
## 19890808: 698 19960000: 1393 4 : 33951 19 : 15034 Mean :1994 Mean :40.26
## 19940723: 663 19900000: 1307 2 : 31932 27 : 11503 3rd Qu.:1999 3rd Qu.:44.45
## (Other) :400881 (Other) :397184 (Other): 69807 (Other): 84845 Max. :2003 Max. :69.85
## NA's : 91 NA's : 868 NA's : 91 NA's : 91 NA's :91 NA's :91
## LONGITUDE ACRES_CONT SIZE_CLASS GEOGRAPHIC STARTDATED
## Min. :-176.67 Min. : 0.0 A :217738 Southwest : 78712 Min. :1981-01-01
## 1st Qu.:-118.45 1st Qu.: 0.1 B :130527 Northwest : 51789 1st Qu.:1989-07-26
## Median :-112.00 Median : 0.2 C : 35503 Northern Rockies : 47645 Median :1994-07-22
## Mean :-110.62 Mean : 178.5 D : 8481 Rocky Mountain : 41947 Mean :1994-05-03
## 3rd Qu.:-106.68 3rd Qu.: 2.0 E : 6193 Eastern Great Basin: 41205 3rd Qu.:1999-07-09
## Max. : -67.06 Max. :606945.0 (Other): 6591 (Other) :143735 Max. :2003-12-31
## NA's :91 NA's :91 NA's : 91 NA's : 91 NA's :188
## YEAR startday startmon startdaynum AREA
## Min. :1981 Min. : 1.00 Min. : 1.000 Min. : 1.0 Min. : 0.00
## 1st Qu.:1989 1st Qu.: 8.00 1st Qu.: 6.000 1st Qu.:159.0 1st Qu.: 0.04
## Median :1994 Median :16.00 Median : 7.000 Median :201.0 Median : 0.08
## Mean :1994 Mean :15.73 Mean : 6.851 Mean :192.9 Mean : 72.23
## 3rd Qu.:1999 3rd Qu.:24.00 3rd Qu.: 8.000 3rd Qu.:231.0 3rd Qu.: 0.81
## Max. :2003 Max. :31.00 Max. :12.000 Max. :366.0 Max. :245622.14
## NA's :188 NA's :188 NA's :188 NA's :188 NA's :91
Remove 210 records with complete NAs
nifc <- nifc[is.na(nifc$LATITUDE) == FALSE,]
summary(nifc)
## AGENCY_COD UNIT_ID FIRE_TYPE FIRE_NUMBE FIRE_NAME STATE
## BIA : 91480 304 : 8420 11 :125642 1 : 1872 FA 1 : 550 CA : 63643
## BLM : 88307 H50H58 : 6958 13 : 14425 2 : 1696 FA 2 : 512 AZ : 55535
## FWS : 0 H50H52 : 6385 16 : 14118 3 : 1556 FA 3 : 473 OR : 38566
## NPS : 26809 AKAFS : 5910 51 : 10310 4 : 1457 FA 4 : 426 ID : 29029
## USFS:198437 312 : 5387 15 : 9658 5 : 1378 FA 5 : 412 MT : 28450
## F50F52 : 5282 (Other): 32443 (Other):198637 (Other):286270 NM : 22249
## (Other):366691 NA's :198437 NA's :198437 NA's :116390 (Other):167561
## DATE_DISCO DATE_CONTR GENERAL_CA SPECIFIC_C YEAR_DISCO LATITUDE
## 19890726: 1078 19940000: 1554 1 :182663 30 :144501 Min. :1981 Min. :19.27
## 19860810: 861 19920000: 1422 9 : 44554 1 : 76258 1st Qu.:1989 1st Qu.:35.31
## 19870830: 852 19930000: 1396 5 : 42126 0 : 72892 Median :1994 Median :39.89
## 19890808: 698 19960000: 1393 4 : 33951 19 : 15034 Mean :1994 Mean :40.26
## 19940723: 663 19900000: 1307 2 : 31932 27 : 11503 3rd Qu.:1999 3rd Qu.:44.45
## 19960813: 653 (Other) :397184 0 : 22246 8 : 11138 Max. :2003 Max. :69.85
## (Other) :400228 NA's : 777 (Other): 47561 (Other): 73707
## LONGITUDE ACRES_CONT SIZE_CLASS GEOGRAPHIC STARTDATED
## Min. :-176.67 Min. : 0.0 A:217738 Southwest : 78712 Min. :1981-01-01
## 1st Qu.:-118.45 1st Qu.: 0.1 B:130527 Northwest : 51789 1st Qu.:1989-07-26
## Median :-112.00 Median : 0.2 C: 35503 Northern Rockies : 47645 Median :1994-07-22
## Mean :-110.62 Mean : 178.5 D: 8481 Rocky Mountain : 41947 Mean :1994-05-03
## 3rd Qu.:-106.68 3rd Qu.: 2.0 E: 6193 Eastern Great Basin: 41205 3rd Qu.:1999-07-09
## Max. : -67.06 Max. :606945.0 F: 4336 Southern : 35210 Max. :2003-12-31
## G: 2255 (Other) :108525 NA's :97
## YEAR startday startmon startdaynum AREA
## Min. :1981 Min. : 1.00 Min. : 1.000 Min. : 1.0 Min. : 0.00
## 1st Qu.:1989 1st Qu.: 8.00 1st Qu.: 6.000 1st Qu.:159.0 1st Qu.: 0.04
## Median :1994 Median :16.00 Median : 7.000 Median :201.0 Median : 0.08
## Mean :1994 Mean :15.73 Mean : 6.851 Mean :192.9 Mean : 72.23
## 3rd Qu.:1999 3rd Qu.:24.00 3rd Qu.: 8.000 3rd Qu.:231.0 3rd Qu.: 0.81
## Max. :2003 Max. :31.00 Max. :12.000 Max. :366.0 Max. :245622.14
## NA's :97 NA's :97 NA's :97 NA's :97
Create new (common) variables
nobs <- length(nifc[,1])
# direct copy
datasource <- rep("nifc",nobs)
head(datasource)
## [1] "nifc" "nifc" "nifc" "nifc" "nifc" "nifc"
sourceid <- as.integer(seq(1,nobs,by=1))
head(sourceid)
## [1] 1 2 3 4 5 6
latitude <- nifc$LATITUDE
head(latitude)
## [1] 44.28000 44.55000 43.87667 44.03333 43.79333 44.28333
longitude <- nifc$LONGITUDE
head(longitude)
## [1] -118.9167 -118.9167 -119.3683 -118.7467 -118.9700 -118.9217
year <- nifc$YEAR
head(year)
## [1] 1983 1983 1983 1983 1983 1983
mon <- nifc$startmon
head(mon)
## [1] 10 10 10 10 10 10
day <- nifc$startday
head(day)
## [1] 5 8 11 13 16 24
daynum <- nifc$startdaynum
head(daynum)
## [1] 278 281 284 286 289 297
area_ha <- nifc$AREA
head(area_ha)
## [1] 0.0404686 0.4046860 0.0404686 0.4046860 0.0404686 0.4046860
cause_original <- as.numeric(nifc$GENERAL_CA)
head(cause_original)
## [1] 3 3 3 4 3 3
stateprov <- as.character(nifc$STATE)
head(stateprov)
## [1] "OR" "OR" "OR" "OR" "OR" "OR"
agency <- as.character(nifc$AGENCY_COD)
head(agency)
## [1] "USFS" "USFS" "USFS" "USFS" "USFS" "USFS"
# fill cause1 and cause2 with 0's
cause1 <- rep(0,nobs)
cause2 <- rep(0,nobs)
# make data frame
nifc_out <- data.frame(datasource, sourceid, latitude, longitude, year, mon, day, daynum,
area_ha, cause_original, cause1, cause2, stateprov, agency)
summary(nifc_out)
## datasource sourceid latitude longitude year mon
## nifc:405033 Min. : 1 Min. :19.27 Min. :-176.67 Min. :1981 Min. : 1.000
## 1st Qu.:101259 1st Qu.:35.31 1st Qu.:-118.45 1st Qu.:1989 1st Qu.: 6.000
## Median :202517 Median :39.89 Median :-112.00 Median :1994 Median : 7.000
## Mean :202517 Mean :40.26 Mean :-110.62 Mean :1994 Mean : 6.851
## 3rd Qu.:303775 3rd Qu.:44.45 3rd Qu.:-106.68 3rd Qu.:1999 3rd Qu.: 8.000
## Max. :405033 Max. :69.85 Max. : -67.06 Max. :2003 Max. :12.000
## NA's :97 NA's :97
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.00 Min. : 1.0 Min. : 0.00 Min. : 1.000 Min. :0 Min. :0
## 1st Qu.: 8.00 1st Qu.:159.0 1st Qu.: 0.04 1st Qu.: 2.000 1st Qu.:0 1st Qu.:0
## Median :16.00 Median :201.0 Median : 0.08 Median : 2.000 Median :0 Median :0
## Mean :15.73 Mean :192.9 Mean : 72.23 Mean : 4.164 Mean :0 Mean :0
## 3rd Qu.:24.00 3rd Qu.:231.0 3rd Qu.: 0.81 3rd Qu.: 6.000 3rd Qu.:0 3rd Qu.:0
## Max. :31.00 Max. :366.0 Max. :245622.14 Max. :10.000 Max. :0 Max. :0
## NA's :97 NA's :97
## stateprov agency
## CA : 63643 BIA : 91480
## AZ : 55535 BLM : 88307
## OR : 38566 NPS : 26809
## ID : 29029 USFS:198437
## MT : 28450
## NM : 22249
## (Other):167561
Recode cause_original into new causes (cause1, cause2)
# GENERAL_CA
# 1 Natural; 2 Campfire; 3 Smoking; 4 Fire use; 5 Incendiary; 6 Equipment
# 7 Railroads; 8 Juveniles; 9 Miscellaneous; 0 Unknown
# cause1
# 1 Lightning; 2 Human; 3 Unknown
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
nifc_out$cause1[nifc_out$cause_original == 1] <- 1
nifc_out$cause1[nifc_out$cause_original != 1 & nifc_out$cause_original != 0] <- 2
nifc_out$cause1[nifc_out$cause_original == 0] <- 3
nifc_out$cause2[nifc_out$cause_original == 1] <- 1
nifc_out$cause2[nifc_out$cause_original == 2] <- 4
nifc_out$cause2[nifc_out$cause_original == 3] <- 3
nifc_out$cause2[nifc_out$cause_original == 4] <- 5
nifc_out$cause2[nifc_out$cause_original == 5] <- 9
nifc_out$cause2[nifc_out$cause_original == 6] <- 2
nifc_out$cause2[nifc_out$cause_original == 7] <- 6
nifc_out$cause2[nifc_out$cause_original == 8] <- 7
nifc_out$cause2[nifc_out$cause_original == 9] <- 8
nifc_out$cause2[nifc_out$cause_original == 0] <- 10
nifc_out$cause_original <- as.numeric(nifc_out$cause_original) - 1
Compare cause classifications
# GENERAL_CA
# 1 Natural; 2 Campfire; 3 Smoking; 4 Fire use; 5 Incendiary; 6 Equipment
# 7 Railroads; 8 Juveniles; 9 Miscellaneous; 0 Unknown
table(nifc$GENERAL_CA)
##
## 0 1 2 3 4 5 6 7 8 9
## 22246 182663 31932 11672 33951 42126 15596 3357 16936 44554
table(nifc_out$cause_original)
##
## 0 1 2 3 4 5 6 7 8 9
## 22246 182663 31932 11672 33951 42126 15596 3357 16936 44554
# cause1
# 1 Lightning; 2 Human; 3 Unknown
table(nifc_out$cause1)
##
## 1 2
## 22246 382787
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
table(nifc_out$cause2)
##
## 0 1 2 3 4 5 6 7 8 9
## 44554 22246 42126 31932 182663 11672 15596 3357 16936 33951
str(nifc_out)
## 'data.frame': 405033 obs. of 14 variables:
## $ datasource : Factor w/ 1 level "nifc": 1 1 1 1 1 1 1 1 1 1 ...
## $ sourceid : int 1 2 3 4 5 6 7 8 9 10 ...
## $ latitude : num 44.3 44.5 43.9 44 43.8 ...
## $ longitude : num -119 -119 -119 -119 -119 ...
## $ year : num 1983 1983 1983 1983 1983 ...
## $ mon : num 10 10 10 10 10 10 10 7 7 7 ...
## $ day : num 5 8 11 13 16 24 29 18 23 23 ...
## $ daynum : num 278 281 284 286 289 297 302 200 205 205 ...
## $ area_ha : num 0.0405 0.4047 0.0405 0.4047 0.0405 ...
## $ cause_original: num 2 2 2 3 2 2 1 1 1 1 ...
## $ cause1 : num 2 2 2 2 2 2 2 2 2 2 ...
## $ cause2 : num 3 3 3 5 3 3 4 4 4 4 ...
## $ stateprov : Factor w/ 50 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ agency : Factor w/ 4 levels "BIA","BLM","NPS",..: 4 4 4 4 4 4 4 4 4 4 ...
# Recode area_ha NA's to 0.0
nifc_out$area_ha[is.na(nifc_out$area_ha) == TRUE] <- 0
summary(nifc_out$area_ha)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.04 0.08 72.23 0.81 245622.14
Write out a .csv file
outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/nifc_1981-2003.csv"
write.table(nifc_out, outfilename, sep=",", row.names=FALSE)
Select just the records between 1986 and 1991
nifc2_out <- subset(nifc_out, nifc_out$year >= 1986 & nifc_out$year <= 1991)
length(nifc2_out[,1])
## [1] 121576
table(nifc2_out$year)
##
## 1986 1987 1988 1989 1990 1991
## 17760 21873 21950 19857 20606 19530
sum(table(nifc2_out$year))
## [1] 121576
summary(nifc2_out)
## datasource sourceid latitude longitude year mon
## nifc:121576 Min. : 22 Min. :19.30 Min. :-176.67 Min. :1986 Min. : 1.000
## 1st Qu.: 32583 1st Qu.:35.42 1st Qu.:-119.09 1st Qu.:1987 1st Qu.: 6.000
## Median : 63529 Median :39.83 Median :-112.80 Median :1988 Median : 7.000
## Mean :144894 Mean :40.26 Mean :-110.85 Mean :1989 Mean : 6.928
## 3rd Qu.:269221 3rd Qu.:44.47 3rd Qu.:-106.72 3rd Qu.:1990 3rd Qu.: 8.000
## Max. :405004 Max. :69.63 Max. : -68.22 Max. :1991 Max. :12.000
##
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.0 Min. : 1.0 Min. : 0.00 Min. :0.000 Min. :1.000 Min. :0.0
## 1st Qu.: 8.0 1st Qu.:165.0 1st Qu.: 0.04 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:3.0
## Median :15.0 Median :203.0 Median : 0.08 Median :1.000 Median :2.000 Median :4.0
## Mean :15.8 Mean :195.2 Mean : 70.54 Mean :2.987 Mean :1.956 Mean :3.9
## 3rd Qu.:24.0 3rd Qu.:232.0 3rd Qu.: 0.81 3rd Qu.:5.000 3rd Qu.:2.000 3rd Qu.:4.0
## Max. :31.0 Max. :365.0 Max. :219028.61 Max. :9.000 Max. :2.000 Max. :9.0
##
## stateprov agency
## CA :23342 BIA :20448
## AZ :14770 BLM :21262
## OR :12066 NPS : 8479
## ID : 9392 USFS:71387
## MT : 8006
## NM : 5744
## (Other):48256
Write out a .csv file
outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/nifc_1986-1991.csv"
write.table(nifc2_out, outfilename, sep=",", row.names=FALSE)
us_merged <- rbind(fpafod_out,nifc2_out)
table_us_year <- table(us_merged$year)
table_us_year
##
## 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
## 17760 21873 21950 19857 20606 19530 67964 62022 75989 71496 75604 61472 68388 89398 96454
## 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
## 86069 75136 67380 68616 87391 113242 94681 84654 77262 78485 89897 71768 64108
sum(table_us_year)
## [1] 1849052
summary(us_merged)
## datasource sourceid latitude longitude year mon
## fpafod:1727476 Min. : 1 Min. :17.94 Min. :-178.80 Min. :1986 Min. : 1
## nifc : 121576 1st Qu.: 359278 1st Qu.:32.99 1st Qu.:-111.48 1st Qu.:1996 1st Qu.: 4
## Median : 908962 Median :35.62 Median : -92.91 Median :2002 Median : 6
## Mean : 30072960 Mean :37.01 Mean : -96.31 Mean :2002 Mean : 6
## 3rd Qu.: 1663336 3rd Qu.:41.05 3rd Qu.: -82.54 3rd Qu.:2008 3rd Qu.: 8
## Max. :201940182 Max. :70.14 Max. : -65.26 Max. :2013 Max. :12
##
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.00 Min. : 1.0 Min. : 0.00 Min. : 0.000 Min. :1.000 Min. : 0.000
## 1st Qu.: 8.00 1st Qu.: 92.0 1st Qu.: 0.04 1st Qu.: 2.000 1st Qu.:2.000 1st Qu.: 3.000
## Median :15.00 Median :170.0 Median : 0.40 Median : 5.000 Median :2.000 Median : 5.000
## Mean :15.52 Mean :166.9 Mean : 32.24 Mean : 5.728 Mean :1.935 Mean : 5.043
## 3rd Qu.:23.00 3rd Qu.:231.0 3rd Qu.: 1.29 3rd Qu.: 9.000 3rd Qu.:2.000 3rd Qu.: 8.000
## Max. :31.00 Max. :366.0 Max. :245622.14 Max. :13.000 Max. :3.000 Max. :10.000
##
## stateprov agency
## CA : 196976 ST/C&L :1254551
## GA : 163298 FS : 206731
## TX : 126100 BIA : 128871
## NC : 105113 BLM : 112063
## FL : 87589 USFS : 71387
## AZ : 79969 NPS : 28050
## (Other):1090007 (Other): 47399
str(us_merged)
## 'data.frame': 1849052 obs. of 14 variables:
## $ datasource : Factor w/ 2 levels "fpafod","nifc": 1 1 1 1 1 1 1 1 1 1 ...
## $ sourceid : int 1 2 3 4 5 6 7 8 9 10 ...
## $ latitude : num 40 38.9 39 38.6 38.6 ...
## $ longitude : num -121 -120 -121 -120 -120 ...
## $ year : num 2005 2004 2004 2004 2004 ...
## $ mon : num 2 5 5 6 6 6 7 3 3 7 ...
## $ day : num 2 12 31 28 28 30 1 8 15 1 ...
## $ daynum : num 33 133 152 180 180 182 183 67 74 183 ...
## $ area_ha : num 0.0405 0.1012 0.0405 0.0405 0.0405 ...
## $ cause_original: num 9 1 5 1 1 1 1 5 5 1 ...
## $ cause1 : num 2 1 2 1 1 1 1 2 2 1 ...
## $ cause2 : num 8 1 5 1 1 1 1 5 5 1 ...
## $ stateprov : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ agency : Factor w/ 12 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/us_1986-2013.csv"
write.table(us_merged, outfilename, sep=",", row.names=FALSE)
save(us_merged, file="e:/Projects/fire/DailyFireStarts/data/RData/us_1986-2013.RData")