In this analysis, we combine the U.S. FPA-FOD and NIFC data sets to create a merged data set spanning the interval 1986-2014. There are two issues that arise in doing that: 1) creating a common set of variables (that will also be used in merging the U.S. and CNFDB (Canadian) data sets), and 2) reconciling the different fire-start “cause” codes in the two data sets.
The two data sets use different systems for indentifying the causes of the individual fire starts. The FPA-FOD data include 13 categories of causes (in the numeric variable STAT_CAUSE_CODE), while the NIFC data contain 10 categories (in the factor variable GENERAL_CA). The categories do not have a one-to-one relationship, but are close. We created two new cause categorizations, cause1 and cause2 where cause1 is a coarse-resolution categorization (lightning/natural, human, and unknown), and cause2 is a finer, 10-category list of causes:

The specific remappings of causes to the Merged set (cause1 and cause2) are shown in the code below, but can also be inferred from the table. Note that this system also accommodates the CNFDB data, which contains only a course-resolution categorization of causes
There are two main steps in creating the merged data set, including 1) creating a dataframe with the set of common variables to be written out, 2) recoding the cause codes to the cause1 and cause2 variables.
Load the cleaned-up data. (These are the “working” .RData data sets that were created in an earlier step.)
load("e:/Projects/fire/DailyFireStarts/data/RData/fpafod.RData")
load("e:/Projects/fire/DailyFireStarts/data/RData/nifc.RData")List the variables in the two different data sets.
str(fpafod)## 'data.frame': 1727476 obs. of 15 variables:
## $ FOD_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ NWCG_REPORTING_AGENCY: Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ FIRE_YEAR : int 2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
## $ DISCOVERY_DATE : Date, format: "2005-02-02" "2004-05-12" "2004-05-31" ...
## $ DISCOVERY_DOY : int 33 133 152 180 180 182 183 67 74 183 ...
## $ STAT_CAUSE_CODE : num 9 1 5 1 1 1 1 5 5 1 ...
## $ CONT_DATE : POSIXct, format: "2005-02-02" "2004-05-12" "2004-05-31" ...
## $ CONT_DOY : int 33 133 152 185 185 183 184 67 74 184 ...
## $ FIRE_SIZE : num 0.1 0.25 0.1 0.1 0.1 0.1 0.1 0.8 1 0.1 ...
## $ LATITUDE : num 40 38.9 39 38.6 38.6 ...
## $ LONGITUDE : num -121 -120 -121 -120 -120 ...
## $ STATE : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ startday : num 2 12 31 28 28 30 1 8 15 1 ...
## $ startmon : num 2 5 5 6 6 6 7 3 3 7 ...
## $ AREA_HA : num 0.0405 0.1012 0.0405 0.0405 0.0405 ...
str(nifc)## 'data.frame': 405124 obs. of 22 variables:
## $ AGENCY_COD : Factor w/ 5 levels "BIA","BLM","FWS",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ UNIT_ID : Factor w/ 881 levels "1002","1003",..: 361 361 361 361 361 361 361 361 361 361 ...
## $ FIRE_TYPE : Factor w/ 27 levels "0","1","11","12",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NUMBE : Factor w/ 31568 levels "0","1","10","100",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NAME : Factor w/ 151091 levels "''67''","''67'' 2",..: NA NA NA NA NA NA NA NA NA NA ...
## $ STATE : Factor w/ 51 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ DATE_DISCO : Factor w/ 8553 levels "19140716","19150424",..: 1241 1244 1247 1249 1252 1260 1264 1492 1497 1497 ...
## $ DATE_CONTR : Factor w/ 8577 levels "02031231","19800000",..: NA NA NA NA NA NA NA NA NA NA ...
## $ GENERAL_CA : Factor w/ 10 levels "0","1","2","3",..: 3 3 3 4 3 3 2 2 2 2 ...
## $ SPECIFIC_C : Factor w/ 33 levels "0","1","10","11",..: 33 33 33 3 33 32 2 2 2 2 ...
## $ YEAR_DISCO : int 1983 1983 1983 1983 1983 1983 1983 1984 1984 1984 ...
## $ LATITUDE : num 44.3 44.5 43.9 44 43.8 ...
## $ LONGITUDE : num -119 -119 -119 -119 -119 ...
## $ ACRES_CONT : num 0.1 1 0.1 1 0.1 1 0.1 0.1 0.1 0.1 ...
## $ SIZE_CLASS : Factor w/ 7 levels "A","B","C","D",..: 1 2 1 2 1 2 1 1 1 1 ...
## $ GEOGRAPHIC : Factor w/ 11 levels "Alaska","Eastern",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ STARTDATED : Date, format: "1983-10-05" "1983-10-08" "1983-10-11" ...
## $ YEAR : num 1983 1983 1983 1983 1983 ...
## $ startday : num 5 8 11 13 16 24 29 18 23 23 ...
## $ startmon : num 10 10 10 10 10 10 10 7 7 7 ...
## $ startdaynum: num 278 281 284 286 289 297 302 200 205 205 ...
## $ AREA : num 0.0405 0.4047 0.0405 0.4047 0.0405 ...
## - attr(*, "data_types")= chr "C" "C" "C" "C" ...
nobs <- length(fpafod[,1])
# direct copy
datasource <- rep("fpafod",nobs)
head(datasource)## [1] "fpafod" "fpafod" "fpafod" "fpafod" "fpafod" "fpafod"
sourceid <- as.integer(fpafod$FOD_ID)
head(sourceid)## [1] 1 2 3 4 5 6
latitude <- fpafod$LATITUDE
head(latitude)## [1] 40.03694 38.93306 38.98417 38.55917 38.55917 38.63528
longitude <- fpafod$LONGITUDE
head(longitude)## [1] -121.0058 -120.4044 -120.7356 -119.9133 -119.9331 -120.1036
year <- fpafod$FIRE_YEAR
head(year)## [1] 2005 2004 2004 2004 2004 2004
mon <- fpafod$startmon
head(mon)## [1] 2 5 5 6 6 6
day <- fpafod$startday
head(day)## [1] 2 12 31 28 28 30
daynum <- fpafod$DISCOVERY_DOY
head(daynum)## [1] 33 133 152 180 180 182
area_ha <- fpafod$AREA_HA
head(area_ha)## [1] 0.0404686 0.1011715 0.0404686 0.0404686 0.0404686 0.0404686
cause_original <- as.integer(fpafod$STAT_CAUSE_CODE)
head(cause_original)## [1] 9 1 5 1 1 1
stateprov <- as.character(fpafod$STATE)
head(stateprov)## [1] "CA" "CA" "CA" "CA" "CA" "CA"
agency <- as.character(fpafod$NWCG_REPORTING_AGENCY)
head(agency)## [1] "FS" "FS" "FS" "FS" "FS" "FS"
# fill cause1 and cause2 with 0's
cause1 <- rep(0,nobs)
cause2 <- rep(0,nobs)# make data frame
fpafod_out <- data.frame(datasource, sourceid, latitude, longitude, year, mon, day, daynum,
area_ha, cause_original, cause1, cause2, stateprov, agency)
summary(fpafod_out)## datasource sourceid latitude longitude year mon
## fpafod:1727476 Min. : 1 Min. :17.94 Min. :-178.80 Min. :1992 Min. : 1.000
## 1st Qu.: 465673 1st Qu.:32.83 1st Qu.:-109.83 1st Qu.:1998 1st Qu.: 3.000
## Median : 985582 Median :35.40 Median : -91.18 Median :2003 Median : 6.000
## Mean : 32179232 Mean :36.79 Mean : -95.29 Mean :2003 Mean : 5.935
## 3rd Qu.: 1761114 3rd Qu.:40.77 3rd Qu.: -82.25 3rd Qu.:2008 3rd Qu.: 8.000
## Max. :201940182 Max. :70.14 Max. : -65.26 Max. :2013 Max. :12.000
##
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.0 Min. : 1.0 Min. : 0.00 Min. : 1.000 Min. :0 Min. :0
## 1st Qu.: 8.0 1st Qu.: 89.0 1st Qu.: 0.04 1st Qu.: 3.000 1st Qu.:0 1st Qu.:0
## Median :15.0 Median :164.0 Median : 0.40 Median : 5.000 Median :0 Median :0
## Mean :15.5 Mean :164.9 Mean : 29.55 Mean : 5.921 Mean :0 Mean :0
## 3rd Qu.:23.0 3rd Qu.:230.0 3rd Qu.: 1.45 3rd Qu.: 9.000 3rd Qu.:0 3rd Qu.:0
## Max. :31.0 Max. :366.0 Max. :245622.14 Max. :13.000 Max. :0 Max. :0
##
## stateprov agency
## CA :173634 ST/C&L :1254551
## GA :162479 FS : 206731
## TX :125227 BIA : 108423
## NC :104263 BLM : 90801
## FL : 85576 IA : 21841
## SC : 78127 NPS : 19571
## (Other):998170 (Other): 25558
Recode cause_original into new causes (cause1, cause2)
# STAT_CAUSE_CODE
# 1 Lightning; 2 Equipment Use; 3 Smoking; 4 Campfire; 5 Debris Burning; 6 Railroad; 7 Arson; 8 Children;
# 9 Miscellaneous; 10 Fireworks; 11 Power Line; 12 Structure; 13 Missing/Undefined
# cause1
# 1 Lightning; 2 Human; 3 Unknown
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
fpafod_out$cause1[fpafod_out$cause_original == 1] <- 1
fpafod_out$cause1[fpafod_out$cause_original != 1 & fpafod_out$cause_original != 13] <- 2
fpafod_out$cause1[fpafod_out$cause_original == 13] <- 3
fpafod_out$cause2[fpafod_out$cause_original == 1] <- 1
fpafod_out$cause2[fpafod_out$cause_original == 2] <- 2
fpafod_out$cause2[fpafod_out$cause_original == 3] <- 3
fpafod_out$cause2[fpafod_out$cause_original == 4] <- 4
fpafod_out$cause2[fpafod_out$cause_original == 5] <- 5
fpafod_out$cause2[fpafod_out$cause_original == 6] <- 6
fpafod_out$cause2[fpafod_out$cause_original == 7] <- 5
fpafod_out$cause2[fpafod_out$cause_original == 8] <- 7
fpafod_out$cause2[fpafod_out$cause_original == 9] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 10] <-9
fpafod_out$cause2[fpafod_out$cause_original == 11] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 12] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 13] <- 10Compare cause classifications
# STAT_CAUSE_CODE
# 1 Lightning; 2 Equipment Use; 3 Smoking; 4 Campfire; 5 Debris Burning; 6 Railroad; 7 Arson; 8 Children;
# 9 Miscellaneous; 10 Fireworks; 11 Power Line; 12 Structure; 13 Missing/Undefined
table(fpafod$STAT_CAUSE_CODE)##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 260311 137575 49633 68684 390702 32569 267987 58354 291678 10350 11315 3178 145140
# cause1
# 1 Lightning; 2 Human; 3 Unknown
table(fpafod_out$cause1)##
## 1 2 3
## 260311 1322025 145140
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
table(fpafod_out$cause2)##
## 1 2 3 4 5 6 7 8 9 10
## 260311 137575 49633 68684 658689 32569 58354 306171 10350 145140
str(fpafod_out)## 'data.frame': 1727476 obs. of 14 variables:
## $ datasource : Factor w/ 1 level "fpafod": 1 1 1 1 1 1 1 1 1 1 ...
## $ sourceid : int 1 2 3 4 5 6 7 8 9 10 ...
## $ latitude : num 40 38.9 39 38.6 38.6 ...
## $ longitude : num -121 -120 -121 -120 -120 ...
## $ year : int 2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
## $ mon : num 2 5 5 6 6 6 7 3 3 7 ...
## $ day : num 2 12 31 28 28 30 1 8 15 1 ...
## $ daynum : int 33 133 152 180 180 182 183 67 74 183 ...
## $ area_ha : num 0.0405 0.1012 0.0405 0.0405 0.0405 ...
## $ cause_original: int 9 1 5 1 1 1 1 5 5 1 ...
## $ cause1 : num 2 1 2 1 1 1 1 2 2 1 ...
## $ cause2 : num 8 1 5 1 1 1 1 5 5 1 ...
## $ stateprov : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ agency : Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
Write out a .csv file
merged_data_path <- "e:/Projects/fire/DailyFireStarts/data/MergedData/"
outfilename <- "fpafod_1992-2013.csv"
write.table(fpafod_out, paste(merged_data_path, outfilename, sep=""), row.names=FALSE)load("e:/Projects/fire/DailyFireStarts/data/RData/nifc.RData")str(nifc)## 'data.frame': 405124 obs. of 22 variables:
## $ AGENCY_COD : Factor w/ 5 levels "BIA","BLM","FWS",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ UNIT_ID : Factor w/ 881 levels "1002","1003",..: 361 361 361 361 361 361 361 361 361 361 ...
## $ FIRE_TYPE : Factor w/ 27 levels "0","1","11","12",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NUMBE : Factor w/ 31568 levels "0","1","10","100",..: NA NA NA NA NA NA NA NA NA NA ...
## $ FIRE_NAME : Factor w/ 151091 levels "''67''","''67'' 2",..: NA NA NA NA NA NA NA NA NA NA ...
## $ STATE : Factor w/ 51 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ DATE_DISCO : Factor w/ 8553 levels "19140716","19150424",..: 1241 1244 1247 1249 1252 1260 1264 1492 1497 1497 ...
## $ DATE_CONTR : Factor w/ 8577 levels "02031231","19800000",..: NA NA NA NA NA NA NA NA NA NA ...
## $ GENERAL_CA : Factor w/ 10 levels "0","1","2","3",..: 3 3 3 4 3 3 2 2 2 2 ...
## $ SPECIFIC_C : Factor w/ 33 levels "0","1","10","11",..: 33 33 33 3 33 32 2 2 2 2 ...
## $ YEAR_DISCO : int 1983 1983 1983 1983 1983 1983 1983 1984 1984 1984 ...
## $ LATITUDE : num 44.3 44.5 43.9 44 43.8 ...
## $ LONGITUDE : num -119 -119 -119 -119 -119 ...
## $ ACRES_CONT : num 0.1 1 0.1 1 0.1 1 0.1 0.1 0.1 0.1 ...
## $ SIZE_CLASS : Factor w/ 7 levels "A","B","C","D",..: 1 2 1 2 1 2 1 1 1 1 ...
## $ GEOGRAPHIC : Factor w/ 11 levels "Alaska","Eastern",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ STARTDATED : Date, format: "1983-10-05" "1983-10-08" "1983-10-11" ...
## $ YEAR : num 1983 1983 1983 1983 1983 ...
## $ startday : num 5 8 11 13 16 24 29 18 23 23 ...
## $ startmon : num 10 10 10 10 10 10 10 7 7 7 ...
## $ startdaynum: num 278 281 284 286 289 297 302 200 205 205 ...
## $ AREA : num 0.0405 0.4047 0.0405 0.4047 0.0405 ...
## - attr(*, "data_types")= chr "C" "C" "C" "C" ...
summary(nifc)## AGENCY_COD UNIT_ID FIRE_TYPE FIRE_NUMBE FIRE_NAME STATE
## BIA : 91480 304 : 8420 11 :125642 1 : 1872 FA 1 : 550 CA : 63643
## BLM : 88307 H50H58 : 6958 13 : 14425 2 : 1696 FA 2 : 512 AZ : 55535
## FWS : 0 H50H52 : 6385 16 : 14118 3 : 1556 FA 3 : 473 OR : 38566
## NPS : 26809 AKAFS : 5910 51 : 10310 4 : 1457 FA 4 : 426 ID : 29029
## USFS:198437 312 : 5387 15 : 9658 5 : 1378 FA 5 : 412 MT : 28450
## NA's: 91 (Other):371973 (Other): 32443 (Other):198637 (Other):286270 (Other):189810
## NA's : 91 NA's :198528 NA's :198528 NA's :116481 NA's : 91
## DATE_DISCO DATE_CONTR GENERAL_CA SPECIFIC_C YEAR_DISCO LATITUDE
## 19890726: 1078 19940000: 1554 1 :182663 30 :144501 Min. :1981 Min. :19.27
## 19860810: 861 19920000: 1422 9 : 44554 1 : 76258 1st Qu.:1989 1st Qu.:35.31
## 19870830: 852 19930000: 1396 5 : 42126 0 : 72892 Median :1994 Median :39.89
## 19890808: 698 19960000: 1393 4 : 33951 19 : 15034 Mean :1994 Mean :40.26
## 19940723: 663 19900000: 1307 2 : 31932 27 : 11503 3rd Qu.:1999 3rd Qu.:44.45
## (Other) :400881 (Other) :397184 (Other): 69807 (Other): 84845 Max. :2003 Max. :69.85
## NA's : 91 NA's : 868 NA's : 91 NA's : 91 NA's :91 NA's :91
## LONGITUDE ACRES_CONT SIZE_CLASS GEOGRAPHIC STARTDATED
## Min. :-176.67 Min. : 0.0 A :217738 Southwest : 78712 Min. :1981-01-01
## 1st Qu.:-118.45 1st Qu.: 0.1 B :130527 Northwest : 51789 1st Qu.:1989-07-26
## Median :-112.00 Median : 0.2 C : 35503 Northern Rockies : 47645 Median :1994-07-22
## Mean :-110.62 Mean : 178.5 D : 8481 Rocky Mountain : 41947 Mean :1994-05-03
## 3rd Qu.:-106.68 3rd Qu.: 2.0 E : 6193 Eastern Great Basin: 41205 3rd Qu.:1999-07-09
## Max. : -67.06 Max. :606945.0 (Other): 6591 (Other) :143735 Max. :2003-12-31
## NA's :91 NA's :91 NA's : 91 NA's : 91 NA's :188
## YEAR startday startmon startdaynum AREA
## Min. :1981 Min. : 1.00 Min. : 1.000 Min. : 1.0 Min. : 0.00
## 1st Qu.:1989 1st Qu.: 8.00 1st Qu.: 6.000 1st Qu.:159.0 1st Qu.: 0.04
## Median :1994 Median :16.00 Median : 7.000 Median :201.0 Median : 0.08
## Mean :1994 Mean :15.73 Mean : 6.851 Mean :192.9 Mean : 72.23
## 3rd Qu.:1999 3rd Qu.:24.00 3rd Qu.: 8.000 3rd Qu.:231.0 3rd Qu.: 0.81
## Max. :2003 Max. :31.00 Max. :12.000 Max. :366.0 Max. :245622.14
## NA's :188 NA's :188 NA's :188 NA's :188 NA's :91
Remove 210 records with complete NAs
nifc <- nifc[is.na(nifc$LATITUDE) == FALSE,]
summary(nifc)## AGENCY_COD UNIT_ID FIRE_TYPE FIRE_NUMBE FIRE_NAME STATE
## BIA : 91480 304 : 8420 11 :125642 1 : 1872 FA 1 : 550 CA : 63643
## BLM : 88307 H50H58 : 6958 13 : 14425 2 : 1696 FA 2 : 512 AZ : 55535
## FWS : 0 H50H52 : 6385 16 : 14118 3 : 1556 FA 3 : 473 OR : 38566
## NPS : 26809 AKAFS : 5910 51 : 10310 4 : 1457 FA 4 : 426 ID : 29029
## USFS:198437 312 : 5387 15 : 9658 5 : 1378 FA 5 : 412 MT : 28450
## F50F52 : 5282 (Other): 32443 (Other):198637 (Other):286270 NM : 22249
## (Other):366691 NA's :198437 NA's :198437 NA's :116390 (Other):167561
## DATE_DISCO DATE_CONTR GENERAL_CA SPECIFIC_C YEAR_DISCO LATITUDE
## 19890726: 1078 19940000: 1554 1 :182663 30 :144501 Min. :1981 Min. :19.27
## 19860810: 861 19920000: 1422 9 : 44554 1 : 76258 1st Qu.:1989 1st Qu.:35.31
## 19870830: 852 19930000: 1396 5 : 42126 0 : 72892 Median :1994 Median :39.89
## 19890808: 698 19960000: 1393 4 : 33951 19 : 15034 Mean :1994 Mean :40.26
## 19940723: 663 19900000: 1307 2 : 31932 27 : 11503 3rd Qu.:1999 3rd Qu.:44.45
## 19960813: 653 (Other) :397184 0 : 22246 8 : 11138 Max. :2003 Max. :69.85
## (Other) :400228 NA's : 777 (Other): 47561 (Other): 73707
## LONGITUDE ACRES_CONT SIZE_CLASS GEOGRAPHIC STARTDATED
## Min. :-176.67 Min. : 0.0 A:217738 Southwest : 78712 Min. :1981-01-01
## 1st Qu.:-118.45 1st Qu.: 0.1 B:130527 Northwest : 51789 1st Qu.:1989-07-26
## Median :-112.00 Median : 0.2 C: 35503 Northern Rockies : 47645 Median :1994-07-22
## Mean :-110.62 Mean : 178.5 D: 8481 Rocky Mountain : 41947 Mean :1994-05-03
## 3rd Qu.:-106.68 3rd Qu.: 2.0 E: 6193 Eastern Great Basin: 41205 3rd Qu.:1999-07-09
## Max. : -67.06 Max. :606945.0 F: 4336 Southern : 35210 Max. :2003-12-31
## G: 2255 (Other) :108525 NA's :97
## YEAR startday startmon startdaynum AREA
## Min. :1981 Min. : 1.00 Min. : 1.000 Min. : 1.0 Min. : 0.00
## 1st Qu.:1989 1st Qu.: 8.00 1st Qu.: 6.000 1st Qu.:159.0 1st Qu.: 0.04
## Median :1994 Median :16.00 Median : 7.000 Median :201.0 Median : 0.08
## Mean :1994 Mean :15.73 Mean : 6.851 Mean :192.9 Mean : 72.23
## 3rd Qu.:1999 3rd Qu.:24.00 3rd Qu.: 8.000 3rd Qu.:231.0 3rd Qu.: 0.81
## Max. :2003 Max. :31.00 Max. :12.000 Max. :366.0 Max. :245622.14
## NA's :97 NA's :97 NA's :97 NA's :97
Create new (common) variables
nobs <- length(nifc[,1])
# direct copy
datasource <- rep("nifc",nobs)
head(datasource)## [1] "nifc" "nifc" "nifc" "nifc" "nifc" "nifc"
sourceid <- as.integer(seq(1,nobs,by=1))
head(sourceid)## [1] 1 2 3 4 5 6
latitude <- nifc$LATITUDE
head(latitude)## [1] 44.28000 44.55000 43.87667 44.03333 43.79333 44.28333
longitude <- nifc$LONGITUDE
head(longitude)## [1] -118.9167 -118.9167 -119.3683 -118.7467 -118.9700 -118.9217
year <- nifc$YEAR
head(year)## [1] 1983 1983 1983 1983 1983 1983
mon <- nifc$startmon
head(mon)## [1] 10 10 10 10 10 10
day <- nifc$startday
head(day)## [1] 5 8 11 13 16 24
daynum <- nifc$startdaynum
head(daynum)## [1] 278 281 284 286 289 297
area_ha <- nifc$AREA
head(area_ha)## [1] 0.0404686 0.4046860 0.0404686 0.4046860 0.0404686 0.4046860
cause_original <- as.numeric(nifc$GENERAL_CA)
head(cause_original)## [1] 3 3 3 4 3 3
stateprov <- as.character(nifc$STATE)
head(stateprov)## [1] "OR" "OR" "OR" "OR" "OR" "OR"
agency <- as.character(nifc$AGENCY_COD)
head(agency)## [1] "USFS" "USFS" "USFS" "USFS" "USFS" "USFS"
# fill cause1 and cause2 with 0's
cause1 <- rep(0,nobs)
cause2 <- rep(0,nobs)# make data frame
nifc_out <- data.frame(datasource, sourceid, latitude, longitude, year, mon, day, daynum,
area_ha, cause_original, cause1, cause2, stateprov, agency)
summary(nifc_out)## datasource sourceid latitude longitude year mon
## nifc:405033 Min. : 1 Min. :19.27 Min. :-176.67 Min. :1981 Min. : 1.000
## 1st Qu.:101259 1st Qu.:35.31 1st Qu.:-118.45 1st Qu.:1989 1st Qu.: 6.000
## Median :202517 Median :39.89 Median :-112.00 Median :1994 Median : 7.000
## Mean :202517 Mean :40.26 Mean :-110.62 Mean :1994 Mean : 6.851
## 3rd Qu.:303775 3rd Qu.:44.45 3rd Qu.:-106.68 3rd Qu.:1999 3rd Qu.: 8.000
## Max. :405033 Max. :69.85 Max. : -67.06 Max. :2003 Max. :12.000
## NA's :97 NA's :97
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.00 Min. : 1.0 Min. : 0.00 Min. : 1.000 Min. :0 Min. :0
## 1st Qu.: 8.00 1st Qu.:159.0 1st Qu.: 0.04 1st Qu.: 2.000 1st Qu.:0 1st Qu.:0
## Median :16.00 Median :201.0 Median : 0.08 Median : 2.000 Median :0 Median :0
## Mean :15.73 Mean :192.9 Mean : 72.23 Mean : 4.164 Mean :0 Mean :0
## 3rd Qu.:24.00 3rd Qu.:231.0 3rd Qu.: 0.81 3rd Qu.: 6.000 3rd Qu.:0 3rd Qu.:0
## Max. :31.00 Max. :366.0 Max. :245622.14 Max. :10.000 Max. :0 Max. :0
## NA's :97 NA's :97
## stateprov agency
## CA : 63643 BIA : 91480
## AZ : 55535 BLM : 88307
## OR : 38566 NPS : 26809
## ID : 29029 USFS:198437
## MT : 28450
## NM : 22249
## (Other):167561
Recode cause_original into new causes (cause1, cause2)
# GENERAL_CA
# 1 Natural; 2 Campfire; 3 Smoking; 4 Fire use; 5 Incendiary; 6 Equipment
# 7 Railroads; 8 Juveniles; 9 Miscellaneous; 0 Unknown
# cause1
# 1 Lightning; 2 Human; 3 Unknown
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
nifc_out$cause1[nifc_out$cause_original == 1] <- 1
nifc_out$cause1[nifc_out$cause_original != 1 & nifc_out$cause_original != 0] <- 2
nifc_out$cause1[nifc_out$cause_original == 0] <- 3
nifc_out$cause2[nifc_out$cause_original == 1] <- 1
nifc_out$cause2[nifc_out$cause_original == 2] <- 4
nifc_out$cause2[nifc_out$cause_original == 3] <- 3
nifc_out$cause2[nifc_out$cause_original == 4] <- 5
nifc_out$cause2[nifc_out$cause_original == 5] <- 9
nifc_out$cause2[nifc_out$cause_original == 6] <- 2
nifc_out$cause2[nifc_out$cause_original == 7] <- 6
nifc_out$cause2[nifc_out$cause_original == 8] <- 7
nifc_out$cause2[nifc_out$cause_original == 9] <- 8
nifc_out$cause2[nifc_out$cause_original == 0] <- 10
nifc_out$cause_original <- as.numeric(nifc_out$cause_original) - 1Compare cause classifications
# GENERAL_CA
# 1 Natural; 2 Campfire; 3 Smoking; 4 Fire use; 5 Incendiary; 6 Equipment
# 7 Railroads; 8 Juveniles; 9 Miscellaneous; 0 Unknown
table(nifc$GENERAL_CA)##
## 0 1 2 3 4 5 6 7 8 9
## 22246 182663 31932 11672 33951 42126 15596 3357 16936 44554
table(nifc_out$cause_original)##
## 0 1 2 3 4 5 6 7 8 9
## 22246 182663 31932 11672 33951 42126 15596 3357 16936 44554
# cause1
# 1 Lightning; 2 Human; 3 Unknown
table(nifc_out$cause1)##
## 1 2
## 22246 382787
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
table(nifc_out$cause2)##
## 0 1 2 3 4 5 6 7 8 9
## 44554 22246 42126 31932 182663 11672 15596 3357 16936 33951
str(nifc_out)## 'data.frame': 405033 obs. of 14 variables:
## $ datasource : Factor w/ 1 level "nifc": 1 1 1 1 1 1 1 1 1 1 ...
## $ sourceid : int 1 2 3 4 5 6 7 8 9 10 ...
## $ latitude : num 44.3 44.5 43.9 44 43.8 ...
## $ longitude : num -119 -119 -119 -119 -119 ...
## $ year : num 1983 1983 1983 1983 1983 ...
## $ mon : num 10 10 10 10 10 10 10 7 7 7 ...
## $ day : num 5 8 11 13 16 24 29 18 23 23 ...
## $ daynum : num 278 281 284 286 289 297 302 200 205 205 ...
## $ area_ha : num 0.0405 0.4047 0.0405 0.4047 0.0405 ...
## $ cause_original: num 2 2 2 3 2 2 1 1 1 1 ...
## $ cause1 : num 2 2 2 2 2 2 2 2 2 2 ...
## $ cause2 : num 3 3 3 5 3 3 4 4 4 4 ...
## $ stateprov : Factor w/ 50 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ agency : Factor w/ 4 levels "BIA","BLM","NPS",..: 4 4 4 4 4 4 4 4 4 4 ...
# Recode area_ha NA's to 0.0
nifc_out$area_ha[is.na(nifc_out$area_ha) == TRUE] <- 0
summary(nifc_out$area_ha)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.04 0.08 72.23 0.81 245622.14
Write out a .csv file
outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/nifc_1981-2003.csv"
write.table(nifc_out, outfilename, sep=",", row.names=FALSE)Select just the records between 1986 and 1991
nifc2_out <- subset(nifc_out, nifc_out$year >= 1986 & nifc_out$year <= 1991)
length(nifc2_out[,1])## [1] 121576
table(nifc2_out$year)##
## 1986 1987 1988 1989 1990 1991
## 17760 21873 21950 19857 20606 19530
sum(table(nifc2_out$year))## [1] 121576
summary(nifc2_out)## datasource sourceid latitude longitude year mon
## nifc:121576 Min. : 22 Min. :19.30 Min. :-176.67 Min. :1986 Min. : 1.000
## 1st Qu.: 32583 1st Qu.:35.42 1st Qu.:-119.09 1st Qu.:1987 1st Qu.: 6.000
## Median : 63529 Median :39.83 Median :-112.80 Median :1988 Median : 7.000
## Mean :144894 Mean :40.26 Mean :-110.85 Mean :1989 Mean : 6.928
## 3rd Qu.:269221 3rd Qu.:44.47 3rd Qu.:-106.72 3rd Qu.:1990 3rd Qu.: 8.000
## Max. :405004 Max. :69.63 Max. : -68.22 Max. :1991 Max. :12.000
##
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.0 Min. : 1.0 Min. : 0.00 Min. :0.000 Min. :1.000 Min. :0.0
## 1st Qu.: 8.0 1st Qu.:165.0 1st Qu.: 0.04 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:3.0
## Median :15.0 Median :203.0 Median : 0.08 Median :1.000 Median :2.000 Median :4.0
## Mean :15.8 Mean :195.2 Mean : 70.54 Mean :2.987 Mean :1.956 Mean :3.9
## 3rd Qu.:24.0 3rd Qu.:232.0 3rd Qu.: 0.81 3rd Qu.:5.000 3rd Qu.:2.000 3rd Qu.:4.0
## Max. :31.0 Max. :365.0 Max. :219028.61 Max. :9.000 Max. :2.000 Max. :9.0
##
## stateprov agency
## CA :23342 BIA :20448
## AZ :14770 BLM :21262
## OR :12066 NPS : 8479
## ID : 9392 USFS:71387
## MT : 8006
## NM : 5744
## (Other):48256
Write out a .csv file
outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/nifc_1986-1991.csv"
write.table(nifc2_out, outfilename, sep=",", row.names=FALSE)us_merged <- rbind(fpafod_out,nifc2_out)
table_us_year <- table(us_merged$year)
table_us_year##
## 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
## 17760 21873 21950 19857 20606 19530 67964 62022 75989 71496 75604 61472 68388 89398 96454
## 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
## 86069 75136 67380 68616 87391 113242 94681 84654 77262 78485 89897 71768 64108
sum(table_us_year)## [1] 1849052
summary(us_merged)## datasource sourceid latitude longitude year mon
## fpafod:1727476 Min. : 1 Min. :17.94 Min. :-178.80 Min. :1986 Min. : 1
## nifc : 121576 1st Qu.: 359278 1st Qu.:32.99 1st Qu.:-111.48 1st Qu.:1996 1st Qu.: 4
## Median : 908962 Median :35.62 Median : -92.91 Median :2002 Median : 6
## Mean : 30072960 Mean :37.01 Mean : -96.31 Mean :2002 Mean : 6
## 3rd Qu.: 1663336 3rd Qu.:41.05 3rd Qu.: -82.54 3rd Qu.:2008 3rd Qu.: 8
## Max. :201940182 Max. :70.14 Max. : -65.26 Max. :2013 Max. :12
##
## day daynum area_ha cause_original cause1 cause2
## Min. : 1.00 Min. : 1.0 Min. : 0.00 Min. : 0.000 Min. :1.000 Min. : 0.000
## 1st Qu.: 8.00 1st Qu.: 92.0 1st Qu.: 0.04 1st Qu.: 2.000 1st Qu.:2.000 1st Qu.: 3.000
## Median :15.00 Median :170.0 Median : 0.40 Median : 5.000 Median :2.000 Median : 5.000
## Mean :15.52 Mean :166.9 Mean : 32.24 Mean : 5.728 Mean :1.935 Mean : 5.043
## 3rd Qu.:23.00 3rd Qu.:231.0 3rd Qu.: 1.29 3rd Qu.: 9.000 3rd Qu.:2.000 3rd Qu.: 8.000
## Max. :31.00 Max. :366.0 Max. :245622.14 Max. :13.000 Max. :3.000 Max. :10.000
##
## stateprov agency
## CA : 196976 ST/C&L :1254551
## GA : 163298 FS : 206731
## TX : 126100 BIA : 128871
## NC : 105113 BLM : 112063
## FL : 87589 USFS : 71387
## AZ : 79969 NPS : 28050
## (Other):1090007 (Other): 47399
str(us_merged)## 'data.frame': 1849052 obs. of 14 variables:
## $ datasource : Factor w/ 2 levels "fpafod","nifc": 1 1 1 1 1 1 1 1 1 1 ...
## $ sourceid : int 1 2 3 4 5 6 7 8 9 10 ...
## $ latitude : num 40 38.9 39 38.6 38.6 ...
## $ longitude : num -121 -120 -121 -120 -120 ...
## $ year : num 2005 2004 2004 2004 2004 ...
## $ mon : num 2 5 5 6 6 6 7 3 3 7 ...
## $ day : num 2 12 31 28 28 30 1 8 15 1 ...
## $ daynum : num 33 133 152 180 180 182 183 67 74 183 ...
## $ area_ha : num 0.0405 0.1012 0.0405 0.0405 0.0405 ...
## $ cause_original: num 9 1 5 1 1 1 1 5 5 1 ...
## $ cause1 : num 2 1 2 1 1 1 1 2 2 1 ...
## $ cause2 : num 8 1 5 1 1 1 1 5 5 1 ...
## $ stateprov : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ agency : Factor w/ 12 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/us_1986-2013.csv"
write.table(us_merged, outfilename, sep=",", row.names=FALSE)save(us_merged, file="e:/Projects/fire/DailyFireStarts/data/RData/us_1986-2013.RData")