1 Merging the FPA-FOD and NIFC data

In this analysis, we combine the U.S. FPA-FOD and NIFC data sets to create a merged data set spanning the interval 1986-2014. There are two issues that arise in doing that: 1) creating a common set of variables (that will also be used in merging the U.S. and CNFDB (Canadian) data sets), and 2) reconciling the different fire-start “cause” codes in the two data sets.

2 Cause codes

The two data sets use different systems for indentifying the causes of the individual fire starts. The FPA-FOD data include 13 categories of causes (in the numeric variable STAT_CAUSE_CODE), while the NIFC data contain 10 categories (in the factor variable GENERAL_CA). The categories do not have a one-to-one relationship, but are close. We created two new cause categorizations, cause1 and cause2 where cause1 is a coarse-resolution categorization (lightning/natural, human, and unknown), and cause2 is a finer, 10-category list of causes:

The specific remappings of causes to the Merged set (cause1 and cause2) are shown in the code below, but can also be inferred from the table. Note that this system also accommodates the CNFDB data, which contains only a course-resolution categorization of causes

3 Data set preparation

There are two main steps in creating the merged data set, including 1) creating a dataframe with the set of common variables to be written out, 2) recoding the cause codes to the cause1 and cause2 variables.

Load the cleaned-up data. (These are the “working” .RData data sets that were created in an earlier step.)

load("e:/Projects/fire/DailyFireStarts/data/RData/fpafod.RData")
load("e:/Projects/fire/DailyFireStarts/data/RData/nifc.RData")

List the variables in the two different data sets.

str(fpafod)
## 'data.frame':    1727476 obs. of  15 variables:
##  $ FOD_ID               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ NWCG_REPORTING_AGENCY: Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ FIRE_YEAR            : int  2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
##  $ DISCOVERY_DATE       : Date, format: "2005-02-02" "2004-05-12" "2004-05-31" ...
##  $ DISCOVERY_DOY        : int  33 133 152 180 180 182 183 67 74 183 ...
##  $ STAT_CAUSE_CODE      : num  9 1 5 1 1 1 1 5 5 1 ...
##  $ CONT_DATE            : POSIXct, format: "2005-02-02" "2004-05-12" "2004-05-31" ...
##  $ CONT_DOY             : int  33 133 152 185 185 183 184 67 74 184 ...
##  $ FIRE_SIZE            : num  0.1 0.25 0.1 0.1 0.1 0.1 0.1 0.8 1 0.1 ...
##  $ LATITUDE             : num  40 38.9 39 38.6 38.6 ...
##  $ LONGITUDE            : num  -121 -120 -121 -120 -120 ...
##  $ STATE                : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ startday             : num  2 12 31 28 28 30 1 8 15 1 ...
##  $ startmon             : num  2 5 5 6 6 6 7 3 3 7 ...
##  $ AREA_HA              : num  0.0405 0.1012 0.0405 0.0405 0.0405 ...
str(nifc)
## 'data.frame':    405124 obs. of  22 variables:
##  $ AGENCY_COD : Factor w/ 5 levels "BIA","BLM","FWS",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ UNIT_ID    : Factor w/ 881 levels "1002","1003",..: 361 361 361 361 361 361 361 361 361 361 ...
##  $ FIRE_TYPE  : Factor w/ 27 levels "0","1","11","12",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ FIRE_NUMBE : Factor w/ 31568 levels "0","1","10","100",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ FIRE_NAME  : Factor w/ 151091 levels "''67''","''67'' 2",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ STATE      : Factor w/ 51 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
##  $ DATE_DISCO : Factor w/ 8553 levels "19140716","19150424",..: 1241 1244 1247 1249 1252 1260 1264 1492 1497 1497 ...
##  $ DATE_CONTR : Factor w/ 8577 levels "02031231","19800000",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ GENERAL_CA : Factor w/ 10 levels "0","1","2","3",..: 3 3 3 4 3 3 2 2 2 2 ...
##  $ SPECIFIC_C : Factor w/ 33 levels "0","1","10","11",..: 33 33 33 3 33 32 2 2 2 2 ...
##  $ YEAR_DISCO : int  1983 1983 1983 1983 1983 1983 1983 1984 1984 1984 ...
##  $ LATITUDE   : num  44.3 44.5 43.9 44 43.8 ...
##  $ LONGITUDE  : num  -119 -119 -119 -119 -119 ...
##  $ ACRES_CONT : num  0.1 1 0.1 1 0.1 1 0.1 0.1 0.1 0.1 ...
##  $ SIZE_CLASS : Factor w/ 7 levels "A","B","C","D",..: 1 2 1 2 1 2 1 1 1 1 ...
##  $ GEOGRAPHIC : Factor w/ 11 levels "Alaska","Eastern",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ STARTDATED : Date, format: "1983-10-05" "1983-10-08" "1983-10-11" ...
##  $ YEAR       : num  1983 1983 1983 1983 1983 ...
##  $ startday   : num  5 8 11 13 16 24 29 18 23 23 ...
##  $ startmon   : num  10 10 10 10 10 10 10 7 7 7 ...
##  $ startdaynum: num  278 281 284 286 289 297 302 200 205 205 ...
##  $ AREA       : num  0.0405 0.4047 0.0405 0.4047 0.0405 ...
##  - attr(*, "data_types")= chr  "C" "C" "C" "C" ...

3.1 FPA-FOD

3.1.1 FPA-FOD common variables

nobs <- length(fpafod[,1])

# direct copy
datasource <- rep("fpafod",nobs)
head(datasource)
## [1] "fpafod" "fpafod" "fpafod" "fpafod" "fpafod" "fpafod"
sourceid <- as.integer(fpafod$FOD_ID)
head(sourceid)
## [1] 1 2 3 4 5 6
latitude <- fpafod$LATITUDE
head(latitude)
## [1] 40.03694 38.93306 38.98417 38.55917 38.55917 38.63528
longitude <- fpafod$LONGITUDE
head(longitude)
## [1] -121.0058 -120.4044 -120.7356 -119.9133 -119.9331 -120.1036
year <- fpafod$FIRE_YEAR
head(year)
## [1] 2005 2004 2004 2004 2004 2004
mon <- fpafod$startmon
head(mon)
## [1] 2 5 5 6 6 6
day <- fpafod$startday
head(day)
## [1]  2 12 31 28 28 30
daynum <- fpafod$DISCOVERY_DOY
head(daynum)
## [1]  33 133 152 180 180 182
area_ha <- fpafod$AREA_HA
head(area_ha)
## [1] 0.0404686 0.1011715 0.0404686 0.0404686 0.0404686 0.0404686
cause_original <- as.integer(fpafod$STAT_CAUSE_CODE)
head(cause_original)
## [1] 9 1 5 1 1 1
stateprov <- as.character(fpafod$STATE)
head(stateprov)
## [1] "CA" "CA" "CA" "CA" "CA" "CA"
agency <- as.character(fpafod$NWCG_REPORTING_AGENCY)
head(agency)
## [1] "FS" "FS" "FS" "FS" "FS" "FS"
# fill cause1 and cause2 with 0's
cause1 <- rep(0,nobs)
cause2 <- rep(0,nobs)

3.1.2 FPA-FOD data frame

# make data frame
fpafod_out <- data.frame(datasource, sourceid, latitude, longitude, year, mon, day, daynum,
  area_ha, cause_original, cause1, cause2, stateprov, agency)
summary(fpafod_out)
##   datasource         sourceid            latitude       longitude            year           mon        
##  fpafod:1727476   Min.   :        1   Min.   :17.94   Min.   :-178.80   Min.   :1992   Min.   : 1.000  
##                   1st Qu.:   465673   1st Qu.:32.83   1st Qu.:-109.83   1st Qu.:1998   1st Qu.: 3.000  
##                   Median :   985582   Median :35.40   Median : -91.18   Median :2003   Median : 6.000  
##                   Mean   : 32179232   Mean   :36.79   Mean   : -95.29   Mean   :2003   Mean   : 5.935  
##                   3rd Qu.:  1761114   3rd Qu.:40.77   3rd Qu.: -82.25   3rd Qu.:2008   3rd Qu.: 8.000  
##                   Max.   :201940182   Max.   :70.14   Max.   : -65.26   Max.   :2013   Max.   :12.000  
##                                                                                                        
##       day           daynum         area_ha          cause_original       cause1      cause2 
##  Min.   : 1.0   Min.   :  1.0   Min.   :     0.00   Min.   : 1.000   Min.   :0   Min.   :0  
##  1st Qu.: 8.0   1st Qu.: 89.0   1st Qu.:     0.04   1st Qu.: 3.000   1st Qu.:0   1st Qu.:0  
##  Median :15.0   Median :164.0   Median :     0.40   Median : 5.000   Median :0   Median :0  
##  Mean   :15.5   Mean   :164.9   Mean   :    29.55   Mean   : 5.921   Mean   :0   Mean   :0  
##  3rd Qu.:23.0   3rd Qu.:230.0   3rd Qu.:     1.45   3rd Qu.: 9.000   3rd Qu.:0   3rd Qu.:0  
##  Max.   :31.0   Max.   :366.0   Max.   :245622.14   Max.   :13.000   Max.   :0   Max.   :0  
##                                                                                             
##    stateprov          agency       
##  CA     :173634   ST/C&L :1254551  
##  GA     :162479   FS     : 206731  
##  TX     :125227   BIA    : 108423  
##  NC     :104263   BLM    :  90801  
##  FL     : 85576   IA     :  21841  
##  SC     : 78127   NPS    :  19571  
##  (Other):998170   (Other):  25558

3.1.3 FPA-FOD cause codes

Recode cause_original into new causes (cause1, cause2)

# STAT_CAUSE_CODE
# 1 Lightning; 2 Equipment Use; 3 Smoking; 4 Campfire; 5 Debris Burning; 6 Railroad; 7 Arson; 8 Children; 
# 9 Miscellaneous; 10 Fireworks; 11 Power Line; 12 Structure; 13 Missing/Undefined
# cause1
# 1 Lightning; 2 Human; 3 Unknown
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown

fpafod_out$cause1[fpafod_out$cause_original == 1] <- 1
fpafod_out$cause1[fpafod_out$cause_original != 1 & fpafod_out$cause_original != 13] <- 2
fpafod_out$cause1[fpafod_out$cause_original == 13] <- 3

fpafod_out$cause2[fpafod_out$cause_original == 1] <- 1
fpafod_out$cause2[fpafod_out$cause_original == 2] <- 2
fpafod_out$cause2[fpafod_out$cause_original == 3] <- 3
fpafod_out$cause2[fpafod_out$cause_original == 4] <- 4
fpafod_out$cause2[fpafod_out$cause_original == 5] <- 5
fpafod_out$cause2[fpafod_out$cause_original == 6] <- 6
fpafod_out$cause2[fpafod_out$cause_original == 7] <- 5
fpafod_out$cause2[fpafod_out$cause_original == 8] <- 7
fpafod_out$cause2[fpafod_out$cause_original == 9] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 10] <-9
fpafod_out$cause2[fpafod_out$cause_original == 11] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 12] <- 8
fpafod_out$cause2[fpafod_out$cause_original == 13] <- 10

Compare cause classifications

# STAT_CAUSE_CODE
# 1 Lightning; 2 Equipment Use; 3 Smoking; 4 Campfire; 5 Debris Burning; 6 Railroad; 7 Arson; 8 Children; 
# 9 Miscellaneous; 10 Fireworks; 11 Power Line; 12 Structure; 13 Missing/Undefined
table(fpafod$STAT_CAUSE_CODE)
## 
##      1      2      3      4      5      6      7      8      9     10     11     12     13 
## 260311 137575  49633  68684 390702  32569 267987  58354 291678  10350  11315   3178 145140
# cause1
# 1 Lightning; 2 Human; 3 Unknown
table(fpafod_out$cause1)
## 
##       1       2       3 
##  260311 1322025  145140
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
table(fpafod_out$cause2)
## 
##      1      2      3      4      5      6      7      8      9     10 
## 260311 137575  49633  68684 658689  32569  58354 306171  10350 145140
str(fpafod_out)
## 'data.frame':    1727476 obs. of  14 variables:
##  $ datasource    : Factor w/ 1 level "fpafod": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sourceid      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ latitude      : num  40 38.9 39 38.6 38.6 ...
##  $ longitude     : num  -121 -120 -121 -120 -120 ...
##  $ year          : int  2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
##  $ mon           : num  2 5 5 6 6 6 7 3 3 7 ...
##  $ day           : num  2 12 31 28 28 30 1 8 15 1 ...
##  $ daynum        : int  33 133 152 180 180 182 183 67 74 183 ...
##  $ area_ha       : num  0.0405 0.1012 0.0405 0.0405 0.0405 ...
##  $ cause_original: int  9 1 5 1 1 1 1 5 5 1 ...
##  $ cause1        : num  2 1 2 1 1 1 1 2 2 1 ...
##  $ cause2        : num  8 1 5 1 1 1 1 5 5 1 ...
##  $ stateprov     : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ agency        : Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...

3.1.4 FPA-FOD .csv file

Write out a .csv file

merged_data_path <- "e:/Projects/fire/DailyFireStarts/data/MergedData/"
outfilename <- "fpafod_1992-2013.csv"
write.table(fpafod_out, paste(merged_data_path, outfilename, sep=""), row.names=FALSE)

3.2 NIFC

load("e:/Projects/fire/DailyFireStarts/data/RData/nifc.RData")
str(nifc)
## 'data.frame':    405124 obs. of  22 variables:
##  $ AGENCY_COD : Factor w/ 5 levels "BIA","BLM","FWS",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ UNIT_ID    : Factor w/ 881 levels "1002","1003",..: 361 361 361 361 361 361 361 361 361 361 ...
##  $ FIRE_TYPE  : Factor w/ 27 levels "0","1","11","12",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ FIRE_NUMBE : Factor w/ 31568 levels "0","1","10","100",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ FIRE_NAME  : Factor w/ 151091 levels "''67''","''67'' 2",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ STATE      : Factor w/ 51 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
##  $ DATE_DISCO : Factor w/ 8553 levels "19140716","19150424",..: 1241 1244 1247 1249 1252 1260 1264 1492 1497 1497 ...
##  $ DATE_CONTR : Factor w/ 8577 levels "02031231","19800000",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ GENERAL_CA : Factor w/ 10 levels "0","1","2","3",..: 3 3 3 4 3 3 2 2 2 2 ...
##  $ SPECIFIC_C : Factor w/ 33 levels "0","1","10","11",..: 33 33 33 3 33 32 2 2 2 2 ...
##  $ YEAR_DISCO : int  1983 1983 1983 1983 1983 1983 1983 1984 1984 1984 ...
##  $ LATITUDE   : num  44.3 44.5 43.9 44 43.8 ...
##  $ LONGITUDE  : num  -119 -119 -119 -119 -119 ...
##  $ ACRES_CONT : num  0.1 1 0.1 1 0.1 1 0.1 0.1 0.1 0.1 ...
##  $ SIZE_CLASS : Factor w/ 7 levels "A","B","C","D",..: 1 2 1 2 1 2 1 1 1 1 ...
##  $ GEOGRAPHIC : Factor w/ 11 levels "Alaska","Eastern",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ STARTDATED : Date, format: "1983-10-05" "1983-10-08" "1983-10-11" ...
##  $ YEAR       : num  1983 1983 1983 1983 1983 ...
##  $ startday   : num  5 8 11 13 16 24 29 18 23 23 ...
##  $ startmon   : num  10 10 10 10 10 10 10 7 7 7 ...
##  $ startdaynum: num  278 281 284 286 289 297 302 200 205 205 ...
##  $ AREA       : num  0.0405 0.4047 0.0405 0.4047 0.0405 ...
##  - attr(*, "data_types")= chr  "C" "C" "C" "C" ...
summary(nifc)
##  AGENCY_COD       UNIT_ID         FIRE_TYPE        FIRE_NUMBE       FIRE_NAME          STATE       
##  BIA : 91480   304    :  8420   11     :125642   1      :  1872   FA 1   :   550   CA     : 63643  
##  BLM : 88307   H50H58 :  6958   13     : 14425   2      :  1696   FA 2   :   512   AZ     : 55535  
##  FWS :     0   H50H52 :  6385   16     : 14118   3      :  1556   FA 3   :   473   OR     : 38566  
##  NPS : 26809   AKAFS  :  5910   51     : 10310   4      :  1457   FA 4   :   426   ID     : 29029  
##  USFS:198437   312    :  5387   15     :  9658   5      :  1378   FA 5   :   412   MT     : 28450  
##  NA's:    91   (Other):371973   (Other): 32443   (Other):198637   (Other):286270   (Other):189810  
##                NA's   :    91   NA's   :198528   NA's   :198528   NA's   :116481   NA's   :    91  
##     DATE_DISCO        DATE_CONTR       GENERAL_CA       SPECIFIC_C       YEAR_DISCO      LATITUDE    
##  19890726:  1078   19940000:  1554   1      :182663   30     :144501   Min.   :1981   Min.   :19.27  
##  19860810:   861   19920000:  1422   9      : 44554   1      : 76258   1st Qu.:1989   1st Qu.:35.31  
##  19870830:   852   19930000:  1396   5      : 42126   0      : 72892   Median :1994   Median :39.89  
##  19890808:   698   19960000:  1393   4      : 33951   19     : 15034   Mean   :1994   Mean   :40.26  
##  19940723:   663   19900000:  1307   2      : 31932   27     : 11503   3rd Qu.:1999   3rd Qu.:44.45  
##  (Other) :400881   (Other) :397184   (Other): 69807   (Other): 84845   Max.   :2003   Max.   :69.85  
##  NA's    :    91   NA's    :   868   NA's   :    91   NA's   :    91   NA's   :91     NA's   :91     
##    LONGITUDE         ACRES_CONT         SIZE_CLASS                   GEOGRAPHIC       STARTDATED        
##  Min.   :-176.67   Min.   :     0.0   A      :217738   Southwest          : 78712   Min.   :1981-01-01  
##  1st Qu.:-118.45   1st Qu.:     0.1   B      :130527   Northwest          : 51789   1st Qu.:1989-07-26  
##  Median :-112.00   Median :     0.2   C      : 35503   Northern Rockies   : 47645   Median :1994-07-22  
##  Mean   :-110.62   Mean   :   178.5   D      :  8481   Rocky Mountain     : 41947   Mean   :1994-05-03  
##  3rd Qu.:-106.68   3rd Qu.:     2.0   E      :  6193   Eastern Great Basin: 41205   3rd Qu.:1999-07-09  
##  Max.   : -67.06   Max.   :606945.0   (Other):  6591   (Other)            :143735   Max.   :2003-12-31  
##  NA's   :91        NA's   :91         NA's   :    91   NA's               :    91   NA's   :188         
##       YEAR         startday        startmon       startdaynum         AREA          
##  Min.   :1981   Min.   : 1.00   Min.   : 1.000   Min.   :  1.0   Min.   :     0.00  
##  1st Qu.:1989   1st Qu.: 8.00   1st Qu.: 6.000   1st Qu.:159.0   1st Qu.:     0.04  
##  Median :1994   Median :16.00   Median : 7.000   Median :201.0   Median :     0.08  
##  Mean   :1994   Mean   :15.73   Mean   : 6.851   Mean   :192.9   Mean   :    72.23  
##  3rd Qu.:1999   3rd Qu.:24.00   3rd Qu.: 8.000   3rd Qu.:231.0   3rd Qu.:     0.81  
##  Max.   :2003   Max.   :31.00   Max.   :12.000   Max.   :366.0   Max.   :245622.14  
##  NA's   :188    NA's   :188     NA's   :188      NA's   :188     NA's   :91

Remove 210 records with complete NAs

nifc <- nifc[is.na(nifc$LATITUDE) == FALSE,]
summary(nifc)
##  AGENCY_COD       UNIT_ID         FIRE_TYPE        FIRE_NUMBE       FIRE_NAME          STATE       
##  BIA : 91480   304    :  8420   11     :125642   1      :  1872   FA 1   :   550   CA     : 63643  
##  BLM : 88307   H50H58 :  6958   13     : 14425   2      :  1696   FA 2   :   512   AZ     : 55535  
##  FWS :     0   H50H52 :  6385   16     : 14118   3      :  1556   FA 3   :   473   OR     : 38566  
##  NPS : 26809   AKAFS  :  5910   51     : 10310   4      :  1457   FA 4   :   426   ID     : 29029  
##  USFS:198437   312    :  5387   15     :  9658   5      :  1378   FA 5   :   412   MT     : 28450  
##                F50F52 :  5282   (Other): 32443   (Other):198637   (Other):286270   NM     : 22249  
##                (Other):366691   NA's   :198437   NA's   :198437   NA's   :116390   (Other):167561  
##     DATE_DISCO        DATE_CONTR       GENERAL_CA       SPECIFIC_C       YEAR_DISCO      LATITUDE    
##  19890726:  1078   19940000:  1554   1      :182663   30     :144501   Min.   :1981   Min.   :19.27  
##  19860810:   861   19920000:  1422   9      : 44554   1      : 76258   1st Qu.:1989   1st Qu.:35.31  
##  19870830:   852   19930000:  1396   5      : 42126   0      : 72892   Median :1994   Median :39.89  
##  19890808:   698   19960000:  1393   4      : 33951   19     : 15034   Mean   :1994   Mean   :40.26  
##  19940723:   663   19900000:  1307   2      : 31932   27     : 11503   3rd Qu.:1999   3rd Qu.:44.45  
##  19960813:   653   (Other) :397184   0      : 22246   8      : 11138   Max.   :2003   Max.   :69.85  
##  (Other) :400228   NA's    :   777   (Other): 47561   (Other): 73707                                 
##    LONGITUDE         ACRES_CONT       SIZE_CLASS               GEOGRAPHIC       STARTDATED        
##  Min.   :-176.67   Min.   :     0.0   A:217738   Southwest          : 78712   Min.   :1981-01-01  
##  1st Qu.:-118.45   1st Qu.:     0.1   B:130527   Northwest          : 51789   1st Qu.:1989-07-26  
##  Median :-112.00   Median :     0.2   C: 35503   Northern Rockies   : 47645   Median :1994-07-22  
##  Mean   :-110.62   Mean   :   178.5   D:  8481   Rocky Mountain     : 41947   Mean   :1994-05-03  
##  3rd Qu.:-106.68   3rd Qu.:     2.0   E:  6193   Eastern Great Basin: 41205   3rd Qu.:1999-07-09  
##  Max.   : -67.06   Max.   :606945.0   F:  4336   Southern           : 35210   Max.   :2003-12-31  
##                                       G:  2255   (Other)            :108525   NA's   :97          
##       YEAR         startday        startmon       startdaynum         AREA          
##  Min.   :1981   Min.   : 1.00   Min.   : 1.000   Min.   :  1.0   Min.   :     0.00  
##  1st Qu.:1989   1st Qu.: 8.00   1st Qu.: 6.000   1st Qu.:159.0   1st Qu.:     0.04  
##  Median :1994   Median :16.00   Median : 7.000   Median :201.0   Median :     0.08  
##  Mean   :1994   Mean   :15.73   Mean   : 6.851   Mean   :192.9   Mean   :    72.23  
##  3rd Qu.:1999   3rd Qu.:24.00   3rd Qu.: 8.000   3rd Qu.:231.0   3rd Qu.:     0.81  
##  Max.   :2003   Max.   :31.00   Max.   :12.000   Max.   :366.0   Max.   :245622.14  
##  NA's   :97     NA's   :97      NA's   :97       NA's   :97

3.2.1 NIFC common variables

Create new (common) variables

nobs <- length(nifc[,1])

# direct copy
datasource <- rep("nifc",nobs)
head(datasource)
## [1] "nifc" "nifc" "nifc" "nifc" "nifc" "nifc"
sourceid <- as.integer(seq(1,nobs,by=1))
head(sourceid)
## [1] 1 2 3 4 5 6
latitude <- nifc$LATITUDE
head(latitude)
## [1] 44.28000 44.55000 43.87667 44.03333 43.79333 44.28333
longitude <- nifc$LONGITUDE
head(longitude)
## [1] -118.9167 -118.9167 -119.3683 -118.7467 -118.9700 -118.9217
year <- nifc$YEAR
head(year)
## [1] 1983 1983 1983 1983 1983 1983
mon <- nifc$startmon
head(mon)
## [1] 10 10 10 10 10 10
day <- nifc$startday
head(day)
## [1]  5  8 11 13 16 24
daynum <- nifc$startdaynum
head(daynum)
## [1] 278 281 284 286 289 297
area_ha <- nifc$AREA
head(area_ha)
## [1] 0.0404686 0.4046860 0.0404686 0.4046860 0.0404686 0.4046860
cause_original <- as.numeric(nifc$GENERAL_CA)
head(cause_original)
## [1] 3 3 3 4 3 3
stateprov <- as.character(nifc$STATE)
head(stateprov)
## [1] "OR" "OR" "OR" "OR" "OR" "OR"
agency <- as.character(nifc$AGENCY_COD)
head(agency)
## [1] "USFS" "USFS" "USFS" "USFS" "USFS" "USFS"
# fill cause1 and cause2 with 0's
cause1 <- rep(0,nobs)
cause2 <- rep(0,nobs)

3.2.2 NIFC data frame

# make data frame
nifc_out <- data.frame(datasource, sourceid, latitude, longitude, year, mon, day, daynum,
  area_ha, cause_original, cause1, cause2, stateprov, agency)
summary(nifc_out)
##  datasource       sourceid         latitude       longitude            year           mon        
##  nifc:405033   Min.   :     1   Min.   :19.27   Min.   :-176.67   Min.   :1981   Min.   : 1.000  
##                1st Qu.:101259   1st Qu.:35.31   1st Qu.:-118.45   1st Qu.:1989   1st Qu.: 6.000  
##                Median :202517   Median :39.89   Median :-112.00   Median :1994   Median : 7.000  
##                Mean   :202517   Mean   :40.26   Mean   :-110.62   Mean   :1994   Mean   : 6.851  
##                3rd Qu.:303775   3rd Qu.:44.45   3rd Qu.:-106.68   3rd Qu.:1999   3rd Qu.: 8.000  
##                Max.   :405033   Max.   :69.85   Max.   : -67.06   Max.   :2003   Max.   :12.000  
##                                                                   NA's   :97     NA's   :97      
##       day            daynum         area_ha          cause_original       cause1      cause2 
##  Min.   : 1.00   Min.   :  1.0   Min.   :     0.00   Min.   : 1.000   Min.   :0   Min.   :0  
##  1st Qu.: 8.00   1st Qu.:159.0   1st Qu.:     0.04   1st Qu.: 2.000   1st Qu.:0   1st Qu.:0  
##  Median :16.00   Median :201.0   Median :     0.08   Median : 2.000   Median :0   Median :0  
##  Mean   :15.73   Mean   :192.9   Mean   :    72.23   Mean   : 4.164   Mean   :0   Mean   :0  
##  3rd Qu.:24.00   3rd Qu.:231.0   3rd Qu.:     0.81   3rd Qu.: 6.000   3rd Qu.:0   3rd Qu.:0  
##  Max.   :31.00   Max.   :366.0   Max.   :245622.14   Max.   :10.000   Max.   :0   Max.   :0  
##  NA's   :97      NA's   :97                                                                  
##    stateprov       agency      
##  CA     : 63643   BIA : 91480  
##  AZ     : 55535   BLM : 88307  
##  OR     : 38566   NPS : 26809  
##  ID     : 29029   USFS:198437  
##  MT     : 28450                
##  NM     : 22249                
##  (Other):167561

3.2.3 NIFC cause codes

Recode cause_original into new causes (cause1, cause2)

# GENERAL_CA
# 1 Natural; 2 Campfire; 3 Smoking; 4 Fire use; 5 Incendiary; 6 Equipment
# 7 Railroads; 8 Juveniles; 9 Miscellaneous; 0 Unknown
# cause1
# 1 Lightning; 2 Human; 3 Unknown
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown

nifc_out$cause1[nifc_out$cause_original == 1] <- 1
nifc_out$cause1[nifc_out$cause_original != 1 & nifc_out$cause_original != 0] <- 2
nifc_out$cause1[nifc_out$cause_original == 0] <- 3

nifc_out$cause2[nifc_out$cause_original == 1] <- 1
nifc_out$cause2[nifc_out$cause_original == 2] <- 4
nifc_out$cause2[nifc_out$cause_original == 3] <- 3
nifc_out$cause2[nifc_out$cause_original == 4] <- 5
nifc_out$cause2[nifc_out$cause_original == 5] <- 9
nifc_out$cause2[nifc_out$cause_original == 6] <- 2
nifc_out$cause2[nifc_out$cause_original == 7] <- 6
nifc_out$cause2[nifc_out$cause_original == 8] <- 7
nifc_out$cause2[nifc_out$cause_original == 9] <- 8
nifc_out$cause2[nifc_out$cause_original == 0] <- 10
nifc_out$cause_original <- as.numeric(nifc_out$cause_original) - 1

Compare cause classifications

# GENERAL_CA
# 1 Natural; 2 Campfire; 3 Smoking; 4 Fire use; 5 Incendiary; 6 Equipment
# 7 Railroads; 8 Juveniles; 9 Miscellaneous; 0 Unknown
table(nifc$GENERAL_CA)
## 
##      0      1      2      3      4      5      6      7      8      9 
##  22246 182663  31932  11672  33951  42126  15596   3357  16936  44554
table(nifc_out$cause_original)
## 
##      0      1      2      3      4      5      6      7      8      9 
##  22246 182663  31932  11672  33951  42126  15596   3357  16936  44554
# cause1
# 1 Lightning; 2 Human; 3 Unknown
table(nifc_out$cause1)
## 
##      1      2 
##  22246 382787
# cause2
# 1 Lightning/Natural; 2 Equipment; 3 Smoking; 4 Campfire; 5 Deliberate; 6 Railroads;
# 7 Juveniles; 8 Miscellaneous; 9 Incendiary; 10 Unknown
table(nifc_out$cause2)
## 
##      0      1      2      3      4      5      6      7      8      9 
##  44554  22246  42126  31932 182663  11672  15596   3357  16936  33951
str(nifc_out)
## 'data.frame':    405033 obs. of  14 variables:
##  $ datasource    : Factor w/ 1 level "nifc": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sourceid      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ latitude      : num  44.3 44.5 43.9 44 43.8 ...
##  $ longitude     : num  -119 -119 -119 -119 -119 ...
##  $ year          : num  1983 1983 1983 1983 1983 ...
##  $ mon           : num  10 10 10 10 10 10 10 7 7 7 ...
##  $ day           : num  5 8 11 13 16 24 29 18 23 23 ...
##  $ daynum        : num  278 281 284 286 289 297 302 200 205 205 ...
##  $ area_ha       : num  0.0405 0.4047 0.0405 0.4047 0.0405 ...
##  $ cause_original: num  2 2 2 3 2 2 1 1 1 1 ...
##  $ cause1        : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ cause2        : num  3 3 3 5 3 3 4 4 4 4 ...
##  $ stateprov     : Factor w/ 50 levels "AK","AL","AR",..: 38 38 38 38 38 38 38 38 38 38 ...
##  $ agency        : Factor w/ 4 levels "BIA","BLM","NPS",..: 4 4 4 4 4 4 4 4 4 4 ...
# Recode area_ha NA's to 0.0
nifc_out$area_ha[is.na(nifc_out$area_ha) == TRUE] <- 0
summary(nifc_out$area_ha)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##      0.00      0.04      0.08     72.23      0.81 245622.14

3.2.4 NIFC csv file (all data)

Write out a .csv file

outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/nifc_1981-2003.csv"
write.table(nifc_out, outfilename, sep=",", row.names=FALSE)

3.2.5 Subset NIFC data

Select just the records between 1986 and 1991

nifc2_out <- subset(nifc_out, nifc_out$year >= 1986 & nifc_out$year <= 1991)
length(nifc2_out[,1])
## [1] 121576
table(nifc2_out$year)
## 
##  1986  1987  1988  1989  1990  1991 
## 17760 21873 21950 19857 20606 19530
sum(table(nifc2_out$year))
## [1] 121576
summary(nifc2_out)
##  datasource       sourceid         latitude       longitude            year           mon        
##  nifc:121576   Min.   :    22   Min.   :19.30   Min.   :-176.67   Min.   :1986   Min.   : 1.000  
##                1st Qu.: 32583   1st Qu.:35.42   1st Qu.:-119.09   1st Qu.:1987   1st Qu.: 6.000  
##                Median : 63529   Median :39.83   Median :-112.80   Median :1988   Median : 7.000  
##                Mean   :144894   Mean   :40.26   Mean   :-110.85   Mean   :1989   Mean   : 6.928  
##                3rd Qu.:269221   3rd Qu.:44.47   3rd Qu.:-106.72   3rd Qu.:1990   3rd Qu.: 8.000  
##                Max.   :405004   Max.   :69.63   Max.   : -68.22   Max.   :1991   Max.   :12.000  
##                                                                                                  
##       day           daynum         area_ha          cause_original      cause1          cause2   
##  Min.   : 1.0   Min.   :  1.0   Min.   :     0.00   Min.   :0.000   Min.   :1.000   Min.   :0.0  
##  1st Qu.: 8.0   1st Qu.:165.0   1st Qu.:     0.04   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:3.0  
##  Median :15.0   Median :203.0   Median :     0.08   Median :1.000   Median :2.000   Median :4.0  
##  Mean   :15.8   Mean   :195.2   Mean   :    70.54   Mean   :2.987   Mean   :1.956   Mean   :3.9  
##  3rd Qu.:24.0   3rd Qu.:232.0   3rd Qu.:     0.81   3rd Qu.:5.000   3rd Qu.:2.000   3rd Qu.:4.0  
##  Max.   :31.0   Max.   :365.0   Max.   :219028.61   Max.   :9.000   Max.   :2.000   Max.   :9.0  
##                                                                                                  
##    stateprov      agency     
##  CA     :23342   BIA :20448  
##  AZ     :14770   BLM :21262  
##  OR     :12066   NPS : 8479  
##  ID     : 9392   USFS:71387  
##  MT     : 8006               
##  NM     : 5744               
##  (Other):48256

3.2.6 NIFC .csv file (1986-1991)

Write out a .csv file

outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/nifc_1986-1991.csv"
write.table(nifc2_out, outfilename, sep=",", row.names=FALSE)

3.3 Make the US data set

3.3.1 Merge data

us_merged <- rbind(fpafod_out,nifc2_out)
table_us_year <- table(us_merged$year)
table_us_year
## 
##   1986   1987   1988   1989   1990   1991   1992   1993   1994   1995   1996   1997   1998   1999   2000 
##  17760  21873  21950  19857  20606  19530  67964  62022  75989  71496  75604  61472  68388  89398  96454 
##   2001   2002   2003   2004   2005   2006   2007   2008   2009   2010   2011   2012   2013 
##  86069  75136  67380  68616  87391 113242  94681  84654  77262  78485  89897  71768  64108
sum(table_us_year)
## [1] 1849052
summary(us_merged)
##   datasource         sourceid            latitude       longitude            year           mon    
##  fpafod:1727476   Min.   :        1   Min.   :17.94   Min.   :-178.80   Min.   :1986   Min.   : 1  
##  nifc  : 121576   1st Qu.:   359278   1st Qu.:32.99   1st Qu.:-111.48   1st Qu.:1996   1st Qu.: 4  
##                   Median :   908962   Median :35.62   Median : -92.91   Median :2002   Median : 6  
##                   Mean   : 30072960   Mean   :37.01   Mean   : -96.31   Mean   :2002   Mean   : 6  
##                   3rd Qu.:  1663336   3rd Qu.:41.05   3rd Qu.: -82.54   3rd Qu.:2008   3rd Qu.: 8  
##                   Max.   :201940182   Max.   :70.14   Max.   : -65.26   Max.   :2013   Max.   :12  
##                                                                                                    
##       day            daynum         area_ha          cause_original       cause1          cause2      
##  Min.   : 1.00   Min.   :  1.0   Min.   :     0.00   Min.   : 0.000   Min.   :1.000   Min.   : 0.000  
##  1st Qu.: 8.00   1st Qu.: 92.0   1st Qu.:     0.04   1st Qu.: 2.000   1st Qu.:2.000   1st Qu.: 3.000  
##  Median :15.00   Median :170.0   Median :     0.40   Median : 5.000   Median :2.000   Median : 5.000  
##  Mean   :15.52   Mean   :166.9   Mean   :    32.24   Mean   : 5.728   Mean   :1.935   Mean   : 5.043  
##  3rd Qu.:23.00   3rd Qu.:231.0   3rd Qu.:     1.29   3rd Qu.: 9.000   3rd Qu.:2.000   3rd Qu.: 8.000  
##  Max.   :31.00   Max.   :366.0   Max.   :245622.14   Max.   :13.000   Max.   :3.000   Max.   :10.000  
##                                                                                                       
##    stateprov           agency       
##  CA     : 196976   ST/C&L :1254551  
##  GA     : 163298   FS     : 206731  
##  TX     : 126100   BIA    : 128871  
##  NC     : 105113   BLM    : 112063  
##  FL     :  87589   USFS   :  71387  
##  AZ     :  79969   NPS    :  28050  
##  (Other):1090007   (Other):  47399
str(us_merged)
## 'data.frame':    1849052 obs. of  14 variables:
##  $ datasource    : Factor w/ 2 levels "fpafod","nifc": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sourceid      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ latitude      : num  40 38.9 39 38.6 38.6 ...
##  $ longitude     : num  -121 -120 -121 -120 -120 ...
##  $ year          : num  2005 2004 2004 2004 2004 ...
##  $ mon           : num  2 5 5 6 6 6 7 3 3 7 ...
##  $ day           : num  2 12 31 28 28 30 1 8 15 1 ...
##  $ daynum        : num  33 133 152 180 180 182 183 67 74 183 ...
##  $ area_ha       : num  0.0405 0.1012 0.0405 0.0405 0.0405 ...
##  $ cause_original: num  9 1 5 1 1 1 1 5 5 1 ...
##  $ cause1        : num  2 1 2 1 1 1 1 2 2 1 ...
##  $ cause2        : num  8 1 5 1 1 1 1 5 5 1 ...
##  $ stateprov     : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ agency        : Factor w/ 12 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...
outfilename <- "e:/Projects/fire/DailyFireStarts/data/MergedData/us_1986-2013.csv"
write.table(us_merged, outfilename, sep=",", row.names=FALSE)
save(us_merged, file="e:/Projects/fire/DailyFireStarts/data/RData/us_1986-2013.RData")