also here on github

it’s not R sacrilege if nobody knows

Even the little stuff benefits from some organizational scripting, even if it’s just to catalog one’s actions. Here are some examples for common tasks.

Get all the source data into a R-friendly format like csv. ogr2ogr has a nifty option -lco GEOMETRY=AS_WKT (Well-Known-Text) to keep track of spatial data throughout abstractions- we can add the WKT as a cell until it is time to write the data out again.

# define a shapefile conversion to csv from system's shell:
sys_SHP2CSV <- function(shp) {
  csvfile <- paste0(shp, '.csv')
  shpfile <-paste0(shp, '.shp')
  if (!file.exists(csvfile)) {
    # use -lco GEOMETRY to maintain location
    # for reference, shp --> geojson would look like:
    # system('ogr2ogr -f geojson output.geojson input.shp')
    # keeps geometry as WKT:
    cmd <- paste('ogr2ogr -f CSV', csvfile, shpfile, '-lco GEOMETRY=AS_WKT')
    system(cmd)  # executes command
  } else {
    print(paste('output file already exists, please delete', csvfile, 'before converting again'))
  }
  return(csvfile)
}

Read the new csv into R:

# for file 'foo.shp':
foo_raw <- read.csv(sys_SHP2CSV(shp='foo'), sep = ',')

One might do any number of things now, some here lets snag some columns and rename them:

# rename the subset of data "foo" we want in a data.frame:
foo <- data.frame(foo_raw[1:5])
colnames(foo) <- c('bar', 'eggs', 'ham', 'hello', 'world')

We could do some more careful parsing too, here a semicolon in cell strings can be converted to a comma:

# replace ` ; ` to ` , ` in col "bar":
foo$bar <- gsub(pattern=";", replacement=",", foo$bar)

Do whatever you do for an output directory:

# make a output file directory if you're into that
# my preference is to only keep one set of output files per run
# here, we'd reset the directory before adding any new output files
redir <- function(outdir) {
  if (dir.exists(outdir)) {
    system(paste('rm -rf', outdir))
  }
  dir.create(outdir)
}

Of course, once your data is in R there are countless "R things" one could do…

# iterate to fill empty cells with preceding values
for (i in 1:length(foo[,1])) {
  if (nchar(foo$bar[i]) < 1) {
    foo$bar[i] <- foo$bar[i-1]
  }
  # fill incomplete rows with NA values:
  if (nchar(foo$bar[i]) < 1) {
    foo[i,] <- NA  
  }
}

# remove NA rows if there is nothing better to do:
newfoo <- na.omit(foo)

Even though this is totally adding a level of complexity to what could be a single ogr2ogr command, I’ve decided it is still worth it- I’d definitely rather keep track of everything I do over forget what I did…. xD

# make some methods to write out various kinds of files via gdal:
to_GEO <- function(target) {
  print(paste('converting', target, 'to geojson .... '))
  system(paste('ogr2ogr -f', " geojson ",  paste0(target, '.geojson'), paste0(target, '.csv')))
}

to_SHP <- function(target) {
  print(paste('converting ', target, ' to ESRI Shapefile .... '))
  system(paste('ogr2ogr -f', " 'ESRI Shapefile' ",  paste0(target, '.shp'), paste0(target, '.csv')))
}

# name files:
foo_name <- 'output_foo'

# for table data 'foo', first:
write.csv(foo, paste0(foo_name, '.csv'))

# convert with the above csv:
to_SHP(foo_name)

Cheers!
-Jess