Under the hood, many function of operation included in the rbms
package are done using the data.table
package. The
data.table
package offer fast and powerful operation on large data set with a very
compact syntax.
In this session, you will:
You will need to load the following R packages and the data found in
the bms_workshop_data folder that can be
downloaded
here. Once you have downloaded the data, unzip the folder and add
the data in your R project directory, your current R working directory,
or set your working directly accordingly setwd()
.
## data.table 1.14.8 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
All data are stored in rds
format, this format is highly
efficient for storing R object, but you could also have them in any
other format. To load .rds data
, we use the function
readRDS()
b_count_sub <- readRDS("bms_workshop_data/work_count.rds")
m_visit_sub <- readRDS("bms_workshop_data/work_visit.rds")
m_site_sub <- readRDS("bms_workshop_data/work_site.rds")
## to read csv into a data.table use fread()
bcz_class <- data.table::fread("bms_workshop_data/GEnS_v3_classification.csv")
b_count_sub
## bms_id transect_id_serial visit_date year month day species_name
## 1: FRBMS FRBMS.1 2009-05-22 2009 5 22 Maniola jurtina
## 2: FRBMS FRBMS.1 2009-08-01 2009 8 1 Maniola jurtina
## 3: FRBMS FRBMS.1 2009-08-01 2009 8 1 Polyommatus icarus
## 4: FRBMS FRBMS.1 2009-09-19 2009 9 19 Polyommatus icarus
## 5: FRBMS FRBMS.1 2010-07-02 2010 7 2 Maniola jurtina
## ---
## 59598: UKBMS UKBMS.300 2018-07-03 2018 7 3 Maniola jurtina
## 59599: UKBMS UKBMS.300 2018-07-10 2018 7 10 Maniola jurtina
## 59600: UKBMS UKBMS.300 2018-07-16 2018 7 16 Maniola jurtina
## 59601: UKBMS UKBMS.300 2018-07-16 2018 7 16 Polyommatus icarus
## 59602: UKBMS UKBMS.300 2018-07-23 2018 7 23 Maniola jurtina
## count
## 1: 5
## 2: 5
## 3: 2
## 4: 6
## 5: 20
## ---
## 59598: 26
## 59599: 16
## 59600: 5
## 59601: 1
## 59602: 2
When you look at a data.table object, you can will see the first and last five row, with all column names.
## bms_id transect_id_serial year visit_date
## 1: FRBMS FRBMS.1 2009 2009-05-22
## 2: FRBMS FRBMS.1 2009 2009-08-01
## 3: FRBMS FRBMS.1 2009 2009-08-01
## 4: FRBMS FRBMS.1 2009 2009-09-19
## 5: FRBMS FRBMS.1 2010 2010-07-02
## ---
## 59598: UKBMS UKBMS.300 2018 2018-07-03
## 59599: UKBMS UKBMS.300 2018 2018-07-10
## 59600: UKBMS UKBMS.300 2018 2018-07-16
## 59601: UKBMS UKBMS.300 2018 2018-07-16
## 59602: UKBMS UKBMS.300 2018 2018-07-23
## [1] "FRBMS" "NLBMS" "UKBMS"
## bms_id year
## 1: FRBMS 2009
## 2: FRBMS 2010
## 3: FRBMS 2011
## 4: FRBMS 2012
## 5: FRBMS 2013
## 6: FRBMS 2014
## 7: FRBMS 2015
## 8: FRBMS 2017
## 9: FRBMS 2018
## 10: FRBMS 2016
## 11: FRBMS 2008
## 12: NLBMS 2008
## 13: NLBMS 2009
## 14: NLBMS 2010
## 15: NLBMS 2011
## 16: NLBMS 2012
## 17: NLBMS 2013
## 18: NLBMS 2014
## 19: NLBMS 2015
## 20: NLBMS 2016
## 21: NLBMS 2017
## 22: NLBMS 2018
## 23: UKBMS 2008
## 24: UKBMS 2009
## 25: UKBMS 2010
## 26: UKBMS 2011
## 27: UKBMS 2012
## 28: UKBMS 2013
## 29: UKBMS 2014
## 30: UKBMS 2015
## 31: UKBMS 2016
## 32: UKBMS 2017
## 33: UKBMS 2018
## bms_id year
## bms_id transect_id_serial year visit_date
## 1: NLBMS NLBMS.1 2008 2008-05-23
## 2: NLBMS NLBMS.1 2008 2008-06-07
## 3: NLBMS NLBMS.1 2008 2008-07-26
## 4: NLBMS NLBMS.1 2008 2008-07-26
## 5: NLBMS NLBMS.1 2008 2008-08-15
## ---
## 26200: NLBMS NLBMS.300 2016 2016-07-03
## 26201: NLBMS NLBMS.300 2016 2016-07-25
## 26202: NLBMS NLBMS.300 2017 2017-06-26
## 26203: NLBMS NLBMS.300 2017 2017-07-14
## 26204: NLBMS NLBMS.300 2018 2018-06-15
my_new_dt[ , month := month(visit_date)][ , c("day", "year") := .(mday(visit_date), year(visit_date))]
my_new_dt
## bms_id transect_id_serial year visit_date month day
## 1: FRBMS FRBMS.1 2009 2009-05-22 5 22
## 2: FRBMS FRBMS.1 2009 2009-08-01 8 1
## 3: FRBMS FRBMS.1 2009 2009-08-01 8 1
## 4: FRBMS FRBMS.1 2009 2009-09-19 9 19
## 5: FRBMS FRBMS.1 2010 2010-07-02 7 2
## ---
## 59598: UKBMS UKBMS.300 2018 2018-07-03 7 3
## 59599: UKBMS UKBMS.300 2018 2018-07-10 7 10
## 59600: UKBMS UKBMS.300 2018 2018-07-16 7 16
## 59601: UKBMS UKBMS.300 2018 2018-07-16 7 16
## 59602: UKBMS UKBMS.300 2018 2018-07-23 7 23
## bms_id N
## 1: FRBMS 1074
## 2: NLBMS 26204
## 3: UKBMS 32324
## change names
setnames(b_count_sub, "transect_id_serial", "SITE_ID")
setnames(b_count_sub, c("species_name", "bms_id"), c("SPECIES", "BMS_ID"))
names(b_count_sub) <- toupper(names(b_count_sub))
b_count_sub
## BMS_ID SITE_ID VISIT_DATE YEAR MONTH DAY SPECIES COUNT
## 1: FRBMS FRBMS.1 2009-05-22 2009 5 22 Maniola jurtina 5
## 2: FRBMS FRBMS.1 2009-08-01 2009 8 1 Maniola jurtina 5
## 3: FRBMS FRBMS.1 2009-08-01 2009 8 1 Polyommatus icarus 2
## 4: FRBMS FRBMS.1 2009-09-19 2009 9 19 Polyommatus icarus 6
## 5: FRBMS FRBMS.1 2010-07-02 2010 7 2 Maniola jurtina 20
## ---
## 59598: UKBMS UKBMS.300 2018-07-03 2018 7 3 Maniola jurtina 26
## 59599: UKBMS UKBMS.300 2018-07-10 2018 7 10 Maniola jurtina 16
## 59600: UKBMS UKBMS.300 2018-07-16 2018 7 16 Maniola jurtina 5
## 59601: UKBMS UKBMS.300 2018-07-16 2018 7 16 Polyommatus icarus 1
## 59602: UKBMS UKBMS.300 2018-07-23 2018 7 23 Maniola jurtina 2
## merge data set
setnames(m_site_sub, "transect_id_serial", "SITE_ID")
setkey(b_count_sub, SITE_ID)
setkey(m_site_sub, SITE_ID)
merge(b_count_sub, m_site_sub[bms_id == "FRBMS" , .(SITE_ID, transect_lon_1km, transect_lat_1km)])
## SITE_ID BMS_ID VISIT_DATE YEAR MONTH DAY SPECIES COUNT
## 1: FRBMS.1 FRBMS 2009-05-22 2009 5 22 Maniola jurtina 5
## 2: FRBMS.1 FRBMS 2009-08-01 2009 8 1 Maniola jurtina 5
## 3: FRBMS.1 FRBMS 2009-08-01 2009 8 1 Polyommatus icarus 2
## 4: FRBMS.1 FRBMS 2009-09-19 2009 9 19 Polyommatus icarus 6
## 5: FRBMS.1 FRBMS 2010-07-02 2010 7 2 Maniola jurtina 20
## ---
## 1070: FRBMS.9 FRBMS 2014-07-18 2014 7 18 Maniola jurtina 30
## 1071: FRBMS.9 FRBMS 2014-08-21 2014 8 21 Maniola jurtina 33
## 1072: FRBMS.9 FRBMS 2014-08-21 2014 8 21 Polyommatus icarus 6
## 1073: FRBMS.9 FRBMS 2014-09-16 2014 9 16 Maniola jurtina 2
## 1074: FRBMS.9 FRBMS 2014-09-16 2014 9 16 Polyommatus icarus 18
## transect_lon_1km transect_lat_1km
## 1: 3953000 2967000
## 2: 3953000 2967000
## 3: 3953000 2967000
## 4: 3953000 2967000
## 5: 3953000 2967000
## ---
## 1070: 3737000 2835000
## 1071: 3737000 2835000
## 1072: 3737000 2835000
## 1073: 3737000 2835000
## 1074: 3737000 2835000
merge(b_count_sub, m_site_sub[bms_id == "FRBMS" , .(SITE_ID, transect_lon_1km, transect_lat_1km)], all.x = TRUE)
## SITE_ID BMS_ID VISIT_DATE YEAR MONTH DAY SPECIES COUNT
## 1: FRBMS.1 FRBMS 2009-05-22 2009 5 22 Maniola jurtina 5
## 2: FRBMS.1 FRBMS 2009-08-01 2009 8 1 Maniola jurtina 5
## 3: FRBMS.1 FRBMS 2009-08-01 2009 8 1 Polyommatus icarus 2
## 4: FRBMS.1 FRBMS 2009-09-19 2009 9 19 Polyommatus icarus 6
## 5: FRBMS.1 FRBMS 2010-07-02 2010 7 2 Maniola jurtina 20
## ---
## 59598: UKBMS.99 UKBMS 2018-08-28 2018 8 28 Polyommatus icarus 8
## 59599: UKBMS.99 UKBMS 2018-09-02 2018 9 2 Maniola jurtina 25
## 59600: UKBMS.99 UKBMS 2018-09-02 2018 9 2 Polyommatus icarus 7
## 59601: UKBMS.99 UKBMS 2018-10-01 2018 10 1 Maniola jurtina 14
## 59602: UKBMS.99 UKBMS 2018-10-01 2018 10 1 Polyommatus icarus 2
## transect_lon_1km transect_lat_1km
## 1: 3953000 2967000
## 2: 3953000 2967000
## 3: 3953000 2967000
## 4: 3953000 2967000
## 5: 3953000 2967000
## ---
## 59598: NA NA
## 59599: NA NA
## 59600: NA NA
## 59601: NA NA
## 59602: NA NA
data_m <- merge(b_count_sub, m_site_sub[ , .(SITE_ID, transect_lon_1km, transect_lat_1km)], all.x = TRUE)
data_m
## SITE_ID BMS_ID VISIT_DATE YEAR MONTH DAY SPECIES COUNT
## 1: FRBMS.1 FRBMS 2009-05-22 2009 5 22 Maniola jurtina 5
## 2: FRBMS.1 FRBMS 2009-08-01 2009 8 1 Maniola jurtina 5
## 3: FRBMS.1 FRBMS 2009-08-01 2009 8 1 Polyommatus icarus 2
## 4: FRBMS.1 FRBMS 2009-09-19 2009 9 19 Polyommatus icarus 6
## 5: FRBMS.1 FRBMS 2010-07-02 2010 7 2 Maniola jurtina 20
## ---
## 59598: UKBMS.99 UKBMS 2018-08-28 2018 8 28 Polyommatus icarus 8
## 59599: UKBMS.99 UKBMS 2018-09-02 2018 9 2 Maniola jurtina 25
## 59600: UKBMS.99 UKBMS 2018-09-02 2018 9 2 Polyommatus icarus 7
## 59601: UKBMS.99 UKBMS 2018-10-01 2018 10 1 Maniola jurtina 14
## 59602: UKBMS.99 UKBMS 2018-10-01 2018 10 1 Polyommatus icarus 2
## transect_lon_1km transect_lat_1km
## 1: 3953000 2967000
## 2: 3953000 2967000
## 3: 3953000 2967000
## 4: 3953000 2967000
## 5: 3953000 2967000
## ---
## 59598: 3604000 3177000
## 59599: 3604000 3177000
## 59600: 3604000 3177000
## 59601: 3604000 3177000
## 59602: 3604000 3177000
There is many way to manipulate tabular data sets in R, dplyr and data.table are especially efficient with large data set, data.table can manipulate millions of row with ease. Learn one of these tool will help you seriously in every R task and analysis as 80% of the work is to organize the data to make them fit for the analysis or build a graph.