Chapter 9 Data repositories

Archiving geolocator data is not only good scientific practice, it is a precondition for publication in a growing number of journals and for receiving funding from many organizations. Proper archiving retains the value of the data for future work, allowing for (re‐)analysis with new analytical methods or overarching syntheses. To enable appropriate re-use of geolocation data, it is crucial to archive raw data and location estimates in conjunction with documentation needed to understand how the data were collected and what processing and analysis methods led to any reported results.

Although several databases exist for sharing and archiving animal movement data, the foremost as a general research platform for animal movement data is Movebank. Movebank is particularly well suited for geolocator data as it includes features tailored to tracking based on ambient light levels, supporting storage of raw light recordings, twilight annotation tables, location estimates with credibility intervals, and custom files such as R scripts or model output, along with deployment‐ and study‐level metadata. Researchers are encouraged to publish their data in the public domain, which can be done by publishing in the Movebank Data Repository, wherein a dataset is provided with a digital object identifier (DOI), persistent link, licence and citation after undergoing review. For in‐progress work, Movebank provides a range of sharing options: owners can choose to give access to specific researchers, and allow the public to see a written summary only, view the tracks on a map, or download the data. We recommend storing data in Movebank or similar repositories even during preliminary analysis stages when the results and analyses remain unpublished. In these situations, the online archive provides a secure backup and can be discovered by other researchers who can contact the owner about potential collaboration using the data.

9.1 Basic principles of storing data in Movebank

Movebank (movebank.org) is a free online database and research platform for animal movement and other animal-borne sensor data hosted by the Max Planck Institute of Animal Behavior. Researchers retain ownership of their data and choose whether and with whom to share their data. All data are stored within user-owned studies. You can access and analyze Movebank data directly within R using the move package. These are the basic steps to archiving a dataset on Movebank:

Log in to Movebank. If you are not registered already, create an account at https://www.movebank.org/cms/webapp/register.
Create a study. Typically users create a study for each project, but for large projects you may want to create studies for each species, field season or deployment site. Consider this carefully because there is no automated way to move data between studies later. Provide sufficient detail in your study name, citations and objectives to inform others about your project. You can also set License Terms that must be accepted by collaborators with permission to download your data.
Set the access permissions. Define what level of access to give to the public, and assign other users as collaborators (to whom you can give extra viewing and download access) or data managers (who can help upload and organize). You can update these settings at any time.
Add data. You can import tables of raw light recordings, annotated twilights, location estimates and deployment information, and upload and store other relevant files. More detailed instructions follow in Section 9.2.
Submit for publication. The Movebank Data Repository is a public archive of Movebank studies that provides authors with a DOI and research citation for their datasets. When you have completed your data analysis we recommend submitting your dataset for publication. Researchers typically submit when their related manuscripts are in review.

Section 9.3 provides examples of published geolocator studies. Movebank’s user manual and archiving best practices provide more instructions and guidance for organizing studies. For help, contact support@movebank.org.

9.2 What can (and should) be stored in Movebank

Besides the results (location estimates), Movebank enables users to store all relevant information and data needed to repeat an analysis or reanalyze the data using different methods. This is done by storing four data tables that contain (1) raw light recordings, (2) annotated twilights, (3) location estimates and (4) deployment-level information about the tags, animals, deployments and data processing steps. Movebank relates these tables using a data model that mimics typical tag deployments:

After creating a study (steps 1 and 2 above), you can begin importing files. For each file import, you will map each column in the table to a Movebank data attribute. All attributes are defined in the Movebank Attribute Dictionary and a persistent, machine-readable version is published at http://vocab.nerc.ac.uk/collection/MVB/current. While each study and set of files is a bit different, we will walk through the import steps in a recommended order below. Also see Movebank’s instructions for importing custom tabular data. We are preparing a complete example of all steps in the study “Light-Level Geolocation Manual” (study ID 788381694).

9.2.1 Raw light recordings

These are the files you obtain from your geolocators. Movebank can accept most delimited tabular data files with or without a header row. Any rows in the header other than column names must be removed before importing. Information stored in the header can be integrated into the reference data, compiled and added as a file attachment to the study, and/or included in the readme if the data are published in the Movebank Data Repository (more below). You can import one file per tag or one file with data from multiple tags. If you have many files, it can be easier to merge the light-level data into one or a few files that contain a tag ID column prior to importing to Movebank (see 9.2.6 Merging files). To import a light-level file,

From your study select Upload Data > Import Data > Light-level data > Raw light-level data.
Browse to the file and select Upload. A window will open previewing “What Movebank sees in your file” and “How Movebank will save the data”. If nothing appears under “How Movebank will save the data”, or if the first row is not read correctly, select CSV Parameter to adjust how the file is read.
Set Reference to Animal/Tag: If the file contains data for a single tag and does not contain an ID column, check the box next to “All rows belong to the same Animal/Tag”. You can either select the IDs from a list (if you have already imported the reference data, see 9.2.4 Reference data) or create a new ID. If the file contains an ID column, select the relevant column/s. You are only required to map a Tag ID—you can either map an Animal ID in this step or relate the tags to animals with your reference data table. Read more. When you are done select Save.

Map Timestamp: You can map one or more date-time columns to Movebank’s “Timestamp” attribute. Use “+” to add additional columns if needed. Provide the format string for each column based on this key. Read more. When you are done select Save. The format displayed under “How Movebank will save the data” is yyyy-MM-dd HH:mm:ss.SSS. Please check your timestamp format carefully! In particular, Movebank sometimes guesses months and days incorrectly if no days above 12 are in the top portion of the file. It can be easy to overlook this mistake with data from non-location sensors.
Map Light Level: Choose the column that contains the light values and select Save. Now your file should be ready to import.
Twilight: If twilight selections are noted in a column of your light-level data, you can map this to twilight.

Select Finish. You will have the option to filter duplicate records (typically not present in light level files) and to save a copy of your file format. Save the format, giving it a name like “lig files” or “glf files”, so that you can import additional light-level files to the study without having to repeat these steps each time.
Select Ok. Depending on the file size, processing may take a few seconds or several minutes. Once the file is imported, you can import more light recording files or move on to annotated twilights (below).

9.2.2 Annotated twilights

Twilight selections made while processing your light-level data can be reported as a tabular file containing the twilight times and results of editing and filtering steps. For example, TAGS will provide a file that looks like

datetime,light,twilight,interp,excluded
2014-06-17T14:34:13.000Z,5.76547632856179,0,FALSE,FALSE

and TwGeos will provide a file that looks like

"Twilight","Rise","Deleted","Marker","Inserted","Twilight3"
2015-07-16 19:43:53,FALSE,FALSE,0,FALSE,2015-07-16 19:43:53

If you have many files, it can be easier to merge the twilight data into one file containing a tag ID column prior to importing to Movebank (see 9.2.6 Merging files). To import a twilight file,

From your study select Upload Data > Import Data > Light-level data > Twilight data.
Browse to the file and select Upload. If nothing appears under “How Movebank will save the data”, or if the first row is not read correctly, select CSV Parameter to adjust how the file is read. Map the following columns:
Sensor Type: If this is not mapped already, check the box next to “Set fixed Sensor Type for all rows” and choose “Solar Geolocator Twilight”.
Animal/Tag IDs: Each location must be associated with a Tag ID, and can then be associated with an Animal ID using this file (Read more) or by defining deployments in the reference data (see 9.2.4 Reference data).
Timestamps: For details on mapping timestamp values read here. In the example from TwGeos above, the format is yyyy-MM-dd HH:mm:ss. Please check your timestamp format carefully!
Rise: Define whether the twilight represents sunrise or sunset using the attribute geolocator rise, which accepts values TRUE (sunrise) and FALSE (sunset). If values are stored in a different format, use geolocator fix type. Be sure that both TRUE and FALSE values are mapped as shown below—Movebank might not automatically “see” and map values that are not present toward the beginning of large files.

Deleted: If available, map to the attribute twilight excluded.
Inserted: If available, map to the attribute twilight inserted.
Twilight3: If available, map to the attribute geolocator twilight3. As described above, please define and check your timestamp format carefully. In the example from TwGeos above, the format is yyyy-MM-dd HH:mm:ss.
Once you are done mapping attributes, your preview should look something like this:

Select Finish. You will have the option to filter duplicate records (typically not present in twilight files) and to save a copy of your file format. Save the format if you will import additional twilight files to the study so that you don’t have to repeat the mapping process each time.
Select Ok. Depending on the file size, processing may take a few seconds or several minutes. Once the file is imported, you can import more twilight files or move on to location estimates (below).

9.2.3 Location estimates

The results of geolocator analysis can be reported as a tabular file containing estimated location coordinates for each animal over the deployment period and which may include additional columns with information such as location accuracy or migration stage. When possible we recommend including upper and lower bounds for the location estimates. The functions FLightR2Movebank and SGAT2Movebank can be used to create location estimates ready to import to Movebank. To import this file, follow the instructions for importing custom tabular data in Movebank’s data manual. The example below supplements these instructions with details specific to geolocation-based location estimates.

The times and locations of deployment and recapture can be represented in two ways in Movebank: (1) in this file with the location estimates and (2) in the reference data (see 9.2.4 Reference data). If you include these records in this file, be sure to indicate this because they are of course more accurate than the estimates from geolocation analysis. The easiest way to do this is to add a “comments” column to the file and a note “known deployment location”, or something similar, to those records.

From your study select Upload Data > Import Data > Processed solar geolocator data > other geolocator data.
Browse to the file and select Upload. If nothing appears under “How Movebank will save the data”, or if the first row is not read correctly, select CSV Parameter to adjust how the file is read. Map the following columns:
Sensor Type: If this is not mapped already, check the box next to “Set fixed Sensor Type for all rows” and choose “Solar Geolocator”.
Animal/Tag IDs: Each location must be associated with a Tag ID, and can then be associated with an Animal ID using this file (Read more) or by defining deployments in the reference data (see 9.2.4 Reference data).
Timestamps: For your location estimates, you might have either a single timestamp or start and end timestamps associated with each record. If you have one timestamp per location, map this to the “Timestamp” field. If it represents the beginning of the period for which the estimate is valid, also map this to “Start Timestamp”. If you have start and end timestamps for each location, map the start timestamp to “Timestamp” and “Start Timestamp” and the end timestamp to “End Timestamp”. For details on mapping timestamp values read here. Please check your timestamp format carefully!
Location coordinates: Select the columns that contain the latitude and longitude estimates, to be stored in the Movebank attributes “Location Lat” and “Location Long”.
Upper and lower coordinate bounds: Where possible we recommend including the upper and lower location bounds. To do this, select Map other Attributes, Map Column, or the relevant column header under “What Movebank sees in your file”. The related Movebank attributes are listed as under “Other attributes” as “Latitude Lower”, “Latitude Upper”, “Longitude Upper” and “Longitude Lower”. You can further define these bounds using “Location Accuracy Comments” in the reference data (see below).
Geolocator Fix Type: Provide the period of day (e.g. “sunrise”) used to estimate the location.
Comments: As described above, here you can indicate high-accuracy records from known locations that were not derived from the geolocation analysis.
Additional attributes: To map other attributes you might want to include, see the Movebank Attribute Dictionary for available attributes.
Once you are done mapping attributes, your preview should look something like this:

After finishing the import, if you mapped both Tag and Animal IDs, you might be directed to the Deployment Manager to accept deployments created based on the data in your file. If you haven’t yet added deployments, it is fine to accept these, however for your final deployments, we recommend importing a reference data file that contains additional information important to understanding geolocation studies.

9.2.4 Reference data

In Movebank, “reference data” includes information about animals, tags and deployments. This can be prepared as a table with one row per deployment, and allows you to document additional details about your geolocator analysis methods. The following reference data for bird 14SA in our example study “Light-Level Geolocation Manual” includes recommended reference data attributes to include for geolocation studies, with notes in italics.

tag-id: 14SA (tag, animal and deployment IDs can be the same or different)
animal-id: 14SA
deployment-id: 14SA_2015
animal-taxon: Merops apiaster (use a taxon that is valid at itis.gov)
deploy-on-date: 2015-07-13 00:00:00.000
deploy-off-date: 2016-07-15 00:00:00.000
animal-life-stage: adult
animal-mass: 108.0 (grams)
animal-reproductive-condition: breeding
animal-ring-id: SA40838
animal-sex: f
attachment-type: harness
data-processing-software: TwGeos v0.0-1, SGAT v0.1.3
deploy-off-latitude: 51.32
deploy-off-longitude: 11.96
deploy-on-latitude: 51.32
deploy-on-longitude: 11.96
deployment-end-type: removal
duty-cycle: light sampled every 5 minutes
geolocator-calibration: on-bird calibration 2015-07-20 to 2015-08-29
geolocator-light-threshold: 0.5
location-accuracy-comments: upper/lower coordinates are the 95% credible interval calculated with locationSummary, SGAT package
manipulation-type: none
study-site: Saxony Anhalt, Germany
tag-manufacturer-name: Swiss Ornithological Institute
tag-mass: 1.3 (grams)
tag-model: PAM GDL3
tag-readout-method: tag-retrieval

Once you have prepared this file, follow the instructions for importing reference data in Movebank’s user manual.

9.2.5 Additional files

Other data sensors: You can import additional sensor data, e.g. from conductivity or temperature sensors on the tag. Follow the instructions in Section 9.2.1, but select the appropriate sensor type, using “accessory measurements” if the sensor is not listed.

Other kinds of files: Movebank allows you to store additional files in your study that are not imported to the database. You can use this to store things like R scripts, a text file with the configuration details stored in the header of original light-level files, or more detailed capture information. From your study select Upload Data > Add Attachments and follow instructions to upload file attachments from Movebank’s user manual.

9.2.6 Merging files

Here is an example of how to merge files of light recordings or twilights prior to importing them to Movebank. This allows you to import just one file rather than one for each tag. With files exceeding ~500 MB the import process can become slow, so see these tips for importing large csv files. We have successfully imported files up to at least 3 GB.

To merge multiple tabular files in the same format, first get a list of all files in the directory.

filenames <- list.files(path = ".", pattern = NULL, all.files = FALSE, 
                        full.names = FALSE, recursive = FALSE, ignore.case = FALSE)

Make a function to read the files and add a column with the filename (where the tag identifiers are typically stored).

read_txt_filename <- function(filename){
ret <- read.table(filename, header=TRUE, sep=";", quote='"', dec=".", na.strings="", 
                  colClasses="character", fill=TRUE, strip.white=TRUE, blank.lines.skip=TRUE) 
                  #, colClasses=c(rep("character", 7), rep("NULL", 6)), blank.lines.skip=TRUE)
                  ret$Filename <- filename
                    ret}

Read in the contents of all files to a new data frame.

library(plyr)
dat <- ldply(filenames, read_txt_filename)

Remove extension from the filenames. Adjust as needed so that the filename values become the Tag IDs you want to use in Movebank.

dat <- lapply(dat, function(findreplace){
      gsub(".csv","",findreplace)})

Save the file with settings that will minimize file size and leave NAs blank.

write.csv(dat, file="merged_files.csv", quote=F, na="", row.names=F)

9.3 Examples of geolocator studies in Movebank

The following Movebank studies follow many of the best practices described above and can be publicly downloaded, so you can use them to help organize your own datasets. To search Movebank for publicly visible geolocator studies that include location estimates, go to the Tracking Data Map select “All Sensor Types” and choose “Solar Geolocator”. Geolocator studies published in the Movebank Data Repository are listed at https://www.datarepository.movebank.org/discover?query=geolocator. For more on how to properly use and cite these examples, as well as other geolocator studies in Movebank, see Movebank’s citation guidelines, general terms of use and user agreement.

Light-Level Geolocation Manual (study 788381694): This study contains sample data used in this manual, imported following the recommendations above. See Lisovski et al. (2019).
Red-necked Phalarope southern Chukotka (data from Mu et al. 2018) (study 411155573 published with DOI 10.5441/001/1.p41784h5): This study includes measurements from conductivity and temperature sensors. Movebank could use input from more data owners with conductivity sensors to improve how we document these. See Mu et al. (2018a, 2018b).
Migration of male and female red-backed shrikes from southern Scandinavia (data from Pedersen et al. 2019) (study [890556697] (https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study890556697)) published with DOI 10.5441/001/1.j71640kh: This study includes light recordings, annotated twilights, location estimates and reference data. See Pedersen et al. (2019a, 2019b).
Vermont Center for Ecostudies Grasshopper Sparrow Migration Project (study 348283339, published with DOI 10.5441/001/1.c6b47s0r: This study includes light recordings and location estimates following the recommendations above, and the published dataset also includes an R script and original .lux file to further document the analysis. See Hill and Renfrew (2019a, 2019b).

References

Hill JM and Renfrew RB (2019a) Data from: Migratory patterns and connectivity of two North American grassland bird species [grasshopper sparrows]. Movebank Data Repository. https://doi.org/10.5441/001/1.c6b47s0r

Hill JM, Renfrew RB (2019b) Migratory patterns and connectivity of two North American grassland bird species. Ecology and Evolution 9(1): 680–692. https://doi.org/10.1002/ece3.4795

Lisovski S, Bauer S, Briedis M, Davidson SC, Dhanjal-Adams KL, Hallworth MT, Karagicheva J, Meier CM, Merkel B, Ouwehand J, Pedersen L, Rakhimberdiev E, Roberto-Charron A, Seavy NE, Sumner MD, Taylor CM, Wotherspoon SJ, Bridge ES (2020) Light-level geolocator analyses: a user’s guide. Journal of Animal Ecology 89(1): 221-36. https://doi.org/10.1111/1365-2656.13036

Mu T, Tomkovich PS, Loktionov EY, Syroechkovskiy EE, Wilcove DS (2018a) Data from: Migratory routes of red-necked phalaropes Phalaropus lobatus breeding in southern Chukotka revealed by geolocators. Movebank Data Repository. https://doi.org/10.5441/001/1.p41784h5

Mu T, Tomkovich PS, Loktionov EY, Syroechkovskiy EE, Wilcove DS (2018b) Migratory routes of red-necked phalaropes Phalaropus lobatus breeding in southern Chukotka revealed by geolocators. Journal of Avian Biology 49(7): e01853. https://doi.org/10.1111/jav.01853

Pedersen L, Jakobsen NM, Strandberg R, Thorup K, Tøttrup AP (2019a) Data from: Sex-specific difference in migration schedule as a precursor of protandry in a long-distance migratory bird. Movebank Data Repository. https://doi.org/10.5441/001/1.j71640kh

Pedersen L, Jakobsen NM, Strandberg R, Thorup K, Tøttrup AP (2019b) Sex-specific difference in migration schedule as a precursor of protandry in a long-distance migratory bird. The Science of Nature 106:45. https://doi.org/10.1007/s00114-019-1637-6