On Preserving Coral Reef Imagery and Related Data – by James W Porter

March 6, 2018March 6, 2018 crescyntrcndata, database, image, photo, planning, storageLeave a comment

Porter_James_Slide_Discovery-Bay-Jamaica — James Porter’s coral photo monitoring project in Discovery Bay

In preparation for an upcoming Data Science for Coral Reefs: Data Rescue workshop, Dr. James W. Porter of the University of Georgia spoke eloquently about his own efforts to preserve historic coral reef imagery captured in Discovery Bay, Jamaica, from as early as 1976. It’s a story from the trenches with a senior scientist’s perspective, outlining the effort and steps needed to accomplish preservation of critical data, in this case characterizing a healthy reef over 40 years ago.

Enjoy this insightful 26-min audio description, recorded on 2018-01-04.

Transcript from 2018-01-04 (lightly edited):

This is Dr. Jim Porter from the University of Georgia. I’m talking about the preservation of a data set that is at least 42 years old now and started with a photographic record that I began making in Discovery Bay, Jamaica on the north coast of Jamaica in 1976. I always believed that the information that photographs would reveal would be important specifically because I had tried other techniques of line transecting and those were very ephemeral. They were hard to relocate in exactly the same place. And in addition to that they only captured a line’s worth of data. And yet coral reefs are three dimensional and have a great deal of material on them not well captured in the linear transect. So those data were… I was very consistent about photographing from 1976 to 1986.

But eventually funding ran out and I began focusing on physiological studies. But toward the end of my career I realized that I was sitting on a gold mine. So, the first thing that’s important when considering a dataset and whether it should be preserved or not is the individual’s belief in the material. Now it’s not always necessary for the material to be your own for you to believe in it. For instance, I’m working on Tom Goreau, Sr.’s collection which I have here at the University of Georgia. I neither made it nor in any way contributed to its preservation but I’ve realized that it’s extremely important and therefore I’m going to be spending a lot of time on it. But in both cases, the photographic record from Jamaica, as well as the coral collection itself – those two activities have in common my belief in the importance of the material.

The reason that the belief in the material is so important is that the effort required to capture and preserve it is high, and you’ve got to have a belief in the material in order to take the steps to assure the QA/QC of the data you’re preserving, as well as the many hours required to put it into digital format. And believing in the material then should take another step, which is a very self-effacing review of whether you believe the material to be of real significance to others. There’s nothing wrong with memorabilia. We all keep scrapbooks and photographs that we like – things relating to friends and family, and times that made us who we are as scientists and people. However, the kind of data preservation that we’re talking about here goes beyond that – could have 50 or 100 years’ worth of utility.

Those kinds of data really do require them to be of some kind of value, and the value could either be global, regional, or possibly even local. Many local studies can be of importance in a variety of ways: the specialness of the environment, or the possibility that people will come back to that same special environment in the future. The other thing that then is number two on the list – first is belief in the material – second is you’ve got to understand that the context in which you place your data is much more important to assure its survival and utility than the specificity of the data. Numbers for their own sake are numbers. Numbers in the service of science become science. It is the context in which you place your data that will assure its future utility and preservation.

Continue reading “On Preserving Coral Reef Imagery and Related Data – by James W Porter” →

CRESCYNT Data Science for Coral Reefs Workshop 1 – Data Rescue

December 22, 2017March 10, 2018 crescyntrcnbackup, dark data, database, legacy, resources, storage, workflowLeave a comment

light-bulb-nero-neo-pixabay_1129247_1920

We’re extremely pleased to be able to offer two workshops in March 2018 at NCEAS. The first is CRESCYNT Data Science for Coral Reefs Workshop 1: Data Rescue. Apply here.

When: March 7-10, 2018
Where: NCEAS, Santa Barbara, California, USA

Workshop description:

Recommended for senior scientists with rich “dark” data on coral reefs that needs to be harvested and made accessible in an open repository. Students or staff working with senior scientists are also encouraged to apply. Topics covered on days 1 and 2 of the workshop will cover the basic principles of data archiving and data repositories, including Darwin Core and EML metadata formats, how to write good metadata, how to archive data on the KNB data repository and elsewhere, data preservation workflow and best practices, and how to improve data discoverability and reusability. Additionally, participants will spend approximately 2 days working in pairs to archive their own data using these principles, so applying with a team member from your research group is highly recommended.

The workshop is limited to 20 participants. We encourage you to apply via this form. Workshop costs will be covered with support from NSF EarthCube – CRESCYNT RCN. Participants will publish data during the workshop process, and we anticipate widely sharing workshop outcomes, including workflows and recommendations. Because coral reef science embodies a wide range of data types (spreadsheets, images, videos, field notes, large ‘omics text files, etc.), anticipate some significant pre-workshop prep effort.

UPDATE: HERE IS THE AGENDA FOR THE WORKSHOP, WITH TRAINING LINKS.

>>>Go to the blog Masterpost or the CRESCYNT website or NSF EarthCube.<<<

CRESCYNT Toolbox – Data Repositories – Estate Planning for your Data

January 27, 2017August 10, 2017 crescyntrcndata, database, planning, repository, storageLeave a comment

“Hypotheses come and go but data remain.” – Ramon y Cajal

Taking care of our data for the long term is not just good practice, allowing us to share our data, defend our work, reassess conclusions, collaborate with colleagues, and examine broader scales of space and time – it’s also estate planning for our data, and a primary way of communicating with future scientists and managers.

egg-gold

Here are some great options for long-term data storage, highlighting repositories friendly to coral reef science.

First, there are some important repository networks useful for coral reef data – these can unify standards and offer collective search portals: we like DataONE (members here) and bioCaddie (members here).

KNB – the Knowledge Network for Biocomplexity offers open and private data uploads; ecological orientation. DataONE network.

NOAA CoRIS: Coral Reef Information System – often free to use and can accept coral reef related data beyond NOAA’s own data; contact them first.

BCO-DMO – Biological and Chemical Oceanography Data Management Office – if you have an NSF grant that requires data storage here, you’re fortunate. Good data management guidelines and metadata templates, excellent support staff. Now a DataONE member.

Dataverse – supported by Harvard endowments. There are multiple organizational dataverses – the Harvard Dataverse is free to use. bioCaddie member.

Zenodo – free to use, supported by the European Commission (this is a small slice of CERN’s enormous repository for the Large Hadron Collider). Assigns dois. We invite you to include the “Coral Reef” community when you upload. bioCaddie member.

NCBI – the National Center for Biotechnology Information is very broadly accepted for ‘omics data of all types. A bioCaddie member.

DataCite – not a repository, but if you upload a dataset at a repository that does not assign its own doi’s, you can get one at DataCite and include it when publishing your datasets.

We’ve not listed more costly repositories such as Dryad (focused on journal requirements) or repositories restricted to institutions. What about other storage options such as GitHub, Amazon Web Services, websites? Those have important uses, but are not curated repositories with long-term funding streams, so are not the best data legacy options.

eggs-stacked-images Most of these repositories allow either private (closed) or public (open) access, or later conversion to open access. Some have API’s for automated access within workflows. These are repositories we really like for storing and accessing coral reef work. Share your favorite long-term data repository – or experiences with any of the repositories listed here – in the comments.

CRESCYNT Toolbox – Disaster Planning and Recovery

November 22, 2016February 3, 2017 crescyntrcnbackup, collaboration, computer, data, disaster, planning, recovery, storageLeave a comment

With computers, the question is not whether they will fail, but when.

tl;dr – It’s very practical to have cloud storage backup in addition to still-useful external hard drive backup routines. Here are some secure cloud alternatives.

itcrowd_giphy Personal note. I’ve had hard drive failures due to lightning strike; simultaneous death of mirrored hard drives within a RAID; drenching from an upper floor emergency shower left flowing by a disgruntled chemistry student; and most recently, demise of my laptop by sudden immersion in salt water (don’t ask). By some intersection of luck and diligence, on each occasion recent backups were available for data recovery. In the most recent remake, it was a revelation how much work is now backed up via regular entry into the casual cloud.

This latest digital landing was mercifully soft (…cloudlike). Because of work portability, my recent sequential backup habit has been to a paid unshared Dropbox account; $10/mo is a bargain for peace of mind (beyond a certain size, restoration is not drag-and-drop). A surprising number of files these days are embedded in multiple team projects – much on Google Drive – so all of that was available, with revision history. Group conversations and files were on Slack and email. One auxiliary brain (iPhone) was in a waterproof case with cloud backup, and another auxiliary brain (project/task tracking) was in a web app, KanbanFlow. Past years of long-term archives were already on external hard drives in two different cities. GitHub is an amazing place to develop, document, recover and share work in progress and products, but it is not a long-term curated data repository. For valuable datasets, the rule is to simplify formats, attach metadata, and update media periodically.

Thinking about your own locations for data storage and access? Check out this review of more secure alternatives to – and apps on top of – Dropbox. Some, like OwnCloud, can serve as both storage and linked access for platforms like Agave. A strength of some current analytical platforms is that they can access multiple data storage locations; for example, Open Science Framework can access Dropbox, Google Drive, GitHub, Box, figshare, and now Dataverse and Amazon Web Services as well.

A collaborator recently pointed out that the expense of any particular type of data storage is really the expense of its backup processes: frequency, automation, security, and combination of archiving media. Justifying the expense can come down to this question: What would it cost to replace these data? Some things are more priceless than others.

Disaster Planning and Recovery tools. To go beyond data recovery in your planning, here’s an online guide for IT disaster recovery planning and cyberattacks. How much of a problem is this really? See Google’s real-time attack map (hit “play”). Better to plan than fear. You did update those default passwords on your devices, yes?

Feel free to share your own digital-disaster-recovery story in the comments.

Coral Reef Science & Cyberinfrastructure Network

CRESCYNT is an EarthCube Research Coordination Network. Find out more at crescynt.org

storage

On Preserving Coral Reef Imagery and Related Data – by James W Porter

CRESCYNT Data Science for Coral Reefs Workshop 1 – Data Rescue

CRESCYNT Toolbox – Data Repositories – Estate Planning for your Data

CRESCYNT Toolbox – Disaster Planning and Recovery