Thursday, December 15, 2016

Backing up climate-science data

It is nearly inconceivable that Trump would order the deletion of climate-science data -- a modern book burning -- but one can imagine large budget cuts for climate-science research, making it impossible to maintain and update this sort of public data.

Climate scientists have kicked off at least two projects to create backup copies of their research and data.

One is Climate Mirror, which is part of an ad-hoc project to mirror public climate datasets before the Trump Administration takes office -- to make sure these datasets remain freely and broadly accessible.

Another is a hackathon that will be hosted on December 17th at the University of Toronto in collaboration with the Internet Archive End of Term project, which seeks to archive the federal online pages and data that are in danger of disappearing during the Trump administration. (Note that they have done the same for earlier administrations).

For example, NASA recently released data showing how temperature and rainfall patterns worldwide may change through the year 2100 because of growing concentrations of greenhouse gases in Earth’s atmosphere.


The post announcing the dataset states:
The dataset, which is available to the public, shows projected changes worldwide on a regional level in response to different scenarios of increasing carbon dioxide simulated by 21 climate models. The high-resolution data, which can be viewed on a daily timescale at the scale of individual cities and towns, will help scientists and planners conduct climate risk assessments to better understand local and global effects of hazards, such as severe drought, floods, heat waves and losses in agriculture productivity.

"NASA is in the business of taking what we’ve learned about our planet from space and creating new products that help us all safeguard our future,” said Ellen Stofan, NASA chief scientist. “With this new global dataset, people around the world have a valuable new tool to use in planning how to cope with a warming planet.
The climate-science community is obviously alarmed by Donald Trump's appointments of Ryan Zinke, who characterizes climate change as “unsettled science," as Secretary of the Interior, Rick Perry, who once could not recall the name of the department, but remembered that he did want to eliminate it, as Secretary of Energy and Scott Pruitt, who consistently opposes regulation, to head the Environmental Protection Agency.

These men are all supporters of and supported by the oil industry.

The Trump transition team also requested a list of the names of Energy Department people (contractors and employees) who have worked on climate science and the professional society memberships of lab workers.

It is nearly inconceivable that Trump would order the deletion of climate-science data -- a modern book burning -- but one can imagine large budget cuts for climate-science research, making it impossible to maintain and update this sort of public data.

-----
Update 12/29/2016

Check out this excellent, short (5:14) interview of Internet Archive founder Brewster Kahle. The interview begins with climate scientist Eric Holthaus talking about the effort to archive climate research, then Khale goes on to say more about how and why they archive government (.gov and .mil) Web sites when a new administration takes over.

He said 83% of the .pdf files on government sites were deleted between 2008 and 2012. In addition to Web pages, they will be archiving virtual machine versions of interactive government services and databases. (As noted above, those are vulnerable to defunding).

When asked for an example of the value of the archive, Khale mentioned the press release announcing George Bush's ironically famous "Mission Accomplished" speech on the deck of an aircraft carrier. As shown below, the headline reads "President bush announces combat operations in Iraq have ended" and the first sentence qualifies the headline by saying "major" combat operations have ended. Khale said that a couple of weeks later "major" was added to the title and a couple months later, the press release was deleted.


Excerpt from press release on "Mission Accomplished" speech

The Internet is potential providing raw data for historians -- it should be complete and accurate.

If you would like to see a video of the entire speech, the Internet Archive has preserved that as well:




-----
Update 12/30/2016

The following is a transcript of Bob Garfield, co-host of the podcast On The Media, interviewing Brewster Khale, founder of the Internet archive and a partner in the End-of-term Project with a lead-in question for on climate-science research Eric Holthaus of Slate Magazine.

Bob: Meanwhile a small army of volunteer archivists, scientists and advocates have been working to save the government climate change research that already exists

Eric: at NASA and NOA that takes the temperature of the planet from weather stations from satellites from ocean buoys.

Bob: Meteorologist Eric Holthaus spoke to NPR about his effort to save government climate data.

Eric: Sometimes these data sets are only stored in United States government servers so there hasn't really been an effort to catalog those in other countries because we haven't thought it was necessary before

Bob: The Internet Archive on the other hand has given a lot of thought to what gets lost in presidential transitions. Every week the archive tapes three hundred million Web pages and every four years it enlists a bunch of volunteers to make copies of government Web sites as a hedge against what the next administration may choose to delete. It's called The End-of-term web archive and for some reason this year the organizers are getting a lot more offers of help. Brewster Kahle founder of the Internet Archive says that this year his team also is backing up its data to Canada

Brewster: When the election went the way that it did, it was a bit of a surprise, so we looked through the television archive at what President-elect Trump said about freedom of the press and about the Internet and what we found was shocking. He wanted to close up parts of the Internet that there was mocking of freedom of the press. This was kind of a wake-up call and we said let's make sure we have a copy in some other location.

Bob: What are your priorities? How does it work?

Brewster: So the Internet Archive works with the Library of Congress, University of North Texas -- now a growing list of groups to try to do as best we can to record the information that's available on the Web sites and now the web services that have been made available on .gov and .mil Web sites. We found in 2008, 83 percent of the PDFs that were available back then are no longer available even by 2012. So with an 83 percent loss rate when the Obama administration came on board we're likely to see something like it maybe even more with the Trump administration.

So we're coordinating activities to go and archive web pages and we're reaching out to federal webmasters to go and see if we can keep whole services up and running. Can we take virtual machine versions of the databases that they're running and be able to run them in snapshot form so that we can keep these services going as they were in 2016 in the future?

Bob:Give me some examples of when the federal web archive has come in handy. Was there something that you and disappeared that you were super glad to have archived?

Brewster: Oh the anecdotes go on and on. Example -- there is a press release from the White House during the George W. Bush administration when he stood on an aircraft carrier and declared “mission accomplished.” And the headline of that press release was combat operations in Iraq had ceased but a couple of weeks later they changed the headline and said major combat operations had ceased with no notice that it had changed. The only reason why we know is because we had archived both versions. And then a couple of months later the press release went away completely from the web. You know what is more Orwellian is it changing a press release that's in the past or is it disappearing completely?

Bob: What are you most worried is going to disappear in a Trump administration?

Brewster: Frankly we have no idea. This upcoming administration is very aware of the power of the Internet and how it can be manipulated -- how you can go and push things out in the middle of the night and use the journalist system in ways that are really pretty blatant. So let's at least keep a record of it.

Bob: We have just experienced the interference in a political campaign by outsiders. Is this archive secure -- I mean really secure against hacking, against intrusion?

Brewster: The history of libraries is a history of loss. Libraries are burned. That's what happened in the Library of Alexandria. It'll be what happens to us -- just don't know when. So let's design for it. Let's go and make copies in other places. Let's make sure people want universal access to all knowledge, that they want education based on facts. Let's go and make sure that there is an environment that supports libraries. That's the only way that in the long term we're going to survive. And the copies that are maybe now unique at the Internet Archive will survive based on all sorts of changes whether it's earthquakes or institutional failure or law changes.

Bob: Brewster as always many thanks.

Brewster: Thank you very much.

Bob: Mr. Khale is the founder of the Internet Archive and a partner on the End-of-term Project.

Khale's interview was part of longer podcast episode called Hurry Up. They discussed other steps President Obama could take during the last weeks of his term. The suggestions included disclosing information on contributions by government contractors, surveillance and the drone program, closing Guantanamo and clemency. The episode ends with a discussion of the nature of time by science writer James Gleick.

Finally, I created the interview transcript using a nifty service called PopUpArchive. You simply upload a sound file and wait for the text version to be posted ready for download. It takes a little proofreading and editing, but it is a lot faster than manual transcription and as this Microsoft Research report shows, we can look forward to more accurate speech recognition in the future.