Font Size:
Preserving the scholarly record with WebCite(R) (www.webcitation.org): An archiving system for long-term digital preservation of cited webpages
Last modified: 2008-06-13
Abstract
Objective
Authors increasingly cite webpages and other digital objects on the Internet, which can "disappear" overnight. In one study published in the journal Science, 13% of Internet references in scholarly articles were inactive after only 27 months. Another problem is that cited webpages may change, so that readers see something different than what the citing author saw. The problem of unstable webcitations and the lack of routine digital preservation of cited digital objects has been referred to as an issue "calling for an immediate response" by publishers and authors.
The objective of this paper is to present and discuss the societal implications of a suggested solution, an on-demand archiving system for authors, journal/book editors, and publishers for long-term preservation of cited webreferences.
Methodology
WebCite(R), a member of the International Internet Preservation Consortium, is an on-demand archiving system for non-journal webreferences (cited webpages and websites, or other kinds of Internet-accessible digital objects), which can be used by authors, editors, and publishers of scholarly papers and books, who want to ensure that cited webmaterial will remain available to readers in the future. If cited webreferences in journal articles, books etc. are not archived, future readers may encounter a "404 File Not Found" error when clicking on a cited URL.
A WebCite(R)-enhanced reference is a reference which contains - in addition to the original live URL (which can and probably will disappear in the future, or its content may change) - a link to an archived copy of the material, exactly as the citing author saw it when he accessed the cited material (Example Reference: Lawrence, Lessig: "this is a fantastically cool idea" (Blog). Sept 8, 2006. http://lessig.org/blog/2006/09/ Archived by WebCite (R) at http://www.webcitation.org/5UzgHmsS7 on 20-01-2008)
The archiving process can be initiated by citing authors, editors, publishers, or cited authors. Authors usually use the archiving form, the WebCite(R) bookmarklet, or upload an entire citing manuscript to the WebCite server via the comb page, which initiates the WebCite(R) tool to comb through the manuscript and to archive all cited non-journal URLs.
Participating journal or book editors, publishers, or copyeditors, participate through inserting a note in their "Instructions for authors" asking authors to use webcitation.org to permanently archive all cited webpages and websites before manuscript submission, and to cite the archived copy in addition to the original link.
Participating Publishers (such as BioMed Central) submit manuscript XML files to WebCite(R) at the time of publication, so that WebCite(R) can comb through the manuscript and archive cited webpages automatically.
Citable authors, i.e. academic bloggers and authors of non-journal scholarly webpages who foresee the possibility to be cited in the scholarly literature ("citable Web-author" or subsequently called "cited author"), but are concerned about the persistence and citability of their work can add a "WebCite this!" link to their work, which links dynamically to the archiving form.
Data and interpretation
Since 2005, WebCite(R) has been used by over 200 scholarly journals, and has archived over 3 Million scholarly important files and webpages.
Conclusions
The current state of scholarly communication on the web can be characterized by the following paradox:
1) blogs (and other Internet venues such as wikis) are - at least in theory - important venues for scholarship to publish hyptheses, analyses etc. outside of the traditional journal publishing system
2) yet, they are not considered "citable" or "publications" - which in turn affect their use, usefulness, and acceptance among researchers as tools for scholalry communication.
WebCite(R) aims to make Internet material (any sort of digital objects) more "citable", long-term accessible, and hence more acceptable for scholarly purposes. Without WebCite(R), Internet citations are deemed ephemeral and therefore are often frowned upon by authors and editors. However, it does not make much sense to ignore opinions, ideas, draft papers, or data published on the Internet (including wikis and blogs), not acknowledging them only because they are not "formally" published, and because they are difficult to cite. The reality is that in the age of the Internet, "publication" is a continuum, and it makes little sense to not cite (therefore acknowledge) for example the idea of a scholarly blogger, the collective wisdom of a wiki, ideas from an online discussion paper, or data from an online accessible dataset only because online material is not deemed "citable". By making Internet material more "citable" (and also by creating incentives such as mechanisms and metrics for measuring the "impact" of online material by calculating and publishing WebCite(R) impact factor), we hope that this will encourage scholars to publish ideas and data online in a wide range of formats, which in turn should accelerate and facilitate the exchange of scientific ideas. While we do see the value of scholarly peer-reviewed journals for publishing research results, we also acknowledge that much of the scientific discourse takes place before it is "formally" published, and that peer-review can also take on other forms (e.g. post-publication peer-review, which is something WebCite(R) plans to implement).
Another broader societal aspect of the WebCite(R) initiative is advocacy and research in the area of copyright. We aim to develop a system which balances the legitimate rights of the copyright-holders (e.g. cited authors and publishers) against the "fair use" rights of society to archive and access important material. We also advocate and lobby for a non-restrictive interpretation of copyright which does not impede digital preservation of our cultural heritage, or free and open flow of ideas. This should not be seen as a threat by copyright-holders - we aim to keep material which is currently openly accessible online accessible for future generations without creating economic harm to the copyright holder. This is a challenging, but feasible goal, and future iterations of this service may include some sort of revenue sharing mechanism for copyright holders.
Yet another angle is that WebCite(R) enables "one-click self-archiving", making it very easy for scholarly authors to create a permanent, openly accessible record of their own work and their ideas. While the primary pathway in the WebCite(R) system is third-party initiated archiving (triggered by a citing author), WebCite(R) also provides a very simple mechanism for authors to self-archive their own work.
Authors increasingly cite webpages and other digital objects on the Internet, which can "disappear" overnight. In one study published in the journal Science, 13% of Internet references in scholarly articles were inactive after only 27 months. Another problem is that cited webpages may change, so that readers see something different than what the citing author saw. The problem of unstable webcitations and the lack of routine digital preservation of cited digital objects has been referred to as an issue "calling for an immediate response" by publishers and authors.
The objective of this paper is to present and discuss the societal implications of a suggested solution, an on-demand archiving system for authors, journal/book editors, and publishers for long-term preservation of cited webreferences.
Methodology
WebCite(R), a member of the International Internet Preservation Consortium, is an on-demand archiving system for non-journal webreferences (cited webpages and websites, or other kinds of Internet-accessible digital objects), which can be used by authors, editors, and publishers of scholarly papers and books, who want to ensure that cited webmaterial will remain available to readers in the future. If cited webreferences in journal articles, books etc. are not archived, future readers may encounter a "404 File Not Found" error when clicking on a cited URL.
A WebCite(R)-enhanced reference is a reference which contains - in addition to the original live URL (which can and probably will disappear in the future, or its content may change) - a link to an archived copy of the material, exactly as the citing author saw it when he accessed the cited material (Example Reference: Lawrence, Lessig: "this is a fantastically cool idea" (Blog). Sept 8, 2006. http://lessig.org/blog/2006/09/ Archived by WebCite (R) at http://www.webcitation.org/5UzgHmsS7 on 20-01-2008)
The archiving process can be initiated by citing authors, editors, publishers, or cited authors. Authors usually use the archiving form, the WebCite(R) bookmarklet, or upload an entire citing manuscript to the WebCite server via the comb page, which initiates the WebCite(R) tool to comb through the manuscript and to archive all cited non-journal URLs.
Participating journal or book editors, publishers, or copyeditors, participate through inserting a note in their "Instructions for authors" asking authors to use webcitation.org to permanently archive all cited webpages and websites before manuscript submission, and to cite the archived copy in addition to the original link.
Participating Publishers (such as BioMed Central) submit manuscript XML files to WebCite(R) at the time of publication, so that WebCite(R) can comb through the manuscript and archive cited webpages automatically.
Citable authors, i.e. academic bloggers and authors of non-journal scholarly webpages who foresee the possibility to be cited in the scholarly literature ("citable Web-author" or subsequently called "cited author"), but are concerned about the persistence and citability of their work can add a "WebCite this!" link to their work, which links dynamically to the archiving form.
Data and interpretation
Since 2005, WebCite(R) has been used by over 200 scholarly journals, and has archived over 3 Million scholarly important files and webpages.
Conclusions
The current state of scholarly communication on the web can be characterized by the following paradox:
1) blogs (and other Internet venues such as wikis) are - at least in theory - important venues for scholarship to publish hyptheses, analyses etc. outside of the traditional journal publishing system
2) yet, they are not considered "citable" or "publications" - which in turn affect their use, usefulness, and acceptance among researchers as tools for scholalry communication.
WebCite(R) aims to make Internet material (any sort of digital objects) more "citable", long-term accessible, and hence more acceptable for scholarly purposes. Without WebCite(R), Internet citations are deemed ephemeral and therefore are often frowned upon by authors and editors. However, it does not make much sense to ignore opinions, ideas, draft papers, or data published on the Internet (including wikis and blogs), not acknowledging them only because they are not "formally" published, and because they are difficult to cite. The reality is that in the age of the Internet, "publication" is a continuum, and it makes little sense to not cite (therefore acknowledge) for example the idea of a scholarly blogger, the collective wisdom of a wiki, ideas from an online discussion paper, or data from an online accessible dataset only because online material is not deemed "citable". By making Internet material more "citable" (and also by creating incentives such as mechanisms and metrics for measuring the "impact" of online material by calculating and publishing WebCite(R) impact factor), we hope that this will encourage scholars to publish ideas and data online in a wide range of formats, which in turn should accelerate and facilitate the exchange of scientific ideas. While we do see the value of scholarly peer-reviewed journals for publishing research results, we also acknowledge that much of the scientific discourse takes place before it is "formally" published, and that peer-review can also take on other forms (e.g. post-publication peer-review, which is something WebCite(R) plans to implement).
Another broader societal aspect of the WebCite(R) initiative is advocacy and research in the area of copyright. We aim to develop a system which balances the legitimate rights of the copyright-holders (e.g. cited authors and publishers) against the "fair use" rights of society to archive and access important material. We also advocate and lobby for a non-restrictive interpretation of copyright which does not impede digital preservation of our cultural heritage, or free and open flow of ideas. This should not be seen as a threat by copyright-holders - we aim to keep material which is currently openly accessible online accessible for future generations without creating economic harm to the copyright holder. This is a challenging, but feasible goal, and future iterations of this service may include some sort of revenue sharing mechanism for copyright holders.
Yet another angle is that WebCite(R) enables "one-click self-archiving", making it very easy for scholarly authors to create a permanent, openly accessible record of their own work and their ideas. While the primary pathway in the WebCite(R) system is third-party initiated archiving (triggered by a citing author), WebCite(R) also provides a very simple mechanism for authors to self-archive their own work.