Revision as of 23:47, 29 October 2013 editLexein (talk | contribs)Extended confirmed users, Rollbackers17,577 edits Neutral topnote only. No damn dragging made-up "policy" or "guideline" issues into it.← Previous edit | Revision as of 00:09, 30 October 2013 edit undoLexein (talk | contribs)Extended confirmed users, Rollbackers17,577 edits copyedit for tone, rm extra space at topNext edit → | ||
Line 1: | Line 1: | ||
''The use of Archive.is is under discussion at ].''<!--Neutral. See Talk--> | |||
''The use of Archive.is is under discussion at ].'' <!-- this is the only neutral advisory possible, or needed. --><!-- Salting any discussion of policy in the tag. This howto is NOT under discussion at ] by ANYONE. Nor is there an active policy issue involved. It passed. It passed ], period. The dispute was resolved ''howto kept''. Nobody brought up ], or any policy issues, during the ], or during the MFD discussion period. --Lexein --> | |||
{{ |
{{NOINDEX}} | ||
⚫ | {{Misplaced Pages how to}} | ||
{{Use dmy dates|date=October 2013}} | {{Use dmy dates|date=October 2013}} | ||
⚫ | This page gives information about using ], an on-demand ] service, at http://archive.is/. A web archiving service allows Misplaced Pages editors to reduce ] by preserving a copy of an online ] that can be accessed if the original page is moved, changes, or disappears. Not all web pages can be archived using Archive.is.<ref name="wcfaq" group="nb">{{cite web | url=http://archive.is/faq.html | publisher=Archive.is | title=FAQ | archiveurl= | archivedate= | deadurl=no | quote=A page may not be archived for a number of reasons. Archive.is does not support archiving ] files, audio and video. The page may be too big (there is 50mb limit for a single page). The content may be inaccessible from the Archive.is network (this is particularly likely if you are attempting to access subscription based content which your institution subscribes to on its users' behalf). Also, the content may be unreadable by the Archive.is archiver (too complex JavaScript based pages can crash its browser or be executed too long time, or ones involving browser checks sometimes cause our archive engine to fail). }}</ref> | ||
⚫ | {{Misplaced Pages how to |
||
⚫ | Archive.is can archive ] web pages, ], ], and ]s. | ||
⚫ | This page gives information about using ], an on-demand ] service, at http://archive.is/. |
||
⚫ | Archive.is can archive |
||
=== Differences from other archivers === | === Differences from other archivers === | ||
Line 14: | Line 13: | ||
Archive.is removes archived pages by request of copyright holders per the U.S. DMCA;<ref name="is_4139573794">{{cite web | url=http://blog.archive.is/post/41395737942/how-can-i-delete-an-archived-page | title=How can I delete an archived page | work=Blog | publisher=Archive.is | date=24 January 2013 | archiveurl=http://web.archive.org/web/20130926093655/http://blog.archive.is/post/41395737942/how-can-i-delete-an-archived-page | archivedate=2013-09-26 | deadurl=no }}</ref><ref name="is_remove">Example DMCA removal: {{cite web | url=http://archive.is/ugotposted.com<!-- http://hiyo.jp/cache/of/2013-09-28-00-28-31/http://archive.is/ugotposted.com --> | title=Archive.is: saved from ugotposted.com | publisher=Archive.is | date= | archiveurl= | archivedate= | deadurl=no | quote="In response to a complaint we received under the US Digital Millennium Copyright Act we have removed content from this page." }}</ref> requests can be made with the "Report abuse" link on Archive.is archived pages. | Archive.is removes archived pages by request of copyright holders per the U.S. DMCA;<ref name="is_4139573794">{{cite web | url=http://blog.archive.is/post/41395737942/how-can-i-delete-an-archived-page | title=How can I delete an archived page | work=Blog | publisher=Archive.is | date=24 January 2013 | archiveurl=http://web.archive.org/web/20130926093655/http://blog.archive.is/post/41395737942/how-can-i-delete-an-archived-page | archivedate=2013-09-26 | deadurl=no }}</ref><ref name="is_remove">Example DMCA removal: {{cite web | url=http://archive.is/ugotposted.com<!-- http://hiyo.jp/cache/of/2013-09-28-00-28-31/http://archive.is/ugotposted.com --> | title=Archive.is: saved from ugotposted.com | publisher=Archive.is | date= | archiveurl= | archivedate= | deadurl=no | quote="In response to a complaint we received under the US Digital Millennium Copyright Act we have removed content from this page." }}</ref> requests can be made with the "Report abuse" link on Archive.is archived pages. | ||
Archive.is does abide by the ],<ref name="Dascalescu">{{cite web | first=Dan | last=Dascalescu | url=http://wiki.dandascalescu.com/reviews/online_services/web_page_archiving | title=Web page archiving | work=Wiki | publisher=Dan Dascalescu | date=18 February 2013 | archiveurl=http://web.archive.org/web/20130922192354/http://wiki.dandascalescu.com/reviews/online_services/web_page_archiving | archivedate=2013-09-22 | deadurl=no }} {{dubious|reason=wiki}}</ref> while other archivers do: WayBack Machine uses it to avoid archiving material which site owners do not want archived,<ref name="archiveorg_exclude">{{cite web | url=http://archive.org/about/exclude.php | title=Removing Documents From the Wayback Machine | publisher=Archive.org | date= | archiveurl=http://web.archive.org/web/20021015132338/http://www.archive.org/about/exclude.php | archivedate=2002-10-15 | deadurl=no }}<!-- doesn't mention copyright --></ref><ref name="archiveorg_faq">{{cite web | url=http://archive.org/about/faqs.php#14 | title=Some sites are not available because of robots.txt or other exclusions. What does that mean? | work=FAQ | publisher=Archive.org | date= | archiveurl=http://web.archive.org/web/20021004170927/http://www.archive.org/about/faqs.php#14 | archivedate=2002-10-04 | deadurl=no }}</ref> and WebCite uses it to address "copyright issues".<!-- quote source exactly --><ref name="PMC1550686">{{cite journal | last=Eysenbach | first=Gunther | last2=Trudel | first2=Mathieu | date=30 December 2005 | url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1550686/ | title=Going, Going, Still There: Using the WebCite Service to Permanently Archive Cited Web Pages | journal=Journal of Medical Internet Research | doi=10.2196/jmir.7.5.e60. | via=NIH.gov | quote=Copyright issues are addressed by honouring respective Internet standards (robot exclusion files, no-cache and no-archive tags). }}</ref> Web sites use "robot" tags to inform archives that their content is not to be re-hosted on any other site, by consensus in ] <code>robots-request@nexor.co.uk</code>.<ref name="robotstxt">{{cite web | url=http://www.robotstxt.org/orig.html | title=A Standard for Robot Exclusion | publisher=Robotstxt.org | date= | archiveurl=http://web.archive.org/web/20110723045952/http://www.robotstxt.org/orig.html | archivedate=2011-07-23 | deadurl=no }}<!-- doesn't mention copyright --></ref> Re-hosting U.S. copyrighted material without permission may be a violation of the U.S. ] (DMCA) - for this reason, to avoid implicating Misplaced Pages in violations of copyright laws and incurring DMCA take-down requests, Archive.is should be used with some caution regarding U.S.-copyrighted content.<!-- no synth or exaggeration --> | |||
== How to archive == | == How to archive == | ||
There are |
There are several ways to submit a web page to Archive.is for archiving. For new users, the website form is suggested. The other methods are better suited to those who use Archive.is regularly. | ||
=== Website form === | === Website form === | ||
This method is easy to use |
This method is easy to use. It requires going to the Archive.is website in order to archive a web page. | ||
# At <code>http://archive.is/</code>, enter the URL of the web page you wish to archive into the "My url is alive and I want to archive its content" field (the red one). | # At <code>http://archive.is/</code>, enter the URL of the web page you wish to archive into the "My url is alive and I want to archive its content" field (the red one). | ||
Line 27: | Line 26: | ||
=== Bookmarklet === | === Bookmarklet === | ||
A ] is a web browser bookmark which performs a certain function. The Archive.is bookmarklet, when clicked, takes the URL of the page you are currently looking at and submits it to Archive.is for archiving. This method is straightforward to set up, and is convenient. It is recommended that you have your Bookmarks/Favorites bar visible or at least have your bookmarks accessible within a click or two. This method only allows you to archive the page you are currently viewing. To archive a different web page you will have to use another method. | |||
# To '''set up''' the bookmarklet, |
# To '''set up''' the bookmarklet, navigate to <code>http://archive.is/</code>. | ||
# Drag the gray button <kbd class="keyboard-key nowrap" style="border: 1px solid #aaa; -moz-border-radius: 2px; -webkit-border-radius: 2px; border-radius: 2px; -moz-box-shadow: 1px 2px 2px #ddd; -webkit-box-shadow: 1px 2px 2px #ddd; box-shadow: 1px 2px 2px #ddd; background-color: #f9f9f9; background-image: -moz-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: -ms-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: -o-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: -webkit-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: linear-gradient(top, #eee, #f9f9f9, #eee); padding: 1px 3px; font-family: inherit; font-size: 0.85em;">archive.is</kbd> into your Bookmarks/Favorites bar. You may need to hold Shift key if you use ]. | # Drag the gray button <kbd class="keyboard-key nowrap" style="border: 1px solid #aaa; -moz-border-radius: 2px; -webkit-border-radius: 2px; border-radius: 2px; -moz-box-shadow: 1px 2px 2px #ddd; -webkit-box-shadow: 1px 2px 2px #ddd; box-shadow: 1px 2px 2px #ddd; background-color: #f9f9f9; background-image: -moz-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: -ms-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: -o-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: -webkit-linear-gradient(top, #eee, #f9f9f9, #eee); background-image: linear-gradient(top, #eee, #f9f9f9, #eee); padding: 1px 3px; font-family: inherit; font-size: 0.85em;">archive.is</kbd> into your Bookmarks/Favorites bar. You may need to hold Shift key if you use ]. | ||
# To '''use''' the bookmarklet, simply click on it when you are on a web page you wish to archive. It initiates archiving process. When |
# To '''use''' the bookmarklet, simply click on it when you are on a web page you wish to archive. It initiates the archiving process. When the process is complete (it usually takes 5-15 seconds) you will be sent to the archived page. | ||
# It is recommended that you view the archived page to check if the archive process |
# It is recommended that you view the archived page to check if the archive process was successful. | ||
=== Firefox smart keyword === | === Firefox smart keyword === | ||
Firefox smart keywords are commonly used to perform searches through the Firefox address bar or to open a bookmark by typing a keyword into the Firefox address bar. Here we are going to use a smart keyword to submit a URL to Archive.is for archiving. This method is moderately simple to set up |
Firefox smart keywords are commonly used to perform searches through the Firefox address bar or to open a bookmark by typing a keyword into the Firefox address bar. Here we are going to use a smart keyword to submit a URL to Archive.is for archiving. This method is moderately simple to set up. | ||
# To '''set up''' the smart keyword, hit Ctrl+Shift+B to open up your Bookmarks Library (or by clicking the orange Firefox button on the top left of the window, then going to "Bookmarks", then "Show All Bookmarks"). | # To '''set up''' the smart keyword, hit Ctrl+Shift+B to open up your Bookmarks Library (or by clicking the orange Firefox button on the top left of the window, then going to "Bookmarks", then "Show All Bookmarks"). | ||
Line 44: | Line 44: | ||
# Enter a keyword for the bookmark. You should choose something short and this keyword must not already be used for another bookmark (e.g. <code>wc</code>). | # Enter a keyword for the bookmark. You should choose something short and this keyword must not already be used for another bookmark (e.g. <code>wc</code>). | ||
# Click the "Add" button. Close the Bookmarks Library. | # Click the "Add" button. Close the Bookmarks Library. | ||
# To '''use''' the smart keyword, add the keyword you chose ("<code>wc</code>" in the above example) followed by a space ("<code> </code>") in front of the URL of the web page you would like to archive in the Firefox address bar. (e.g. If you are using "a" as your keyword, the text in the address bar would be <code>a <nowiki>http://www.example.com/pageyouwantoarchive.html</nowiki></code>). | # To '''use''' the smart keyword, add the keyword you chose ("<code>wc</code>" in the above example) followed by a space ("<code> </code>") in front of the URL of the web page you would like to archive in the Firefox address bar. (e.g. If you are using "a" as your keyword, the text in the address bar would be <code>a <nowiki>http://www.example.com/pageyouwantoarchive.html</nowiki></code>). | ||
# Hit Enter. It initiates archiving process. When archiving process completes (it usually takes 5-15 seconds) you will be sent to the archived page. | # Hit Enter. It initiates archiving process. When archiving process completes (it usually takes 5-15 seconds) you will be sent to the archived page. | ||
Line 49: | Line 50: | ||
=== Chrome search engine === | === Chrome search engine === | ||
Although this is created through Chrome's search engine feature, this functions just like a smart keyword in Firefox. This method is moderately simple to set up |
Although this is created through Chrome's search engine feature, this functions just like a smart keyword in Firefox. This method is moderately simple to set up. | ||
# To '''set up''' the "search engine", right click the address bar and select "Edit search engines...". At the bottom of the list that comes up, you can add a "search engine". | # To '''set up''' the "search engine", right click the address bar and select "Edit search engines...". At the bottom of the list that comes up, you can add a "search engine". | ||
Line 56: | Line 57: | ||
# Enter <code><nowiki>http://archive.is/?run=1&url=%s&</nowiki></code> into the third field. | # Enter <code><nowiki>http://archive.is/?run=1&url=%s&</nowiki></code> into the third field. | ||
# Hit Enter to save the "search engine". | # Hit Enter to save the "search engine". | ||
# To '''use''' the "search engine", add the keyword you chose ("<code>wc</code>" in the above example) followed by a space ("<code> </code>") in front of the URL of the web page you would like to archive in the Chrome address bar (e.g. If you are using "a" as your keyword, the text in the address bar would be <code>a <nowiki>http://www.example.com/pageyouwantoarchive.html</nowiki></code>). | # To '''use''' the "search engine", add the keyword you chose ("<code>wc</code>" in the above example) followed by a space ("<code> </code>") in front of the URL of the web page you would like to archive in the Chrome address bar (e.g. If you are using "a" as your keyword, the text in the address bar would be <code>a <nowiki>http://www.example.com/pageyouwantoarchive.html</nowiki></code>). | ||
# Hit Enter. You will be sent to a page containing a link to the archive URL of the web page you wished to archive. | # Hit Enter. You will be sent to a page containing a link to the archive URL of the web page you wished to archive. | ||
Line 71: | Line 73: | ||
== See also == | == See also == | ||
⚫ | * ], automatic URL archiving bot | ||
* ], how-to guide for prevention of link rot | * ], how-to guide for prevention of link rot | ||
* ], how-to guide | * ], how-to guide | ||
* ], how-to guide | * ], how-to guide | ||
⚫ | * ], automatic URL archiving bot | ||
== References == | == References == |
Revision as of 00:09, 30 October 2013
The use of Archive.is is under discussion at WP:Archive.is RFC.
This help page is a how-to guide. It explains concepts or processes used by the Misplaced Pages community. It is not one of Misplaced Pages's policies or guidelines, and may reflect varying levels of consensus. |
This page gives information about using Archive.is, an on-demand web archiving service, at http://archive.is/. A web archiving service allows Misplaced Pages editors to reduce link rot by preserving a copy of an online source that can be accessed if the original page is moved, changes, or disappears. Not all web pages can be archived using Archive.is.
Archive.is can archive HTML web pages, style sheets, JavaScript, and digital images.
Differences from other archivers
Other web archiving services include Wayback Machine and the WebCite. The three operate differently, and certain pages can be archived by one but not the other. The Wayback Machine takes snapshots of webpages at certain times as well as having an archiving process initiated by user requests; WebCite requires someone to actively archive a link. The (currently not yet approved bot) User:RotlinkBot would monitor RecentChanges of many wiki projects (including all national wikipedias) in order to automatically archive new links as soon as possible after the editors added them to the articles. A similar feature for Wayback Machine is under development.
Copyright and robots.txt
Archive.is removes archived pages by request of copyright holders per the U.S. DMCA; requests can be made with the "Report abuse" link on Archive.is archived pages.
Archive.is does abide by the Robots exclusion standard, while other archivers do: WayBack Machine uses it to avoid archiving material which site owners do not want archived, and WebCite uses it to address "copyright issues". Web sites use "robot" tags to inform archives that their content is not to be re-hosted on any other site, by consensus in mail list robots-request@nexor.co.uk
. Re-hosting U.S. copyrighted material without permission may be a violation of the U.S. Digital Millennium Copyright Act (DMCA) - for this reason, to avoid implicating Misplaced Pages in violations of copyright laws and incurring DMCA take-down requests, Archive.is should be used with some caution regarding U.S.-copyrighted content.
How to archive
There are several ways to submit a web page to Archive.is for archiving. For new users, the website form is suggested. The other methods are better suited to those who use Archive.is regularly.
Website form
This method is easy to use. It requires going to the Archive.is website in order to archive a web page.
- At
http://archive.is/
, enter the URL of the web page you wish to archive into the "My url is alive and I want to archive its content" field (the red one). - Click the "Submit" button. When archiving process completes (it usually takes 5-15 seconds) you will be sent to the archived page.
- It is recommended that you view the archived page to check if the archive process has been successful.
Bookmarklet
A bookmarklet is a web browser bookmark which performs a certain function. The Archive.is bookmarklet, when clicked, takes the URL of the page you are currently looking at and submits it to Archive.is for archiving. This method is straightforward to set up, and is convenient. It is recommended that you have your Bookmarks/Favorites bar visible or at least have your bookmarks accessible within a click or two. This method only allows you to archive the page you are currently viewing. To archive a different web page you will have to use another method.
- To set up the bookmarklet, navigate to
http://archive.is/
. - Drag the gray button archive.is into your Bookmarks/Favorites bar. You may need to hold Shift key if you use Opera.
- To use the bookmarklet, simply click on it when you are on a web page you wish to archive. It initiates the archiving process. When the process is complete (it usually takes 5-15 seconds) you will be sent to the archived page.
- It is recommended that you view the archived page to check if the archive process was successful.
Firefox smart keyword
Firefox smart keywords are commonly used to perform searches through the Firefox address bar or to open a bookmark by typing a keyword into the Firefox address bar. Here we are going to use a smart keyword to submit a URL to Archive.is for archiving. This method is moderately simple to set up.
- To set up the smart keyword, hit Ctrl+Shift+B to open up your Bookmarks Library (or by clicking the orange Firefox button on the top left of the window, then going to "Bookmarks", then "Show All Bookmarks").
- Browse to a location you would like to save the smart keyword bookmark in.
- In the menu at the top of the window, click "Organize", then "New Bookmark".
- Enter a name for the bookmark (e.g.
Archive.is
). - Enter
http://archive.is/?run=1&url=%s
into the Location field. - Enter a keyword for the bookmark. You should choose something short and this keyword must not already be used for another bookmark (e.g.
wc
). - Click the "Add" button. Close the Bookmarks Library.
- To use the smart keyword, add the keyword you chose ("
wc
" in the above example) followed by a space ("a http://www.example.com/pageyouwantoarchive.html
). - Hit Enter. It initiates archiving process. When archiving process completes (it usually takes 5-15 seconds) you will be sent to the archived page.
- It is recommended that you view the archived page to check if the archive process has been successful.
Chrome search engine
Although this is created through Chrome's search engine feature, this functions just like a smart keyword in Firefox. This method is moderately simple to set up.
- To set up the "search engine", right click the address bar and select "Edit search engines...". At the bottom of the list that comes up, you can add a "search engine".
- Enter a name for the "search engine" in the first field (e.g.
Archive.is
). - Enter a keyword for the "search engine" in the second field. You should choose something short and this keyword must not already be used (e.g.
wc
). - Enter
http://archive.is/?run=1&url=%s&
into the third field. - Hit Enter to save the "search engine".
- To use the "search engine", add the keyword you chose ("
wc
" in the above example) followed by a space ("a http://www.example.com/pageyouwantoarchive.html
). - Hit Enter. You will be sent to a page containing a link to the archive URL of the web page you wished to archive.
- It is recommended that you view the archived page to check if the archive process has been successful.
Use within Misplaced Pages
Links archived with Archive.is may appear in two formats. The first format uses a 4- or 5-letters "Snapshot ID," similar to URL shortening services, to provide a more convenient link: http://archive.is/XXXX
The second format displays the original URL and the date of archiving within the URL itself: http://archive.is/YYYYMMDDhhmmss/http://www.example.com
or http://archive.is/YYYYMMDD/http://www.example.com
. Either is appropriate for use within Misplaced Pages.
This archive URL can be inserted into the archiveurl=
and its supporting archivedate=
and deadurl=
parameters in any of the citation templates. If the original URL is no longer accessible, the deadurl=
parameter value should be set to yes
. If the original URL is still accessible, the deadurl=
parameter value should be set to no
.
<ref>{{cite web |last= |first= |title= |work= |publisher= |date= |url= |archiveurl= |archivedate= |deadurl= }}</ref>
Searching for previously archived web pages
Web pages previously archived through Archive.is are accessible through a searchable database. Users may search by URL, domain or their wildcards.
See also
- Misplaced Pages:Link rot, how-to guide for prevention of link rot
- Misplaced Pages:Using the Wayback Machine, how-to guide
- Misplaced Pages:Using WebCite, how-to guide
- User:RotlinkBot, automatic URL archiving bot
References
- Harihareswara, Sumana (3 September 2013). "Wikitech-l - format of Recent Changes feed". Wikimedia.org technical mail list. Archived from the original on 26 October 2013.
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help) - "How can I delete an archived page". Blog. Archive.is. 24 January 2013. Archived from the original on 26 September 2013.
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help) - Example DMCA removal: "Archive.is: saved from ugotposted.com". Archive.is.
In response to a complaint we received under the US Digital Millennium Copyright Act we have removed content from this page.
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help) - Dascalescu, Dan (18 February 2013). "Web page archiving". Wiki. Dan Dascalescu. Archived from the original on 22 September 2013.
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help) - "Removing Documents From the Wayback Machine". Archive.org. Archived from the original on 15 October 2002.
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help) - "Some sites are not available because of robots.txt or other exclusions. What does that mean?". FAQ. Archive.org. Archived from the original on 4 October 2002.
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help) - Eysenbach, Gunther; Trudel, Mathieu (30 December 2005). "Going, Going, Still There: Using the WebCite Service to Permanently Archive Cited Web Pages". Journal of Medical Internet Research. doi:10.2196/jmir.7.5.e60. – via NIH.gov.
Copyright issues are addressed by honouring respective Internet standards (robot exclusion files, no-cache and no-archive tags).
{{cite journal}}
: Check|doi=
value (help) - "A Standard for Robot Exclusion". Robotstxt.org. Archived from the original on 23 July 2011.
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help)
Notes
- "FAQ". Archive.is.
A page may not be archived for a number of reasons. Archive.is does not support archiving Portable Document Format files, audio and video. The page may be too big (there is 50mb limit for a single page). The content may be inaccessible from the Archive.is network (this is particularly likely if you are attempting to access subscription based content which your institution subscribes to on its users' behalf). Also, the content may be unreadable by the Archive.is archiver (too complex JavaScript based pages can crash its browser or be executed too long time, or ones involving browser checks sometimes cause our archive engine to fail).
{{cite web}}
: Unknown parameter|deadurl=
ignored (|url-status=
suggested) (help)