Openwayback redirects to live web

238 views
Skip to first unread message

Dominik Frey

unread,
Dec 8, 2015, 7:31:45 AM12/8/15
to openwayback-dev

Hello,

I need some help setting up openwayback. I have created a cdx index and path index file. Openwayback shows the results, like it should. But when I open a url from the search results, openwayback redirects me to the live web !?

Many thanks and kind regards
Dominik



Mohamed Elsayed

unread,
Dec 24, 2015, 7:17:42 AM12/24/15
to openwayback-dev
Hi Dominik,

Sorry for the late reply, if this problem still exists ,do you think you could let me check wayback.xml and CDXCollection.xml please? Thank you.

Dominik Frey

unread,
Jun 16, 2016, 10:28:14 AM6/16/16
to openwayback-dev
Hi,

sorry for coming back late to this issue !

here you find the wayback.xml and CDXCollection.xml

https://drive.google.com/file/d/0ByKYV-RMeYHuLTFCVjg4R3hCOGM/view
https://drive.google.com/file/d/0ByKYV-RMeYHuRi1GVzMxNWVkZlE/view

in addition the redirect of archived swr.de pages to the live web happens also at archive.org, for example:

Any ideas what is wrong?

could it be related to http://www.swr.de/robots.txt or meta <meta name="robots" content="noarchive,index,follow,noodp"/>

Many thanks and kind regards
Dominik

Lauren Ko

unread,
Jun 16, 2016, 1:34:00 PM6/16/16
to openway...@googlegroups.com
Hi Dominik,
Was your initial question specific to the swr.de host? If your issue is with that specific site and the problem is also at the archive.org Wayback, the problem with replay is likely not in your setup of OpenWayback but with something in the swr.de site. When I disable JavaScript in my browser to view the archived site via https://web.archive.org/web/20160616065056/http://www.swr.de/ and https://web.archive.org/web/*/http://www.swr.de/, I am not redirected to the live web. Looking at the source code of the particular site, there is a JavaScript file included called redirector.js (https://web.archive.org/web/20160616071021js_/http://www.swr.de/-/id=13886966/property=jslib/pubVersion=18/6lc7gx/redirector.js) that could be causing the redirect from the archive to the live web.

Lauren Ko
UNT Libraries

--
You received this message because you are subscribed to the Google Groups "openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dominik Frey

unread,
Jun 17, 2016, 5:52:47 AM6/17/16
to openwayback-dev
Hi Lauren,

many thanks for your investigation! The issue is specific to swr.de. And you are right http://www.swr.de/-/id=13886966/property=jslib/pubVersion=18/6lc7gx/redirector.js causes the trouble. Befor this file was included in August 2014 everything works fine. Is there a way to prevent Open Wayback including or renedering this URL ?

Best wishes
Dominik


However I also use wayback 1.6.0 (https://sourceforge.net/projects/archive-access/files/wayback/) and wayback 1.6.0 doesn't redirected the same swr pages.  

Lauren Ko

unread,
Jun 17, 2016, 5:33:26 PM6/17/16
to openway...@googlegroups.com
I believe it would work to add the URL of the redirector.js file to an exclusion file. 

You could do that by: 

- Uncommenting this in your wayback.xml file:

<!--
  <bean id="excluder-factory-static" class="org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory">
    <property name="file" value="/var/tmp/os-cdx/exclusion-2008-09-22-cleaned.txt" />
    <property name="checkInterval" value="600000" />
  </bean>
-->
- Change the value for the file being used in that bean to one you create.
- Put the URL for redirector.js in the file you just created.
- In your standardaccesspoint bean, uncomment:
<!-- See the LiveWeb.xml import above.
    <property name="exclusionFactory" ref="excluder-factory-static" />
-->

Hope it helps,
Lauren


Reply all
Reply to author
Forward
0 new messages