Posts

Showing posts from August, 2015

2015-08-28 Original Header Replay Considered Coherent

Image
Introduction As web archives have advanced over time, their ability to capture and playback web content has grown. The Memento Protocol, defined in RFC 7089 , defines an HTTP protocol extension that bridges the present and past web by allowing time-based content negotiation. Now that Memento is operational at many web archives, analysis of archive content is simplified. Over the past several years, I have conducted analysis of web archive temporal coherence. Some of the results of this analysis will be published at Hypertext'15 . This blog post discusses one implication of the research: the benefits achieved when web archives playback original headers. Archive Headers and Original Headers Consider the headers (Figure 1) returned for a logo from the ODU Computer Science Home Page as archived on Wed, 29 Apr 2015 15:15:23 GMT. HTTP/1.1 200 OK Content-Type: image/gif Last-Modified: Wed, 29 Apr 2015 15:15:23 GMT Figure 1. No Original Header Playback Try to answer the

2015-08-20: ODU, L3S, Stanford, and Internet Archive Web Archiving Meeting

Image
Two weeks ago (on Aug 3, 2015), I was glad to be invited to visit Internet Archive in San Francisco in order to share our latest work with a set of the Web Archiving pioneers from around the world. The attendees were Jefferson Bailey  and Vinay Goel  from IA, Nicholas Taylor  and Ahmed AlSum from Stanford, and Wolfgang Nejdl , Ivana Marenzi  and Helge Holzmann from L3S . First, we took a quick introduction to each others mentioning the purpose and the nature of our work to IA. Then, Nejdl introduced the Alexandria project , and demoed the ArchiveWeb project, which aims to develop tools and techniques to explore and analyze Web archives in a meaningful way. In the project, they develop tools that will allow users to visualize and collaboratively interact with Archive-it collections by adding new resources in the form of tags and comments. Furthermore, it contains a collaborative search and sharing platform. I presented the off-topic detection work with a live demo for the

2015-08-18: Three WS-DL Classes Offered for Fall 2015

Image
The Web Science and Digital Libraries Group is offering three classes this fall.  Unfortunately there are no undergraduate offerings this semester, but there are three graduate classes covering the full WS-DL spectrum: CS 695 - NoSQL Databases (CRN 21159 ) will be offered by Dr. Cartledge .  While we've used NoSQL databases in a variety of classes in the past, this is the first time we've offered a class entirely on this topic.  This is a good complement to the CS 495/595 Big Data class he offered last spring.    CS 734/834 - Introduction to Information Retrieval (CRNs 19986 & 20004 ) will be offered by Dr. Nelson .  Although the number and name have slightly changed, this will be similar to previous offerings of this class (e.g., see CS 895 spring 2014 ).   This class will broadly cover the foundations of information retrieval .   CS 791/891 - Visualization Seminar (CRNs 12619 & 12620 )will be taught by Dr. Weigle .  This P/F course will cover the fund