Tech startup Webaroo wants to offer the internet on a Flash drive, according to Networkworld.com:
Webaroo does it, [the firms Brad Husick] says, through “a server farm that is of Web scale” and a set of proprietary search algorithms that whittle the million gigabytes down to more manageable chunks that will fit on a hard drive: up to 256 megabytes for a growing menu of “Web packs” on specific topics — your favorite Web sites, city guides, news summaries, Wikipedia and the like — that make up the service’s initial offerings; and something in the neighborhood of 40 gigabytes for the full-Web version the company intends to release later this year.
“We’ve developed these algorithms that give you a set of meaningful, relevant results for anything on which you search,” Husick says. “In effect, we give you the first couple pages of results.”
That’s all you really need, the company argues, because studies show that most people rarely look beyond the first 10 to 20 results returned by a typical search. With Webaroo you’re being returned not just a list of pages, but the pages themselves — with all graphics intact — as well as key live links from those pages and the pages to which they lead. They’re talking roughly 10,000 pages per “Web pack,” or plenty to provide a meaningful search experience for whatever the subject matter at hand, Husick says.
So it’s site scraping, just like endless offline browsing packages. However, I’m wondering about the “Web Packs”. Could site owners claim copyright infringement? I can’t tell whether Webaroo is paying publishers for the right to include their content in its Web Packs, but if it isn’t I can imagine firms being a bit miffed at the prospect of someone else selling their content.