Wednesday, November 17, 2010

Week 11 Reading Notes

Web Search Engines
Thanks to Sarah Denzer, I found the article. Apparently, using the full citation of IEEE Computer threw the entire system of citation linker off.

Reading through these articles gives me a different kind of appreciation for the systems used for crawlers. I was familiar with the concept from a book for LIS2000 (Laszlo's Linked), but I didn't know about the degree of equipment involved. Knowing that it would take some of our better net connections 10 days to do a crawl caused me to do a double take, and seeing the numbers made my head swim.

This is a MUCH different idea than what I've seen from the simple "find it" programs I wrote as an undergrad, as this will find the information, index it, and in a way, learn from it. As I said, new appreciation for what is done and how it is done.

Did anyone else have a similar feeling?

Current Development and Future Trends for the OAI Protocol for Metadata Harvesting
Once again, we’re referencing back to the Dublin Core and other metadata standards to sift through and organize data. While the article does give a few inspiring notes as to how this could come to be with examples of organizations/consortia trying this, I still have to wonder the same as I did before: can this really be done?

I mean, honestly: even with a “standard” set of metadata, how viable will this be? Will we actually have a comprehensive set of usable search terms to actively search the “deep web” (including databases), or are we just going to add more clutter to the already vast amount of data hidden on the Internet?

The Deep Web
Some parts of this article made me think of Laszlo’s book Linked, especially with the early section regarding how sites would often be connected or a crawler would find the data.

Thankfully, the article covered more than that, by offering statistics (which gives me a new found respect for the amount of digital data on the internet, as it is measured in thousands of terabytes) and a comparison of “surface” and “deep” web searches, which also explains why the general “sweep” done by the standard search engines just doesn’t cut it for finding what you really need.

There isn’t much to say about the article beyond definitions and numbers (and the feeling that it was a plug for certain technologies), but it does make me interested to learn what is really out there hidden away in the depths of the web.

3 comments:

  1. For the first reading, you need to go into the Library website and search them through the electronic journal link.

    ReplyDelete
  2. The Deep Web article got me thinking a bit about the future of the internet, well, the world wide web part of the internet I suppose. The less smaller websites are being discovered, the more chance the big corporations have to take over. Already I find myself looking more at corporate sponsored websites than independently owned sites. I don't remember this being the case, just a few years ago. Could this lead to the end of the "free" internet, where everyone can make a website? Just my thoughts....

    ReplyDelete
  3. Anthony, I had that exact feeling when I was reading the first set of articles. After I was done, I did a Google search for "beagle." I got 9.9 million hits in .15 seconds. Knowing what I know now, I had a deeper appreciation for the whole search process.

    ReplyDelete