Friday, October 31, 2008

Week 10 Readings

I'm discussing chapter 8 from Digital Libraries by William Arms, "Evaluation of Digital Libraries: an Overview" by Tefko Saracevic, and "Digital Library Design for Usability" by Rob Kling and Margaret Elliott.

I'm very interested in the discussion and debates about how end-users access information via the web and digital libraries, as you may have guessed by my discussions on similar subjects in previous posts. With that being said I have to admit that I found all three of these readings rather dry.
Arms provides such a broad look into user interfaces that it's hard to get a detailed grasp of usability and it's importance. I think Arms' most important and relevant point is actually the one he makes in the very beginning of chapter 8, stating that usability is based upon the whole system working together smoothly and appropriately. Without a seamless system, it is kind of impossible for users to really get an idea of what they should expect from their digital library.
Accessing different page numbers using structural metadata is something I never really considered to be a "big deal" aspect of a digital library. But in fact, Arms points out that these small details are essential for seamless operation.
It's interesting that Arms talks about DLITE and CORBA, considering we use the DLITE interface for class lectures from the past. (And I do not think it is particularly easy to use...). These programs tie into interface design and the ways in which designing interfaces can be functional and easy for users or hard for users to "get".

Rob Kling and Margaret Elliott spoke to me a little more than Arms, although after finishing their article I still do not feel like I am fully grasping the ideas and concepts of usability and how it will affect the way I create a digital library (software-wise). "Design for usability", as Kling and Elliott propose is the new way to design digital libraries. Interface usability and organizational usability are apparently two different things, according to these authors. But why? In my humble opinion it seems that learnability, efficiency, memorability, and errors are all aspects of both interface usability and organizational usability.
Also, Kling and Elliott point out that a vast digital library could lead users to dread using it because there are too many results and it's hard to narrow down results to those that the user actually wants. This leads back to to Arms' point about seamless organization in a digital library. If the digital library that Kling and Elliott refer to, Gopher, was created in a most organized and accessible (aka seamless) manner, then the size of the library should not be daunting to the user. After all, the user has no idea how big it is- it is digital. They are not walking into a monstrous hall of a building and looking at vast rows and rows of books. All the user need do is enter their search terms and rely on the DL to pull the items they need. Sounds easy..... maybe too easy to be true.
I'm so glad that Tefko Saracevic decided to write about evaluation of digital libraries. While his writing was not necessarily the most conducive to my full understanding of the article...it was more of a challenge to understand how Saracevic described evaluation than his meanings behind the article. He mainly describes how digital libraries have been evaluated by subject in the beginning of his article. This is not important for me to know, is it? Afterall, the only thing I truly learned from the beginning of this article is that digital libraries that have been created by universities or museums, etc., have been exempt from this evaluation process for far too long.
Interesting fact: both system-centered approach to evaluation and human-centered approach to evaluation are based on finding results for users.
The criteria for evaluation is as follows:
  • content
  • process
  • format
  • overall assessment
  • technology performance
  • process/algorithm performance
  • overall system
Usage of the digital library was also considered to be part of the criteria for evaluation in some studies.
Saracevic writes some thoughts on why digital library evaluation is not more widespread, and I have to say that I agree with his reasonings on most points. The premature nature of digital libraries is obviously a large reason why they have not been evaluated so extensively yet. Also, who is to say that creators want their digital libraries to be evaluated by outside sources? If it is a small college DL that is providing information merely for it's own small group of users then they may not necessarily have any interest in the evaluation from someone who is not a member of the user group.
Obviously all of these evaluation and usability studies show that DLs have a long way to go before they will be considered to have main standards for input, upkeep, and evaluation.

Muddiest Points
Since we did not have class last week I just have a refresher for my muddiest point. I am wondering how many blog postings we need to do total for the semester.

Thursday, October 16, 2008

Week 8 Readings

I am commenting on:

Definition and Origins of OAI-PMH (Chapter 1).
Federated Searching: Put It in Its Place by Todd Miller
The Truth About Federated Searching by Paula J. Hane

The OAI-PMH gives me mixed feelings. While I think it is ground-breaking that the OAI is working toward providing metadata for many different d-libs through one search, I am not convinced that it works very well. Based on the information given in this chapter, on how the two different aspects of the OAI-PMH, moving metadata appears to be very easy. The OAI service providers take the metadata from the OAI data providers and interpret which data is wanted and which needs to be delivered to the end-user. (I know that in digital library speak the end-user isn't called the end-user, but I'm still used to library speak!)
This method of transporting data that is described in chapter 1 seems complicated. One-stop shopping? That only works at Walmart. (For now.)
While the majority of this chapter delves into the history of OAI-PMH, there is still some discussion about the differences between OAI-PMH and plain open access or Dublin Core. OAI-PMH is not a protocol for searching. It is harvested searching, which means that the metadata is local and can be controlled from directly within the OAI.
Overall OAI-PMH provides a starting point for this type of federated searching, but I do not believe that it plays out the way it is presented.

Todd Miller immediately begins his article by stating that federated searching should start with the library catalog. I agree. Except that federated searching does not seem to yield the results that a regular catalog search is able to yield. When you search a catalog you are searching for very specific content in a specific manner. Oftentimes this is not possible with federated searching. The searches are much broader and less defined.
By going above and beyond the catalog and reaching into databases and other internet sources with one federated search, the user is not really aware of where they are searching. Because this is the case, they really have no idea how to make their search yield the results they need.
While it is obvious that libraries need to update their catalogs and the way to search them to provide more access, federated searching is not the way to do it, unless the goal is to confuse the end-user even more.

When I first started to read The Truth About Federated Searching I was not hopeful. I have a lot of experience with Webfeat and my experience is this: if the librarians are not able to perform an accurate Webfeat search, then how do we expect the user to be able to? We shouldn't expect Webfeat to change libraries! I agree with a lot of the points that Ms. Hane makes about the realities of federated searching.
Under point 1: even if federated searching could search every database, that doesn't mean it has the tools t search each database to its greatest potential, therefore it isn't a very good search.

Under point 2: Agreed. De-duping doesn't work. At all. And not only is it impossible to de-dup, but federated searching also is not able to tell the distinction between the significance of words being searched.

Under point 3: Refer to my point in point 2.

Under point 5: Federated searching cannot make a regular database search better. In fact, it seems to make it worse. Because the nuances of searching that particular database are not inherent in a federated search engine.

Overall I think federated searching is.... well, it needs some work.

Muddiest Points for week 7:

Does the test cover all materials in the readings (including the optional readings) as well as in the slides?

Will we all be able to fit our proposals into the time after the test?

Friday, October 10, 2008

Week 7 Readings

I'll be commenting on:
Challenges in Web Search Engines by Monika Henzinger, Rajeev Motwani, and Craig Silverstein
How Things Work: Web Search Engines: Part 1 by David Hawking
How Things Work: Web Search Engines: Part 2 by David Hawking

Both the Hawking articles have been very informative. What I've learned:

  • indexing the web is not impossible, although it still seems like it should be to my uneducated mind
  • Crawling and indexing takes a vast amount of terabytes and requires very dependable servers
  • Speed, politeness, and content (both excluded and duplicate) must be accounted for when crawling
  • Decoding is a big block that crawlers have to deal with
Part 2
  • postings lists are important for a document or web page to be indexed- without them the id of the page isn't found
  • scanning and inversion make up the two main parts of indexing a page
  • scale: web-sized crawling requires a huge scale. A big challenge.
  • Searching terms and words that are used in webpages is a daunting task bc there is no real way to know if it is a spelling error or if it is a nondictionary word. Example: "teh"
  • Anchor text helps to link pages together
  • Queries inhabit a large portion of how and why webpages are indexed. Queries show how often a page is searched and what terms are used to search it
  • Query processors need to be advanced to yield good results
  • Caching: I am hesitant to believe that caching really will help to provide good results from a query. Just because a page has the word doesn't mean that it is provided in the context the searcher is looking for

Challenges in Web Search Engines
  • The idea of spam on the web is addressed first, and I think it's addressed first bc it represents such a real problem because people don't often look past the first page of results and if this is loaded with spam then essentially the search yielded no results for the patron
  • Both Text Spam and Link Spam try to manipulate the search using keywords that would be used to search for the page and trying to add as many as possible- often hidden or in a link farm on the bottom of the page.
  • Cloaking is used to deceive search engines like Google, but I don't really understand how it works
  • Quality should be based on more than just the amount of a certain word on a page but it seems impossible for a human indexer to be employed to help differentiate these types of things: ways to do it? Have users give feedback.

Overall it seems that what I learned in my Indexing class here in the Information Science school can only loosely be related to web indexing. Searching the web is a complicated process that relies mostly on machine-run indexing ways and does not involve a human that can actually determine the things only humans can determine- things like how words are perceived by patrons and what words are actually on a page as relevant.

Web search engines can only run at a particular way and leave guidance to the user- but if the user is uneducated on how search engines work, then it will be hard for him/her to get the best results. Patience is key.

Muddiest Point for Last Week:

Are there written requirements for the proposal on Courseweb?
Where are the written requirements for Assignment 3 located on Courseweb?

Friday, October 3, 2008

Week 6 Readings

Thoughts on:
"Research Challenges in Digital Archives and Long-term Preservation" by Margaret Hedstrom
"Actualized Preservation Threats: Practical Lessons from Chronicling America" by Justin Littman
"Technology Watch Report: the Open Archival Information System Reference Model: Introductory Guide" by Brian F. Lavoie

Margaret Hedstrom's article on the challenges that people face when creating and maintaining digital libraries are very interesting because they come from a preservationist's point of view. While many people think of digitizing materials as a bonus because it creates access for a larger group of people, Hedstrom eloquently points out that preservation is also considered to be a reason for digitization. While digitizing can make it easier on rare or old documents and images, it does not necessarily lead to guaranteed accessibility forever.

What's Needed for Maintenance:
  • Tools for analysis
  • Tools for preservation
  • Security against attacks
  • A system that can withstand chance in an electronic environment
While Hedstrom points out all these facts about the difficulties in preserving for a long period of time, Brian F. Lavoie uses some of these negatives to discuss how open access can create a stable and more secure digital environment for preservation.

Lavoie gives a detailed and interesting summation of the OAIS and how it works. Many different Designated Communities are involved with many different archived content. It appears that Designated Communities determine the types of archived content that is added to the OAIS and from there the materials are accessible to that Designated Community. While making sure the documents and images are digitized properly and archived properly is important, the OAIS deems the designation of the community to be even more important from a preservation standpoint.
In order for the information that has been archived to be understood by users it has to be used and set up by users that are familiar with it. Otherwise, it wouldn't be accessible. This is a very interesting way to look at preservation that I had not previously thought of. It occurred to me that a lack of appropriate technology could hinder the use of a digitized object, but not the lack of a person who understands what the digitized object means or why it is there.

"Actualized Preservation Threats" provides another take on the preservation front. Media failure, hardware failure, software failures and operation failures make up the 4 aspects that pose a constant threat to digital preservation. While this article has a lot more technical language in it, it is harder for me to absorb the information with my computer-illiterate mind. But even though it took longer to digest, the information was more clearly spelled out about the types of threats that one who is creating a digital library would have to watch for.

While all three authors brought very specific and relevant points to the table, Littman did so with much more detail to the types of threats that digitization can cause in the name of preservation.
While all of the threats are of great concern, the most threatening, in my opinion, would be those related to hardware failures. The thought of losing all trace of a digitized item because the hardware fails just reinforces the fact that one should always have a backup storage system and very often should have multiple backup storage systems. One of them should be fireproof.

Overall I think I learned a lot about the long-term goals of digitization and how it can be used to help preserve the past- if it is done correctly.

Muddiest Points for Week 5 Lecture
How well does Javascript relate to XML?