Friday, November 14, 2008

Week 12 Readings

Week 12 (or: almost done with the semester)
Readings for this week:
Implementing Policies for Access Management by Arms
Chapter 6 Economics by Arms
Chapter 10 Economics by Lesk

Security and economics, I have to admit, are not issues that ever really occurred to me when I thought about the tasks of starting and maintaining a digital library. Although they are obviously extremely important, I did not realize that myself or my institution would really need to take extra measures to ensure security for the digital library.
According to Arms, (who has a lot to say on this topic), in Implementing Policies for Access Management, security is usually motivated by payments. Institutions wish to have control over those who are accessing digital materials because they require payment for access. I find this to be very true at the University of Pittsburgh. Because a log-in is required for those hoping to access the internet and the academic research databases, it can get a little hectic at the reference desk when patrons that are not university-affiliated wish to use the computers.
The general access model that Arms describes appears overly simple. But this is only in appearance. What allows for complexity is what Arms calls, "dynamic evaluation of roles, attributes, and policies." Keeping all important information in a container that can be stored in a repository allows for better control and security.
Authentication is one way to make certain that only users who are allowed access will receive access to digital publications. Payment and then agreements regarding password access also help to establish rules and boundaries about who is allowed access.
A great point that Arms brings up is that of interoperability. Access can be given through the attributes used for the DL, but if not all digital libraries are using the same system with attributes it may not work seamlessly.
Operations involving the different types of enforcement of policies for access can be technical, legal, contractual, and institutional. These can be helped along by an access statement issued to users and possible users.

While Lesk takes his chapter on Economics to a different level, explaining the investments libraries make and the money that is saved by utilizing institution-wide subscriptions, he also comments fundamentally on access and usage. Both cost and quality of materials are based on the simple supply-demand model.
Costs and funding for libraries are going up as publishers are deciding not to print journals and instead to offer them electronically only. While this is often a higher subscription rate for libraries with added paper costs for printing that either the library or its users have to pay for, it is also easier for patrons. Therefore the system is in high demand.
Digitally this makes interoperability extremely important. But is it working? According to Lesk the funding for libraries is an important aspect of the system. (Well, not according to Lesk ONLY, it's rather obvious without his input that it is an important asepct of the system.) Bundling journals helps to alleviate extra costs but it does create the problem of repetition and payment for journals and databases that a library may already have paid for through someone else.
Lesk mentions the idea of the digital cash register that is attributed to Ted Nelson. This idea that inserting a quote from another author will lead to the patron or user having to pay for the use and viewing of the quote is dubbed by Nelson as transclusion. The idea of micropayments is relatively new and I'm not sure that if it is widespread it is called the same name.
The issue of access is not only one that patrons have to deal with, but libraries as well. While they used to own all of their materials because they were physically purchased and placed within the library, now the idea of borrowing and access is the main privilege that is paid for. Publishers mainly only allow access to materials that are currently paid for. But once a library stops paying for their subscription, they do not have access to previous issues that they "paid for." This is a potentially negative position for libraries.
While Lesk's position on economics regarding digital libraries does not seem to be particularly positive, he does point out some major aspects that libraries need to be aware of and to deal with.

Arms also writes a chapter on Economics and he is slightly more hopeful when speaking about the ways that libraries can overcome their problems with ownership and access. Arms really focuses on how library environments are changing quickly and libraries need to accept and adapt to all the digital changes that are being made so quickly. Greed and fear: these are the driving factors behind publisher's business decisions. According to Arms. While he does not have particularly positive things to say about how publishing houses are running their business and dealings with libraries, he is fairly positive in his returns on libraries. Libraries strive for access for the people. Many different kinds of users and access for all, hopefully.
Arms cites different research organizations that are open to working interoperably with libraries to create access for its materials. This provides an open framework for communication and interchange between "publishers" and libraries. How this relates to economics is a slightly different matter.
Arms brings up the interesting point of tv and how ppl pay for tv. They are fine with regular payments that are not one-time or spontaneous. While physical materials and publishing is no longer the norm with journals, it is also becoming slightly less prominent with books. Companies like ebrary are promoting online access to books where users have to follow parameters for printing. Copyright laws are the reason that patrons cannot print more than a few pages at a time, depending on the book.
All of these readings have had some enlightening aspects but not many for me.

Muddiest Point
Have you seen social impacts on libraries from personal experience?

Friday, November 7, 2008

Week 11 Readings

This week's readings:
William Arms' A Viewpoint Analysis of the Digital Library

Wade Roush's The Infinite Library

and Social Aspects of Digital Libraries by the UCLA's Final Report.

In Arms' chapter, the viewpoint that I found most interesting was the viewpoint of the user. While the organization's viewpoint and the technical viewpoint were enlightening, I do not think that they are as important as the viewpoint of the user. After all, it's the user that is the sole focus while creating a digital library. Technology, irrelevant? Organizations, not important? As a user of digital libraries, I would say this is true. Users who know nothing about the way a digital library is put together do view things this way. I'm wondering how UCLA found this information about users.

While the user's viewpoint is very limited, they do seem to see the importance of the Google Book Project. Google is digitizing many of the world's old books that are both valuable and beyond copyright issues. They are even digitizing books that are still within copyright. It's a big controversy. The idea of digitization has become much more mainstream than it ever was before. This means that librarians and those that are adept at working with digital libraries will have more job security. And why not? Larger digital libraries will lead to more jobs, but different kinds of jobs. I think the whole point of this reading is to express how long it can take for users to gather information, even from a digital library. The author talks about the Bodleian Library at Oxford and how the old texts are not easily accessible. Requests have to be sent to view the materials, and materials are not able to be checked out. But with a digital library access to materials can be next to instant. Making these materials digital is not an instant process. It can take a very long time to scan and upload a book to a digital environment.

"Social Aspects of Digital Libraries" is a very long document that goes over different points dealign with art in digital libraries. Because art is in image format it is handled a little differently in the digital environment. Based off of a workshop that took place, the goals being to inform researchers about what takes place regarding social aspects of digital libraries and and assess how well digital libraries are working out. The workshop discussed the same user-centered, organizational-centered and technical-centered aspects of DLs. The seamless interaction between all parts of a digital library are what were stressed at this meeting. They talked about allowing users to have a part in constructing digital libraries. I think this is a great idea! By allowing users to get behind the scenes and give suggestions from more than a creative viewpoint it will allow those that are familiar with the technical aspects of DLs to see it from a truly end-user perspective. This will also provide ways for users to gather information on how better to use a DL.

Muddiest Points from Week 10 Lecture:
How necessary is it for our groups to have a set schedule/timeline for our DL project? After viewing other DL proposals I am thinking that we may be behind and while working with a professor gives us access to a lot of great materials, it will make it harder to do things earlier than they need to be done. What is a suggested timeframe to stay on top of all the work?

Friday, October 31, 2008

Week 10 Readings

I'm discussing chapter 8 from Digital Libraries by William Arms, "Evaluation of Digital Libraries: an Overview" by Tefko Saracevic, and "Digital Library Design for Usability" by Rob Kling and Margaret Elliott.

I'm very interested in the discussion and debates about how end-users access information via the web and digital libraries, as you may have guessed by my discussions on similar subjects in previous posts. With that being said I have to admit that I found all three of these readings rather dry.
Arms provides such a broad look into user interfaces that it's hard to get a detailed grasp of usability and it's importance. I think Arms' most important and relevant point is actually the one he makes in the very beginning of chapter 8, stating that usability is based upon the whole system working together smoothly and appropriately. Without a seamless system, it is kind of impossible for users to really get an idea of what they should expect from their digital library.
Accessing different page numbers using structural metadata is something I never really considered to be a "big deal" aspect of a digital library. But in fact, Arms points out that these small details are essential for seamless operation.
It's interesting that Arms talks about DLITE and CORBA, considering we use the DLITE interface for class lectures from the past. (And I do not think it is particularly easy to use...). These programs tie into interface design and the ways in which designing interfaces can be functional and easy for users or hard for users to "get".

Rob Kling and Margaret Elliott spoke to me a little more than Arms, although after finishing their article I still do not feel like I am fully grasping the ideas and concepts of usability and how it will affect the way I create a digital library (software-wise). "Design for usability", as Kling and Elliott propose is the new way to design digital libraries. Interface usability and organizational usability are apparently two different things, according to these authors. But why? In my humble opinion it seems that learnability, efficiency, memorability, and errors are all aspects of both interface usability and organizational usability.
Also, Kling and Elliott point out that a vast digital library could lead users to dread using it because there are too many results and it's hard to narrow down results to those that the user actually wants. This leads back to to Arms' point about seamless organization in a digital library. If the digital library that Kling and Elliott refer to, Gopher, was created in a most organized and accessible (aka seamless) manner, then the size of the library should not be daunting to the user. After all, the user has no idea how big it is- it is digital. They are not walking into a monstrous hall of a building and looking at vast rows and rows of books. All the user need do is enter their search terms and rely on the DL to pull the items they need. Sounds easy..... maybe too easy to be true.
I'm so glad that Tefko Saracevic decided to write about evaluation of digital libraries. While his writing was not necessarily the most conducive to my full understanding of the article...it was more of a challenge to understand how Saracevic described evaluation than his meanings behind the article. He mainly describes how digital libraries have been evaluated by subject in the beginning of his article. This is not important for me to know, is it? Afterall, the only thing I truly learned from the beginning of this article is that digital libraries that have been created by universities or museums, etc., have been exempt from this evaluation process for far too long.
Interesting fact: both system-centered approach to evaluation and human-centered approach to evaluation are based on finding results for users.
The criteria for evaluation is as follows:
  • content
  • process
  • format
  • overall assessment
  • technology performance
  • process/algorithm performance
  • overall system
Usage of the digital library was also considered to be part of the criteria for evaluation in some studies.
Saracevic writes some thoughts on why digital library evaluation is not more widespread, and I have to say that I agree with his reasonings on most points. The premature nature of digital libraries is obviously a large reason why they have not been evaluated so extensively yet. Also, who is to say that creators want their digital libraries to be evaluated by outside sources? If it is a small college DL that is providing information merely for it's own small group of users then they may not necessarily have any interest in the evaluation from someone who is not a member of the user group.
Obviously all of these evaluation and usability studies show that DLs have a long way to go before they will be considered to have main standards for input, upkeep, and evaluation.

Muddiest Points
Since we did not have class last week I just have a refresher for my muddiest point. I am wondering how many blog postings we need to do total for the semester.

Thursday, October 16, 2008

Week 8 Readings

I am commenting on:

Definition and Origins of OAI-PMH (Chapter 1).
Federated Searching: Put It in Its Place by Todd Miller
The Truth About Federated Searching by Paula J. Hane

The OAI-PMH gives me mixed feelings. While I think it is ground-breaking that the OAI is working toward providing metadata for many different d-libs through one search, I am not convinced that it works very well. Based on the information given in this chapter, on how the two different aspects of the OAI-PMH, moving metadata appears to be very easy. The OAI service providers take the metadata from the OAI data providers and interpret which data is wanted and which needs to be delivered to the end-user. (I know that in digital library speak the end-user isn't called the end-user, but I'm still used to library speak!)
This method of transporting data that is described in chapter 1 seems complicated. One-stop shopping? That only works at Walmart. (For now.)
While the majority of this chapter delves into the history of OAI-PMH, there is still some discussion about the differences between OAI-PMH and plain open access or Dublin Core. OAI-PMH is not a protocol for searching. It is harvested searching, which means that the metadata is local and can be controlled from directly within the OAI.
Overall OAI-PMH provides a starting point for this type of federated searching, but I do not believe that it plays out the way it is presented.

Todd Miller immediately begins his article by stating that federated searching should start with the library catalog. I agree. Except that federated searching does not seem to yield the results that a regular catalog search is able to yield. When you search a catalog you are searching for very specific content in a specific manner. Oftentimes this is not possible with federated searching. The searches are much broader and less defined.
By going above and beyond the catalog and reaching into databases and other internet sources with one federated search, the user is not really aware of where they are searching. Because this is the case, they really have no idea how to make their search yield the results they need.
While it is obvious that libraries need to update their catalogs and the way to search them to provide more access, federated searching is not the way to do it, unless the goal is to confuse the end-user even more.

When I first started to read The Truth About Federated Searching I was not hopeful. I have a lot of experience with Webfeat and my experience is this: if the librarians are not able to perform an accurate Webfeat search, then how do we expect the user to be able to? We shouldn't expect Webfeat to change libraries! I agree with a lot of the points that Ms. Hane makes about the realities of federated searching.
Under point 1: even if federated searching could search every database, that doesn't mean it has the tools t search each database to its greatest potential, therefore it isn't a very good search.

Under point 2: Agreed. De-duping doesn't work. At all. And not only is it impossible to de-dup, but federated searching also is not able to tell the distinction between the significance of words being searched.

Under point 3: Refer to my point in point 2.

Under point 5: Federated searching cannot make a regular database search better. In fact, it seems to make it worse. Because the nuances of searching that particular database are not inherent in a federated search engine.

Overall I think federated searching is.... well, it needs some work.

Muddiest Points for week 7:

Does the test cover all materials in the readings (including the optional readings) as well as in the slides?

Will we all be able to fit our proposals into the time after the test?

Friday, October 10, 2008

Week 7 Readings

I'll be commenting on:
Challenges in Web Search Engines by Monika Henzinger, Rajeev Motwani, and Craig Silverstein
How Things Work: Web Search Engines: Part 1 by David Hawking
How Things Work: Web Search Engines: Part 2 by David Hawking

Both the Hawking articles have been very informative. What I've learned:

  • indexing the web is not impossible, although it still seems like it should be to my uneducated mind
  • Crawling and indexing takes a vast amount of terabytes and requires very dependable servers
  • Speed, politeness, and content (both excluded and duplicate) must be accounted for when crawling
  • Decoding is a big block that crawlers have to deal with
Part 2
  • postings lists are important for a document or web page to be indexed- without them the id of the page isn't found
  • scanning and inversion make up the two main parts of indexing a page
  • scale: web-sized crawling requires a huge scale. A big challenge.
  • Searching terms and words that are used in webpages is a daunting task bc there is no real way to know if it is a spelling error or if it is a nondictionary word. Example: "teh"
  • Anchor text helps to link pages together
  • Queries inhabit a large portion of how and why webpages are indexed. Queries show how often a page is searched and what terms are used to search it
  • Query processors need to be advanced to yield good results
  • Caching: I am hesitant to believe that caching really will help to provide good results from a query. Just because a page has the word doesn't mean that it is provided in the context the searcher is looking for

Challenges in Web Search Engines
  • The idea of spam on the web is addressed first, and I think it's addressed first bc it represents such a real problem because people don't often look past the first page of results and if this is loaded with spam then essentially the search yielded no results for the patron
  • Both Text Spam and Link Spam try to manipulate the search using keywords that would be used to search for the page and trying to add as many as possible- often hidden or in a link farm on the bottom of the page.
  • Cloaking is used to deceive search engines like Google, but I don't really understand how it works
  • Quality should be based on more than just the amount of a certain word on a page but it seems impossible for a human indexer to be employed to help differentiate these types of things: ways to do it? Have users give feedback.

Overall it seems that what I learned in my Indexing class here in the Information Science school can only loosely be related to web indexing. Searching the web is a complicated process that relies mostly on machine-run indexing ways and does not involve a human that can actually determine the things only humans can determine- things like how words are perceived by patrons and what words are actually on a page as relevant.

Web search engines can only run at a particular way and leave guidance to the user- but if the user is uneducated on how search engines work, then it will be hard for him/her to get the best results. Patience is key.

Muddiest Point for Last Week:

Are there written requirements for the proposal on Courseweb?
Where are the written requirements for Assignment 3 located on Courseweb?

Friday, October 3, 2008

Week 6 Readings

Thoughts on:
"Research Challenges in Digital Archives and Long-term Preservation" by Margaret Hedstrom
"Actualized Preservation Threats: Practical Lessons from Chronicling America" by Justin Littman
"Technology Watch Report: the Open Archival Information System Reference Model: Introductory Guide" by Brian F. Lavoie

Margaret Hedstrom's article on the challenges that people face when creating and maintaining digital libraries are very interesting because they come from a preservationist's point of view. While many people think of digitizing materials as a bonus because it creates access for a larger group of people, Hedstrom eloquently points out that preservation is also considered to be a reason for digitization. While digitizing can make it easier on rare or old documents and images, it does not necessarily lead to guaranteed accessibility forever.

What's Needed for Maintenance:
  • Tools for analysis
  • Tools for preservation
  • Security against attacks
  • A system that can withstand chance in an electronic environment
While Hedstrom points out all these facts about the difficulties in preserving for a long period of time, Brian F. Lavoie uses some of these negatives to discuss how open access can create a stable and more secure digital environment for preservation.

Lavoie gives a detailed and interesting summation of the OAIS and how it works. Many different Designated Communities are involved with many different archived content. It appears that Designated Communities determine the types of archived content that is added to the OAIS and from there the materials are accessible to that Designated Community. While making sure the documents and images are digitized properly and archived properly is important, the OAIS deems the designation of the community to be even more important from a preservation standpoint.
In order for the information that has been archived to be understood by users it has to be used and set up by users that are familiar with it. Otherwise, it wouldn't be accessible. This is a very interesting way to look at preservation that I had not previously thought of. It occurred to me that a lack of appropriate technology could hinder the use of a digitized object, but not the lack of a person who understands what the digitized object means or why it is there.

"Actualized Preservation Threats" provides another take on the preservation front. Media failure, hardware failure, software failures and operation failures make up the 4 aspects that pose a constant threat to digital preservation. While this article has a lot more technical language in it, it is harder for me to absorb the information with my computer-illiterate mind. But even though it took longer to digest, the information was more clearly spelled out about the types of threats that one who is creating a digital library would have to watch for.

While all three authors brought very specific and relevant points to the table, Littman did so with much more detail to the types of threats that digitization can cause in the name of preservation.
While all of the threats are of great concern, the most threatening, in my opinion, would be those related to hardware failures. The thought of losing all trace of a digitized item because the hardware fails just reinforces the fact that one should always have a backup storage system and very often should have multiple backup storage systems. One of them should be fireproof.

Overall I think I learned a lot about the long-term goals of digitization and how it can be used to help preserve the past- if it is done correctly.

Muddiest Points for Week 5 Lecture
How well does Javascript relate to XML?

Friday, September 26, 2008

Flickr Assignment

Here is the link to the URL of my Flickr page, where you will find only the photos for the assignment.

http://www.flickr.com/photos/30803613@N06/sets/72157607475719861/