Submitted by Karen G. Schneider on April 3, 2006 - 3:02pm
In my first article in this series, I wrassled with the biggest bear in the forest: how most online catalogs lack relevance ranking. That's one big hairy bear, but as some readers pointed out, it's a little forced to pick on relevance ranking, out of the context of all the other important features most online catalogs don't offer—or are features implemented so badly that librarians disable these features rather than further confuse the poor user, who just wants to find a book or DVD, for crying out loud.
So rather than plunge into another specific feature, I'm back tracking just enough to give you the Checklist of Shame—key features common to most search engines (even the least expensive), features often missing in online catalogs. Even this is an abbreviated list; the search-engine test instrument I've developed for My Place Of Work (MPOW) is seven pages long.
I agree with Eric Lease Morgan's comment on my last piece that librarians tend to ask for esoteric features at the expense of core functionality. I continue to be surprised at the people who tell me how a catalog "should" work but haven't done a lick of user analysis, forensic, heuristic, academic, or otherwise, to back their theories.
But here's a rule of thumb: in general, if the 800-pound gorillas, such as Google and Ask.com, offer a feature (like default setting), you should mimic the gorillas and offer the same feature—and give that feature priority in your considerations. Furthermore, it's common-sense usability practice that you should offer that big-gorilla search-engine feature the way the gorillas offer it—because users will come to your catalog with user behavior learned from such search engines as Google and Ask.com. (Don't ever rely on help files to "teach" people. In last year's usability testing at MPOW, the only person who read our help files, out of a group of techies, librarians, and academics, was the 25-year-old soccer mom.)
I also list features used primarily by aficionados. This group—ranging from in-house librarians to information super-users—can be influential, and they are often engaged with your catalog at a level that can prove hugely informative. So many search engines support aficionado features that it's easy enough to support their preferences. Furthermore, in my experience, aficionados will also tell you when an esoteric feature is completely pointless, even for them. Just don't let the aficionado input drown out common-sense decisions.
Features Your OPAC Wishes It Had
- Relevance ranking—As I explained earlier, on TF/IDF (term frequency/inverse document frequency), relevance rank is the essential building block to ensure the most likely search results rise to the top. Every search engine on the planet relies on relevance ranking. Many online catalogs don't offer it ("system sorted," anyone?) or implement it bizarrely. (I agree with comments that relevance ranking and online catalogs can be hard to do well, but I disagree that adding relevance ranking cannot be done at all; the NCSU catalog makes that clear.)
- Stemming—To steal from a couple of good Web definitions, stemming is "a method by which Search Engines associate words with prefixes and suffixes to [a] word stem to make the search broader," such as returning the same results for "applies, applying, and applied."
After relevance ranking, stemming is arguably one of the most important search features for an online catalog, where search success hinges precipitously on searching the relatively scanty metadata of MARC records. Yet even huge search engines (such as Google) with the luxury of massive amounts of full text to improve matching, use stemming. (I've watched Google turn stemming off and on and tinker with it—clearly they think about stemming a lot.)
- Field weighting—First runner-up for second most important feature in a search engine. You can tweak field weighting to give more or less prominence to fields. For example, titles are often given more importance, allowing the first few hits for the search term million to retrieve books with million in the title.
- Spell-checking—Essential, not because people are dumb, but because people make mistakes. If anyone gets snobby with you when you bring up spell-check, just tell them Jane Austen was a notoriously bad speller; she misspelled one of her teenage works as “Love and Freindship.” (Thank goodness for
- Refining original queries—If you type in a term such as butterfly, after viewing the results, you may want to tweak that search to add a term such as conservation. A good search engine will present the search terms in the search box or otherwise make it very easy to view and modify the original search.
- Support for popular query operators—For example, supports + and – for "required" and "not." It's also okay to offer older query operators, such as and, for backward compatibility to people who have been searching your catalog since Melvil Dewey was a circ clerk, but those older query operators are not substitutes for what people are using today. For that matter, things change over time, so the ability to add a new query operator synonym is valuable.
- The Boolean bag o' goods—Can the search engine support quoted searching ("declaration of independence"), wildcard searching (appl*), proximity searching (cheese near cheddar), or give preference to case (AIDS versus aids)? Most people don't use these features, but your aficionado users will look for them, and nearly all search engines, even the entry-level products, offer these features. Any vendor who moans these are difficult and expensive to offer is blowing smoke in your ear.
- Flexible default query processing—Basically, can you decide that search results will be "anded" (meaning that all terms must be matched) or "orred" (meaning that any term must be matched)? Google changes its features over time, but Google's settings might not be the best choice for your catalog (something to keep in mind if you evaluate the Google Appliance). You'll only know through usability testing, and the search engine shouldn't make that decision for you.
- In-line query limiters—The ability to search in-line by a field, the way in Google you can limit your searches, for example, with site: edu. This is a capability that will be used by a tiny fraction of your users. I wouldn't trade it for relevance ranking and field weighting, but then, every search engine I've evaluated this spring offers this feature. Extra credit for being able to select and label the limiters any way you want.
- Duplicate detection—This is an interesting search-engine feature to discuss for online catalogs. It raises the issue of FRBR (pronounced FER-ber)—Functional Requirements for Bibliographic Records—which is, to be grossly reductive, duplicate management for online catalogs, so that a user isn't stumped by five records for what is essentially the same item. But in a search engine, duplicate detection simply flags multiple records for the same item and ideally gives you control over how to handle search results when duplicates are detected.
- Sort flexibility—You don't want to overwhelm users with options for sorting search results, but can you at least offer them the capability to switch between relevance and date? Also, can you offer other sorting that might be a nice local option (the way some store Web sites offer sorting by price or user rating)? Even more crucially, can you control where the search engine pulls its date information—ensuring that the indexed "date" comes from a locally controlled field, rather than simply the HTTP header?
- Character sets—Although most search engines offer flexible support for other languages, many online catalogs can barely handle one character set. I recently observed ALA Council debating a resolution on non-Roman characters in online catalogs that was ultimately shot down because it didn't come from ALCTS—a classic example of NIH (Not Invented Here). Forget ALA subcommittees: the pressure needs to come from you, gentle reader.
- Faceting—This is a "21st-century search engine" feature that some search engines grew up around and that older search engines are scrambling to add. Faceting manipulates search results to make it easy to browse by category. Search the NCSU catalog for the phrase civil war, and browse by LCSH or publisher; search Landsend.com with the term pants, and see choices arranged by size, cost, and other metadata.
Avi Rappoport, search guru extraordinaire, explains faceting thoroughly in www.searchtools.com/info/faceted-metadata.html. Online catalogs offer such rich metadata that it's a shame not to offer faceting.
- Advanced search—My favorite chimera! In most search engines, most notably Google (www.google.com/advanced_search?hl=en ), the "advanced search" page is largely a "junior" search page that walks the user through fielded and Boolean searches. At MPOW, we shamelessly stole their page for our own (http://lii.org/pub/htdocs/adv_search_home.htm). There's nothing wrong with that, and the "advanced search" page can be convenient place to offer popular date-searching options or other nice tweaks. But users should be able to perform most truly advanced features through inline operators in the search engine's basic search box, so that the handful of hopeless nerds like me who think it's bang-up fun to do a search such as wine-cheese site: edu won't have to plod through a fielded page to do so.
- Easily customized search-result pages—The word easily should be understood to refer to people with respectable HTML skills, not to people who pay people to do that kind of work (for about the same reason I don't give myself root canals). Still, good search engines provide strong templating systems for developing search-results pages that integrate well with your overall design. Extra credit for default templates that validate to published HTML standards and meet Priority 2 accessibility requirements.
- Human suggestions (also called "best bets," etc.)—Can you force an item to the top of search results? (Can you then charge publishers for premium results? Just kidding, just kidding…) This smart discussion of best bets www.steptwo.com.au/papers/cmb_bestbets/ has a great screen capture of this feature in action. Best bets are particularly nice when you have good search analysis to indicate what people are searching for most frequently, which brings up…
- Search logging and reports—You need to know what's working for your users and what isn't. Your basic transaction logs (how many hits to the server and where the hits come from) aren't adequate for this. A good search engine will, at minimum, log top queries by frequency and top queries with no hits. Also look for trend reports you can use to tweak the search engine, for example, by adding terms to records to make them more findable (the way I saw librarians add Brokeback Mountain to the notes field for records for Annie Proulx's short story collection, Close Range www.worldcatlibraries.org/wcpa/ow/e4d1df37de10d114a19afeb4da09e526.html
- Well-rounded administrative interface—Does every tweak to the search engine require begging some techie to tweak a feature, observing the results, and then begging some more until it's right? Are the search engine's features hidden in largely undocumented mystery meat? Is it impossible to determine the settings at a glance, or at least through intelligent perusal of the administrative section? (Yes, this is a roman Ã clef…one of several drivers in our search for a better search engine at MPOW.)
These are just the high notes of search functionality, and it doesn't cover how well, or badly, vendors provide these features (or how well or badly customers implement them)—topics I'll tackle in future sections in this series. After all that, this checklist doesn't address the much more difficult problem to solve: the sparse, hard-to-search nature of citation indexes. People are now accustomed to full-text searching. Can we make them like an OPAC, no matter how much we fix its search functions?
But think about your own catalog: are these features available? It may well be, as some users wrote me privately, that the OPAC (as separate software purchased by local libraries) is near death's door. I think that's very likely. But if so, anything else we use for a catalog—who's betting on Open WorldCat?—will need good search functionality as well, or it too will suck, only more consistently and on a much larger scale. In the end, as uber-librarian and user champion Marvin Scilken told me many times, the bottom line is public service.Technorati tags: library, library catalog, library catalogs, Online catalogs, OPAC