More on Denton’s How to Make a Faceted Classification and Put It On the Web

During a recent slog up tread­mill hill I read through Wm. Denton’s How to Make a Faceted Classification and Put It on the Web.

I feel bet­ter now. After my grad school deja vu expe­ri­ence with Simplified Facet Classification I was despair­ing of ever being able to bring facets to the mass­es. Or in my case to intel­li­gent but not LIS qual­i­fied engi­neers and oth­er tech types.

You’ll still need some back­ground in clas­si­fi­ca­tion to under­stand the details but it’s a great overview that is under­stand­able by peo­ple with some back­ground knowl­edge mod­el­ing. Most web and soft­ware dev types have, often unknow­ing­ly, done a fair amount of infor­mal knowl­edge modeling.

Denton’s straight for­ward style makes his dis­cus­sion clear enough that, bar­ring melt down
over the tech­ni­cal­i­ties of the entity/instance analy­sis and facet cre­ation in steps 2 and 3
you can hand this essay out as back­ground reading.

Section 1: When to Make a Faceted Classification gives a nice overview of where faceted clas­si­fi­ca­tion sys­tems fits into the field of clas­si­fi­ca­tion and orga­niz­ing schemes in gen­er­al. Denton pro­vides use­ful ques­tions to ask when con­sid­er­ing faceted clas­si­fi­ca­tion. It’s refresh­ing to see a con­sid­er­a­tion of facets that also dis­cuss­es when facets are not the best answer to your questions.

In the sec­ond sec­tion Denton divides the actu­al work of cre­at­ing the clas­si­fi­ca­tion sys­tem (facets and foci) into 7 steps. Beginning with Domain Collection and end­ing with Revision, Testing, and Maintenance. (Love to see that word main­te­nance laid out in black on white!) It’s a sim­ple method­ol­o­gy that will get you through the process and give you a work­able sys­tem at the end of the day.

What fol­lows are a few notes on his descrip­tion of the process.

Starting with what I believe is the miss­ing first step.

Defining Your Domain

Step 0: Define the Domain

You must have a sol­id and agreed on def­i­n­i­tion of the sub­ject and scope of the domain
before you start. We are all aware that assum­ing that what’s already in the sys­tem is the
lim­it of the domain that needs to be con­sid­ered but also be care­ful that you do not make
the assump­tion that every­thing dealt with by your web­site or soft­ware should be includ­ed in
the domain for which you are build­ing the classification.

Building the Faceted Classification

The next five steps make up the build­ing the facets and foci part of the process.

Step 1: Domain Collection.
Step 2: Entity Listing.
Step 3: Facet Creation.

Note that you’ll have to do quite a bit of iter­at­ing over steps 1, 2, and 3. The process of col­lect­ing, ana­lyz­ing, and defin­ing always brings to light bits and pieces that were missed in the first (lat­est) go ’round. It’s just a fact of life, so plan for it.

Denton does not dis­cuss the tech­niques of analy­sis that can be used to get from the enti­ty (things) list to the facets (char­ac­ter­is­tics) list. This sort of analy­sis is com­plex and domain depen­dent and IMHO the most com­mon point of break down for many attempts. I’m always on the look out for mate­r­i­al that describes these tech­niques, as well as “real” world examples.

Once you have a sys­tem of facets and foci you have to decide on how to arrange its pieces:

Step 4: Facet Arrangement

I find Denton’s expla­na­tion of this step not entire­ly clear. You may have to explain that
you are now work­ing with both the foci (terms) and facets. The end result of this step will
be a list of facets and and arrange­ment of the foci with­in each facet. You will definitely
have to reit­er­ate that the foci with­in each facet are arranged in a way that best reflects
the sub­ject of the indi­vid­ual facet. Once again — the things inside one facet don’t have to
be arranged in the same way as the things inside anoth­er facet. (Can you tell that I’ve had
trou­ble with get­ting this across? Database jock­eys are the worst. Facets and foci do not
map well onto tables, fields, and joins.)

Step 5: Citation Order

Citation order is of less impor­tance in an elec­tron­ic sys­tem than it was in the paper
sys­tems in use when faceted clas­si­fi­ca­tion was first invent­ed. Though you may find yourself
in a sit­u­a­tion in which you aren’t going to be able to take full advan­tage of the
flex­i­bil­i­ty of a com­put­er based sys­tem to mix and match your facets for tech­ni­cal or bud­getary rea­sons. No mat­ter how flex­i­ble your sys­tem is you are going to have to decide on a default dis­play order and behav­ior so don’t skip this step entire­ly but don’t allow any­one to get hung up on it either.

Apply the Faceted Classification 

Step 6: Classification

And now we get to the whole point. Applying the clas­si­fi­ca­tion sys­tem to the stuff. And
it’s about as sim­ple as Denton makes it sound. Sometimes…

Before hand­ing this task off to the near­est con­ve­nient, unoccupied,warm body take a clear
eyed look at how many of your facets include terms (foci) that will require judg­ment calls
to get things labeled prop­er­ly. Who’s best qual­i­fied to make these judg­ment calls in a way
that will serve your ‑users-?

Checking it twice, Getting it on the road, and Keeping it running

Step 7: Revision, Testing, and Maintenance.

Note that you have been doing iter­a­tive test­ing and revi­sion through out the cre­ation of
the sys­tem. If you’ve got­ten here with­out hav­ing to rethink or redo any part of your system
you are one of: work­ing in a very lim­it­ed well-know domain, very lucky, not paying
attention.

Actually only revi­sion and test­ing should be includ­ed in step 7. Maintenance should be step
8. Any clas­si­fi­ca­tion sys­tem that does­n’t include main­te­nance as sep­a­rate, on-going phase
is bound to suf­fer ROT.

The final sec­tion: How to Put the Classification on the Web tack­les the ques­tion of how to use your new faceted clas­si­fi­ca­tion scheme to help your users nav­i­gate the stuff. I’ll be dis­cussing Denton’s help­ful sug­ges­tions in anoth­er essay.

Conclusions:

If I were hand­ing this out ’round the table in a con­fer­ence room packed with devs and coders I’d leave out the third sec­tion titled: “How to store the faceted sys­tem in a com­put­er”. The tech­ni­cal Ways, Whys, and Wherefore’s of stor­ing and access­ing meta­da­ta such as a faceted clas­si­fi­ca­tion sys­tem go far beyond what is cov­ered by Denton’s cou­ple of pages of X(F)ML and SQL exam­ples. It nev­er pays to drop a shal­low solu­tion to a prob­lem into a room of peo­ple who are trained to take any prob­lem laid before them and debate the best way to do it.

TQR — Introductory Tutorial on Thesaurus Construction

If the­saurus con­struc­tion is some­thing that comes up only occa­sion­al­ly in the course of your work you should book­mark this tuto­r­i­al cre­at­ed by Dr. Tim Craven of Western Ontario University for his LIS students.

Eight sec­tions take you quick­ly through the basic con­cepts and con­sid­er­a­tions for build­ing a the­saurus. It’s a handy refresh­er that is soft­ware agnostic.

In fact the sec­tion head­ings would make a good out­line for a set of ques­tions to ask the soft­ware ven­dors if you are con­sid­er­ing pur­chas­ing a the­saurus man­age­ment system.

Speaking of the­saurus soft­ware, Dr. Craven also has a hand­ful of free­ware pro­grams to assist in index­ing and the­saurus con­struc­tion. I haven’t checked them out yet and so can’t offer an opinion.

TQR- Berry Picking Time (with apologies to both Ms. Bates and Great Big Sea)

Once in a while it is a good and refresh­ing thing to revis­it some of the clas­sics. In this case a paper that I con­sid­er to be a pri­ma­ry lens for look­ing at infor­ma­tion seek­ing behaviours.

Something struck me as I was reread­ing Marcia Bates’ “The Design of Browsing and Berrypicking Techniques for the On-Line Search Interface” (Published in 1989, a time when on-line search­ing was awk­ward, expen­sive and the pre­serve of aca­d­e­mics and sci­en­tists. We can argue whether or not the sit­u­a­tion has actu­al­ly improved on anoth­er day.)

The berryp­ick­ing (or evolv­ing search) mod­el that she describes is now a wide­ly used short­hand for a set of user behav­iors. Unfortunately like many abbre­vi­at­ed terms, we for­get the full com­plex­i­ty of the ideas that the short­hand represents.

Five of the six spe­cif­ic infor­ma­tion chas­ing strate­gies that she describes as being used by aca­d­e­m­ic searchers are used every­day by the blog­gers and blog read­ers. Blogs have evolved tools for their own ver­sions of:

  • Footnote Chasing: (also known as back­ward chain­ing.) No need to write that cita­tion down and go the library to look up the cit­ed mate­r­i­al, just click on the link in the blog post and get an imme­di­ate look at it.
  • Citation Chasing: (for­ward chain­ing,) Most non-academics don’t ever learn about using a cita­tion index but it’s one of the best ways to move your search for infor­ma­tion for­ward through time. Now with track­backs every­one can do cita­tion chas­ing with­out even know­ing that they are engag­ing in one of the rit­u­als of grad­u­ate school. Also have look at tech­no­rati’s blog reac­tions for links to blog posts that refer to anoth­er post.
  • Journal Run: Instead of sit­ting on the floor of the peri­od­i­cals stacks run­ning your fin­ger down the table of con­tents of each issue of the Journal of Cat-like Things for the last two years just click on the handy archive links in the left (right) hand nav­i­ga­tion pane of the blog.
  • Author Searching: Most blog writ­ers who pub­lish in more than one place add links to their oth­er blogs or guest writ­ing spots in their “home” blogs.

The sixth search tech­nique is a lit­tle hard­er to place in the blog world. At least I thought it was, until I spent some time look­ing at a hand­ful of blogs try­ing to find good exam­ples of the first five techniques.

  • Area Scanning: the habit of look­ing at the adjoin­ing shelves. Once you have found Audubon’s Birds of North America (DDC 598AUD) you will find Kale’s Florida’s Birds (DDC 598.2975 KAL) as well as Garrido’s Field Guide to the Birds of Cuba (DDC 598.097291 GAR) on near­by shelves. Handy if you’re look­ing for infor­ma­tion on birds you might see in the Florida Keys. The blog equiv­a­lent is look­ing at the blog rolls. Perhaps not as tidy as the library shelf mod­el but none-the-less titles co-located by being placed on the same list are like­ly to have use­ful rela­tion­ships to one anoth­er. (This blog is the sad counter exam­ple; my blog roll is exact­ly a list of things that are not relat­ed to the pri­ma­ry top­ic of my essays.)

For the next cou­ple of days I’ll be more aware of which search habits I might be drag­ging from the paper based past into the dig­i­tal present present and think­ing about whether or not they are still use­ful and if use­ful are they well pro­vid­ed for?

Semantic Zooming, Oh, I Thought You Said Semantic Zoning.

Just the usu­al read­ing too quick­ly this a.m. and I got seman­tic zon­ing instead of seman­tic zooming.

On reflec­tion seman­tic zon­ing may be more use­ful con­cept. Think of semat­ic zon­ing like coun­ty zon­ing. You know, urban plan­ning. Perhaps (?) our clas­si­fi­ca­tion schemes need to be a bit more like a city. Some bits of a city we zone and design for easy, cer­tain, and sure access, like a cen­tral core. Some we let ram­ble a bit like the res­i­den­tial neigh­bor­hoods. Some areas we inten­tion­al­ly push toward chaos and sur­prise like parks and gar­dens. How we zone and how rigid­ly we grid an area depends on what peo­ple are like­ly to be doing or seek­ing in each area. The court house and hos­pi­tal, should be imme­di­ate­ly and dis­tinct­ly find­able. The bench under the wil­low in the arbore­tum, not so much.