Video: What libraries and search engines have in common

Type a few words, get back a ton of results, ranked by relevance -- how, exactly, does that work?


As the original huge data depots, libraries needed to solve the problem of being able to locate the data an individual wants within that depot. Thus came indexes and catalogs to tell you where to look. (Thank you, Dewey, for your decimal system!)

When we started search through computers, we mimicked what was being done in libraries: Create an index of all the stuff, then refer to the index to find out where's the thing you're looking for. Search engines further scale up this process, this time for the Web, but a lot goes into making a search engine functional or useful. In the video below, Max L. Wilson, Assistant Professor in Human-Computer Interaction and Information Seeking at the University of Nottingham, gives a  rundown of how search engines work and how they've evolved.

Term frequency is a very rough sorting method. Then comes inverse document frequency to lessen the impact of commonly used words. It's not fast, though -- and that's where indexes come in, as it's faster to search an index than it is to search all the documents. Then come techniques like stop words and stemming, which help surface the most relevant results for a given search term. On and on it goes.

We take it for granted that we can type a few words into a field and get back a bunch of relevant info, but getting those algorithms right is no small task -- especially as the sheer volume of data continues to grow by leaps and bounds.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies