Here Comes Santa Claus(e)

Digital legal research is amazing but the hype – particularly around AI – is a bit like Christmas magic. It involves a lot of wishing.

Santa's boots coming down a chimney

As the Great Library’s staff heads off to spend the holidays with family and friends, and the doors close for the year, I can sneak in this guest post unobserved, like a night-time intruder in a chimney flue.  As visions of sugar plums dance in my head, I can’t help wonder if the AI in legal publishing research databases dreams of Santa.  Or if it’s not quite clear on the concept.  Sometimes Santa’s little legal research helpers need a bit of guidance.

I don’t really believe legal publishers are using artificial intelligence in anything but marketing materials yet.  Maybe I’ll find a steam train waiting outside my house this Christmas Eve, and it will take me to a place where I can see the AI at work, putting the final touches on its machine learning before the reindeer fly into the sky. #BELIEVE

It’s interesting to see how the legal publisher search engines tweak search terms to attempt to provide that certain Christmas magic that a legal researcher needs.  Sometimes it feels a bit like it’s not SURE what I’m looking for, like a Christmas gone awry.

I suppose a “Santa class” would be a hard class action to join.  I mean, there is only one Santa Claus, isn’t there?

In Canada, where so many cultures that celebrate Christmas come together, we don’t always use the same term for the same thing.  Are you looking for Santa Claus? Father Christmas? Père Noël?  It can be important to know when you’ve got a term that has more than one option.  My favorite example has always been (marijuana OR marihuana) but I’m sure that’s not what ol’ Kris Kringle has in his pipe!

I was curious, so I ran Santa through each database, and, not surprising to a law librarian, the results weren’t exactly the same.  (Okay, the French results disparity was a bit surprising)  I was glad that the search “alias Santa Claus” didn’t bring up any bad rap sheets.  If I’d spent a bit more time, and investigated which cases were most heavily cited or preferred, I’m sure I would have seen the same top results in each database.  But this is just a look by-the-numbers.

“Santa Claus”166240239
“Father Christmas”242525
“Pere Noel”209169111
All 3 (connected by OR)396414347
         Δ between individual terms and all 3 together (should be zero?)

You may be wondering if Kris Kringle is worth a try as a search term.  No, unless you’re looking for office party secret santa-related cases.  I hadn’t heard of Kris Kringle gifts but that’s a thing.  Also, if you’re considering distributing stun guns as Kris Kringle, you may want to think again.

What I can’t figure out is why the sum search (all three connected by an OR connector) retrieves MORE or FEWER than adding each individual search results together.  At least it was inconsistent across platforms.  I even reran the searches (here’s a LexisNexis Advance example) to try to dedupe the results:

“”santa claus”” % “”father christmas””
“”santa claus”” % “”pere noel””
“”father christmas”” % “”pere noel””&c.

It was a bit more difficult on WestlawNext Canada because the search engine kept interrupting.  It was like an elf that wants to be a dentist; just give me results that respond to the search I gave you.  Finally, I had to explicitly tell it to leave out cases that had to do with parental visitation over the holidays:

advanced: (“pere noel” “father christmas” “santa claus”) % (“father’s Christmas”)

I feel like artificial intelligence should be able to know when terms mean the same thing.

Sometimes we look up the chimney to see if that elf really is listening to whether we’re good or bad, and we don’t find the magic.  For example, a search for “patriot act” across US materials doesn’t retrieve the USA Patriot Act.  (It was faster to find it using Google and Wikipedia) A new legal researcher would spend an hour looking at federal materials without realizing that it wasn’t in their results set.

A similar example of this intelligence appeared when I looked for cases involving miscreants entering homes and getting stuck in chimneys.  Someone entering that on LexisNexis Advance is going to end up with st!ck results.  It would have been nice to have been asked whether I meant stuck or stick.  The ubiquity of did you mean reflects the lack of hubris in machine learning.  The system might be mistaken, and the omission of a check (did you mean?) suggests it knows and doesn’t need to ask any more.

This algorithm is at risk of going on the naughty list.

The real intelligence in legal research is the person doing the work.  And while the technology powering the tools we use – whether paper or silicon – is amazing, the impact of artificial intelligence or machine learning for the researcher appears to be hype for the moment.  Perhaps we should just embrace that natural language is much better than it was, which is something tangible researchers can get behind.

I was comforted, realizing that, while we may not be driving our own cars in the near future, legal research faces many of the same challenges it always has.  Like just knowing where all the chimneys are, and realizing you need to ensure every child gets their gift.