lawtomated
  • Home
  • About
  • Law
    • All Access to Justice Future lawyers Knowledge Management Law Firms Open-source law Practice of Law
      Access to Justice

      Divorce disruptors – how LawTech start-up amicable is…

      Buying Software

      Selling to Legal Teams: Attention to Detail

      Buying Software

      Selling to Legal Teams: Who to Sell

      Buying Software

      Selling to Legal Teams: 3 Mistakes To Avoid

      Access to Justice

      Divorce disruptors – how LawTech start-up amicable is…

      Future lawyers

      To Code or Not to Code: should lawyers…

      Knowledge Management

      Google Document Understanding AI – features, screenshots and…

      Knowledge Management

      Structured Data vs. Unstructured Data: what are they…

      Law Firms

      Selling to Legal Teams: Attention to Detail

      Law Firms

      Selling to Legal Teams: Who to Sell

      Law Firms

      Selling to Legal Teams: 3 Mistakes To Avoid

      Law Firms

      Killer software demos that win legaltech pitches

      Open-source law

      Open Source Contracts: Part 4

      Open-source law

      Open Source Contracts: Part 3

      Open-source law

      Open Source Contracts: Part 2

      Open-source law

      Open Source Contracts: Part 1

      Practice of Law

      Why are lawyers unhappy?

  • Legaltech
    • All Buying Software Selling Software
      Events

      Introducing Legal Innovators California – 9th June 2022…

      Events

      Future Lawyer Week UK is coming to London…

      Legaltech

      Is mobile the future of legaltech?

      Legaltech

      Future Lawyer Week 2021!

      Buying Software

      Why you should look beyond legaltech: 4 surprising…

      Buying Software

      Selling to Legal Teams: Attention to Detail

      Buying Software

      Selling to Legal Teams: Who to Sell

      Buying Software

      Selling to Legal Teams: 3 Mistakes To Avoid

      Selling Software

      Selling to Legal Teams: Attention to Detail

      Selling Software

      Selling to Legal Teams: Who to Sell

      Selling Software

      Selling to Legal Teams: 3 Mistakes To Avoid

      Selling Software

      Killer software demos that win legaltech pitches

  • Coding
    • Coding

      Coding for beginners: 10 tips on how you…

      Coding

      Coding for beginners: what to learn, where, how…

      Coding

      Coding for beginners: what to learn, where, how…

      Coding

      To Code or Not to Code: should lawyers…

      Coding

      Open Source Contracts: Part 4

  • Careers
    • All Guide Profile
      Careers

      Legaltech Careers: Sharan Kaur, Legaltech Consultant

      Careers

      Legaltech Careers: Dave Wilson, Managing Director & Founder…

      Careers

      The Legaltech cheat sheet. All you need to…

      Careers

      Leaving the Law for Legaltech, Legal Ops or…

      Guide

      The Legaltech cheat sheet. All you need to…

      Guide

      Leaving the Law for Legaltech, Legal Ops or…

      Guide

      Legaltech Careers: Nitish Upadhyaya, Senior Innovation Manager, A&O’s…

      Guide

      Legaltech Careers Guide: Roles, Salaries & Work /…

      Profile

      Legaltech Careers: Sharan Kaur, Legaltech Consultant

      Profile

      Legaltech Careers: Dave Wilson, Managing Director & Founder…

      Profile

      Legaltech Careers: Mary Bonsor, CEO and Co-Founder of…

      Profile

      Legaltech Careers: Devshi Mehrotra, CEO & Co-Founder of…

  • A.I.
    • All Accuracy, Precision, Recall & F1 Score Deep Learning Hype I.A. Machine Learning Reinforcement Learning Supervised Learning Unsupervised Learning
      A.I.

      Contracts and the data capture challenge

      A.I.

      The evolution of Natural Language Processing and its…

      A.I.

      Legaltech adoption barriers. How many apply to your…

      A.I.

      Explainable AI – All you need to know….

      Accuracy, Precision, Recall & F1 Score

      4 things you need to know about AI:…

      Deep Learning

      The evolution of Natural Language Processing and its…

      Deep Learning

      Explainable AI – All you need to know….

      Deep Learning

      Machine learning with school math. Yes, you learnt…

      Deep Learning

      10 hype busting A.I. articles everyone should read

      Hype

      10 hype busting A.I. articles everyone should read

      Hype

      Can your AI vendor answer these 17 questions?…

      Hype

      Why the “I” in A.I. needs to go

      I.A.

      I.A. vs. A.I. – what’s the difference and…

      Machine Learning

      Contracts and the data capture challenge

      Machine Learning

      The evolution of Natural Language Processing and its…

      Machine Learning

      Explainable AI – All you need to know….

      Machine Learning

      Machine learning with school math. Yes, you learnt…

      Reinforcement Learning

      10 hype busting A.I. articles everyone should read

      Reinforcement Learning

      A.I. Technical: Machine vs Deep Learning

      Supervised Learning

      Machine learning with school math. Yes, you learnt…

      Supervised Learning

      4 things you need to know about AI:…

      Supervised Learning

      10 hype busting A.I. articles everyone should read

      Supervised Learning

      A.I. Technical: Machine vs Deep Learning

      Unsupervised Learning

      Machine learning with school math. Yes, you learnt…

      Unsupervised Learning

      10 hype busting A.I. articles everyone should read

      Unsupervised Learning

      A.I. Technical: Machine vs Deep Learning

      Unsupervised Learning

      Google enters the contract extraction space!

  • Contact
lawtomated
  • Home
  • About
  • Law
    • All Access to Justice Future lawyers Knowledge Management Law Firms Open-source law Practice of Law
      Access to Justice

      Divorce disruptors – how LawTech start-up amicable is…

      Buying Software

      Selling to Legal Teams: Attention to Detail

      Buying Software

      Selling to Legal Teams: Who to Sell

      Buying Software

      Selling to Legal Teams: 3 Mistakes To Avoid

      Access to Justice

      Divorce disruptors – how LawTech start-up amicable is…

      Future lawyers

      To Code or Not to Code: should lawyers…

      Knowledge Management

      Google Document Understanding AI – features, screenshots and…

      Knowledge Management

      Structured Data vs. Unstructured Data: what are they…

      Law Firms

      Selling to Legal Teams: Attention to Detail

      Law Firms

      Selling to Legal Teams: Who to Sell

      Law Firms

      Selling to Legal Teams: 3 Mistakes To Avoid

      Law Firms

      Killer software demos that win legaltech pitches

      Open-source law

      Open Source Contracts: Part 4

      Open-source law

      Open Source Contracts: Part 3

      Open-source law

      Open Source Contracts: Part 2

      Open-source law

      Open Source Contracts: Part 1

      Practice of Law

      Why are lawyers unhappy?

  • Legaltech
    • All Buying Software Selling Software
      Events

      Introducing Legal Innovators California – 9th June 2022…

      Events

      Future Lawyer Week UK is coming to London…

      Legaltech

      Is mobile the future of legaltech?

      Legaltech

      Future Lawyer Week 2021!

      Buying Software

      Why you should look beyond legaltech: 4 surprising…

      Buying Software

      Selling to Legal Teams: Attention to Detail

      Buying Software

      Selling to Legal Teams: Who to Sell

      Buying Software

      Selling to Legal Teams: 3 Mistakes To Avoid

      Selling Software

      Selling to Legal Teams: Attention to Detail

      Selling Software

      Selling to Legal Teams: Who to Sell

      Selling Software

      Selling to Legal Teams: 3 Mistakes To Avoid

      Selling Software

      Killer software demos that win legaltech pitches

  • Coding
    • Coding

      Coding for beginners: 10 tips on how you…

      Coding

      Coding for beginners: what to learn, where, how…

      Coding

      Coding for beginners: what to learn, where, how…

      Coding

      To Code or Not to Code: should lawyers…

      Coding

      Open Source Contracts: Part 4

  • Careers
    • All Guide Profile
      Careers

      Legaltech Careers: Sharan Kaur, Legaltech Consultant

      Careers

      Legaltech Careers: Dave Wilson, Managing Director & Founder…

      Careers

      The Legaltech cheat sheet. All you need to…

      Careers

      Leaving the Law for Legaltech, Legal Ops or…

      Guide

      The Legaltech cheat sheet. All you need to…

      Guide

      Leaving the Law for Legaltech, Legal Ops or…

      Guide

      Legaltech Careers: Nitish Upadhyaya, Senior Innovation Manager, A&O’s…

      Guide

      Legaltech Careers Guide: Roles, Salaries & Work /…

      Profile

      Legaltech Careers: Sharan Kaur, Legaltech Consultant

      Profile

      Legaltech Careers: Dave Wilson, Managing Director & Founder…

      Profile

      Legaltech Careers: Mary Bonsor, CEO and Co-Founder of…

      Profile

      Legaltech Careers: Devshi Mehrotra, CEO & Co-Founder of…

  • A.I.
    • All Accuracy, Precision, Recall & F1 Score Deep Learning Hype I.A. Machine Learning Reinforcement Learning Supervised Learning Unsupervised Learning
      A.I.

      Contracts and the data capture challenge

      A.I.

      The evolution of Natural Language Processing and its…

      A.I.

      Legaltech adoption barriers. How many apply to your…

      A.I.

      Explainable AI – All you need to know….

      Accuracy, Precision, Recall & F1 Score

      4 things you need to know about AI:…

      Deep Learning

      The evolution of Natural Language Processing and its…

      Deep Learning

      Explainable AI – All you need to know….

      Deep Learning

      Machine learning with school math. Yes, you learnt…

      Deep Learning

      10 hype busting A.I. articles everyone should read

      Hype

      10 hype busting A.I. articles everyone should read

      Hype

      Can your AI vendor answer these 17 questions?…

      Hype

      Why the “I” in A.I. needs to go

      I.A.

      I.A. vs. A.I. – what’s the difference and…

      Machine Learning

      Contracts and the data capture challenge

      Machine Learning

      The evolution of Natural Language Processing and its…

      Machine Learning

      Explainable AI – All you need to know….

      Machine Learning

      Machine learning with school math. Yes, you learnt…

      Reinforcement Learning

      10 hype busting A.I. articles everyone should read

      Reinforcement Learning

      A.I. Technical: Machine vs Deep Learning

      Supervised Learning

      Machine learning with school math. Yes, you learnt…

      Supervised Learning

      4 things you need to know about AI:…

      Supervised Learning

      10 hype busting A.I. articles everyone should read

      Supervised Learning

      A.I. Technical: Machine vs Deep Learning

      Unsupervised Learning

      Machine learning with school math. Yes, you learnt…

      Unsupervised Learning

      10 hype busting A.I. articles everyone should read

      Unsupervised Learning

      A.I. Technical: Machine vs Deep Learning

      Unsupervised Learning

      Google enters the contract extraction space!

  • Contact
A.I.Machine LearningSupervised LearningUnsupervised Learning

Supervised Learning vs Unsupervised Learning. Which is better?

by info@lawtomated.com April 8, 2019
April 8, 2019 0 comment
12 min read

A.I. systems, including legal ones, typically use a form of artificial intelligence known as machine learning (sometimes also rules and search). For the machine learning elements, a distinction is drawn between supervised learning vs unsupervised learning.

We’ll explain:

  • what each of these mean;
  • how they work, plus an example of each in a legal context;
  • when to use each, and which of supervised learning vs unsupervised learning is better; and
  • the out of the box = unsupervised learning misconception.

Supervised Learning

learning from labels

Supervised learning requires labelled data. That data is typically labelled by a domain expert, i.e. someone who is expert at identifying what labels go with what data. In the legal context, this will be a lawyer or legally trained individual.

In the consumer space, this is often you! For instance, Facebook is great at automatically tagging your friends in photos.

Why is that? It is because of the historical training you provided – and continue to provide – when manually tagging photos of your friends. Over time, with more examples of your friends in different conditions (lighting, angles and obscuring detail), Facebook’s algorithms learn how to tag photo A as “Arnold” and photo B as “Linda”.

Legal A.I. systems identifying and extracting clauses (or intra-clause data, e.g. a financial number such as rent amount) also achieve this via supervised learning.

For example, a legal A.I. due diligence tool may extract governing law from SPAs. To do so, either vendor or user provides the system with labelled examples of governing law clauses.

This process is known as training. In doing so a supervised machine learning algorithm is used to generate a predictive model.

A predictive model is a mathematical formula able to map a given input to the desired output, in this case, its predicted classification, i.e. the correct governing law. The model is predictive because it relies on statistical and probabilistic techniques to predict the correct governing law based on historical data.

A basic workflow describing the above process for the governing law example is shown below:

The above generates a predictive model mathematically optimised to predict whether a given combination of words is more or less likely to belong to a particular label.

In machine learning terms this type of supervised learning is known as classification, i.e. because we are building a system to classify something into one of two or more classes (i.e. governing laws).

Accurate though it might become, the model never understands neither the labels nor what it is labelling. As we always like to stress at lawtomated, machine learning is maths not minds.

If you are interested in digging deeper, check out our forthcoming guide to training, testing and cross-validation of machine learning systems, which are each fundamental concepts in any machine learning system, albeit usually abstracted or unavailable to the users of via the UI of legal A.I. systems.

Unsupervised Learning

Pattern spotting

Unlike supervised learning, unsupervised learning does not require labelled data. This is because unsupervised learning techniques serve a different process: they are designed to identify patterns inherent in the structure of the data.

A typical non-legal use case is to use a technique called clustering. This is used to segment customers into groups by distinct characteristics (e.g. age group) to better assign marketing campaigns, product recommendations or prevent churn.

A common legal use case for this technique is diagrammed below in the case of A.I. powered contract due diligence:

As the above illustrates we start with a disorganised bag of governing law clauses. An unsupervised technique such as clustering can be used to identify statistical patterns inherent in the data, clustering similar governing law clause formulations together but separate from dissimilar items.

In this example, the data scientist – or in some cases the end user to the extent such controls are exposed via a UI – can adjust the similarity threshold, typically a value between 0 and 1.

If set to 1 the algorithm will cluster together only identical items, i.e. identifying duplicates. This turns data – random clauses – into information we can use, i.e. we now understand the dataset contains duplicate data, which in turn may be a valuable insight.

If set to 0 the algorithm will cluster apart items that are entirely distinct from one another.

A setting between 0 and 1 will cluster data into varying cluster sizes and groupings. To be clear, a setting of 0.8 would cluster together clauses 80% similar. Users might use this to detect near duplicates, i.e. documents that are virtually but not entirely identical.

Which is better: supervised or unsupervised?

(Hint: You’re asking the wrong question)

Here’s a helpful analogy for the supervised learning vs unsupervised learning question.

Ask yourself: which is better, screwdriver or hammer?

The answer is neither. They serve similar but different purposes, albeit sometimes work hand in hand (literally) to achieve a bigger outcome, e.g. a set of shelves.

In the same way, when people ask the question – “Which is better supervised or unsupervised learning?” – the answer is neither, albeit they are often combined to achieve an end result.

For example, unsupervised learning is sometimes used to automatically preprocess data into logical groupings based on the distribution of the data, such as in the clause clustering example above. This might result in groupings based on the type of paperwork used for a contract type, e.g. all the contracts stemming from template A may fall into one cluster vs. those falling into a separate cluster. This turns data into useful information to the extent it was not previously known, nor immediately identifiable, by a human reviewer.

This may, in turn, assist human domain experts with their dataset labelling, e.g. by identifying which documents will most likely contain representative examples of the data points they wish to label at a more granular level and those which won’t. The subsequent labelling will then feed into a supervised learning algorithm that produces the final result, e.g. a due diligence report summary of red flag clauses in an M&A data room.

To recap: the left finds logical groupings; the right identifies a boundary between 2 classes.

Out of the box vs. Unsupervised Learning

good vendors distinguish, bad vendors disguise

Any legal team buying an A.I. system will want to know which is best for them. Vendors in the crowded A.I. contract due diligence space typically provide one or both of two features:

  1. OOTB Extractors: these are product features pre-trained by the vendor to identify and extract popular contract provisions, e.g. governing law, termination, indemnity etc.
  2. Self-trained extractors: these are product features capable of training by the user to generate a user-specific predictive model for a contract provision of their choosing and design.

In either case, someone has to train the system with labelled data. This is because both techniques are supervised learning techniques of the sort described above.

Unfortunately, some vendors deliberately or by omission lead people (media, buyers and users) to believe that because something comes ready and working “out of the box” (aka “OOTB“) this means it uses unsupervised learning.

This is patently false: it will have been trained by the vendor if it is performing a classification task such as extracting clauses from contracts.

By extension, conflating OOTB Extractors with unsupervised learning is usually intended to suggest their solution is superior to products without such features, i.e. because it “requires no training” or worse implies the system “just learns by itself”. Again, this is inaccurate and misleading.

OOTB Extractors vs. Self-trained Extractors

Another bake-off!

Flowing from the above, and as with the earlier point about which of supervised vs. unsupervised learning is better, so too the question of OOTB Extractors vs. Self-trained Extractors.

Recall both are supervised learning techniques. The differences however are these:

OOTB ExtractorsSelf-trained Extractors
WhoVendor trainedUser trained
WhatPublic data, e.g. filings at SEC, Companies House, etcUser’s data, e.g. document management system (“DMS“) but also public data to the extent users curate a dataset from public sources
HowGood vendors actively disclose this in some detail.

Usually involves a senior lawyer deciding on the initial labelling methodology and examples, which is then replicated by junior lawyers / law students across a wider dataset.

This is then iterated alongside vendor’s technical team to gradually improve performance.

Bad vendors will not disclose this process in any detail.

Understanding this is vital as a buyer: you are trusting someone else, their methodology and dataset – are you sure the who, what and how meet your needs and quality controls?


Depends on the application and the user’s own methodology.

In theory, it should mirror something similar to what the vendor has done (to the extent the vendor has invested efforts to train and sell its own provisions).

Unfortunately, a lot of self-training features are not well workflowed, explained or robust enough to manage in the same way knowledge lawyers expect to create and curate knowledge (e.g. versioning, access and editing permissions etc).
ProsReady to use out of the box
Bespoked to user needs, not the market’s
ConsTrained on public data, which may be biassed toward certain languages, jurisdictions and / or document types. For instance, many vendors use data sources from the SEC filing system in the USA and UK Companies House, both of which bias toward English language documents with a UK or US centric focus and, with regard to the SEC, only certain types of companies and documents.

Training methodology and quality of vendor side trainers may be less experienced than user’s either in general, or with regard to the specific domain challenges of a user’s practice area or business need.

Often these provisions are locked in the sense the user cannot “top up” the provision with additional user training, either to improve its accuracy in general or to tailor it toward a specific variation on a data point, e.g. tuning an OOTB Extractor trained on lease assignment clauses to work with leveraged finance agreement assignment clauses (which are quite different!)
Requires training, both the users in how to train the system, and the trained user training of the system itself.

Self-training features are usually underdeveloped and do not provide the typical controls a knowledge management lawyer might expect to find, e.g. version control, access / read / write permissions for model curation and sharing purposes.

Nor do such tools provide a UI with a similar degree of finesse available to a data scientist with regard to tuning the test, training and cross-validation datasets.

This can be a blocker to uptake and adoption of the product and / or the self-training features, i.e. because it’s hard to properly tune models let alone version them to protect their integrity.

Conclusion

Hopefully, you’ve learnt:

  • What is supervised learning.
  • What is unsupervised learning.
  • How each of the above work (at a high level).
  • A basic use case example of supervised learning vs unsupervised learning.
  • The key difference for most legal use cases: that supervised learning requires labelled data to predict labels for new data objects whereas unsupervised learning does not require labels and instead mathematically infers groupings.
  • That neither supervised learning nor unsupervised learning is objectively better; each serves different purposes, albeit can be (and often are) used in combination to achieve a larger goal.
  • That unsupervised learning and OOTB pre-trained extractors are not the same, that the latter is, in fact, supervised learning (albeit trained by the vendor) and doesn’t simply “learn by itself”!
  • The who, what, how, pros and cons of OOTB pre-trained extractors vs. self-trained extractors.

If you want to learn more about artificial intelligence, check out this article. If you’re interested to appreciate the differences between machine learning and deep learning head over to here.

A.I.Artificial IntelligenceMachine LearningSupervised LearningUnsupervised Learning
0 comment
previous post
Structured Data vs. Unstructured Data: what are they and why care?
next post
I.A. vs. A.I. – what’s the difference and why I.A. comes before A.I. (Part 1)

Related Posts

Contracts and the data capture challenge

October 14, 2021

The evolution of Natural Language Processing and its...

July 19, 2021

Search

Stay in touch

Facebook Twitter Instagram Linkedin Email

Tweets

Great piece by @LegalTechHub1 summarizing critical evaluation criteria for use of LLMs in legal settings. Asking t… https://t.co/3dxfx7UFzw

17-Mar-2023

Reply Retweet Favorite
Unsurprising but exciting. To what extent will this overlap with or challenge the "for legal" apps offering, or lau… https://t.co/Gzw4jYrMAs

16-Mar-2023

Reply Retweet Favorite
GPT4 demo's last 5 mins should be a must watch for legal / tax folks! Spoiler Alert: snippet of US tax code is use… https://t.co/RqcljH3ILK

14-Mar-2023

Reply Retweet Favorite
AI-generated works, artists and IP. Who owns what? Great read (with a US focus) on some key themes re generative… https://t.co/Sn289Ce4zW

06-Mar-2023

Reply Retweet Favorite
A #legaltech rap by @bing chat 🎶 Yo, listen up, I got a story to tell, About legaltech and how it's changing the… https://t.co/ofrNBdfIVj

02-Mar-2023

Reply Retweet Favorite

Popular Posts

  • 1

    Structured Data vs. Unstructured Data: what are they and why care?

  • 2

    Legaltech vs Lawtech. What is the difference between legaltech and lawtech? Is there one and does it matter?

  • 3

    Is mobile the future of legaltech?

  • 4

    Future Lawyer Week UK is coming to London for its fifth edition!

  • 5

    Legaltech jobs: How to get one? Routes in, skills to learn and where to find them

Categories

Tags

A.I. AI Artificial Intelligence Avvoka Buying Software Career Profile Careers Coding Contract Data Deep Learning DMS Document Management System EdX Git GitHub Google Hype iManage Javascript Law Law Firms Lawtech Lawyers Legal Legal A.I. Legal Drafting Legal Innovation Legal Ops Legal Teams Legaltech Legatics Linux Machine Learning Marginal Gains Office & Dragons Open-source law Open-source software Open Source Initiative OSS Python Search Selling Software Supervised Learning Unsupervised Learning
  • Facebook
  • Twitter
  • Instagram
  • Linkedin
  • Email
  • Reddit
  • RSS

@2020 - All Rights Reserved Lawtomated