You are on page 1of 26

Quality of

Classification
What to achieve ?
 Optimum:
All documents pertaining to specific technical area
(concept) are found by classification search

# retrieved relevant documents


Recall = =1
# existing relevant documents

For concepts defined in IPC:


Priority 1: documents have all appropriate symbols

 < > Efficiency:


Priority 2: documents have no inappropriate symbols
Phenomenology of quality issues
 document is unclassified
 has wrong / inappropriate classification
 has outdated / invalid classification
 non-exhaustive / incomplete classification
> appropriate symbols are missing
> given symbols are not specific enough
 varying classifications of family members
 excessive classification
Different aspects
 individual document / publication
- classification by publishing IPO
- and by other IPOs, e.g. EPO > ECLA
DPMA > "ICP"
JPO,… ?
> examiners create their own search files
 different publication levels:
- unexamined (unsearched) applications
- granted patents
 families: in MCD reclassification at family level
 data in different databases
Unclassified documents
Published before 1.1.2006:
many documents in MCD still unclassified / not reclassified:
92% of all documents in MCD*
87% of all documents of EPO members

Published after 1.1.2006:


97% of all documents in MCD
91% of all WO

each week 6 - 8% of WO publications are


not classified at all

*cf IPC/CE/40/4
Unclassified WO documents

12.0% 400

350
% unclassified WO docs / week

Number of unclassified WO / week


10.0%
300
8.0%
250

6.0% 200

150
4.0%
100
2.0% Percentage unclassified
Number unclassified
50

0.0% 06.07.06 14.09.06 23.11.06 01.02.07 12.04.07 21.06.07 30.08.07 08.11.07 17.01.08
0
Publication week
Unclassified WO documents

Publication week 50 (13.12.2007): 260 of 3272 (7.9%)

ISA Receiving Office


EP 218 (84%) US 177
KR 27 (10%) IB 31
AU 5 EP 26
US 5 GB 9
RU 2 KR 3
SE 2 DE 2
CA 1 FR 2
IL 2
:
Lesson : There are still many documents without any valid classification
> Top priority: All documents should have at least one valid classification
courtesy of M. Meier (Audi)

Wrong classification

A61N 1/00 Electrotherapy;


Circuits therefor
courtesy of M. Meier (Audi)

Wrong classification

B60K Arrangement or mounting


of propulsion units or of
transmissions in vehicles

Lesson : Completely wrong classifications do occur


Wrong classification
Example: WO2007126503
ISR: G01L 19/02
Espacenet: G10L 19/02

Lesson : Typos may occur; flaws of concordance tables

Wrong classifications:
 difficult to investigate because difficult to find
 feedback by users needed
Outdated / invalid classification
Business methods: G06F 17/60  G06Q [2006.01]
in Espacenet: 0 WO docs with a:G06F17/60
in Patentscope: 1506 WO docs with G06F17/60
- e.g. WO2007004271 reclassified in Espacenet only to ECLA

Lesson : Classification data may be different in different databases

in Espacenet: many non-PCT min are not reclassified


- e.g. CZ, UY, NZ, AR
not all PCT min is reclassified
- e.g. only 678 of 14543 KR docs reclassified in ECLA/IPC

Lesson : Reclassification following revision is still incomplete


Outdated / invalid classification
Traditional medicine: A61K 35/78  A61K 36/.. [2006.01]

in Espacenet: 10413 docs still have 35/78 as ECLA


only 7412 thereof have 36/..

Lesson : Reclassification to valid IPC incomplete

Further example WO1998039019


in Espacenet: A61K 36/02 as IPC-AL
A61K 35/80 as ECLA
Patentscope: A61K 35/80 as IPC

Lesson : Classification data may be different in different databases


Varying classifications in family
Example: Aircraft cargo loading logistics system
US 2005246132 A1 (3.11.2005)
US 7100827 B2 (5.9.2006)
DE 102005019194 A1 (24.11.2005)
FR 2871269 A1 (9.12.2005)

Classification data on front page


US A1 US B2 DE A1 FR A1
B64C 1/22 G06F 19/00 G06F 17/60 G06F 19/00
G06K 15/00 G07C 11/00 G06F 17/60

Lesson : Classification of granted patents may be very different

Lesson : Assessment of main classification varies


Varying classifications in family
US A1 US B2 DE A1 FR A1 Espace Espace Depatis PatFT
IPC ECLA
B64C 1/20 X X X
B64C 1/22 X X X
B64D 9/00 X X X
B64D 9/00A X
G06K 15/00 X X
G06Q 10/00
G06Q 10/00D X
G06F 17/60 X X X
G06F 19/00 X X X X X
G07C 11/00 X X X

Lesson : classification data from subsequent publications may not be in MCD

Lesson : some reclassification data may not be in MCD; exist as ECLA only
Varying classifications of single
document
Example: WO2007126503
ECLA: G01L 19/00B (roll up to IPC: G01L 19/00)
IPC: G01L 19/02

Lesson : different views of different classifiers

US7258017 B1 (granted family member)


IPC: G01L 19/04

Lesson : classification of granted patents may be different


by courtesy of H. Wongel

Current problems in classification (I): IPC consistency

• KR20070005367 A (Prio.: KR20050060661)


• Multifocal lens and manufacture method thereof
• IPC (AL):G02B3/10

• JP2007017937 A (Prio.: KR20050060661)
• Multifocal lens and method for manufacturing the same
• IPC (AL):G02F1/13; G02B3/14; G02F1/1334

• US2007008599 A (Prio.: KR20050060661)
• Multifocal lens and method for manufacturing the same
• IPC (AL):G02B5/32

• CN1892258 A (Prio.: KR20050060661)
• Multifocal lens and method for manufacturing the same
• IPC (AL):G02B3/10

• EP1742100 A1 (Prio.: KR20050060661)
• Multifocal lens and method for manufacturing the same
• IPC (AL):G02F1/1334

Lesson : classifiers may have different views of subject matter to be classified or


interpret IPC groups differently
Non-exhaustive classification
Example: Secondary scheme A01P [2006.01]
"Biocidal, pest repellant ,… activity of
chemical compounds"
not in ECLA !

Espacenet:

A01P EP A01N EP
total 43361 1054 99994 23330
(2%) (24% )
2007 2104 114 10328 1040
(5% ) (10% )

Lesson : incompatibility of IPC and ECLA may cause non-exhaustive classification


Non-exhaustive classification
Example: A61K 36/..
ECLA: 22440 documents
IPC: only 17847 thereof have a:A61K 36/..

Lesson : relevant classifications may not be given / available as IPC

Example: EP1881839 Example:C12Q 1/68


ECLA: A61K 36/487 Espacenet: > 100.000 docs
IPC: A61K 36/00 ECLA:> 40 subgroups
IPC:0 subgroups

Lesson : classifications could be more specific


Causes/sources for deficiencies
 "wrong" or varying intellectual classification:
- rules too complicated
- drawbacks of classification scheme (too much
overlap)
- interpretation of subject matter
- differing national practise
- lack of expertise, diligence, time pressure
 granted claims may differ
 incompatibility ECLA - IPC; USPC concordance tables
 lack or delay of reclassification:
- insufficient resources for intellectual reclassification
 data exchange / management problems
 data input (typos)
Options for improvement
 on IPO level:
- allocate resources
- adapt / harmonize classification practise / training
- develop classification assistance tools
 on user level:
- knowing deficiencies > adapt search strategies
 on IPC level:
- improve user-friendliness (e.g. definitions)
- simplify IPC scheme, rules

More liberal approach when classifying ?


One more symbol better than one symbol missing ?
Do we need to be worried about varying classifications ?
Options for improvement
On MCD / database level:
 crosscheck content of databases
 pooling / compiling of classification data (in one searchable field
/ on family level ?) of
- classification data of fam members
- subsequent publications
- other sources (DE: ICP,…)
 processing such compilations of classifications of different
origin, e.g.:
compare classification of subsequent publications (A, B, ..)

> create "trusted" classifications (e.g. class (A) = class (B)) ?


Learn from / go WEB 2.0 ?
 "Folksonomy", "social tagging", "cooperative, collaborative
classification"

> include broader user community ?


e.g. any searcher ?
> implement feedback channels ?
Are you satisfied with classification in A61N 1/00 ? Yes / No

Would you like to suggest further classifications: .......................


.......................
.......................
Submit

Click opens
Learn from / go WEB 2.0 ?
 "Folksonomy", "social tagging", "cooperative, collaborative
classification"
> include broader user community
> compile varying views, ie classifications
 process such data; create "trusted" classifications

 broader participation in scheme development, in particular


definitions ? Tagging of IPC entries ?

Thank you
Top priority: all documents should have at least one valid
classification
Priority 1: documents have all appropriate symbols
Priority 2: documents have no inappropriate symbols

More liberal approach when classifying ?


One more symbol better than one symbol missing ?
Do we need to be worried about varying classifications ?
Include broader user community ?
e.g. any searcher ?
Implement feedback channels ?
Create "trusted" classifications (e.g. class (A) = class (B)) ?

You might also like