Professional Documents
Culture Documents
INTRODUCTION
1.1ARTIFICIAL INTELLIGENCE
Artificial intelligence is the simulation of human intelligence by
machines. Computers can be made to think like humans and imitate their
actions. In Artificial Intelligence, the machine exhibits the characteristics
of humans such as learning, understanding and problem solving.
Artificial intelligence breaks down human intelligence to a form that can
be understood by machines. This is then used to make the machines
perform tasks. These tasks may be simple or very complex tasks that
require hours of work. AI has countless applications from self-driving
cars to performing complex surgeries.
Natural language processing (NLP) is a subfield of computer
science, information engineering, and artificial intelligence concerned
with the interactions between computers and human (natural) languages.
Computers are made to process huge volumes of natural language data
and perform tasks and derive conclusions from it.
1.2SYSTEM OVERVIEW
The objective is to create a legal system that can help people who wish to
obtain quick legal information, right from a lawyer to a common person.
Right now, lawyers have to go through all the laws in our constitution to find
the ones that are relevant for the case. This process can be made much faster
by picking out the laws that seem relevant to the topic at hand and then
displaying it to the user. This can help them come to a decision much
quicker. This sort of system can also help other people who are not well
1
versed in law. They can put in their problems and check whether their case is
worth pursuing or not before deciding to hire a legal consultant.
In addition to this, the system can also help laymen to find who to
approach in case they require legal assistance. In case some of the legal
documents (such as contracts) are not clear, a summarization tool can be
provided so that the key points of the given contract are summarized and
presented to the user.
2
similar to the current case at hand and then verify what laws were used to
argue the case.
This process of looking through the laws wastes time that the
lawyer can use preparing their case. The time of the human experts can be
directed to much more useful tasks than manually looking through large
volumes of text. Thus, this AI system comes into play.
The proposed system will take a problem statement from the user.
After analysing the problem, it will filter out the relevant laws in the
constitution. After doing this, it will display them to the user. From this,
the lawyers can pick out the laws that they think will be most useful for
them. From this, we can see that the system will search through the
constitution for all the laws that can be applied to the problem. The
lawyer finally has to decide which of the laws will be most suitable for
him to use. The final decision is still left in the hands of the human
experts and they can frame their case however they want. The goal of the
system is to reduce the effort and speed up the time taken to frame the
case.
The input to the system will be in the form of natural language (i.e,
the issue as stated by the lawyer’s client). It need not be polished too
much or have legal terms added to it in order to be fed into the system.
3
courts. In order to accurately determine this, it is usually necessary to
have some sort of legal knowledge or an understanding of laws present in
the constitution.
Now, since anyone is able to use this system, they can get all
relevant laws just by stating their case to the system. Since the system
pulls up all the relevant laws, the user can read them and get an idea
whether the judgement will be in his favour or not. After that he can opt
to approach the human experts or drop the case if he is in the wrong.
4
government documents, get information from the right authority and
other similar issues.
5
CHAPTER 2
LITERATURE SURVEY
6
and similarities among the sentences. Sentence importance measures are also
assigned to highlight which are essential for the summary.The full steps
required to summarize the document start with using the documents and query
for document pre-processing. After that a sentence splitter is used and
importance is assigned to the sentences. This is passed through a stack decoder
algorithm along with the constraints and a summary is generated.
This work also describes a methodology that uses AI and neural networks
to summarize the passage. In this, the raw data is collected and sent for pre-
processing. After doing this, it is sent for model creation and evaluation. 50,387
essays between 1970 and 2017 are used as a raw dataset. The title of the data is
mined using sentiment analysis and opinion mining. Essay titles and essay
abstracts were extracted, flited some special characters, convert encode and
converted into the format of “title-abstract” pair. Finally, the candidate title with
the highest score is selected.The LibSVM is used to predict whether the token is
part of the candidate title or not. Parameters like Dropout, Loss function and
Optimizer are used. Loss function gives the relevancy between the chosen
candidate title and the content being evaluated. Optimizer used different
algorithms to raise accuracy.As per the system architecture, ROUGE evaluation
method is used to put the pre-processed data into the three models. 80% of data
is used for training and 20% is used for testing and the accuracy is found to be
82.47%.
7
Authors: Annervaz K M, Jovin George, Shubhashis Sengupta
The first step is information setup and creating the training data set. This
takes advantage of the fact that most legal documents have sections and sub-
sections. A web interface is provided to the SME (subject matter experts) and
they are trained to format this data in a hierarchical manner. The next step is
annotation, where various training samples about the snippets in the client setup
are collected. Previously processed data can be collected here.
After this, the machine learning models are trained to understand this
data. Semi-supervised learning may take place. Some of the models that may be
used here include Support Vector machines, Naïve Bayes, etc. For various
levels of granularity, both models may be used to provide the prediction. Next a
data profile and a rule inducer are used.
When all these steps have been performed, lease abstraction can be
performed. A feedback mechanism is also created so that the system learns from
8
mistakes. The final decision is taken by the user and he can evaluate if the result
obtained is right or wrong. The recorded data is used for further training cycles.
The various difficulties that are involved in creating an expert system for
Legal systems are outlined below. The steps involved are analysis and advice,
intake and assessment, intelligence workflow and document automation.
Generally, facts to populate the data are taken from databases, websites or the
human experts. The legal system tries to emulate the human experts in making
decisions for the legal processes.
The nature of the legal system itself poses a challenge to the system. The
legal system is very organic i.e, the judgement may vary based on the nature of
the case and strict ‘if-then’ rules alone can’t be used to create it. The complexity
of the legal issues makes it necessary for the system to have a huge set of data
before it can make a decision. New cases and judgements are done everyday
and these need to be updated.
Like any other system, the legal system may also have an error margin.
However, in this case it can have dire consequences to the lives of the users.
Therefore, a highly reliable system must be created. For this type of systems,
acquiring the expert’s knowledge and updating it is crucial. The laws may be
modified and new acts may be implemented. So, the system must not run on
outdated information.
The Hague Navigation Tool is used as a case-study in this work. The tool
is used for cases regarding family law. The advantages and disadvantages of
these tool are explored.
9
[6] Fuzzy Bag-of-Words Model for Document Representation
One of the most popular models for document representation is the bag of
words model. This assigns a vector to the document and notes the normalized
occurrences of the basis terms and also the number of such terms. It should be
noted that the basis terms are the high frequency words in a corpus, and the
number of basis terms or the dimensionality of BoW vectors is less than the size
of vocabulary. BoW maps the document into a fixed length vector. It is simple,
but effective.
Fuzzy BoW models are proposed to learn more dense and robust
document representations encoding more semantics. The hard mapping in the
previous method is replaced by a fuzzy mapping. Fuzzy BoW introduces
vagueness in the mapping between the words and the basis terms.
This model works based on word embeddings. The core idea behind word
embeddings is to assign such a dense and low-dimensional vector representation
to each word that semantically similar words are close to each other in the
vector space. The merit of word embeddings is that the semantic similarity
between two words can be conveniently evaluated based on the cosine
similarity measure between corresponding vector representations of the two
words.
10
Named Entity recognition (NER) is one of the most important aspects of
IE. NER finds the parts of the text that correspond to the proper names and then
classifies it to its appropriate category. IE techniques include extracting proper
nouns, commonly known as NER, relation detection and classification, temporal
and event processing, and template filling. There are two approaches to this.
They are rule-based approach and statistical approach.
This approach uses look up lists and leverages the structure of the
language in order to classify the nouns. Four main approaches are deal with
here. They are: Noun Extraction using Lookup List, Noun Extraction using
Morphological Rules (Noun Affixes), Noun Extraction using Morphological
Rules (Verb, Adjective and Noun Affixes), Noun Extraction using
Morphological Rules (Rayner’s Rule).
Python has less lines of code and provides high readability. Python lets
the developers choose whether to follow OOPS approach or use scripting. It can
be used to link different data structures (DS) and can be used as a backend
language. Its majority of code is checked in the IDE. Python gives developers
the flexibility to provide an API from the current programming language.
Python has the ability to balance high-level programming with low-level.
11
Python lets developers use the correct data structure for the correct
program. NumPy, SciPy and pandas are all very useful. These open source
libraries of Python cover almost all our needs while building an AI project.
12
For creating the summary, first we tokenize the text and tag it with the
part of speech. After that pronoun resolution occurs. Then the lexical chains are
formed and the sentences are scored.
Some of the methods suggested by this work are: extraction based on our
Article Category, using Sentence Scoring, using strong Lexical Chains, using
Proper Noun Scoring.
13
CHAPTER 3
SYSTEM ANALYSIS
Some applications use NLP to process the statements from the users and
present a judgement or a decision based on it. These applications will work for
some field or section of the legal system (such as family law, intellectual
property, etc). Summarization programs are present for news articles and even
multi data document summarization is available. It is possible to obtain a law by
mentioning the section or referencing its content. Some expert systems use the
questionnaire format to obtain the data from the users. This data is then
processed and the results are given.
14
6. The solutions and guidance provided by the system should be tailor-made
towards the client.
The system provides a generic solution using which the relevant laws are
suggested to the legal experts. The legal experts can then take the final decision
on how they are going to present their case based on the laws suggested. The
layman should also be able to use this system in order to understand his problem
better and he should be able to make a decision whether to pursue his case or
not. The system should also be capable of suggesting the legal experts who the
layman can approach in order to file his case. It should be able to filter out the
right experts based on the case. Summarization features are also provided to
better understand contracts and legal documents.
15
3.3 REQUIREMENTS SPECIFICATION
RAM: 8 GB
Hard disk: 10 GB
Processor: Intel i3 and above
Python
Windows Operating system (7 and above)
3.4.1 Python
16
popular Python libraries are NumPy, pandas, NLTK, Tensorflow, Scikit-Learn
and Matplotlib.
Pandas are used to perform real world data analysis. Developers use it to
load, manipulate and model data. Boolean indexing, checking for NaN’s in a
dataset, selecting and dropping a column from a dataset are some of the
operations that can be performed using this library. This library eliminates the
need for loops. Tensor flow can train and run deep neural sets.
17
CHAPTER 4
SYSTEM DESIGN
The above diagram depicts the flow of the system to be created. Here,
three major paths are represented. The initial step is collecting the problem
statement from the user. This may be the client or the lawyer. The input is in the
form of a document containing natural language. Mining is done by taking out
the most essential keywords from the details provided by splitting the words.
After the segregation of the required keywords, mining is performed on the
database so that the related laws and articles can be fetched. This process is
found to be much faster than manually performing the task. Past cases handled
and stored in the knowledge base is also mined based on those keywords and
the result needed.
18
The associated laws and these cases provide a greater insight on the
requirements of the user and the goal to be achieved in the end. Based on this, it
is possible to recommend suitable consultants for the laymen. The consultants
are chosen in the field that the problem statement is about. These results can be
filtered further by filtering out the results. These filters can be decided by
several factors like price, location, etc.
19
Fig 4.2.1 USE CASE DUAGRAM
There are three major actors in this scenario. They are client (layman),
lawyer and the AI system. The lawyers and the clients can access the first
module to find the relevant laws that they can use. An addition feature by which
related cases can be viewed is also added. The laymen can access the second
module to get recommendations for legal consultants. These suggestions can
also be filtered to provide more appropriate ones. The document analyser
module gets a document as an input and provides the key points of the
document as an output.
20
Fig 4.3.1 SEQUENCE DIAGRAM
The sequence diagram depicts the messages passing between four main entities
namely the Lawyer, the Client, the System and the Database. The lawyer sends
the details gathered from the client to the system. The system fetches the
relevant laws from the database and displays it to the lawyer. The client can also
enter his problem into the system and read the relevant laws. If he decides that
his case is worth pursuing, he sends a request to the system to show him the
legal consultants that specialize in the field related to his problem. The system
fetches this information from the database. The search results can be fine-tuned
by addition details provided by the client. If a layman wishes to know the
meaning of the contents of a legal document or contract, he sends it to the
system. The system summarizes the document and displays it back to the client.
21
CHAPTER 5
MODULE DESCRIPTION
5.1 Modules
This module can take a file containing the problem of the client expressed
in natural language as the input. It produces a list of laws which are relevant to
the problem in question.
This module can be used by both the legal consultants and the common
man. These laws can be used to prepare a defence for the case. The system may
also provide relevant articles so that the legal experts are able to strengthen their
case. This reduces the time it takes to look through the constitution in order to
find the right laws. Also, some obscure laws which are not often used will also
be suggested if they are applicable. This provides the guarantee that the expert
will have all the necessary information to present the case.
The system should be able to connect the clients with the right legal
advisor. The system takes in the client’s problem statement as an input and
provides a list of consultants that will be suitable for them as an output.
22
The client may not know how to contact a lawyer or approach the system
for justice. In such cases, the system will guide them. After analysing the
problem, if the client wishes to file a case, the system can recommend lawyers
that excel in the field that involves that case. This can be deduced by the system
based on the laws suggested to the client. The system can also change its
recommendations based on filters such as location, price range, etc to suit the
needs of the client.
23