You are on page 1of 9

01

AUTOMATIC CODE
COMMENTS
GENERATION BY
AARUSHI DUA 2K20/CO/005
ABHEEK KAUSHAL 2K20/CO/017
ANUBHAV GUPTA 2K20/CO/085
02

TODAY' 1
CODE CLONE DETECTION

PRESENT 2
3
AUTOMATIC COMMENT GENERATION ALGORITHMS

COMMENT EXTRACTION

ATION 4
5
RNN BASED ALGORITHMS

COMMON NLP TOOL

INCLUDES 6 GLOBAL ENCODER


CHALLENGES There are several challenges in code
comment generation.

The npl is weekly structured while the In natural language vocabulary is limited to
programming languages are formal the most common words and the words
languages and source code written in outside are called unknown words. While
them are unambiguous and structured. in coding vocabulary consists of keywords,
This is the main challenge for the rnn operators and identifiers so that data set
model. which stores the unique tokens after
replacing numerals and string becomes
very huge.
03

CODE CLONE
DETECTION INTRODUCTION

Code clone detection based comment generation algorithms are concerned with utilizing code
clone detection technique to find similar code in a database, and the corresponding comments
of the matched code or discussion text are viewed as comments for the target code.

ALGORITHM TOKENISATION CONFIGURATION


The tool recognizes code clones with three
Token-based code clone detection tools can Tokenization of the source code is based on or more statements, which is a common
achieve a time complexity of O(n ∗ m) with the serialization of AST nodes. Utilized threshold similar to other previous work.
token-based matching, where n is the number Eclipse AST-Parser to obtain the AST of the The tool excludes two types of statements
of lines Java source code. ASTParser cannot fully including return statement and switch
of code in the input project; and m is the
resolve variable type bindings of code statement. Return statement is less
number of lines of code in the database. The
elements unless we manually compile each meaningful, and switch statement can have
token-based matching algorithms
project in the database, which is infeasible many overlapped segments that introduce
are usually more efficient than tree-based
because there are 1,005 Java projects in false positives
matching algorithms.
GitHub.
04

JavaDoc Comment

COMMENT JavaDoc comment is delimited by


/**...*/. It only appears in front of a

EXTRACTIONSingle-line
public class, or a public/protected
method/variable.

Comment
Single-line comment starts with a
double slash (//). In the previous Block Comment
work, we were able to extract full
Block Comment: Block
sentences easily from Stack
comment is delimited by /*...*/.
Overflow posts, which is not the
It is similar to the JavaDoc
case in this work with single-line
comment except it can appear
comments.
anywhere in the source code.
05

RNN BASED
BRIEF
INTRODUCTION
ALGORITHMS
Encoder-Decoder framework In deep neural network based comment generation
systems encoder-decoder structure, also known as sequence to sequence model, is
generally exploited. In the structure of encoder-decoder, the encoder
plays the role of encoding source code into a fixed-sized
vector; and the decoder is responsible for decoding source
code vector and predicting comments for source code. The
difference among various encoder-decoder structures lies in the form of inputs and the
type of neural network
06

COMMON NLP TOOL


Part-of-Speech (POS) Tagging is used for
assigning parts of speech, such as noun, verb,
adjective, etc. to each word observed in the
target text. There are several tools available for
this, one commonly used tool is the Stanford
Log-linear Part-Of-Speech Tagger14 .
Language Parsers are used to work out the
grammatical structure of the sentence. For
example, find which words in the sentence are
subject and object to a noun or identify phrases
in sentences.
07

GLOBAL
GRAPH CONSTRUCTION - We aim to exploit the information
available at the class level. To do so, we connect the target

ENCODER
function to all other functions in the same class to form a class-
level contextual graph.

VERTEX INITIALIZATION - To encode each vertex in the graph


to a hidden vector, we apply the local encoder to each individual
function and concatenate the last hidden states in both
directions as the initial vertex representation.

Graph Attention Network - We feed the initial state of each


vertex into a GAT, which applies different weights to different
nodes when exchanging information with neighbors. In the 1th
layer of the GAT network, each vertex sends its own current
hidden state to all neighbors .
THANKYOU

You might also like