0% found this document useful (0 votes)

40 views8 pages

Markdown-Savvy Transformer Fine-Tuning

The document describes an approach to fine-tuning transformer models like T5 and BART to preserve markdown formatting when summarizing and paraphrasing documents. The key steps are: 1) Scraping sentences and articles containing markdown from websites to create a dataset for fine-tuning. 2) Encoding the markdown tags as special tokens to reduce the amount of training data needed and prevent the models from getting confused by markdown syntax. 3) Fine-tuning the models on the encoded dataset to learn to properly handle markdown. The models are evaluated based on how well they preserve the original meaning and markdown formatting.

Uploaded by

Josan Dela Torre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views8 pages

Markdown-Savvy Transformer Fine-Tuning

Uploaded by

Josan Dela Torre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Transformers For Markdown Article Rewriting

Stanford CS224N Custom Project

Scott Hickmann
Department of Computer Science
Stanford University
hickmann@stanford.edu

Abstract

The goal of this project is to be able to summarize and paraphrase full article,
which may contain inline links, text in bold, images, etc. Inline markdown can be
challenging to pass to transformers. In this paper, I describe how I am to rewrite
articles containing rich text including inline markdown properly. More specifically,
this paper delves into fine-tuning transformer models pretrained on summarization
or paraphrasing by forcing them to learn markdown syntax, and encoding rich
text tags to reduce the amount of training data necessary. I then introduce a novel
method based on ROUGE and cosine similarity of words inside markdown tags to
evaluate the quality of an NLP model at producing proper rich text.

1 Key Information to include

• Mentor: Manan Rai

• External collaborators: No
• Sharing project: No

2 Introduction

As of 2022, the state of the art approach when it comes to summarizing and paraphrasing text is to
use transformer-based encoder-decoder models. Indeed, T5 and BART models have proven very
performant at these tasks when pretrained on corpora of summarized and paraphrased text.
Unfortunately, these great pretrained models were trained on text that was properly cleaned, devoid of
any form of style and formatting. Hence trying to apply these models to paraphrase articles containing
rich text might bring up such issues:

• “A flock of ==frogs== were ==roaming== around the park

in search of water once more.” becomes “A flock of
==frogs===roaming===roaming===roaming====roaming====roaming====roaming...”
• “Once, a group of **frogs** were **roaming** around the forest in search of water.”
becomes “A flock of **frogs** were **roaming** around the park in search of water once
more.” whereas the same sentence without the markdown “**” syntax would become “A
herd of frogs were wandering around the woods in search of water”.

Yet summarizing and rewriting articles is key today in a world where everyone is bombarded with
information. Markdown is helpful to make an article more digest, and therefore it is important to
preserve it while paraphrasing and summarizing text. Hence the motivation behind this paper.

Stanford CS224N Natural Language Processing with Deep Learning

3 Related Work

This problem has already been encountered in the past, but no good solution has yet been deter-
mined. People like Edward Ross have experimented with several approaches using HTML tags to
parse markdown, but these approaches have remained inconclusive [1]. However, in recent years,
transformer models have proven very successful not only at doing various tasks, but also at learning
quickly through fine-tuning thanks to multi-head attention [2]. Even with a limited amount of data,
this paper will demonstrate how it is possible to summarize and paraphrase text containing inline
markdown correctly thanks to these transformers.

4 Approach

The pretrained T5 transformer model used for paraphrasing can be found here [3].
The pretrained BART transformer model used for summarization can be found here [4].

4.1 Scraping Data

4.1.1 Paraphrasing
Using an simple web parser, I am collecting markdown rich sentences from the web. The scraping
algorithm first goes through several Medium articles (converted from HTML to markdown) and
downloads those. Then for each article, it selects a few sentences that are rich in markdown (contain
at least one piece of markdown syntax in them). All these sentences are then stored in a .csv file [5].

4.1.2 Summarization
Using an simple web parser, I am collecting articles from the web. The scraping algorithm first goes
through several Medium articles (converted from HTML to markdown) and downloads those. Then
for each article, it groups sentences together into several parts. All these parts are then stored in a
.csv file [5].

4.2 Preparing Data

4.2.1 Paraphrasing
For each sentence, the data preparation algorithm stores a copy of the sentence without any markdown.
It then encodes these sentences and runs them through the T5 transformer model pretrained on
paraphrasing diverse text but not yet fine-tuned. Then, a human (in this case me) is shown two
sentence: the original sentence with markdown, and the paraphrased sentence without markdown.
My task is simply to add the markdown from the original sentence back on the paraphrased sentence
without markdown. For example, if the algorithm gives me:

• Input with markdown (fine-tune source): “Once, a group of **frogs** were **roaming**
around the forest in search of water.”
• Output without markdown: “A herd of frogs were wandering around the woods in search of
water”
• Human (fine-tune target): “A herd of **frogs** were **wandering** around the woods in
search of water”

4.2.2 Summarization
Similarly to paraphrasing, for each part, the data preparation algorithm stores a copy of the part
without any markdown. It then encodes these parts and runs them through the BART transformer
model pretrained on summarizing diverse text but not yet fine-tuned. Then, a human (in this case me)
is shown two parts: the original part with markdown, and the summarized part without markdown.
My task is simply to add the markdown from the original sentence back on the paraphrased part
without markdown.

2
4.3 Fine-Tuning

Now that we have collected paraphrased and summarized markdown data in an efficient way by using
a hybrid version between automated and manual paraphrase, we can now move on to fine-tuning the
transformer models. This is as simple as simply feeding in the input sentences and parts and back
propagating the model’s loss functions calculated by comparing the predicted and stored paraphrases
and summaries.

4.4 Improvements

While the above approach works fine, there are many different markdown tags, and some work
better than others with T5 and BART. In addition, we need enough fine-tuning data for every single
markdown syntax (“*”, “**”, “_”, links, etc.) for the models to perform well. Using HTML instead
of markdown barely helps, even when defining the tags as special tokens. There is a trick to only
train the model on a single tag. First, we can convert each markdown symbol into a special token
with a number attached, such as “(MD1)”, “(MD2)”, etc. This way, the models only need to learn to
preserve this single token in the correct markdown rules, instead of all the various markdown tags. I
wrote an encoder and decoder to convert markdown to these special tokens and back. Throughout
this paper, we will refer to text encoded in this manner as “MD-encoded” and the tags as (MDn) tags.
There is one edge case with this method: links. Link labels are kept in the text and surrounded by
(MDn) tags just like everything else, but the URL itself is removed and can be added back after the
summarization/paraphrasing by looking up the proper (MDn) tag in the output. This is done in order
to prevent the model from being confused about “?”, “.”, and other special character in URLs that can
for example trick the model into believing a sentence is over when it isn’t.

5 Experiments
5.1 Data

The parts of articles, and sentences used to fine-tune the models were scraped from Medium articles.
The dataset I collected can be found here [5].

5.2 Evaluation method

Using raw markdown articles that have been reserved for evaluation, we divide it into m subparts that
are each get summarized or paraphrased individually.
We remove all markdown and feed that through the original model without any fine-tuning (text 1),
and feed the markdown encoded version as discussed in 4.4 through the final model with fine-tuning
then remove the markdown (text 2). We then compute ROUGE 2 and ROUGE L between these
two texts. Let’s call these scores R and L respectively. These scores evaluate how well the original
summaries and paraphrases are preserved after fine-tuning, but not how well the markdown syntax is
[6].
Therefore, let’s compute a score to evaluate the success of the markdown. As there is no standard
metric for that, I developed and used the following custom metric. Let’s define the functions f ,
which computes an enclosure check (makes sure that the output is surrounded by (MD0) tags which
should be present based on how the MD encoder and decoder work), and g, which computes sentence
similarity between two phrases by taking the average of the word embeddings of all words in the two
phrases, and calculating the cosine similarity between the resulting embeddings [7]:

1 if v = “(MD0){entire content}(MD0)”
f (v) = (1)
0 otherwise

avg(v) · avg(w)
g(v, w) = (2)
∥avg(v)∥∥avg(w)∥
Using these two functions, let’s now define M , the markdown similarity score between the original
text and the paraphrased/summarized text. Let’s create v, a 3D array containing the markdown tag

3
contents for each (MDn) tag pair for each defined n for each paraphrased/summarized MD-encoded
text. Let’s also create w, a 3D array containing the markdown tag contents for each (MDn) tag pair
for each defined n for each original MD-encoded text (before paraphrasing/summarizing). M can be
calculated in the following manner:

   
m len(vi ) len(vi,j )
1 X 1  X 1 X
M= f (vi,1,1 ) +  max{g(vi,j,k , wi,j,l ) : l = 1..len(wi,j )} (3)
m i=1 len(vi ) j=2
len(vi,j )
k=1

Finally, we can combine the resulting scores to get an overall score S in the following manner:

R + L + 2M
S= (4)
4
This is the overall score that will be reported in the results of the experiments.

5.3 Experimental details

Here are the hyperparameters used for each task that experimentally created the best results.

5.3.1 Paraphrasing
Learning Rate 1 × 10−4
Train & Val Batch Size 2
Train Epochs 5
Max Length 128
Num Beams 15
Diversity Penalty 0.99

5.3.2 Summarization
Learning Rate 1 × 10−4
Train & Val Batch Size 2
Train Epochs 3
Min Length 0
Max Length 512
Summary Max Length 142
Num Beams 4
Length Penalty 0.1

5.4 Results

The baseline charts have been obtained by using the non-fine-tuned model, while the best charts have
been obtained using the fine-tuned model. You can find the corresponding tables in Appendix and
articles 1 through 10 in Appendix B.

Figure 1: Summary Baseline

4
Figure 2: Summary Best

Figure 3: Paraphrase Baseline

Figure 4: Paraphrase Best

The results are for the most part what I expected. By fine-tuning the model, we are able to significantly
improve the MD similarity score. Markdown syntax is generally well preserved. However, for
summarization specifically, the ROUGE scores decrease, which indicates that the model has a hard
time preserving the same summaries that it had when not using markdown. From human evaluation,
the summaries are still perfectly understandable so we do not consider that a big problem, apart from
the fact that they differ slightly from the original summaries. This does not seem to be as much of an
issue for paraphrasing, where ROUGE scores are high.
As a reminder, please note that these ROUGE score do not compare the model’s output text to human
generated text, but rather to machine generated text while omitting markdown. Also note that to
collect the previous results, (MDn) tags where added as additional special tokens for the BART
summarizing model, but not to the T5 paraphrasing model. The reason for why adding them as
special tokens improves BART’s performance but not T5’s performance remains to be explained.

5
5.5 End-to-end

With the paraphraser and summarizer models fine-tuned, we can combine both of these to achieve a
more developed form of article rewriting. Here are the steps used to fully rewrite an article.

• Divide the article into subparts, categorizing each of these parts into paragraph vs. other
(image, code, figures, etc.).
• MD-encode the markdown in each paragraph subparts to (MDn) tags.
• Pass these MD-encoded parts into the BART fine-tuned summarizer model.
• Pass these results from BART into the T5 fine-tuned paraphrasing model.
• MD-decode the markdown in the resulting text from (MDn) tags to actual markdown.
• Combine all subparts back into a single document.

Examples of this method are available in Appendix B.

6 Analysis

The model fails most often when there is a lot of nested markdown, which is particularly present at
the footer of articles when referencing the author(s). We might want to add more fine-tuning data
regarding footer text to address this issue.
Overall, the results found are very promising considering the scarcity of data (manually collected)
to achieve proper article rewriting. Collecting more data would definitely improve the performance
especially when it comes to the ROUGE score of summarization, which is an inherently harder task
than paraphrasing text.

7 Conclusion

The methods developed in this paper can be used to simplify or generate new and unique article
rewrites. Collecting more data could help mitigate the slight decrease in ROUGE performance to
achieve top-grade article rewriting.
As an avenue for future work, fine-tuning BART for too many epochs results in the model producing
only (MD0), possibly by interpreting that token as an end token rather than a markdown token. It
could be worth investigating why that is, as it could be a way to improve BART’s performance at
preserving markdown syntax, which is worse that T5’s paraphrasing performance.

References
[1] Edward Ross. Using HTML in NLP. Jul 2020.

[2] Ashishm Vaswani, Noam Shazeer, Niki Parmar, Jacob Uszkoreit, Llion Jones, Aidan N. Gomez,
and Łukasz Kaiser. Attention Is All You Need. 2017.

[3] Ramsri Goutham. High-quality sentence paraphraser using Transformers in NLP. Sep 2021.

[4] Sam Shleifer. Distilbart CNN 6-6. Sep 2021.

[5] Scott Hickmann. Metrics and Datasets. Feb 2022.

[6] Chin-Yew Lin. ROUGE: A Package for Automatic Evaluation of Summaries. 2004.

[7] Yves Peirsman. Comparing Sentence Similarity Methods. May 2018.

[8] Scott Hickmann. Results. Mar 2022.

6
A Appendix

Here are the results in table rather than graph form. The articles can be found in Appendix B.

Article Compression Rouge 2 Rouge L MD Overall

Number Similarity
1 33.27% 70.65% 72.64% 4.17% 37.91%
2 17.55% 67.58% 70.05% 0.00% 34.41%
3 41.54% 74.50% 76.55% 7.69% 41.61%
4 21.41% 59.79% 64.29% 4.00% 33.02%
5 29.85% 74.56% 75.88% 0.00% 37.61%
6 27.90% 65.91% 68.70% 0.00% 33.65%
7 28.11% 74.02% 76.20% 0.00% 37.55%
8 25.32% 80.00% 83.12% 0.00% 40.78%
9 24.33% 75.99% 78.51% 0.00% 38.62%
10 31.17% 55.31% 59.61% 0.00% 28.73%
Average 28.04% 69.83% 72.56% 1.59% 36.39%
Table 1: Summary Baseline

Article Compression Rouge 2 Rouge L MD Overall

Number Similarity
1 45.22% 57.21% 60.63% 95.61% 77.27%
2 20.37% 35.91% 40.28% 94.74% 66.42%
3 35.30% 58.91% 62.01% 76.19% 68.32%
4 40.82% 38.74% 41.93% 82.42% 61.38%
5 31.83% 48.27% 53.62% 100.00% 75.47%
6 32.57% 34.69% 39.69% 91.23% 64.21%
7 27.05% 53.13% 57.37% 70.00% 62.63%
8 24.84% 40.66% 46.95% 100.00% 71.90%
9 24.43% 36.91% 42.86% 94.12% 67.00%
10 24.55% 29.88% 36.41% 92.86% 63.00%
Average 30.70% 43.43% 48.18% 89.72% 67.76%
Table 2: Summary Best

Article Compression Rouge 2 Rouge L MD Overall

Number Similarity
1 99.44% 72.37% 78.06% 69.52% 72.37%
2 95.16% 73.90% 76.39% 60.31% 67.73%
3 102.02% 70.73% 76.58% 68.35% 71.00%
4 103.71% 60.14% 68.42% 65.67% 64.97%
5 110.27% 73.54% 79.71% 84.85% 80.74%
6 97.26% 77.84% 82.48% 77.34% 78.75%
7 88.12% 67.61% 70.50% 44.73% 56.89%
8 99.98% 76.01% 80.18% 82.79% 80.44%
9 103.91% 56.65% 65.62% 81.32% 71.23%
10 101.38% 76.58% 80.79% 82.35% 80.52%
Average 100.12% 70.53% 75.87% 71.72% 72.46%
Table 3: Paraphrase Baseline

7
Article Compression Rouge 2 Rouge L MD Overall
Number Similarity
1 99.64% 70.06% 79.23% 94.64% 84.64%
2 97.24% 75.03% 79.46% 91.97% 84.61%
3 101.75% 73.05% 80.87% 87.77% 82.36%
4 102.04% 65.73% 78.92% 92.52% 82.42%
5 105.22% 73.20% 80.75% 96.97% 86.97%
6 98.12% 78.98% 84.14% 97.66% 89.61%
7 97.19% 73.03% 78.17% 70.02% 72.81%
8 98.73% 77.15% 81.39% 99.73% 89.50%
9 99.71% 60.12% 77.63% 98.90% 83.89%
10 98.40% 70.34% 81.70% 98.47% 87.25%
Average 99.80% 71.67% 80.22% 92.87% 84.41%
Table 4: Paraphrase Best

B Appendix
The 10 original articles used to evaluate the model can be found here [5]. Their summarized only,
paraphrased only, and summarized-then-paraphrased/end-to-end versions can be found here [8] under
summarizer, paraphraser, and rewriter respectively.

Project Report
No ratings yet
Project Report
25 pages
NLP Mini Project
No ratings yet
NLP Mini Project
19 pages
Summarization and Translation Project Report
No ratings yet
Summarization and Translation Project Report
16 pages
Gen Ai 6,7
No ratings yet
Gen Ai 6,7
6 pages
Text Paraphrasing With Large Language Models-3
No ratings yet
Text Paraphrasing With Large Language Models-3
6 pages
Paraphrasing Presentation
No ratings yet
Paraphrasing Presentation
11 pages
Final Research Paper PDF
No ratings yet
Final Research Paper PDF
2 pages
Bridging The Gap - Hugging Face Transformers and GitHub For Cross-Framework NLP - by Frank Morales Aguilera - Nov, 2024 - Medium
No ratings yet
Bridging The Gap - Hugging Face Transformers and GitHub For Cross-Framework NLP - by Frank Morales Aguilera - Nov, 2024 - Medium
13 pages
YouTube Summarizer Using NLP Techniques
No ratings yet
YouTube Summarizer Using NLP Techniques
8 pages
Seminar Presentation Abstract
No ratings yet
Seminar Presentation Abstract
1 page
Natural Language Processing Lab 9
No ratings yet
Natural Language Processing Lab 9
13 pages
Reader-LM: Efficient HTML To Markdown Conversion With AI
No ratings yet
Reader-LM: Efficient HTML To Markdown Conversion With AI
8 pages
Transformer Architecture Explained
100% (2)
Transformer Architecture Explained
8 pages
cl12 Huggingface
No ratings yet
cl12 Huggingface
34 pages
Transformers Model
No ratings yet
Transformers Model
11 pages
Transformers NLP Presentation
No ratings yet
Transformers NLP Presentation
7 pages
Introduction to Huggingface Transformers
No ratings yet
Introduction to Huggingface Transformers
18 pages
HTLM: Hyper-Text Pre-Training and Prompting of Language Models
No ratings yet
HTLM: Hyper-Text Pre-Training and Prompting of Language Models
14 pages
Natural Language Processing With Python
No ratings yet
Natural Language Processing With Python
7 pages
Text Summarization Using Transformer Model
No ratings yet
Text Summarization Using Transformer Model
5 pages
Understanding BERT and Transformer Encoders
No ratings yet
Understanding BERT and Transformer Encoders
66 pages
Mini Project Final Review Batch 8B
No ratings yet
Mini Project Final Review Batch 8B
16 pages
Phase 2 Ibm
No ratings yet
Phase 2 Ibm
5 pages
Architecture
No ratings yet
Architecture
13 pages
Towards Abstractive Captioning of Infographics
No ratings yet
Towards Abstractive Captioning of Infographics
94 pages
Machine Translation Using Natural Language Process
No ratings yet
Machine Translation Using Natural Language Process
6 pages
Optimizing PEGASUS for Dialogue Summarization
No ratings yet
Optimizing PEGASUS for Dialogue Summarization
13 pages
CNN Text Classification
No ratings yet
CNN Text Classification
12 pages
Controllable Abstractive Summarization Model
No ratings yet
Controllable Abstractive Summarization Model
10 pages
Transformer
No ratings yet
Transformer
39 pages
Paper 3
No ratings yet
Paper 3
6 pages
Final Research Paper
No ratings yet
Final Research Paper
2 pages
Markdown to HTML Parser Assignment
No ratings yet
Markdown to HTML Parser Assignment
20 pages
Sandeep Interview
No ratings yet
Sandeep Interview
27 pages
Text Summarization and Chatbot Development
No ratings yet
Text Summarization and Chatbot Development
8 pages
AI Models for Text Generation Tasks
No ratings yet
AI Models for Text Generation Tasks
10 pages
Fine-Tune Marian-MT Translation Model
No ratings yet
Fine-Tune Marian-MT Translation Model
9 pages
System Design for Text Summarization Tool
No ratings yet
System Design for Text Summarization Tool
10 pages
Breaking Barriers Transformer-Based Summarization and Translation of English Legal Documents To Malaya
No ratings yet
Breaking Barriers Transformer-Based Summarization and Translation of English Legal Documents To Malaya
4 pages
Deep Learning: Transformers & Pretraining
No ratings yet
Deep Learning: Transformers & Pretraining
32 pages
Paper 1
No ratings yet
Paper 1
23 pages
Advanced NLP Techniques with Transformers
No ratings yet
Advanced NLP Techniques with Transformers
34 pages
Multi-Document Summarization with LLMs
No ratings yet
Multi-Document Summarization with LLMs
13 pages
IOT Based Mini Project
No ratings yet
IOT Based Mini Project
28 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Tokenization, Stemming, Lemmatization Guide
No ratings yet
NLP Tokenization, Stemming, Lemmatization Guide
29 pages
Transformers
No ratings yet
Transformers
27 pages
Explain To Me Like I Am Five - Sentence Simplification Using Transformers
No ratings yet
Explain To Me Like I Am Five - Sentence Simplification Using Transformers
4 pages
Understanding T5 and BERT for NLP
No ratings yet
Understanding T5 and BERT for NLP
40 pages
Transformers and BERT in NLP
No ratings yet
Transformers and BERT in NLP
20 pages
Rephrasing The Web: A Recipe For Compute and Data-Efficient Language Modeling
No ratings yet
Rephrasing The Web: A Recipe For Compute and Data-Efficient Language Modeling
33 pages
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
Transformers in Natural Language Processing
No ratings yet
Transformers in Natural Language Processing
2 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
NLP Problem-Solving Techniques
No ratings yet
NLP Problem-Solving Techniques
118 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
Scaling Down To Scale Up: A Guide To Parameter-Efficient Fine-Tuning
No ratings yet
Scaling Down To Scale Up: A Guide To Parameter-Efficient Fine-Tuning
21 pages
Data Scientist with AI & Big Data Expertise
No ratings yet
Data Scientist with AI & Big Data Expertise
4 pages
Analyzing ViTs in Class Embedding Space
No ratings yet
Analyzing ViTs in Class Embedding Space
12 pages
Transformer Architecture Explained
No ratings yet
Transformer Architecture Explained
3 pages
Symmetry 14 02657
No ratings yet
Symmetry 14 02657
14 pages
Vision Transformer Explained
No ratings yet
Vision Transformer Explained
18 pages
IEEE PAMI: Towards Open Vocabulary Learning A Survey
No ratings yet
IEEE PAMI: Towards Open Vocabulary Learning A Survey
20 pages
Nerd Talk GPT: AI for Pop Culture Fans
No ratings yet
Nerd Talk GPT: AI for Pop Culture Fans
31 pages
Fintech Loan Platform Report
No ratings yet
Fintech Loan Platform Report
37 pages
ChatGPT Cybersecurity
No ratings yet
ChatGPT Cybersecurity
24 pages
Generative AI Seminar Report
No ratings yet
Generative AI Seminar Report
70 pages
Synthetic Media Threats for Law Enforcement
No ratings yet
Synthetic Media Threats for Law Enforcement
28 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
Linguistic Insights in Cyberthreat Analysis
No ratings yet
Linguistic Insights in Cyberthreat Analysis
7 pages
Layoutlm: Pre-Training of Text and Layout For Document Image Understanding
No ratings yet
Layoutlm: Pre-Training of Text and Layout For Document Image Understanding
9 pages
1 s2.0 S1319157824001691 Main
No ratings yet
1 s2.0 S1319157824001691 Main
14 pages
Agronomy CCMT - 14-00500
No ratings yet
Agronomy CCMT - 14-00500
13 pages
Chatbots and Dialogue Systems Overview
No ratings yet
Chatbots and Dialogue Systems Overview
51 pages
Abstractive Text Summary Generation With Knowledge Graph Representation
No ratings yet
Abstractive Text Summary Generation With Knowledge Graph Representation
9 pages
Fine Tuning LLM For Enterprise: Practical Guidelines and Recommendations
No ratings yet
Fine Tuning LLM For Enterprise: Practical Guidelines and Recommendations
17 pages
唐钰葆-Introduction to GR
No ratings yet
唐钰葆-Introduction to GR
42 pages
Cdtrans Cross-Domain Transformer For Unsupervised Domain Adaptation
No ratings yet
Cdtrans Cross-Domain Transformer For Unsupervised Domain Adaptation
14 pages
End-to-End Automatic Speech Recognition
No ratings yet
End-to-End Automatic Speech Recognition
6 pages
Advancing Fake News Detection Hybrid Deep Learning With FastText and Explainable AI
No ratings yet
Advancing Fake News Detection Hybrid Deep Learning With FastText and Explainable AI
19 pages
Bidirectional EnglishHadiyyisa Machine Translation
67% (3)
Bidirectional EnglishHadiyyisa Machine Translation
105 pages
Multi-Scale Vision Transformer for Medical Segmentation
No ratings yet
Multi-Scale Vision Transformer for Medical Segmentation
19 pages
Trimodal Contrastive Loss for Social Media
No ratings yet
Trimodal Contrastive Loss for Social Media
10 pages
Efficient Inference for Large Language Models
No ratings yet
Efficient Inference for Large Language Models
36 pages
Mengistu Abebe
No ratings yet
Mengistu Abebe
137 pages
ViT for Satellite Image Classification
No ratings yet
ViT for Satellite Image Classification
6 pages

Markdown-Savvy Transformer Fine-Tuning

Uploaded by

Markdown-Savvy Transformer Fine-Tuning

Uploaded by

Transformers For Markdown Article Rewriting

Stanford CS224N Custom Project

1 Key Information to include

• Mentor: Manan Rai

• “A flock of ==frogs== were ==roaming== around the park

Stanford CS224N Natural Language Processing with Deep Learning

4.1 Scraping Data

4.2 Preparing Data

5.2 Evaluation method

5.3 Experimental details

Figure 1: Summary Baseline

Figure 3: Paraphrase Baseline

Figure 4: Paraphrase Best

Examples of this method are available in Appendix B.

[4] Sam Shleifer. Distilbart CNN 6-6. Sep 2021.

[5] Scott Hickmann. Metrics and Datasets. Feb 2022.

[7] Yves Peirsman. Comparing Sentence Similarity Methods. May 2018.

[8] Scott Hickmann. Results. Mar 2022.

Article Compression Rouge 2 Rouge L MD Overall

Article Compression Rouge 2 Rouge L MD Overall

Article Compression Rouge 2 Rouge L MD Overall

You might also like