You are on page 1of 2

Information

Paid Solutions:
https://ironpdf.com/blog/using-ironpdf/csharp-
extract-text-from-pdf/
https://tika.apache.org/
https://api.inda.ai/hr/docs/v2/#tag/Resume-
Parsing/operation/text_extraction__POST
(Open API, it can be limited. I am not 100 % because it
needs to be tested by a Developer)

Projetcs done:
https://github.com/KevM/tikaondotnet

Free Solutions:
https://stackoverflow.com/questions/1011234/how-
to-extract-text-from-ms-office-documents-in-c-sharp
https://docx2txt.sourceforge.net/
https://stackoverflow.com/questions/10982156/how-
to-extract-text-from-pdf-word-and-excel-documents

https://www.grapecity.com/blogs/how-to-
programmatically-extract-data-from-pdf-using-net-c-
sharp
(it shows samples)
https://blog.aspose.com/2020/05/16/extract-text-
from-pdf-csharp-vb-net/

….¨I would recommend Aspose Total for this. A few years ago I did a project on doing
pretty much exactly what you are asking and compared to using the Office Interop stuff
between different versions of Office (Prior to the change to XML) Aspose was the most
robust library. You will probably have to do some OCR based on what you are talking
about too. It's not cheap but I found their API's pretty solid and it works on most versions
of the file types you are asking about. You should be able to use the free trial to see if it
will fit for you project. I have no affiliation with Aspose other than that I used their tools
in a production environment….¨

Aspose Total
https://purchase.aspose.com/pricing/total/net
https://www.aspsnippets.com/Articles/Convert-
Image-to-Text-in-ASPNet-using-C-and-VBNet.aspx

You might also like