You are on page 1of 2

Test →

Build an AI model (RAG, autoregressive, etc.) using any open-source LLM that takes in
an Excel sheet and fills up certain cells based on data found in a corpus of factual data.
You can convert the Excel sheet to a CSV file and then back to an output Excel file.

Corpus of data can be accessed through this MongoDB connection URI:

mongodb+srv://intern:JeUDstYbGTSczN4r@interntest.i7decv0.mongodb.net/

Input Excel File can be found at:

Genoshi Intern Test - Input Excel Sheet

Your code should take in this input excel sheet and fill up all cells (whose required
information will be described by the row and column heading that cell is in) for which
the data is available in the corpus. This is a combination of a structured data retrieval
and AI based document querying assignment.

Notes →

● You can use the MongoDB compass to see all the data under the intern/papers
collection and the data you have to work with for the AI Model.
● In some of the row/column pairs the data required does not exist in the corpus of
papers - in this case, the model should leave that cell blank.
● The model doesn’t have to be perfectly correct. Accuracy of submission will be
just one of many judgement criteria. Criteria includes - design thinking, speed,
efficiency, model selection and logic and overall answer.
● The output excel generated by the model must be submitted in the relevant docs
section of the intern test form.
● Hallucinations and false data outputs will be strongly downmarked.
● Open source LLM implies no OpenAI models, no Claude.ai etc. Models can be
picked up from HuggingFace or other forums.
● You can clone the database data to your own mongoDB atlas cluster (free tier
M0) if you want to add data fields like embeddings etc. to ensure the model
works.

Test Submission Form: https://forms.gle/VD7d48GBXr5mb6mx7


Responsibilities:
Systems Design and Architecture:

● Lead the conceptualization and design of a robust and scalable backend system
for document OCR, summarization, and RAG processing.
● Develop a comprehensive architectural vision for AI systems, ensuring security,
scalability, and performance requirements are met.
● Optimize and develop an efficient backend client-facing API system.

AI Development and Implementation:

● Utilize LangChain and Transformers to build production-level RAG and AI


systems.
● Implement and optimize OCR and summarization algorithms for efficient
document processing.

Tech Stack:

● Proficiency in Python, FastAPI, LangChain, MongoDB, and Transformers


● Experience with GCP and AWS for computing resources.

Innovation and Ideation:

● Drive innovation in AI technologies, staying abreast of the latest research and


trends.
● Ideate and implement creative solutions for backend systems, contributing to the
evolution of our product.

You might also like