You are on page 1of 26
ayaa, 1:58 Transforming Text and Image Processing with Gemini Al | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium Open inapp 7 Gams) Sianin @0 Medium = Q sean Z write Transforming Text and Image Processing with Gemini Al a Akriti Upadhyay - Follow Published in Accredian » 1Sminread - 5 days ago Introduction nitps:medium.comvinterational-school-ota-data-sciencetranstormingtex-and-mage-processing-with-gemint-s-25d 1éc8827F 118 ayaa, 1:59 Transforming Text and Image Processing with Gemini Al|by Akrti Upadhyay | Accredian | Dec, 2023 | Medium Gemini is a powerful tool for text and image processing through multimodal prompting. In text processing, it generates creative responses based on prompts, from stories to poetry. For image processing, it can create visual representations from text prompts, allowing for custom artwork and conceptual visualization. By combining these capabilities, Gemini excels in complex tasks, such as generating stories connecting text and images or providing descriptive captions for images. In my previous article, you can explore more about Gemini. Let’s begin the journey! Multimodal Prompting Multimodal prompting involves using diverse inputs like text, images, audio, and video to prompt a multimodal model capable of processing multiple modalities simultaneously. This approach enables complex reasoning and communication across modalities, opening up new applications. Effective prompt design is a challenge, with various methods like keywords, templates, examples, or natural language instructions. ntps:medium.comvinterational-school-ota-data-sciencetranstormingtext-and-mage-processing-with-gemint-a+- 25d 1éc8827F 2128 ayaa, 1:58 Transforming Text and Image Processing with Gemini Al | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium : Language ----------, ,------- : Question: Which property do these ve ' two objects have in common? ' ' ' Context: Select the better answer. ! ' Options: (A) soft (B) salty Rationale: Look at each object. For each object, decide if it has that property. Potato chips have a salty taste. Both objects are salty. A soft object changes shape when you squeeze it. The fries are soft, but the cracker is not. The property that both objects have in common is salty. Answer: The answer is (B). Figure 1. Example of the multimodal CoT task. Recent advancements, such as Lee et al.'s method addressing missing modalities and Chen’s demonstration using Gemini for spatial reasoning and logic puzzles, highlight ongoing innovations. Khattak et al. introduced MaPLe, a technique for learning synergistic behavior across vision and language modalities in multiple transformer blocks. Model Parameters As you experiment with your prompt prototype, you have the flexibility to adjust the model run settings displayed on the right side of the application. Familiarize yourself with these crucial settings: 1. Model: Choose the model that best suits your prompt-response requirements, nitps:medium.comvinterational-school-ota-data-sciencetranstormingtex-and-mage-processing-with-gemint-s-25d 1éc8827F ayaa, 1:59 Transforming Text and Image Processing with Gemini Al|by Akrti Upadhyay | Accredian | Dec, 2023 | Medium 2, Temperature: Regulate the level of randomness in the model's responses. Increasing this value allows the model to generate more unexpected and imaginative outputs. wo . Max Outputs: Amplify the number of responses generated by the model for each request. This option proves useful for swiftly testing prompts by obtaining multiple responses for a single prompt. - . Safety Settings: Customize safety settings to manage the model's responses. The Gemini API provides adjustable safety settings across four dimensions, allowing you to quickly assess whether your application requires a more or less restrictive configuration. Getting API Key Before getting started with code implementation, you'll need an API key. Visit Google Al studio, Create API key, Copy and save the key for future. oot Ce ane) Go to Google Colab, create a notebook. Go to Secrets > Add new secret. ntps:medium.comvinterational-school-ota-data-sciencetranstormingtext-and-mage-processing-with-gemint-a+- 25d 1éc8827F 4126 2712728, 11:53 “Transforming Text and Image Processing with Gemini Al |by Akr Upadhyay |Accredian | Deo, 2028 | Medium Secrets Gx Configure your code by storing environment variables, file paths, or keys. Values stored here are private. visible only to you and the notebooks that you select. Secret name cannot contain spaces. Notebook access Name Value Actions + Add new secret Access your secret keys in Python via from google.colab import userdata userdata.get('secretName' ) Name the key and paste the copied key in Value. Toggle on the Notebook Access. htips:Imecium.comintemationa-schoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 528 2712728, 11:53 “Transforming Text and Image Processing with Gemini Al |by Akr Upadhyay |Accredian | Deo, 2028 | Medium {a} notebooks that you select. Secret name cannot contain spaces. & Notebook access @® = coocie arik Name Value Actions ©@ w o + Add new secret Access your secret keys in Python via’ You have to use the same name while calling the API key. Let’s get started with codes! Code Implementation with Gemini First, Install the required library. !pip install -q -U google-generativeai Import the important packages which you are going to use. ‘import pathlib ‘import textwrap ‘import google.generativeai as genai from google.colab import userdata from IPython.display import display from IPython.display import Markdown htts:Imecium.comintemationalschoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 628 ayia, 1153 “Transforming Tox and Image Processing with Gemini |by Akt Upadhyay | Aczredian | Dec, 2023 | Medium ‘import. PIL. Image ‘import google. ai.generativelanguage as glm Write a helper function for getting text in Markdown. def to_markdown (text) : text = text.replace('e', | #') return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True)) Pass the API key with the same name you saved in your Google Colab Secrets. GOOGLE_APT_KEY-userdata. get ('GOOGLE_APT_KEY") genai.configure(api_key=GOOGLE_API_KEY) After calling the API key, let’s list the models. for m in genai.list_models(): if ‘generateContent' in m.supported_generation_methods: print(m.name) You'll be able to see the following lis htips:Imecium.comintemationa-schoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 728 ayia, 11:68 Transforming Text and Image Processing with GominiAl | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium models/gemini-pro models/gemini-pro-vision Generate text from text inputs Let's start with the first model for text-only prompts. model = genai.GenerativeModel(' gemini-pro') Pass the question and generate content. Let’s see how much time is it taking. ssetime response = model.generate_content ("what is Markov Chain Rule") Following will be the result of time. CPU times: user 199 ms, sys: 18.9 ms, total: 218 ms Wall time: 10.2 s You'll get the answer of your question, by just passing the question in the helper func ntps:medium.comvinterational-school-ota-data-sciencetranstormingtext-and-mage-processing-with-gemint-a+- 25d 1éc8827F 828 2712728, 11:53 “Transforming Text and Image Processing with Gemini Al |by Akr Upadhyay |Accredian | Deo, 2028 | Medium to_markdown (response. text) Following will be the answer. The Markov Chain Rule, also known as the Chapman-Kolmogorov equations, is a fundamental theorem in probability theory that describes the behavior of stochastic processes over time. It states that the probability of transitioning from a given state to another state in a Markov chain depends only on the current state and the transition probabilities, and is ‘independent of the history of the process. Mathematically, the Markov Chain Rule can be expressed as: P(Xn = xn | X0= x0, X1= xi, X_{n-1} = x_{n-1}) where: X_{n-1} = x_{n-1}) = POX_n = x0 X_n represents the state of the Markov chain at time n. x_n is a particular value of the state at time n. P(X = xn | XO = x0, X1= x1, 0.2, X{nel} = x {n-1}) is the probability of the Markov chain being in state x_n at time n, given that it started in state x_0 at time 0 and visited states x1, x.2, ..., x_{n-1} in between. P(X = xn | X{n-1} = x_{n-1}) is the transition probability of moving from state x_{n-1} to state x_n in one step. The Markov Chain Rule is derived from the assumption that the Markov chain ‘is memoryless, meaning that the future behavior of the process depends only on the present state, and not on the past states. This assumption makes Markov chains a powerful tool for modeling a wide range of real-world phenomena, such as stock market fluctuations, disease transmission, and customer behavior. The Markov Chain Rule is useful for a variety of applications, including: Predicting the future state of a system based on its current state. Calculating the probability of reaching a particular state within a given number of steps. Simulating the behavior of a Markov chain to generate synthetic data. In addition to its theoretical importance, the Markov Chain Rule has many practical applications in various fields, including: Finance: Markov chains are used to model stock prices, interest rates, and htts:Imecium.comintemationalschoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 9128 23112928, 1:53 “Transforming Tex and Image Processing with Gemini Al | by Akrtt Upadhyay |Accredian | Dec, 2023 | Medium other financial variables. Operations research: Markov chains are used to analyze queuing systems, ‘inventory management, and other logistical processes. Biology: Markov chains are used to model the spread of diseases, the evolution of populations, and the behavior of biological systems. Computer science: Markov chains are used to model the behavior of computer networks, speech recognition systems, and other complex systems. Overall, the Markov Chain Rule is a fundamental tool in probability theory and has wide-ranging applications in a variety of fields. Let's see the safety ratings by calling prompt_feedback. response.prompt_feedback Following will be the result: safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE } safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE 3 safety_ratings { category: HARM_CATEGORY HARASSMENT probability: NEGLIGIBLE i safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: NEGLIGIBLE i hntps:Imedium.convintemational-schoo-o-a-dsta-scienceltransforming-tex-and-mage-processing-with-gemini-a+25ddc88c27F 10128 23112928, 1:53 “Transforming Tex and Image Processing with Gemini Al | by Akrtt Upadhyay |Accredian | Dec, 2023 | Medium Gemini has the capability to produce various potential responses for a given prompt. These responses, referred to as candidates, offer a range of options for your consideration. You can assess these candidates and choose the most fitting one to serve as the final response. response. candidates Following will be the result: [content { parts { text: "The Markov Chain Rule, also known as the Chapman-Kolmogorov equations + role: "model" } Finish_reason: STOP index: © safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE } safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE } safety_ratings { category: HARM_CATEGORY_HARASSMENT probability: NEGLIGIBLE 3 safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT. probability: NEGLIGIBLE 3 1 htips:Imecium.comintemationa-schoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 11728 ayaa, 1:58 Transforming Text and Image Processing with Gemini Al | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium You have the option to stream the response in real-time as it’s being generated. The model will provide segments of the response as soon as they are created. Let’s pass another question. ssetime response = model.generate_content("Wwhat is sparse vector?", stream=True) Following will be the resultant time: CPU times: user 189 ms, sys: 18.8 ms, total: 208 ms Wall time: 10.7 s Let’s see the chunks of response. for chunk in response: print (chunk. text) print (""*80) Following will be the result: A sparse vector is a data structure that represents a vector with most of its el set to zero. Sparse vectors are often used to represent high-dimensional vector The key advantage of ntps:medium.comvinterational-school- of a-data-sciencetranstormingtext-and-mage-processing-with-gemint-a-25d éc8827F 1278 23112928, 1:53 “Transforming Tex and Image Processing with Gemini Al | by Akrtt Upadhyay |Accredian | Dec, 2023 | Medium sparse vectors is that they can be stored and processed more efficiently than d ‘is stored as three array: + x#Value! * This array stores the nonzero elements of the vector. * #xColumn indices:** This array stores the column indices of the nonzero elemen * 4#Row pointersi#* This array stores the starting index of each row in the valu For example, consider the following sparse vector: x = [0, 3, @, 0, 7, 0, 8, @, 11] This vector has three nonzero elements: 3, 7, and 11. In CSR format » this vector can be represented as follows: values = (3, 7, 11] column_indices = [1, 4, 8] row_pointers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] The values array stores the three nonzero elements of the vector. The column ind Sparse vectors can be used to represent a wide variety of data, including: * xxDocuments:+# A document can be represented as a sparse vector of word counts + swImagesi#* An image can be represented as a sparse vector of pixel values. * *xGraphs:#x A graph can be represented as a sparse vector of edge weights. Sparse vectors are a powerful tool for representing and processing high-dimensio + eeMachine learning:** Sparse vectors are used in many machine learning algorit ‘* «Data mining:** Sparse vectors are used in data mining tasks, such as clustering, classification, and anomaly dete * #xSignal processing:#* Sparse vectors are used in signal processing applicatio Let's see it’s safety ratings. < » https:Imecium.comintemationa-schoo!-t-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27T 1328 ayia, 11:68 Transforming Text and Image Processing with GominiAl | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium response = model.generate_content ("what is sparse vector?", stream=True) response. prompt_feedback Following will be the result: safety_ratings { category: HARM_CATEGORY_SEXUALLY_EXPLICIT probability: NEGLIGIBLE } safety_ratings { category: HARM_CATEGORY_HATE_SPEECH probability: NEGLIGIBLE 3 safety_ratings { category: HARM_CATEGORY_ HARASSMENT probability: NEGLIGIBLE + safety_ratings { category: HARM_CATEGORY_DANGEROUS_CONTENT probability: NEGLIGIBLE } Generate text from image and text inputs Let’s see what the model could do with the image inputs. Load the image. Icurl -o image. jpg https: //4.pinimg.com/736x/d0/0b/f8/d00bf86933543a764c971cca78 ntps:medium.comvinterational-school-ota-data-sciencetranstormingtext-and-mage-processing-with-gemint-a+- 25d 1éc8827F 1428 ayaa, 1:58 Transforming Text and Image Processing with Gemini Al | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium You'll see the download result. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 109 82583 100 82583 8 8 292k 8 r= 293k » Open the image with Pillow and save it in img. ‘img = PIL.Image.open('image.jpg") img Following is the image: nitps:medium.comvinterational-school-ota-data-sciencetranstormingtex-and-mage-processing-with-gemint-s-25d 1éc8827F 1628 23712728, 11:53 “Transforming Text and Image Processing with Gemini Al | by Akrt Upadhyay |Accredian | Deo, 2028 | Medium For image inputs, we'll use Gemini-pro-vision model. hntps:Imedium.convintemational-schoo-o-a-dsta-scienceltransforming-tex-and-mage-processing-with-gemini-a+25ddc88c27F 1628 ayia, 11:68 Transforming Text and Image Processing with GominiAl | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium model = genai.GenerativeModel (' gemini -pro-vision') Let’s generate the content from the image inputs. response = model. generate_content (img) to_markdown (response. text) Following will be the result: This image is of two kittens in a red wagon. The kittens are both orange and white, and they are looking at the camera. The wagon is sitting on a sandy surface, and the background is a blurry orange. We can give some text inputs alongwith the image. Let’s see the response. response = model.generate_content(["Write a short, engaging blog post based on t response. resolve() to_markdown (response. text) Following will be the result: ntps:medium.comvinterational-school-ota-data-sciencetranstormingtext-and-mage-processing-with-gemint-a+- 25d 1éc8827F 1728 ayia, 11:68 Transforming Text and Image Processing with GominiAl | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium Cats are one of the most popular pets in the world, and for good reason. They ar There are many benefits to having a cat as a pet. Cats can provide companionship Here are some of the reasons why cats are so friendly with humans: They are curious and playful. Cats are naturally curious creatures, and they lov They are affectionate. Cats are very affectionate creatures, and they love to be They are independent. Cats are very independent creatures, and they do not requi If you are looking for a loving, loyal, and independent companion, a cat may be Chat Conversations Gemini empowers you to engage in open-ended conversations spanning multiple turns. The ChatSession class streamlines this process by handling the conversations state. Unlike with generate_content, there's no need to store the conversation history as a list. Let's initiate the conversation: model = genai.GenerativeModel (' gemini -pro') chat = model.start_chat (history=[]) chat Let’s pass the first question. response = chat.send_message("In one sentence, explain the big bang theory for a ‘to_markdown (response. text) ntps:medium.comvinterational-school-ota-data-sciencetranstormingtext-and-mage-processing-with-gemint-a+- 25d 1éc8827F 1828 ayaa, 1:58 Transforming Text and Image Processing with Gemini Al | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium Following will be the answer: In the beginning, there was nothing, and then, with a big bang, all the stars and planets in the universe were made. Pass the second question. response = chat.send_message("In one sentence, explain the solar system for a 8 ‘to_markdown (response. text) Following will be the answer: our solar system is like a big family of planets, moons, and other objects that all orbit around the Sun, which is Like the mom or dad of the family. With chat.history we'll see the history of both the questions and responses. chat.history Following will be the history: nitps:medium.comvinterational-school-ota-data-sciencetranstormingtex-and-mage-processing-with-gemint-s-25d 1éc8827F 1928 2712723, 11:53 “Transforming Text and Image Processing with Gemini Al |by Akr Upadhyay |Accredian | Deo, 2028 | Medium [parts { text: "In one sentence, explain the big bang theory for a 8 year old child.” } role: "user", parts { text: "In the beginning, there was nothing, and then, with a big bang, all th } role: "model", parts { text: "In one sentence, explain the solar system for a 8 year old child." } role: "use parts { text: "Our solar system is like a big family of planets, moons, and other obj y role: "model"] Let’s pass another question, and display the response in chunks. response = chat.send_message("Okay, how about a more detailed explanation to a h for chunk in response: print (chunk. text) print("_"*80) Following will be the result: The solar system, our cosmic neighborhood, consists of the Sun, eight planets, dwarf planets, moons, asteroids, comets, and meteoroids. The Sun, a massive sph on the entire system. The planets, including Mercury, Venus, Earth, Mars, Jupit htts:Imecium.comintemationalschoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 2076 2712728, 11:53 “Transforming Text and Image Processing with Gemini Al | by Akrt Upadhyay |Accredian | Deo, 2028 | Medium too large to be classified as asteroids but do not meet the criteria to be cons Let’s see the chat history messages in proper text format. for message in chat.history: display (to_markdown(f!#{message.role}ss: {message.parts[0] .text)')) Following will be the result: user: In one sentence, explain the big bang theory for a 8 year old child. model: In the beginning, there was nothing, and then, with a big bang, all the s user: In one sentence, explain the solar system for a 8 year old child. model: Our solar system is like a big family of planets, moons, and other object user: Okay, how about a more detailed explanation to a high schooler? model: The solar system, our cosmic neighborhood, consists of the Sun, eight pla Isn’'t this interesting! You can also try the same and explore Gemini Use cases. Have Fun! Conclusion htips:Imecium.comintemationa-schoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 2118 ayaa, 1:58 Transforming Text and Image Processing with Gemini Al | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium In conclusion, Gemini’s application in text and image processing signifies an intriguing blend of creativity and technology. Its proficiency in generating contextually relevant text and creating images from descriptions broadens the scope of automated content creation and visual representation. Welcome to he Gemini era The synergy between text and image processing in Gemini enhances user experiences, unlocking diverse applications like content generation, data visualization, and educational tools. The untapped potential of Gemini in these domains makes it an exciting ‘frontier for exploration and innovation. Gemini Multimodal Prompting Text Image Chat ntps:medium.comvinterational-school-ota-data-sciencetranstormingtext-and-mage-processing-with-gemint-a+- 25d 1éc8827F 22106 2712728, 11:53 “Transforming Text and Image Processing with Gemini Al | by Akrt Upadhyay |Accredian | Deo, 2028 | Medium Written by Akriti Upadhyay Croton) O 246 Followers + Writer for Accredian 3x Microsoft Azure Certified | Intern at Intellipaat | Generative Al| Machine Learning | Data Science | Artificial Intelligence | Microsoft Power Bl | T-SQL More from Akriti Upadhyay and Accredian Retrieval augmentation Fara TY () es = @ Avritiupactyay in Accreaian @ axritiupacnyay in Accredian Implementing RAG with Langchain Efficient Information Retrieval with and Hugging Face RAG Workflow Using Open Source for Information Retrieval Improving Search and Summarization using Retrieval Augmented Generation @minread - Oct 16 Sminread - Octo S) 408 Qo S308 Q3 a hntps:Imedium.convintemational-schoo-o-a-dsta-scienceltransforming-tex-and-mage-processing-with-gemini-a+25ddc88c27F 2318 a2, 1:58 Transforming Tex and Image Processing with GominiAl | by Akrti Upadhyay |Accredian | Dec, 2023 | Medium esc ecal Uy PRODUCT OUTCOMES OUTCOMES © Accredian | Product Management in Accredian @ Avcitivpadhyay Mapping Business Outcomes to How to Create a Vector-Based Product Outcomes: A Strategic... Movie Recommendation System... ‘Author: Anup Kumar Pallar By Akriti 6minread » Aug 1? 10min read - Nov20 2 Q Li Sot QQ ‘See all from Akriti Upadhyay See all from Accredian Recommended from Medium nitps:medium.comvinterational-school-ota-data-sciencetranstormingtex-and-mage-processing-with-gemint-s-25d 1éc8827F 2426 2712723, 11:53 @ Pragmatic coders Al predictions: Top 13 Al trends for 2024 Explore the future with our comprehensive guide to the top 13 Al trends anticipated for... 13min read - Dect0 Sik Qa Lists §) Tech & Tools 113 saves Hat 1B stories stss200 s1#3e00 122500 @ tech-Practice Democratize Al: turn a $95 chip into a 16GB VRAM GPU! Beats mo... Please support and subscribe to my youtube channel and follow me on X(Twitter). + - 3minread - Aug 17 Ow Qn ot hntps:Imedium.convintemational-schoo-o-a-dsta-scienceltransforming-tex-and-mage-processing-with-gemini-a+25ddc88c27F “Transforming Text and Image Processing with Gemini Al |by Akrt Upadhyay |Accredian | Deo, 2028 | Medium @ Piaban Nayak Building an Application for Facial Recognition Using Python,... Method 1. Facial Recognition Using Python, OpenCV and Qdrant 13min read - Dec 1S S10 Q Q & Hoo ote Building Your Own Gemini Pro Chatbot Use Vertex Al APIs to access Google's new Al model Aminread - Dect Sse Qa 25728 23712728, 11:53 1. Waniuin in Towards al Advanced RAG Techniques: an Illustrated Overview A comprehensive study of the advanced retrieval augmented generation techniques. 19minread » Sdaysago 1k Qe ine ‘See more recommendations “Transforming Text and Image Processing with Gemini Al |by Akrt Upadhyay |Accredian | Deo, 2028 | Medium @ im ciyde Monge © in Generative Al Google’s Imagen 2 Announced—Al Images With Shocking. Google's Imagen 2 Al image generator can generate the most photorealistic Al images... + + 6minread - 3daysago Qn at & 76 htips:Imecium.comintemationa-schoo-o-a-data-scienoeftransforming-tex-and-mage-processing with-gemini-a-2541de88c27F 25728

You might also like