You are on page 1of 26
ACKNOWLEDGEMENT The satisfaction that accompanies the successful completion of any task would be incomplete without the mention of people who made it possible and whose constant guidance and encouragement crown all the efforts with success, We would like to extend our heartfelt gratitude and a sincere thanks to Prof. PS Avadhani, Professor, Department of CSE, GVPCEW, former Director IIT Agartala, former Principal AU College of Engineering(A), Andhra University Visakhapatnam, Senior Professor Department of CSSE College of Engincering(A), Andhra University , for his valuable guidance and providing necessary help whenever needed, We feel elated to extend our sincere gratitude to Mr.P.Siva, Assistant Professor for encouragement all the way during analysis of the project. His annotations, insinuations and criticisms are the key behind the successful completion of the thesis and for providing us all the required facilities. We express our deep sense of gratitude and thanks to Dr.Dwiti Krishna Bebarta, Professor and Head of the Department of Computer Science and Engineering for her guidance and for expressing her valuable and grateful opinions in the project for its development and for providing lab sessions and extra hours to complete the project. We would like to take this opportunity to express our profound sense of gratitude to Vice Principal, Dr. G. Sudheer for allowing us to utilize the college resources thereby facilitating the successful completion of our project. We are also thankful to both teaching and non-teaching faculty of the Department of ‘Computer Science and Engineering for giving valuable suggestions for our project. We would like to take the opportunity to express our profound sense of gratitude to the revered Principal, Dr. R. K. Goswami for all the help and support towards the successful completion of our project, TABLE OF CONTENTS TOPICS Abstract 1. INTRODUCTION 1.1 Problem Statement 2 Technology Stack 1) Frameworks Used Uf Packages Used II) Datasets/Database Used IV) Software Requirement Specification \V)Hardware Requirement Specification 3. SYSTEM DESIGN 8. 10 3.1 Introduction 3.2 UML diagrams 3.2.1 Class diagram 3.2.2 Use case diagram 3.2.3 Sequence diagram 3.2.4 Activity diagram METHODOLOGY 4,1 Model Architecture Diagram 4.2. Modules Descriptions 4.3 Techniques/Algorithms Used IMPLEMENTATION AND CODE WITH COMMENTS RESULT / ANALYSIS 6.1 Output Screen CONCLUSION FUTURE SCOPE REFERENCES [ IN IEEE FORMAT ] Certificate Photo in Colour print PAGENO. u 20 2 24 25, ABSTRACT The manual analysis of meeting contributions is a time-consuming and inefficient process that hinders effective collaboration and decision-making within organizations. This problem necessitates the development of an automated solution: a Meeting Contribution Auto Analyzer, This analyzer would streamline the process of extracting, analyzing, and categorizing meeting contributions in real-time, providing accurate and valuable insights. By automating this task, organizations can enhance productivity, ‘enable data-driven decision-making, and identify trends and patterns within meetings. The absence of an efficient auto analyzer limits the potential for improvement and impedes the optimization of meeting effectiveness. This abstract highlights the need for a reliable and efficient Meeting Contribution Auto Analyzer to overcome these challenges and unlock the benefits of streamlined analysis and enhanced collaboration Introduction 1.1 Problem Statement The problem at hand is the inefficiency and lack of productivity in the manual analysis of meeting contributions. Currently, organizations rely on human efforts to review and analyze meeting discussions, notes, and actions, leading to time-consuming processes and potential errors, There is a need for an automated Meeting Contribution Auto Analyzer that can effectively extract, analyze, and categorize ‘meeting contributions in real-time, providing accurate and valuable insights. ‘The absence of such a solution not only hampers decision-making processes but also limits the ability to identify trends, pattems, and areas for improvement within meetings. Therefore, there is a pressing need for a reliable and efficient Meeting Contribution Auto Analyzer to enhance productivity, facilitate data-driven decision-making, and optimize the overall effectiveness of meetings In today's fast-paced business environment, effective meetings play a crucial role in decision-making and driving organizational progress. However, it can be challenging for meeting organizers and stakeholders to assess the level of engagement and contributions from individual participants Our problem is the absence of an automated contribution analyzer for meetings, Without such a tool, organizations face the following challenges: consuming Manual Analysis, + Inefficient Progress Tracking 1.2 Objective In the era of virtual communication and collaborative platforms, the dynamics of online meetings play a crucial role in effective information exchange. The project aims to revolutionize the understanding of online interactions by introducing a comprehensive system that analyzes audio inputs from virtual meetings. This innovative system focuses on two fundamental aspects ~ quantifying individual participation through speech analysis and converting spoken content into text format. Project Objectives: 1. Speech Analysis - Accurately measure the speaking duration of each participant during online meetings. - Provide insights into the distribution of speaking time, fostering equitable participation. 2. Textual Output: - Convert audio content into textual format for better documentation and reference. - Facilitate keyword extraction to identify key points and topics discussed. Significance: This project addres settings. By automating the process of tracking individual speaking time and transcribing spoken content, the system enhances meeting productivity and accountability. Additionally, the ability to extract meaningful insights from meetings can contribute to improved decision-making and collaborative outcomes, es the growing need for efficient meeting analysis, particularly in remote work 2. Technology Stack 2.1 Frameworks used Streamlit: It is an open-source Python library that simplifies the process of creating web applications for data science and machine learning. It allows you to tum data scripts into shareable web apps with minimal effort, using a simple Python script. It is known for its ease of use, as it automates many Ya aspects of web app development, such as layout and interactivity, enabling users to focus. Streamlit on data analysis and visualization 2.2 Packages Used Pandas: A data analysis library for Python that provides tools for data manipulation, cleaning, transformation, and visualization. It is used for processing and analyzing tabular data within te Pandas the web application. NumPy: A numerical operations library for Python that provides efficient operations on arrays. It is used_for performing numerical computations and calculations within the web jymPy application, Matplotlib: A plotting library for Python that provides tools for creating matplstlib visualizations. It is used for generating charts, graphs, and plots to display data insights within the web application NLTK: A natural language processing (NLP) library for Python that provides tools for understanding the meaning and context of text content within the web application, tokenization, tagging, parsing, and sentiment analys Keres: Is an open-source high-level neural networks API written in Python. It serves as an interface for the TensorFlow library, making it easier to design, build, and train deep leaning models. Keras provides a user-friendly and modular approach to constructing neural networks, facilitating both beginners and experienced researchers in the field of machine learning. TensorFlow: Is an open-source machine learning framework developed by the Google Brain 9p team. It allows for the creation and training of machine learning models, particularly deep TensorFlow neural networks. TensorFlow supports a variety of platforms and devices, making it versatile for tasks such as image and speech recognition, natural language processing, and more. Pydub: Is a Python library for audio processing. It simplifies tasks such as reading, Pydub ‘manipulating, and exporting audio files. Pydub supports various audio formats and provides a straightforward interface for common audio operations like concatenation, splitting, and applying effects. Librosa: Is a Python package designed for music and audio analysis. It provides tools for rip feature extraction, time-series analysis, and visualization of audio data. Librosa is widely ee used in applications such as music information retrieval, audio classification, and other tasks related to the analysis of sound. Speech Recognition: Is a Python library that allows developers to recognize speech and convert it into text. It provides a simple interface to work with different automatic speech recognition (ASR) engines. This library is useful for applications involving voice commands, transcription services, and other scenarios where converting spoken language to text is necessary. Count Vectorizer: Is a technique in natural language process 1g (NLP) for converting a collection of text documents to a matrix of token counts, Each document is represented as a vector, and the value in each position of the vector corresponds to the frequency of a specific word in the document. Count Vectorizer is commonly used in text analysis, machine learning, and information retrieval tasks. HTML: The Hyper Text Markup Language that defines the structure and content HTML css CSS: Cascading Style Sheets that control the appearance of web pages. It is used for styling the HTML elements and components within the web application. 2.3 Datasets/Database Used ‘The dataset for the Meeting Contribution AutoAnalyzer project is centered around audio files exclusively in the WAV format. These audio files form the bedrock of the system's learning process, providing the necessary material for training and evaluating the model's performance. The dataset is carefully curated to encapsulate a diverse array of speakers, ensuring that the model learns from a rich tapestry of voices, accents, and communication styles. The inclusion of various meeting scenarios, each reflecting distinct lengths, background noises, and conversational dynamics, enriches the dataset to foster adaptability to real- world conditions. Every audio file within the dataset is meticulously annotated with metadata, including speaker identities and timestamps, crucial for the supervised learning phases during model training, Contributing to this dataset involves the submission of audio contributions in the WAV format under the standardized file name “audio.way." Additionally, contributors are encouraged to provide supplementary information such as speaker identities and contextual details, enhancing the dataset's depth and relevance. Itis imperative that contributors adhere to privacy regulations, obtaining explicit consent for sharing audio data. In embracing these guidelines, contributors actively contribute to the dataset's comprehensiveness, thus playing a pivotal role in shaping the effectiveness of the Meeting Contribution AutoAnalyzer. 2.4 Software Requirement Specifications * Operating System: Windows, macOS, or Linux + Python: Version 3.6 or higher + Pandas: Version 1.0 or higher © NumPy: Version 1.19 or higher ‘© Matplotiib: Version 3.4 or higher + TensorFlow: Version 2.0 or higher * Scikit-leam: Version 0.24 or higher + NLTK: Version 3.7 or higher * HTMLS * css3 2.5 Hardware Requirement Specifications * Processor: Intel Core iS or equivalent + RAM: 8GB or more * Storage: S00GB or more * Graphics Processing Unit (GPU): project involves heavy computations, specify whether it's optimized for GPUs. 3. System Design 3.1 Introduction It sophisticated system designed to revolutionize the way we comprehend and analyze online meetings. At its core, the system employs state-of-the-art audio processing and machine learning techniques to scrutinize audio contributions within a meeting, offering insights into each participant's speaking patterns and textual contributions. The primary aim is to enhance meeting comprehension by providing a comprehensive overview of individual participation. This innovative tool is not only a testament to advancements in audio analytics but also a response to the growing reliance on virtual communication platforms. As businesses and teams increasingly operate in remote or hybrid settings, understanding the dynamics of online meetings becomes paramount. The Meeting Contribution AutoAnalyzer seeks to empower users with a muanced understanding of meeting interactions, ultimately fostering more effective and engaging virtual collaborations. 3.2 UML Diagrams 3.2.1 Usecase Diagram ‘Online Meeting Analyzer ~ & Ma Fig.3.1.1 Use Case Diagram 3.2.2 Class Diagram Fig.3.2.2 Class Diagram 3.2.3 Sequence Diagram 2.3 Sequence Diagram 3.2.4 Activity Diagram User \ @ Fig.3.2.4 Activity Diagram 4, Methodology 4.1. Architecture Diagram srareuenr =P coctecton “DP vaconocessnc—P" setecrion Y MODEL BEPLOMIENT AE onytzation —E caLcutarion —€"_oeTECON Fig 4.1 4.2, Modules, Descriptions 1.Data Colleetion: In this module, raw data is gathered from various sources relevant to the problem domain. This could include databases, APIs, files, or any other data repositories Key Responsibilities: Identify relevant data sources. Extract data using appropriate methods, Ensure data integrity and quality. 2. Data Preprocessing: This module involves cleaning and transforming raw data into a suitable format for analysis. It addresses issues like missing values, outliers, and ensures data consistency. Key Responsibilities: Handle missing or inconsistent data. Perform data normalization and scaling. Encode categorical variables. Detect and handle outliers 3. Model Selection: In this module, the appropriate machine learning or statistical model is chosen based on the nature of the problem, data characteristics, and project goals. Key Responsibilities: Understand the problem requirements, Evaluate various models for suitability. Consider factors like model complexity and interpretability. 4, Model Training: The selected model is trained on the preprocessed data to lear pattems and relationships, making it capable of making predictions or classifications. Key Responsibilities: Split data into training and validation sets Train the model using the training set. Validate and fine-tune model parameters. 5, Voice Detection: This module focuses on identifying and extracting voice signals from audio data. It is especially relevant in applications involving speech processing. Key Responsibilities: Implement voice activity detection algorithms, Extract relevant features from audio data. Differentiate voice from background noise. 6. Length Calculation: This module calculates the length or duration of specific elements, such as audio segments or time intervals, based on the requirements of the project. Key Responsibilities: Implement algorithms to measure length, Provide accurate duration calculations. 7. Model Optimization: Optimization techniques are applied to enhance the performance and efficiency of the trained model. This includes fine-tuning parameters and optimizing for speed or resource utilization. Key Responsibilities: - Fine-tune model hyperparameters. - Optimize model architecture, 8, Deployment: This module involves deploying the trained model and associated components to a production environment, making it accessible for real-world use. Key Responsibilities: Integrate the model into a production system. Monitor and maintain the deployed model. 4,3. Techniques/Algorithms Used 1.Load and Preprocess Audio: ~ Load the mixed audio file using Librosa (librosa.load), - Visualize the raw audio and trim unnecessary portions (librosa.effects.trim), - Display raw and trimmed audio plots, 2. Spectral Analysis: - Compute the Short-Time Fourier Transform (STFT) using Librosa (librosa.stf), - Convert the STFT to a Mel spectrogram (librosa.feature, melspectrogram). - Visualize the spectrogram and Mel spectrogram. 3. Speaker Recognition Model Training: - Prepare training data by extracting features from individual speaker audio files. - Normalize input and define a Convolutional Neural Network (CNN) model using Keras. 4. Combined Audio Speaker Identification: - Load the trained speaker recognition model, ~ Process the combined audio file in segments, - Apply Mel spectrogram transformations to each segment. - Predict the speaker using the trained model and K-Means clustering, 5, Speech to Text Transcription: = Use SpeechRecognition library to transcribe the audio to text - Tokenize the transcribed text into sentences. 6. Text Summarization: - Apply CountVectorizer and TF-IDF Transformer to calculate sentence scores. - Generate a summary by selecting the top-ranked sentences based on TF-IDF scores. 7, Evaluation and Refinement: Ifa reference summary is available, evaluate main points and generated summary using ROU - Optionally, refine the algorithm based on the evaluation results. 8. Display Results: - Display speaker durations in the combined audio. - Print the transcribed text, generated summary, and evaluation metries (if applicable), 10 5. Implementation and Code import pandas as pd import numpy as np import matplotlib.pylab as plt import seaborn as sns import librosa import librosa.display import IPython. display as ipd from itertools import eycle sns.set_theme(style="white”, palette=None) color_pal = plt.reParams[""axes.prop_cycle"].by_key()["color"] color_cycle = cycle(plt.reParams{"axes.prop_cycle").by_key()["color")) y, st= librosa.load(Mixed.wav') print(f'shape y: {y.shape}") print(f'sr: {st}") pd Series(y).plot(figsize=(10, 5), wl title"Raw Audio Example’, colorcolor_palf0]) plt.show0) y_trimmed, _ = librosa.effects.trimty, top_db=20) pd Series(y_trimmed).plot(figsize=(10, 5), Iw-l, titleRaw Audio Trimmed’, color=color_palf1}) pltshow(, pd Series(y[30000:30500)) plot(figsize=(10, 5), Raw Audio Zoomed’, color-color_pal[2]) pltshow(, D =librosa.stft(y) S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max) S_db.shape fig, ax — pltsubplots(figsize-(10, 5)) img = librosa.display.speeshow(S_db, x_axis='time’, y_axis='log’, ax=ax) ax.set_title(‘Spectogram’, fontsize-20) fig.colorbar(img, ax=ax, format=P%0.2) plt.show() S = librosa feature melspectrogram(y=y, sist, n_mels=128 *2,) a ibrosa.amplitude_to_db(S, ref-np.max) It. subplots(figsize(10, 5)) librosa.display.specshow(S_db_mel, ax.set_title(’Mel Spectogram Example’, fontsize=20) fig.colorbar(img, ax=ax, format=P%0.2) pltshow(, import librosa audio_path ="B1.wav’ duration_seconds = librosa.get_duration(filename=audio_path) duration_minutes = duration_seconds / 60 print(duration_minutes) audio_path = 'B1.wav’ duration_seconds = librosa.get_duration(filename=audio_path) duration_milliseconds = int(duration_seconds * 1000) print(duration_seconds) import librosa # Load the audio file audio_file = 'Mixed.wav' audio_data, st = librosa.load(audio_file) # Compute the Mel-Spectrogram ‘mel_spectrogram = librosa.feature.melspectrogram(y=audio_data, sr=st) audio_feature_size = mel_spectrogram.shape[0] print("Audio Feature Size:", audio_feature_size) import numpy as np import os import librosa from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # Define the list of speakers and the corresponding labels speakers =['B','C 7, 'P''S'] ‘um_speakers = len(speakers) # Parameters for audio processing sampling_rate = 22050 # Adjust according to your audio files duration =2 # Adjust according to your audio files # Load and process audio files for training X train = [] (train = [] for speaker in speakers for iin range(1, 11): audio_file = f {speaker} {i}.wav' # Assuming your audio files are in WAV format audio_path = os.path join(‘audio_samples', audio_file) # Load audio file audio, _~librosa.load(audio_path, sr-sampling_rate, dur # Extract features (e.g., Mel spectrograms) ‘mel_spec — librosa.feature. melspectrogram(y—audio, srsampling_rate) mel_spec = librosa.power_to_db(mel_spec, ref=np.max) ‘mel_spee = np.expand_dims(mel_spec, axis=-1) ## Add channel dimension yn=duration) 2 X_train.append(mel_spec) y_train append(speakers. index(speaker)) # Convert lists to numpy arrays X_train = np array(X_train) y_train = np.array(y_train) # Normalize input X train = (X_train - np.mean(X_train)) / np.std(X_train) 4 Define the CNN model ‘model = Sequential() model.add(Conv2D(32, (3, 3), activatio ‘model.add(MaxPooling2D(2, 2))) model.add(Conv2D(64, (3, 3), activationrelu’)) model.add(MaxPooling2D(2, 2))) model.add(Flatten()) model.add(Dense(64, activation-'relu’)) ‘model.add(Dense(mum_speakers, activati # Compile and train the model ‘model.compile(optimizer~'adam’, loss~'sparse_categorical_crossentropy’, metries~['accuracy']) model.fit(X_train, y_train, epochs=10, batch_size=8) #Now you can save this trained model and use it for speaker recognition ‘model.save('speaker_recognition_model.h5') import numpy as np import os import librosa from keras.models import load_model # Define the list of peakers and the corresponding labels speakers =['B','C, , 'P''S'] ‘num_speakers = len(speakers) # Parameters for audio processing sampling_rate = 22050. # Adjust according to your audio files duration =2 # Adjust according to your audio files # Load the trained model ‘model = load_model('speaker_recognition_model.hs') # Load and process the combined audio file audio_file_combined = "Mixed.wav' audio_path_combined = os.path.join(‘combined_audio’, audio_file_combined) ‘audio_combined, _ = librosa.load(audio_path_combined, sr=sampling_rate, duration=duration) ‘mel_spec_combined = librosa feature. melspectrogram(y=audio_combined, sr=sampling_rate) ‘mel_spec_combined = librosa.power_to_db(mel_spec_combined, ref-np.max) ‘mel_spec_combined = np.expand_dims(mel_spec_combined, axis=-1) # Normalize input ‘mel_spec_combined = (mel_spee_combined - np.mean(X_train)) / np.std(X_train) # Make predictions using the trained model predictions = model.predict(np array([mel_spec_combined])) predicted_speaker = speakers{np.argmax(predictions)] # Determine the length of each speaker in the combined audio frame_length = len(audio_combined) frame_length // num_speakers o for i in range(num_speakers) “relu', input_shape=X_train[0].shape)) ‘oftmax’)) B start =i * segment_Jength end = (i+ 1) * segment_length speaker_lengths.append(speakersfi), start, end)) # print("Predicted speaker:”, predicted_speaker) print("Speaker lengths:") for speaker_length in speaker_lengths: print(f"Speaker {speaker_length{0}}: {speaker_length{1]} - {speaker_length{2]}") pip install pydub import numpy as np import os import librosa from pydub import AudioSegment from pydub silence import split_on_silence from sklear.cluster import KMeans # Function to split audio file into segments based on silence def split_audio_on_silence(audio_file, min_silence_duration=500, silence _threshold=-40): audio = AudioSegment from_file(audio_file) segments = split_on_silence(audio, min_silence_duration, silence_threshold) return segments ## Function to extract audio features using librosa def extract_features(audio_file) audio, st = librosa.load(audio_file) mfec = librosa.feature.mfee(y=audio, st=sr) return mfee.T 4 Function fo recognize speakers and determine speaking durations def recognize_speakers(audio_files, combined_file): # Step 1: Split the combined audio file into segments segments = split_audio_on_silence(combined_file) # Step 2: Save segments to temporary files temp_folder = "temp_segments” ‘os. makedirs(temp_folder, exist_ok—True) segment_paths = [] for i, segment in enumerate(segments): segment_path = os.path,join(temp_folder, f'segment_{i}.wav") segment.export(segment_path, format="wav") segment_paths.append(segment_path) # Step 3: Extract features from audio files audio_features =[extract_features(audio_file) for audio_file in audi # Step 4: Perform clustering to identify speakers 1n_speakers = len(audio_files) all_features = np.concatenate(audio_features) means = KMeans(n_clusters=n_speakers, random_state=0) fit(all_features) speaker_labels = kmeans.labels_ # Step 5: Group segments by speaker labels speaker_segments = {label: [] for label in range(n_speakers)} for segment, label in zip(segments, speaker_labels): speaker_segments{label].append(segment) # Step 6: Determine speaking durations for each speaker speaking_durations = {} for i, audio_file in enumerate(audio_files): 4 ‘otal_duration ~ sum(segment.duration_seconds for segment in speaker_segmentsfil) speaking_durations[audio_file] = total_duration # Step 7: Clean up temporary files for segment_path in segment_paths: os.remove(segment_path) ‘os.rmdir(temp_folder) return speaking_durations 4 Example usage audio_files = ["Bl.wav", "C1.wav", "B3.wav"] combined _file= "Mixed.wav" recognized_durations = recognize_speakers(audio_files, combined _file) for speaker, duration in recognized_durations.items() print(f"Speaker {speaker} spoke for {duration} seconds, import tensorflow as tf import librosa import numpy as np # Manually specify the number of classes based on your audio file mum_classes = 10 # Replace with the actual number of classes for your audio file # Define the model architecture model = tf keras.Sequential([ tfkeras layers. Input(shape=(audio_feature_size,)), ‘fkeras. layers. Dense(64, activation~'relu’), tfkeras layers. Dense(64, activation='relu’), ‘fkeras.layers.Dense(mum_classes, activation~'softmax’) Dd # Compile the model ‘model.compile(optimizer—adam', osscategorical_crossentropy’, metries=['accuracy']) # Load the audio file and preprocess it audio_file='Mixed.wav' # Load the audio file and extract audio features audio_data, st = librosa.load(audio_file) ‘mel_spectrogram = librosa.feature.melspectrogram(y=audio_data, sr=st) audio_feature_size = mel_spectrogram.shape[0] # Reshape the audio features for model input audio_features = mel_spectrogram.T # Transpose the mel spectrogram # Pad or truncate the audio features to a fixed size max_length= 128 # Set the desired maximum length for the audio features audio_features = np.pad(audio_features, ((0, 0), (0, max_length - audio_features.shape[1])), mode='constant’) # Create labels for the audio file (if applicable) labels = np.zeros((audio_features.shape[0], num_classes)) labels{0] = np.array([1, 0, 0, 0, 0, 0,0, 0, 0, 0]) # Replace with the actual labels for your audio file # Reshape the labels to match the shape of audio_features labels = labels.reshape((-1, num_classes)) # Train the model on the single audio file model-fit(audio_features, labels, epochs=10, batch _s pip install SpeechRecognition import re 32) 15 import speech_tecognition as st from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer # Step 1: Audio Transcription # Specify the path to your audio file audio_file_path ="Mixed.wav" # Initialize the recognizer srRecognizer() # Load the audio file with st AudioFile(audio_file_path) as source: audio = r.record(source, duration=10) # Read the first 10 seconds of the audio file 4 Transcribe the audio to text transcribed _text = r.recognize_google(audio) # Step 2: Sentence Tokenization sentences = re.split(r(2<-[!2))\s¥, transcribed _text) # Split the text into sentences using regular expressions # Step 3: Sentence Scoring using TF-IDF vectorizer = CountVectorizer() sentence_vectors = vectorizer.fit_transform(sentences) «fid_ transformer = TfidfTransformer() ‘fidf_matrix = tfidf_transformer.fit_transform(sentence_vectors) sentence_scores = tfidf_matrix.sum(axis=1),tolist() # Step 4: Summary Generation # Combine sentences with scores sentences_with_scores = zip(sentences, sentence_scores) sentences_with_scores = sorted(sentences_with_scores, key=lambda x: x[1], reverse=True) # Select top-ranked sentences as summary summary_sentences = [sent for sent, _in sentences_with_scores[:3]] #Select top 3 sentences as summary 4 Generate summary text summary =" 'join(summary_sentences) # Print summary print("Summary: print(summary) import speech_recognition as st from nltk.tokenize import sent_tokenize from nltk.corpus import stopwords import nltk # Step 6: Evaluation and Refinement # Evaluation and refinement steps depend on your specific requirements and dataset. # Download the necessary NLTK resources nitk.download(punkt’) alt download(’stopwords') # Original reference summary (if available) reference_summary = "instances of bias and discrimination across a number of machine Learning Systems has raised many questions regarding the use" # Load the audio file audio_file = "Mixed.wav” # Transcribe the audio r=srRecognizer() with sr AudioFile(audio_file) as source 16 audio =r.record(souree, duration=10) # Read the first 10 seconds of the audio file transcribed_text = r.recognize_google(audio) # Function to extract main points from transcribed text, def extract_main_points(text): # Your implementation to extract main points from text 4# Replace this with your actual implementation ‘mait_points = ["disctimination”, "instances", "raised"] return main_points # Function to generate a summary from the transcribed text def generate_summary(text) # Your implementation to generate a summary from text summary = "instances of bias and discrimination across a number of machine Learning Systems has raised many questions regarding the use” return summary # Function to calculate ROUGE scores for summary evaluation def rouge_evaluation(summary, reference_summary): # Your implementation or use an existing library for ROUGE evaluation # Replace this with your actual implementation or library code rouge_scores = {"rouge-1": {"f": 0.75, "p": 0.85, "1": 0.65}, "touge- return rouge_scores # Extract main points ‘main_points = extract_main_points(transcribed_text) # Generate summary summary = generate_summary(transcribed_text) # Evaluate main points against reference summary (if available) ifreference_summary: reference_sentences = sent_tokenize(reference_summary.lower()) ‘mait_points = [point.lower() for point in main_points] stop_words = set(stopwords.words("english")) filtered_main_points = [point for point in main_points if point not in stop_words] filtered_reference_sentences = [sent for sent in reference_sentences if sent not in stop_words] print("Filtered Main Points:",filtered_main_points) print("Filtered Reference Sentences:",filtered_reference_sentences) correct_main_points = [point for point in filtered_main_points if any(point in sent for sent in filtered_reference_sentences)] print(f"Main Points Accuracy: {main_points_accuracy}") # Evaluate summary against reference summary (if available) if reference_summary: rouge_scores = rouge_evaluation(summary, reference_summary) print(f"ROUGE Scores: {rouge_scores}") import numpy as np import os import librosa from keras.models import load_model # Define the list of speakers and the corresponding labels speakers =['B,'C,'T, 'P''S'] num_speakers = len(speakers) # Parameters for audio processing sampling_rate = 22050. # Adjust according to your audio files duration =2 # Adjust according to your audio files v7 (°P": 0.6, "p": 0.75, "r": 05}} # Load the trained model ‘model = load_model('speaker_recognition_model.h5') # Load and process the combined audio file audio_file_combined = 'Mixed.wav' audio_path_combined = os.path,join(‘combined_audio’, audio_file_combined) audio_combined, _=librosa.load(audio_path_combined, sr=sampling_rate, duration=duration) ‘mel_spec_combined = librosa.feature.melspectrogram(y=audio_combined, sr=sampling_rate) ‘mel_spec_combined = librosa.power_to_db(mel_spec_combined, ref=-np.max) ‘mel_spec_combined = np.expand_dims(mel_spec_combined, axis=-1) Normalize input ‘mel_spec_combined = (mel_spee_combined - np.mean(X_train)) / np.std(X_train) 4 Make predictions using the trained model predictions = model.predict(np array([mel_spec_combined])) predicted_speaker = speakers(np.argmax(predictions)] # Determine the length of each speaker in the combined audio frame_length = len(audio_combined) segment_length = frame_length // num_speakers speaker_lengths = [] for i in range(num_speakers): start =i * segment_length end = (i+ 1) * segment_length speaker_lengths.append((speakersfi), start, end)) # print("Predicted speaker:”, predicted_speaker) print("Speaker lengths:") for speaker_length in speaker_lengths: print(f"Speaker {speaker _length{0}}: {speaker_length(1]} - {speaker_length{2]}") import numpy as np import os import librosa from keras.models import load_model # Define the list of speakers and the corresponding labels speakers = ['B','CT, P'S] ‘num_speakers = len(speakers) # Parameters for audio processing sampling_rate = 22050 # Adjust according to your audio files duration = 2 # Adjust according to your audio files segment_length=3 i Length of individual segments to process # Load the trained model model = load_model/speaker_recognition_model.hs') # Load and process the combined audio file ‘audio_file_combined = 'Mixed.wav' audio_path_combined = os.path,join(‘combined_audio’, audio_file_combined) audio_combined, _~librosa.load(audio_path_combined, sr=sampling_rate) total_frames = len(audio_combined) segment_frames = int(segment_length * sampling_rate) # Calculate the number of segments ‘num_segments = total_frames // segment_frames # Make predictions for each segment predicted_speakers = [] for i in range(num_segments): 18 start_frame = i * segment_frames ‘end_frame = (i + 1) * segment_frames ‘audio_segment = audio_combined[start_frame:end_frame] tmel_spec = librosa. feature. melspectrogram(y=audio_segment, ssampling_rate, n_mels=128, hop_length=512) ‘mel_spec = librosa.power_| ‘mel_spec = np.expand_dims(mel_spec, axis- 4 Adjust the shape of the mel spectrogram ‘mel_spec = mel_specf:, :87, :] Normalize input ‘mel_spec = (mel_spec - np.mean(mel_spec)) / np.std(mel_spec) 4 Make prediction using the trained model prediction = model.predict(np.array({mel_spec})) predicted_speaker = speakers{np.argmax(prediction)] predicted_speakers. append(predicted_speaker) # Determine the start time and end time of each segment segment_durations = [] for iin range(mum_segments): start_time =i * segment_length end_time = (i + 1) * segment_length segment_durations.append((start_time, end_time, predicted_speakers(i))) print("Segment durations and predicted speakers:”) for segment_duration in segment_durations: print(f"Segment {segment_duration[0]}s~ {segment_duration{1]}s: Speaker {segment_duration[2]}") )_db(mel_spee, ref-np.max) 1) 19 6.Results Fig 6.1: Raw Audio Fig 6.2: Trimmed Audio Fig 6.4:Zoomed Audio Spectogram = Fig 6.5: Speetogram 20 ram Example > oso — ° Summary a 7.Conclusion Concluding an internship experience at Datai2i, under the mentorship of Thirumulesh Sir, in the field of data science has been a valuable and enriching journey. Throughout this internship, I had the opportunity to delve into the intricacies of data analysis, machine learning, and various aspects of the data science pipeline, Under the guidance of Thirumulesh Sir, I gained practical insights into the application of data science methodologies, exploring real-world data sets, and employing cutting-edge techniques to derive meaningful insights. The mentorship provided a structured leaming environment, fostering both technical proficiency and a deeper understanding of the strategic implications of data-driven decision-making Working on the “Online Meeting Analyzer” project allowed me to apply theoretical knowledge to a practical scenario, enhancing my skills in system design, algorithm implementation, and the integration of various components within a data science application. The inclusion of UML and ER diagrams provided a comprehensive understanding of the project's structure and relationships. ‘The experience not only honed my technical skills but also instilled a sense of collaboration and problem- solving within a team, Thirumulesh Sit’s mentorship played a pivotal role in bridging the gap between theoretical concepts and their real-world application, providing a solid foundation for my future endeavors in the field of data science. Overall, the internship at Datai2i has been a pivotal stepping stone in my professional development, equipping me with practical skills, industry insights, and a deeper appreciation for the dynamic and transformative nature of data science. I am grateful for the guidance and opportunities provided during this, internship, and I look forward to applying these skills in future projects and challenges. 2 8, Future Scope ‘The Meeting Contribution AutoAnalyzer project holds significant potential for future advancements, positioning itself as a versatile and indispensable tool for optimizing meeting experiences. One key avenue for exploration is the implementation of real-time analysis capabilities. By enabling the system to process ‘meeting contributions on-the-fly, users can gain immediate insights, fostering more dynamic and responsive discussions during live meetings. Additionally, the project's future scope involves enhancing its linguistic capabilities. Multilingual support is pivotal, as it extends the system's functionality to analyze meetings conducted in diverse languages. This expansion not only broadens the project's user base but also ensures its adaptability to global and ‘multicultural work environments. Furthermore, improvements in speaker recognition algorithms, integration with productivity tools, and a robust user feedback mechanism are integral aspects of the project's evolution. The goal is to create a comprehensive and user-centric solution that seamlessly integrates with users! workflows, providing accurate analyses, actionable insights, and a user-friendly experience 2B 9.References [I].Introduction to PyDub https: //projector-video-pdf-converter datacamp.com/17718/chapter3.pdf [2].Introduction to TensorFlow hitps://www tensorflow.org/leam [3].Fands-on Machine Learning with, Scikit-Learn, Keras, and TensorFlow ‘http://powerunit-ju.com/wp-content/uploads/202 1/04/Aurelien-Geron-Hands-On-Machine-Learning-with- Scikit-Leamn-Keras-and-Tensorflow_-Concepts-Tools-and-Techniques-to-Build-Intelligent-Systems- OReilly-Media-2019.pdf 24

You might also like