Professional Documents
Culture Documents
CHAPTER 1
1.1 COMPANY
OVERVIEW
“AGIMUS Technologies Private Limited” is a technology company formed by veteran
technology researchers who has more than two decade of experience in the area of IOT /
Autosar / VLSI / Embedded / RF / PCB verticals and Technology Training. Innovation is
the life of enterprise and AGIMUS is in pursuit of sustainable team, especially focus on R&D
innovation. Our Company offers a wide range of electronics and mechanical products,
solutions and services in the area of IOT / Autosar / VLSI / Embedded / RF / PCB verticals /
Machine Learning / AI.
We derive our strength from our past experience and dedicated approach towards customers.
Our team understands the current trends and emerging technologies and always delivers the
best in market products and services. We work together with the world’s most distinguished
technology companies and form strategic alliances to provide complete customer solutions
and thus deliver value in every project or program.
1.2.1 VISION
To become the world’s preferred technology partner – applying innovation & quality to
create remarkable growth for business and society.
1.2.2 MISSION
Recognize the best in the industry adopting honest business practice thereby generating
immense goodwill with customers, employees and society we operate in.
1.3 SOLUTIONS
Our Product and solutions are contains wide range of In-house Developed Boards, Evaluation
Kits, Modules, Testing & Measurement equipment, PCB Prototyping Machines, robotic Kits
for the Academic Segment and Customized Real Time IOT & Embedded solutions targeting
mega-trend industries.
AGIMUS Academia Industry Alliance Program (AAIAP) focuses on bringing high end
electronics design, IOT, AI, ML, Robotics, RF and embedded computing platforms to the
labs of colleges and university enhancing the experience of technical education and real time
practice on world beating technologies. The induction of latest technology platforms offers an
immersive learning experience to students and trainers ensures that forthcoming industry
needs are met with. AGIMUS enables academia to push the barriers of Industrial learning.
Thanks to the substantive support of its technology partners and solution providers. Exposure
to the latest technology platforms offers a comprehensive learning experience to students and
academia and ensures that the industry-academia gap is reduced.
Completed Projects:
Figure
On Going Development:
CHAPTER 2
Figure 2.1:
Branches of AI
Machine learning uses two types of techniques: supervised learning, which trains a model on
known input and output data so that it can predict future outputs, and unsupervised learning,
which finds hidden patterns or intrinsic structures in input data.
CHAPTER 3
TASKS PERFORMED
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms
and systems to extract knowledge and insights from structured and unstructured data, and
apply knowledge and actionable insights from data across a broad range of application
domains. Data science is related to data mining, machine learning and big data.
Data science is a "concept to unify statistics, data analysis, informatics, and their related
methods" in order to "understand and analyze actual phenomena" with data. It uses
techniques and theories drawn from many fields within the context of mathematics,
statistics,
computer science, information science, and domain knowledge. Turing
Award winner Jim Gray imagined data science as a "fourth paradigm" of science
(empirical, theoretical, computational, and now data-driven) and asserted that "everything
about science is changing because of the impact of information technology" and the data
deluge.
MATLAB makes data science easy with tools to access and preprocess data, build machine
learning and predictive models, and deploy models to enterprise IT systems.
Access data stored in flat files, databases, data historians, and cloud storage, or connect to
live sources such as data acquisition hardware and financial data feeds.
Manage and clean data using datatypes and preprocessing capabilities for programmatic and
interactive data preparation, including apps for ground-truth labelling
Document data analysis with MATLAB graphics and the Live Editor notebook environment
Apply domain-specific feature engineering techniques for sensor, text, image, video, and
other types of data
Explore a wide variety of modeling approaches using machine learning and deep learning
apps
Fine-tune machine learning and deep learning models with automated feature selection,
model selection, and hyperparameter tuning algorithms
Deploy machine learning models to production IT systems, without recoding into another
language
You can pre-process image input with operations such as resizing by using datastores and
functions available in MATLAB and Deep Learning Toolbox. Other MATLAB toolboxes
offer functions, datastores, and apps for labelling, processing, and augmenting deep learning
data. Use specialized tools from other MATLAB toolboxes to process data for domains such
as image processing, object detection, semantic segmentation, signal processing, audio
processing, and text analytics.
Deep learning is a type of machine learning in which a model learns to perform classification
tasks directly from images, text, or sound. Deep learning is usually implemented using a
neural network architecture. The term “deep” refers to the number of layers in the network—
the more layers, the deeper the network. Traditional neural networks contain only 2 or 3
layers, while deep networks can have hundreds.
Deep learning (also known as deep structured learning) is part of a broader family
of machine learning methods based on artificial neural networks with representation
learning. Learning can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, graph
neural networks, recurrent neural networks and convolutional neural networks have
been applied to fields including computer vision, speech recognition, natural language
processing, machine translation, bioinformatics, drug design, medical image analysis,
material inspection and board game programs, where they have produced results comparable
to and in some cases surpassing human expert performance.
These artificial networks may be used for predictive modeling, adaptive control and
applications where they can be trained via a dataset. Self-learning resulting from experience
can occur within networks, which can derive conclusions from a complex and seemingly
unrelated set of information.
A deep neural network combines multiple nonlinear processing layers, using simple elements
operating in parallel and inspired by biological nervous systems. It consists of an input layer,
several hidden layers, and an output layer. The layers are interconnected via nodes, or
neurons, with each hidden layer using the output of the previous layer as its input.
Let’s say we have a set of images where each image contains one of four different categories
of object, and we want the deep learning network to automatically recognize which object is
in each image. We label the images in order to have training data for the network.
Using this training data, the network can then start to understand the object’s specific
features and associate them with the corresponding category.
Each layer in the network takes in data from the previous layer, transforms it, and
passes it on. The network increases the complexity and detail of what it is learning
from layer to layer.
Notice that the network learns directly from the data—we have no influence over
what features are being learned.
A convolutional neural network (CNN, or ConvNet) is one of the most popular algorithms for
deep learning with images and video.
Like other neural networks, a CNN is composed of an input layer, an output layer, and many
hidden layers in between.
These layers perform one of three types of operations on the data: convolution, pooling, or
rectified linear unit (ReLU).
Convolution puts the input images through a set of convolutional filters, each of
which activates certain features from the images.
Rectified linear unit (ReLU) allows for faster and more effective training by mapping
negative values to zero and maintaining positive values.
These three operations are repeated over tens or hundreds of layers, with each layer learning
to detect different features.
Classification Layers
The next-to-last layer is a fully connected layer (FC) that outputs a vector of K dimensions
where K is the number of classes that the network will be able to predict. This vector contains
the probabilities for each class of any image being classified.
The final layer of the CNN architecture uses a softmax function to provide the classification
output.
Deep learning is a subtype of machine learning. With machine learning, you manually extract
the relevant features of an image. With deep learning, you feed the raw images directly into a
deep neural network that learns the features automatically.
Deep learning often requires hundreds of thousands or millions of images for the best results.
It’s also computationally intensive and requires a high-performance GPU.
Figure 3.4: ML vs DL
Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC
programming language, and first released it in 1991 as Python 0.9.0. Python 2.0 was
released in 2000 and introduced new features, such as list comprehensions and a garbage
collection system using reference counting and was discontinued with version 2.7.18 in
2020. Python 3.0 was released in 2008 and was a major revision of the language that is not
completely backward-compatible and much Python 2 code does not run unmodified on
Python 3.
Python consistently ranks as one of the most popular programming languages. As of 2021,
Python is the 2nd most popular language just behind JavaScript as of Stack Overflow's 2020
Developer Survey.
pandas is a software library written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series. It is free software released under the three-
clause BSD license. The name is derived from the term "panel data", an econometrics term
for data sets that include observations over multiple time periods for the same individuals.
Its name is a play on the phrase "Python data analysis" itself. Wes McKinney started
building what would become pandas at AQR Capital while he was a researcher there from
2007 to 2010.
There are a number of different data visualization libraries and modules compatible with
Python.
Most of the Python data visualization libraries can be placed into one of four groups,
separated based on their origins and focus.
The groups are:
• Matplotlib-based libraries
• JavaScript libraries
• JSON libraries
• WebGL libraries
Matplotlib-based Libraries
The first major group of libraries is those based on Matplotlib. Matplotlib is one of the oldest
Python data visualization libraries, and thanks to its wealth of features and ease of use it is
still one of the most widely used one. Matplotlib was first released back in 2003 and has been
continuously updated since.
Matplotlib contains a large number of visualization tools, plot types, and output types. It
produces mainly static visualizations. While the library does have some 3D visualization
options, these options are far more limited than those possessed by other libraries like Plotly
and VisPy. It is also limited in the field of interactive plots, unlike Bokeh, which we’ll cover
in a later chapter.
JavaScript-based Libraries
There are a number of JavaScript-based libraries for Python that specialize in data
visualization. The adoption of HTML5 by web browsers enabled interactivity for graphs and
visualizations, instead of only static 2D plots. Styling HTML pages with CSS can net
beautiful visualizations.
These libraries wrap JavaScript/HTML5 functions and tools in Python, allowing the user to
create new interactive plots. The libraries provide high-level APIs for the JavaScript
functions, and the JavaScript primitives can often be edited to create new types of plots, all
from within Python.
JSON-based Libraries
JavaScript Object Notation (JSON) is a data interchange format, containing data in a simple
structured format that can be interpreted not only by JavaScript libraries but by almost any
language. It’s also human-readable.
There are various Python libraries designed to interpret and display JSON data. With JSON-
based libraries, the data is fully contained in a JSON data file. This makes it possible to
integrate plots with various visualization tools and techniques.
WebGL-based Libraries
The WebGL standard is a graphics standard that enables interactivity for 3D plots. Much like
how HTML5 made interactivity for 2D plots possible (and plotting libraries were developed
as a result), the WebGL standard gave rise to 3D interactive plotting libraries.
Python has several plotting libraries that are focused on the development of WebGL plots.
Most ofthese 3D plotting libraries allow for easy integration and sharing via Jupyter
notebooks and remote manipulation through the web.
The process of hypothesis testing is to draw inferences or some conclusion about the overall
population or data by conducting some statistical tests on a sample. The same inferences are
drawn for different machine learning models through T-test which I will discuss in this
tutorial.
For drawing some inferences, we have to make some assumptions that lead to two terms that
are used in the hypothesis testing.
Null hypothesis: It is regarding the assumption that there is no anomaly pattern or
believing according to the assumption made.
Alternate hypothesis: Contrary to the null hypothesis, it shows that observation is the
result of real effect.
3.5.1 P VALUE
It can also be said as evidence or level of significance for the null hypothesis or in machine
learning algorithms. It’s the significance of the predictors towards the target.Generally, we
select the level of significance by 5 %, but it is also a topic of discussion for some cases. If
you have a strong prior knowledge about your data functionality, you can decide the level of
significance.On the contrary of that if the p-value is less than 0.05 in a machine learning
model against an independent variable, then the variable is considered which means there is
heterogeneous behavior with the target which is useful and can be learned by the machine
learning algorithms.
Suppose there is a shipping container making company which claims that each container is
1000 kg in weight not less, not more. Well, such claims look shady, so we proceed with
gathering data and creating a sample.After gathering a sample of 30 containers, we found that
the average weight of the container is 990 kg and showing a standard deviation of 12.5 kg.
After comparison, we can see that the generated statistics are less than the statistics of the
desired level of significance. So we can reject the claim made.You can calculate the t value
using stats.t.ppf() function of stats class of scipy library
3.5.3 Errors
As hypothesis testing is done on a sample of data rather than the entire population due to the
unavailability of the resources in terms of data. Due to inferences are drawn on sample data
the hypothesis testing can lead to errors, which can be classified into two parts:
Type I Error: In this error, we reject the null hypothesis when it is true.
Type II Error: In this error, we accept the null hypothesis when it is false.
A lot of different approaches are present to hypothesis testing of two models like creating two
models on the features available with us. One model comprises all the features and the other
with one less. So we can test the significance of individual features. However feature inter-
dependency affect such simple methods.
In regression problems, we generally follow the rule of P value, the feature which violates the
significance level are removed, thus iteratively improving the model.Different approaches are
present for each algorithm to test the hypothesis on different features.
Bayes’s theorem is a relationship between the conditional probabilities of two events. For
example, if we want to find the probability of selling ice cream on a hot and sunny day, Bayes’
theorem gives us the tools to use prior knowledge about the likelihood of selling ice cream on
any other type of day (rainy, windy, snowy etc.).
where H and E are events, P(H|E) is the conditional probability that event H occurs given that
event E has already occurred. The probability P(H) in the equation is basically frequency
analysis; given our prior data what is the probability of the event occurring. The P(E|H) in the
equation is called the likelihood and is essentially the probability that the evidence is correct,
given the information from the frequency analysis. P(E) is the probability that the
actual evidence is true.Let H represent the event that we sell ice cream and Ebe the event of
the weather. Then we might ask what is the probability of selling ice cream on any given day
given the type of weather? Mathematically this is written as P(H=ice cream sale | E= type of
weather) which is equivalent to the left hand side of the equation. P(H) on the right hand side
is the expression that is known as the prior because we might already know the marginal
probability of the sale of ice cream. In our example this is P(H = ice cream sale), i.e. the
probability of selling ice cream regardless of the type of weather outside. For example, I could
look at data that said 30 people out of a potential 100 actually bought ice cream at some shop
somewhere. So my P(H = ice cream sale) = 30/100 = 0.3, prior to me knowing anything about
the weather. This is how Bayes’ Theorem allows us to incorporate prior information.
A classic use of Bayes’s theorem is in the interpretation of clinical tests. Suppose that during a
routine medical examination, your doctor informs you that you have tested positive for a rare
disease. You are also aware that there is some uncertainty in the results of these tests.
Assuming we have a Sensitivity (also called the true positive rate) result for 95% of the
patients with the disease, and a Specificity (also called the true negative rate) result for 95% of
the healthy patients.
If we let “+” and “−” denote a positive and negative test result, respectively, then the test
accuracies are the conditional probabilities : P(+|disease) = 0.95, P(-|healthy) = 0.95,In
Bayesian terms, we want to compute the probability of disease given a positive test, P(disease|
+).
Importantly, Bayes’ theorem reveals that in order to compute the conditional probability that
you have the disease given the test was positive, you need to know the “prior” probability you
have the diseaseP(disease), given no information at all. That is, you need to know the overall
incidence of the disease in the population to which you belong. Assuming these tests are
applied to a population where the actual disease is found to be 0.5%, P(disease)= 0.005 which
means P(healthy) = 0.995.So, P(disease|+) = 0.95 * 0.005 /(0.95 * 0.005 + 0.05 * 0.995) =
0.088 In other words, despite the apparent reliability of the test, the probability that you
actually have the disease is still less than 9%. Getting a positive result increases the probability
you have the disease. But it is incorrect to interpret the 95 % test accuracy as the probability
Random forests generally outperform decision trees, but their accuracy is lower than
gradient boosted trees. However, data characteristics can affect their performance.The first
algorithm for random decision forests was created in 1995 by Tin Kam Ho using
the random subspace method, which, in Ho's formulation, is a way to implement the
"stochastic discrimination" approach to classification proposed by Eugene Kleinberg.
An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who
registered "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab, Inc.).
The extension combines Breiman's "bagging" idea and random selection of features,
introduced first by Ho and later independently by Amit and Geman in order to construct a
collection of decision trees with controlled variance. Random forests are frequently used as
"blackbox" models in businesses, as they generate reasonable predictions across a wide
range of data while requiring little configuration.
The dataset consists of accelerometer and gyroscope data captured at 50Hz. The raw sensor
data contain fixed-width sliding windows of 2.56 sec(128 readings/window). The activities
performed by the subject include:
'Walking', 'ClimbingStairs', 'Sitting', 'Standing',and 'Laying'
How to get the data: Execute downloadSensorData and follow the instructions to download
the and extract the data from the source webpage. After the files have been extracted run
saveSensorDataAs-MATFiles. This will create two MAT files: rawSensorData_train and
rawSensorData_test with the raw sensor data.
1. total_acc_(x/y/z)_train : Raw accelerometer sensor data
2. body_gyro_(x/y/z)_train : Raw gyroscope sensor data
3. trainActivity : Training data labels
4. testActivity : Test data labels
Load data from individual files and save as MAT file for reuse
• saveSensorDataAsMATFiles : This function will load the data from the individual source
files and save the data in a single MAT file for easy accesss
if ~exist('rawSensorData_train.mat','file') &&
~exist('rawSensorData_test.mat','file')
saveSensorDataAsMATFiles;
end
rawSensorDataTrain = table(
total_acc_x_train, total_acc_y_train, total_acc_z_train, ...
Lets start with a simple preprocessing technique. Since the raw sensor data contain fixed-
width sliding windows of 2.56sec (128 readings/window) lets start with a simple average
feature for every 128 points
humanActivityData = varfun(@Wmean,rawSensorDataTrain);
humanActivityData.activity = trainActivity;
humanActivityData.activity = trainActivity;
Use the new features to train a model and assess its performance
classificationLearner
load rawSensorData_test
T_pca = varfun(@Wpca1,rawSensorDataTest);
% ClassificationLearner
%
plotActivityResults(trainedmodel1,rawSensorDataTest,humanActivityData.activity,0.5)
chID = 213451;
ReadKey = 'PH7GKO1KG1TEMP9T';
d1 = datetime(2017, 4, 25, 8, 0, 0);
d2 = datetime(2017, 4, 25, 18, 0, 0);
T = thingSpeakRead(chID, 'Fields', 1:8, 'DateRange',[d1,
d2], 'outputFormat', 'TimeTable', 'ReadKey', ReadKey);
head(T)
ans =
8×8 timetable
8×8 timetable
Visualize data
for i = 1 : 8
subplot(4, 2, i);
plot(T.Timestamps, T{:, i});
grid on
xlim(T.Timestamps([1 end]));
title(strrep(T.Properties.VariableNames{i}, '_', ' '));
end
plot(T.Timestamps, temp)
SOURCE CODE:
OUTPUT:
CHAPTER 4
REFLECTION NOTES
An Insight into Artificial Intelligence and Machine Learning with “AGIMUS Technologies
Private Limited” gave an opportunity to learn about basic principles of AI and ML, and the
areas in which it can be used. I also got an opportunity to work on some of the mini projects
provided by “AGIMUS Technologies Private Limited”, which enhanced my understanding of
the intelligence demonstrated by machines with the help of algorithms. I was also made to
work on different platforms like Matlab, Colaboratory and ThingSpeak etc, which helped me
to gain knowledge and experience of working on different platforms. I hope the experience
gained at “AGIMUS Technologies Private Limited”, will help me in the future to work on
different projects and to produce better results.
learnt about many applications in which MATLAB, IOT AND PYTHON Coding can
be used.
AI project using python.
Learnt about the usage of google colaboratory, thingspeak etc
REFERENCES:
• F. Galton. Regression towards mediocrity in hereditary stature. Journal of
theAnthropological Institute of Great Britain and Ireland, pages 246–263, 1886.
• W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent innervous activity.
The bulletin of mathematical biophysics, 5(4):115–133,1943.
• F. Rosenblatt.The perceptron, a perceiving and recognizing automaton Project ParaCornell
Aeronautical Laboratory, 1957.
• B. Widrow. Adaptive ”Adaline” neuron using chemical ”memistors”. Number Technical
Report 1553-2. Stanford Electron. Labs., Stanford, CA, October 1960.
• D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. Evolutionary
Computation, IEEE Transactions on, 1(1):67–82, 1997.
• D. H. Wolpert.The supervised learning no-free-lunch theorems. In Soft Computing and
Industry, pages 25–42. Springer, 2002.
• S. Menard.Logistic regression:From introductory to advanced concepts and applications.
Sage Publica- tions, 2009.
• V. Vapnik. The nature of statistical learning theory. Springer Science and Business
Media, 2013.
• C. J. Burges. A tutorial on support vector machines for pattern recognition.Data
mining and knowledge discovery, 2(2):121–167, 1998.
• J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in
logarithmic expected time. ACM Transactions on Mathematical Software (TOMS),
3(3):209–226, 1977.