Week 5 Response

QUESTION 1
The question of whether AI will be close to the level of intelligence of humans is a complex
and fascinating one. Over the years, AI has made significant strides in various areas, but
achieving human-level intelligence remains a challenge. In the early days of AI, there were
ambitious expectations, exemplified by the sci-fi promises of robots with human-like
capabilities, such as house cleaning robots. However, progress in AI has been more gradual
and nuanced.
Historically, AI achieved notable milestones, like cracking the Nazi message encryption
during World War II and beating humans at checkers in the 1950s. However, the initial
optimism about AI replicating human-like understanding of language and complex tasks did
not fully materialize in the 1950s to 1980s. Fast forward to today, AI has made remarkable
advancements. For instance, we now have Google Translate, which is a significant milestone
in language translation. Speech recognition has evolved from human-recorded audio to more
sophisticated search-by-voice technologies.
AI has demonstrated its prowess in various domains, beating experts at complex games like
Go and even creating art and performing classification tasks like the Sorting Hat in Harry
Potter. These feats are made possible by artificial neural networks, which are modeled after
the functioning of the brain's neurons and their connections. The heart of AI's "intelligence"
lies in the ability of artificial neural networks to learn from examples through machine
learning. The process involves adjusting sliders or parameters to find the perfect weight
values, which are crucial for making accurate predictions. This learning algorithm has proven
effective in various tasks, such as image recognition and classification, demonstrated by
projects like ImageNet and Google Brain.
However, despite these impressive accomplishments, AI still faces challenges in achieving

human-level intelligence. The brain's ability to perform one-shot learning (learning from a
single training example) is a stark contrast to machine learning's reliance on vast amounts of
data. Additionally, tasks requiring human intuition, creativity, and a deep understanding of
context remain difficult for AI. The future of AI is uncertain, and while there is exponential
growth in the field, surpassing the average human intelligence, let alone the intelligence of
someone like Einstein, remains an open question.
QUESTION 2
Answer 1.1
With a linear activation function and linear inputs (X1 and X2), an Exclusive OR (XOR)
model cannot be effectively fit using a single hidden layer with a single neuron. The reason
for this limitation lies in the nature of the XOR problem. The XOR problem is a classic
example of a non-linearly separable problem. It means that the classes cannot be separated by
a straight line or a linear decision boundary.
As you can see, XOR returns 1 when either X1 or X2 is 1 (but not both). If we try to plot
these data points on a graph, we will find that they cannot be separated by a single straight
line. A single-layer perceptron with a linear activation function can only learn linearly
separable patterns. In this case, it can only learn problems where the data points can be
separated by a single line. However, XOR requires a non-linear decision boundary to be
accurately modeled.
To effectively fit an XOR model, you need at least one hidden layer with non-linear
activation functions. Popular activation functions like sigmoid, tanh, or ReLU introduce non-
linearity into the neural network, allowing it to learn complex patterns and non-linear
decision boundaries. A neural network with a single hidden layer containing multiple neurons
and a non-linear activation function (e.g., sigmoid) can accurately learn the XOR function.
Answer 1.2
Yes, increasing the number of neurons in the hidden layer from 1 to 2 or 3 and changing the
activation function to a non-linear function like ReLU or Sigmoid can significantly improve
the model's ability to learn and effectively model the XOR function.
1. Increasing the number of neurons in the hidden layer: By adding more neurons to the
hidden layer, the model becomes more expressive and can learn more complex patterns. In
the case of XOR, a single neuron in the hidden layer might not be sufficient to capture the
non-linear decision boundary required to separate the classes. With two or more neurons, the
model gains the ability to learn more sophisticated relationships between the inputs.
2. Using a non-linear activation function: A linear activation function lacks the ability to
introduce non-linearity into the neural network, which is crucial for learning non-linearly
separable problems like XOR. Changing the activation function to a non-linear one, such as
ReLU or Sigmoid, allows the neural network to learn complex mappings between inputs and
outputs.
Answer 1.3
When we continue to add/remove hidden layers, change the number of neurons per hidden
layer, or experiment with different sets of activation functions, the model's quality can vary
significantly from run to run. The performance of a neural network is affected by several
factors, and these changes can introduce variations in the training process and model
behavior. Some key reasons for the variability in model quality are as follows:
1. Random Initialization: Neural networks are initialized with random weights, and this
random initialization can lead to different starting points for training in each run. Since neural
networks are trained using iterative optimization algorithms like gradient descent, different
initial weights can lead to different convergence points and different solutions.
2. Data Splitting: During training, the data is typically split into training and validation sets.
The split itself is random, and different runs can have different subsets of data in the training
and validation sets. This can lead to variations in how well the model generalizes to unseen
data.
3. Stochastic Nature of Optimization: Training a neural network involves iteratively adjusting

the weights to minimize the loss function. The optimization process can be stochastic in
nature, meaning that it can take slightly different paths in different runs, leading to different
local minima.
4. Overfitting: If the model capacity (number of hidden layers and neurons) is too high or the
training duration is excessive, the model might overfit to the training data, performing well
on the training set but poorly on unseen data. The extent of overfitting can vary from run to
run, depending on the specific hyperparameters and data.
5. Hyperparameter Choices: The performance of the model is sensitive to hyperparameters

like learning rate, batch size, regularization, and more. Different hyperparameter choices can
result in different model qualities.
6. Data Variability: The input data might contain inherent variations or noise, leading to
differences in performance for different model runs.
Answer 1.4
Given the constraint of linear inputs (X1 and X2) and using the initial configuration (Fig 1),
we need to modify the activation function, the number of neurons, and the number of hidden
layers to achieve a test loss of 0.18 or lower.
Intuition behind the chosen architecture: Since we are dealing with a non-linearly separable
problem like XOR, we need to introduce non-linearity into the model. To do that, we can use
a combination of hidden layers and non-linear activation functions. One common choice of
activation function for binary classification problems is the sigmoid function, which squashes
the output between 0 and 1. Using a single hidden layer with multiple neurons and the
sigmoid activation function can help the model learn the XOR function effectively.
Chosen Architecture:
1. Number of Hidden Layers: 1
2. Number of Neurons in the Hidden Layer: 2
3. Activation Function: Sigmoid
4. Epoch: 000,294
Answer 1.5
Implications of Different Model Sizes on Convergence Time and Model Complexity:
1. Convergence Time: Larger model sizes often lead to longer convergence times during
training. This is because larger models have more parameters to optimize, and they may
require more iterations or epochs to find the optimal parameter values. Smaller models with
fewer parameters tend to converge faster as they have less complex optimization landscapes.
2. Model Complexity: Model size is directly related to model complexity. Larger models with
more neurons and layers can represent more complex functions and capture intricate patterns
in the data. However, this increased complexity can also make the model more prone to
overfitting, especially on smaller datasets.
Disadvantages of Increasing Model Size for Simpler Data Distributions:
1. Overfitting: For simpler data distributions, larger models can easily memorize the training
data and may fail to generalize well to unseen data. This results in overfitting, where the
model performs well on the training set but poorly on new data.
2. Computation and Memory Requirements: Larger models require more computational
power and memory to train and run predictions. This can be a significant disadvantage,
especially in resource-constrained environments.
Difference Between Two Hidden Layers with 3 and 2 Neurons vs. Single Hidden Layer
with 6 Neurons:
1. Two Hidden Layers (3 and 2 Neurons): This configuration allows the model to learn
hierarchical representations of the data. The first hidden layer with 3 neurons captures lower-
level features, and the second hidden layer with 2 neurons combines these features to learn
higher-level representations. This hierarchical approach can be beneficial for capturing
complex patterns in the data.
2. Single Hidden Layer (6 Neurons): A single hidden layer with 6 neurons can also learn
complex patterns but does so without hierarchical representations. This architecture might be
sufficient for some simpler problems but could be limited in capturing intricate structures.
Practicality of Having More Neurons in Initial Layers: Having more neurons in the initial
layers allows the model to learn low-level and mid-level features effectively. These initial
layers function as feature extractors, and having more neurons allows the model to capture a
wider range of patterns and variations in the data. These learned features can then be
combined in subsequent layers to learn more complex representations.
Disadvantages of Very Small or Very Large Learning Rates:
1. Very Small Learning Rate: A very small learning rate can lead to slow convergence during
training. The model takes tiny steps in the parameter space, which can significantly increase
the number of epochs needed to reach the optimal solution. It may also get stuck in local
minima and struggle to escape these regions due to the slow learning rate.
2. Very Large Learning Rate: On the other hand, a very large learning rate can cause
instability during training. The model might overshoot the optimal parameters, leading to
oscillations or even divergence, where the loss increases instead of decreasing. It can also
lead to missing the optimal solution altogether.
QUESTION 3
One example of a potential source of concern for algorithmic bias that we encounter every
day is in the context of online advertising algorithms. These algorithms are designed to
display personalized ads to users based on their browsing history, online behavior, and other
data. However, they can inadvertently perpetuate different types of bias as presented by
Datta.
1. Algorithmic Prejudice: Algorithmic prejudice occurs when the AI algorithms discriminate

against certain groups based on their characteristics, such as race, gender, age, or ethnicity. In
the case of online advertising, the algorithm may show ads that reinforce stereotypes or
unfairly target specific demographics or indulged in predatory pricing. For example, if the
algorithm is biased towards showing ads for high-paying job opportunities only to certain
racial or gender groups, it can reinforce existing societal disparities.
2. Negative Legacy: Negative legacy bias refers to the perpetuation of historical

discrimination in the present through the use of biased historical data. Online advertising
algorithms heavily rely on historical user data to make predictions about user preferences. If
the historical data contains biased information from the past, the algorithm might continue to
promote products or services that were disproportionately targeted towards certain groups,
amplifying the negative impact of historical discrimination.
3. Underestimation: Underestimation bias occurs when AI algorithms fail to recognize the

needs or preferences of certain user groups, leading to the underrepresentation of those
groups in the content they receive. In the context of online advertising, if certain minority
groups are consistently underestimated or overlooked by the algorithm, they may not receive
advertisements that cater to their interests, limiting their access to relevant products and
services.

Week 5 Response

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 5 Response

Uploaded by

Copyright:

Available Formats

QUESTION 1

However, despite these impressive accomplishments, AI still faces challenges in achieving

3. Stochastic Nature of Optimization: Training a neural network involves iteratively adjusting

5. Hyperparameter Choices: The performance of the model is sensitive to hyperparameters

1. Number of Hidden Layers: 1

2. Number of Neurons in the Hidden Layer: 2

3. Activation Function: Sigmoid

Implications of Different Model Sizes on Convergence Time and Model Complexity:

Disadvantages of Increasing Model Size for Simpler Data Distributions:

Disadvantages of Very Small or Very Large Learning Rates:

1. Algorithmic Prejudice: Algorithmic prejudice occurs when the AI algorithms discriminate

2. Negative Legacy: Negative legacy bias refers to the perpetuation of historical

3. Underestimation: Underestimation bias occurs when AI algorithms fail to recognize the

You might also like