0% found this document useful (0 votes)
1K views679 pages

DevOps Interview Preparation Guide V1

Uploaded by

anilrajops
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views679 pages

DevOps Interview Preparation Guide V1

Uploaded by

anilrajops
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DEVOPS

INTERVIEW
PREPARATION
GUIDE
By DevOps Architect
• Remember, the goal of this course is not to teach you specific
answers to every possible question but to provide you with the
understanding, strategies, and confidence you need to excel in
DevOps interviews. Good luck with your journey into the exciting
world of DevOps!

All The Best


• Name and Role: Jai M, DevOps Architect
• Experience:
• Total years in IT: "15 years"
• Specialization: "Specializing in DevOps for 7+ years"
• Experience in training: "Provided training for 10 years"

About Me
• DevOps
– Linux and Bash Scripting
– Git and GitHub
– GitLab CI CD
– Azure Cloud
– Terraform
– Ansible
– Docker
– Kubernetes

Pre Requisite
• Linux Interview Questions with Realtime Scenarios
• Bash Scripting Interview Questions with Realtime Scenarios
• Git and Git Hub Interview Questions with Realtime Scenarios
• GitLab CICD Interview Questions with Realtime Scenarios
• Azure Interview Questions with Realtime Scenarios
• Terraform Interview Questions with Realtime Scenarios
• Ansible Interview Questions with Realtime Scenarios
• Docker Interview Questions with Realtime Scenarios
• Kubernetes Interview Questions with Realtime Scenarios

Content
• Scenario: You're unable to SSH into a Linux server. What steps would you take
to troubleshoot this?
– Answer: You could confirm the SSH service is running on the server, the server is up and
accessible over the network, the SSH port is not blocked by a firewall, and that the SSH
configuration file is properly set up. You could also look at the server's logs for any SSH
related errors.

Check if the server is running and network accessible:


You can use the ping command to see if the server is reachable.
Replace your_server_ip with the IP address of your server.
ping your_server_ip

If the server is up and network accessible, you should see replies with the time it took to get
the reply. If you see something like "Request timed out" or "Destination Host Unreachable",
there could be a network issue.

Linux Interview Questions


Check if SSH service is running:
If you have access to the server's console, you can check if the SSH service is
running with:
sudo systemctl status ssh

If SSH is not running, you can start it with:


sudo systemctl start ssh

Then, enable it to start on boot with:


sudo systemctl enable ssh

Linux Interview Questions


Check if the SSH port is open:
The default port for SSH is 22. You can use telnet to check if it's open. Again, replace
your_server_ip with your server's IP address.
telnet your_server_ip 22
If the port is open, you should see a message similar to "SSH-2.0-OpenSSH...". If the
port is closed, the command will hang or say "Unable to connect to remote host".

Check firewall rules:


If you're using ufw as your firewall, you can check its status with:
sudo ufw status
This command will show you the current rules and you can see if SSH (or port 22) is
allowed. If not, you can allow it with:
sudo ufw allow ssh

Linux Interview Questions


Check SSH configuration:
The main configuration file for SSH is located at /etc/ssh/sshd_config. You can inspect it with:
sudo less /etc/ssh/sshd_config

Ensure settings such as "Port", "PermitRootLogin", "PasswordAuthentication" etc. are correctly


configured as per your requirements. After making changes, restart the SSH service:
sudo systemctl restart ssh

Check logs for errors:


SSH logs can usually be found in /var/log/auth.log. You can check the latest entries in this file
with:
sudo tail -n 50 /var/log/auth.log
This might provide some clues if the issue is due to something like failed authentication.

Linux Interview Questions


Scenario: Your Linux server is running slower than usual. What steps can you
take to identify and resolve the issue?
Answer: Start by examining CPU usage with a tool like top, htop, or atop. Check
I/O operations and memory usage. Look for swapping, which could
significantly reduce performance. Check system logs in /var/log/ to identify any
software or hardware errors.

Scenario: An application is unable to write to a file on a Linux server. How


would you diagnose the problem?
Answer: Check the permissions on the file to ensure the application has write
access. You might also need to check if the file is locked by another process or
if the filesystem is read-only due to disk issues.

Linux Interview Questions


Scenario: You have been tasked with automating a application checking task
that needs to be run at a specific time every day. How would you achieve this
on a Linux server?
Answer: This can be done using the cron daemon in Linux. By editing the
crontab file, you can specify the exact time for the script to be executed.
* * * * * /path/to/command arg1 arg2
Where the *'s represent (in order): minute (0 - 59), hour (0 - 23), day of month
(1 - 31), month (1 - 12), and day of week (0 - 7 where both 0 and 7 are Sunday)

0 0 * * * /usr/local/bin/app_backup.sh -- everyday midnight 12AM.

Linux Interview Questions


Scenario: A Linux server is out of disk space. What steps do you take to rectify
this situation?
Answer: Use tools like df and du to determine which directories are consuming
the most space. Clean up unneeded files. Check if there are old logs or core
dumps that can be removed. Otherwise Raise an incident to the Linux team to
add the diskspace.

Scenario: Your Linux server has rebooted unexpectedly. Where would you look
to find out why this happened?
Answer: Check the /var/log/messages and /var/log/syslog for any errors before
the reboot time. Another useful command is last reboot which shows the
reboot history.

Linux Interview Questions


Scenario: How would you change the ownership of a file on a Linux system?
Answer: The chown command can be used to change the owner of a file. The
command chown user:group filename would change the owner and group of a file.

Scenario: How would you check for open ports on a Linux server?
Answer: The netstat command or ss command can be used to check the open ports.
The lsof -i command is another option.

Scenario: A process on your Linux server is using too much CPU. How would you
manage this?
Answer: You could use the top or htop command to identify high CPU utilization
processes.
Linux Admin Job: The nice and renice commands can be used to de-prioritize the CPU
usage of the process.

Linux Interview Questions


Scenario: How would you set up a basic firewall on a Linux server?
Answer: You can use iptables or ufw (Uncomplicated Firewall) to set up a basic
firewall. These tools allow you to define rules for incoming and outgoing
traffic.

ufw (Uncomplicated Firewall) is primarily used on Debian-based distributions,


such as Ubuntu. On Red Hat-based distributions like Red Hat Enterprise Linux
(RHEL) and CentOS, the default firewall utility is usually firewalld.
sudo systemctl status firewalld

To check the currently active firewalld rules on your Red Hat-based system, you
can run the following command:
sudo firewall-cmd --list-all

Linux Interview Questions


Scenario: How would you find a file named "example.txt" located somewhere on the
server?
Answer: The find command can be used for this. For example: find / -name
example.txt.

Scenario: How would you display the total amount of free and used physical and swap
memory in the system, as well as the buffers used by the kernel?
Answer: You can use the free command.

Scenario: You need to verify the integrity of a downloaded file on your Linux system.
How would you do it?
Answer: You can use checksum tools such as md5sum or sha256sum to generate a
checksum for the downloaded file and compare it with the provided checksum.

Linux Interview Questions


how you can verify the integrity of a downloaded file using the md5sum command:

Download the file and obtain the checksum:


Let's assume you have downloaded a file called example.tar.gz and received an accompanying MD5 checksum
file called example.tar.gz.md5 that contains the expected checksum.

Generate the checksum of the downloaded file:

Open a terminal and navigate to the directory where the downloaded file is located. Run the following
command to generate the checksum of the downloaded file:
md5sum example.tar.gz

This command calculates the MD5 hash of the file and displays the checksum value.
Compare the calculated checksum with the provided checksum:
Open the checksum file (example.tar.gz.md5) using a text editor or use the cat command to display its contents.
It should contain the expected checksum value.
Compare the calculated checksum from above Step with the provided checksum. If they match, the
downloaded file is intact and hasn't been modified during the transfer. If they don't match, it indicates that the
file has been corrupted or tampered with.

Linux Interview Questions


Scenario: How would you determine the Linux distribution and version you are
running?
Answer: Use commands like lsb_release -a, cat /etc/os-release, or uname -a.

Scenario: How do you change the hostname on a Linux server?


Answer: Use the hostnamectl command or manually edit the /etc/hostname
file.

Scenario: How would you find which processes are using a specific file?
Answer: Use the lsof command.

Linux Interview Questions


Scenario: How would you check disk I/O on a Linux system?
Answer: Use tools such as iostat, iotop, or vmstat.

Scenario: How would you see the routing table on a Linux server?
Answer: Use the route or ip route command.

Scenario: How would you change the default shell for a user?
Answer: Use the chsh command.

Scenario: How would you check the status of a service and enable it at system
startup?
Answer: Use the systemctl command to check the status and enable services.

Linux Interview Questions


Scenario: How would you check the existing crontabs for all the users on a
Linux system?
Answer: As root, you can check each user's crontab using crontab -u username
-l.

Scenario: How would you check system load average and active processes?
Answer: Use the top or uptime command.

Scenario: How would you change the permissions of a file/directory?


Answer: Use the chmod command.

Linux Interview Questions


Scenario: How would you kill a process that is not responding?
Answer: Use the kill or killall command.

Scenario: How would you check the available space in a filesystem?


Answer: Use the df command.

Scenario: How would you check which ports are listening on a Linux server?
Answer: Use the netstat -l or ss -l command.

Scenario: How would you check all the installed packages on a Linux server?
Answer: Depending on your package manager, you can use dpkg -l, rpm -qa, or
yum list installed.

Linux Interview Questions


Scenario: How would you check the logs for a specific service?
Answer: Use journalctl -u servicename if your system uses systemd.
Otherwise, check the log

Scenario: How would you find out which process is using a specific port?
Answer: Use the netstat -tulpn or lsof -i :portnumber command.

Scenario: How would you check the kernel version on a Linux server?
Answer: Use the uname -r command.s in /var/log/.

Linux Interview Questions


ADVANCED LINUX AND DEVOPS ENGINEERS
REALTIME SCENARIOS
Scenario: The system load of your Linux server has significantly spiked and
remained high for the last 30 minutes, even though there hasn't been a
substantial change in usage. What steps will you take to find out the cause and
how would you communicate this to the users?
Answer: You'd first need to identify the cause by checking system resources
(CPU, memory, disk I/O, network) using commands such as top, vmstat, iostat,
etc. If a particular process is consuming too much resource, you may consider
killing/restarting it. For persistent issues, consider escalating it to the relevant
team. Communicate with users via appropriate channels (emails, ticket
updates, etc.), detailing the issue and estimated resolution time following
Incident Management in ITIL.

Linux Interview Questions


Linux Admin Scenario:

Scenario: You need to upgrade the Linux kernel version on all your
organization's servers. What's your plan to do this and how do you ensure
minimal disruption to services?
Answer: Kernel upgrades, being risky, should first be tested in a non-
production environment. Create a rollback plan in case something goes wrong.
Schedule the upgrade during off-peak hours or maintenance window to
minimize disruption. Communicate the plan and potential risks to all
stakeholders, following ITIL Change Management.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: You received an alert that a partition on one of the servers is running
out of space. How would you handle this?
Answer: You can clear out temporary or unwanted files, compress files, or
move data to a less-used partition. If none of this is possible, consider adding
more disk space. This is a reactive approach and follows the Incident
Management process of ITIL.

Linux Interview Questions


Linux Admin Scenario:

Scenario: How would you prevent the previous scenario from recurring in the
future?
Answer: Set up monitoring and alerts for disk usage to catch the issue early.
Consider implementing automated log rotation and cleanup scripts. If the
partition consistently fills up, it may need to be resized or the service may
need to be moved to a server with more disk space. This proactive approach is
part of the Problem Management and Continual Service Improvement
processes of ITIL.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: A critical security vulnerability has been discovered in a software


package that's widely used across your Linux servers. What steps would you
take?
Answer: If there's an update or patch available, plan a rollout across all servers.
Test the update in a non-production environment first, create a rollback plan,
and schedule the update for a time that minimally impacts users. Follow the
Change Management process as defined by ITIL.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: A server frequently becomes unresponsive and needs to be manually


rebooted. How do you handle this situation?
Answer: Collect and analyze log files, check system resources, and perform a
root cause analysis to identify the problem. You might need to escalate the
issue to the relevant team if it's not within your area of expertise. This
situation falls under ITIL Problem Management.

Linux Interview Questions


Linux Admin Scenario:

Scenario: A new application will be deployed on your Linux servers. What ITIL
processes are involved in this scenario?
Answer: The Service Design process will ensure the server architecture meets
the application's needs. The Change Management process will plan and
manage the deployment of the new application. The Release and Deployment
Management process will handle the physical act of deploying the application.

Linux Interview Questions


Linux Admin Scenario:

Scenario: How do you ensure that the services provided by your Linux servers
meet the users' needs and expectations?
Answer: Use monitoring tools to keep track of system performance and
availability. Regularly review system logs to detect potential problems. Actively
communicate with users to get feedback and make necessary adjustments.
These activities are part of the Service Level Management and Continual
Service Improvement processes of ITIL.

Linux Interview Questions


Linux Admin Scenario:

Scenario: You're planning a major upgrade on your Linux servers that will likely
cause extended downtime. What steps do you take?
Answer: Plan the upgrade following the ITIL Change Management process.
Communicate with all stakeholders well in advance about the downtime and
its impact. Schedule the upgrade for a maintenance window or low-usage
time.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: You are noticing performance degradation on your Linux servers.


What steps would you take to identify and resolve the issue?
Answer: Monitor system resources (like CPU, memory, disk I/O, network) to
identify any bottlenecks. Use tools like top, vmstat, iostat, netstat etc. If a
specific service is causing the issue, you might need to tune its configuration or
allocate more resources to it. If the degradation persists, consider
implementing more robust solutions such as load balancing or clustering.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: One of your servers has been compromised and is being used for
sending spam emails. What steps do you take to handle this issue?
Answer: First, isolate the compromised server to prevent further damage.
Analyze logs and system files to identify how the server was compromised.
Remove the malicious software and patch the vulnerability. In terms of ITIL,
this would fall under Incident Management. Following the incident, conduct a
root cause analysis as part of Problem Management and make necessary
changes to prevent future incidents.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: You are implementing a new backup strategy for your Linux servers.
What considerations should you take into account and how does this relate to
ITIL processes?
Answer: When designing a backup strategy, consider Recovery Point Objective
(RPO) and Recovery Time Objective (RTO). Test your backups and recovery
procedures regularly to ensure they are effective. This is part of ITIL's Service
Design and Continual Service Improvement.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: A Linux server is experiencing high load average and users are
reporting slow response. How would you identify the issue and restore
service?
Answer: Use performance monitoring tools like top, vmstat, iostat, and netstat
to identify bottlenecks. It could be due to a rogue process, insufficient system
resources, or even a hardware issue. Once the issue is identified, take
necessary action to restore normal service. Communicate updates to users.
This falls under ITIL's Incident Management process.

Linux Interview Questions


Linux Admin Scenario:

Scenario: You are tasked with decommissioning a Linux server. What steps do
you need to follow to ensure a smooth transition?
Answer: Identify the services running on the server and plan their migration to
another server. Inform users and stakeholders about the change and schedule
it for a suitable time. Test the services on the new server before
decommissioning the old one. Follow ITIL's Change Management process.

Linux Interview Questions


Linux Admin Scenario:

Scenario: Your Linux server needs to be updated with a critical security patch,
but the patch requires a system reboot which will disrupt ongoing operations.
How do you handle this situation?
Answer: Inform the stakeholders about the critical nature of the patch and the
need for a reboot. Schedule the patching and reboot during a maintenance
window or a period of low activity to minimize impact. Also, have a rollback
plan in case the patch causes issues. This is part of ITIL's Change Management
process.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: You need to maintain system performance while also ensuring high
availability of services in your Linux servers. How would you achieve this?
Answer: Implement load balancing to distribute workload among multiple
servers. Consider clustering for high availability. Regular performance
monitoring and capacity planning are essential to identify potential issues
before they become serious. This is part of ITIL's Service Design and Capacity
Management processes.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: Users are reporting that an application hosted on one of your Linux
servers is not accessible. What steps would you take to restore the service?
Answer: Check the status of the application and the server. The issue could
range from network problems to the application crashing. Resolve the issue
based on your findings. Keep the users informed about the progress. This is
part of ITIL's Incident Management process.

Linux Interview Questions


Linux and DevOps Engineers Scenario:

Scenario: You notice that a process on a Linux server is using most of the CPU
resources. What steps do you take?
Answer: If the process is not critical, it can be killed or its priority can be
changed using the nice or renice command. If the process is critical, you may
need to consider assigning more resources or optimizing the process. This falls
under ITIL's Event and Incident Management processes.

Linux Interview Questions


MIGRATION
• There are different migration procedures that are commonly used:

1. Lift and Shift: This migration procedure involves moving an application or workload from one environment to another with
minimal or no changes to the underlying architecture. It typically involves replicating the application or workload as-is on the
new environment, such as migrating virtual machines from on-premises to the cloud.

2. Rehosting: Rehosting, also known as "lift and shift," involves moving an application or workload to a different infrastructure
without making any significant modifications. For example, migrating a physical server to a virtual machine or moving a
workload from one cloud provider to another without making changes to the application's architecture.

3. Replatforming: Replatforming involves making some modifications to the application or workload during migration to optimize
it for the new environment. It may involve migrating an application from a self-managed infrastructure to a managed service,
such as moving a database from a self-hosted server to a database as a service (DBaaS) offering.

4. Refactoring: Refactoring involves making significant modifications to the application or workload during migration to leverage
the capabilities of the target environment fully. This may include rewriting parts of the application, using new cloud-native
services, or adopting a microservices architecture.

5. Repurchasing: Repurchasing involves migrating to a different commercial off-the-shelf (COTS) product or software-as-a-service
(SaaS) offering to replace an existing application or workload. This approach is often chosen when the existing solution is
outdated or does not meet the organization's needs.

6. Retiring: Retiring involves decommissioning or phasing out an application or workload that is no longer needed. This procedure
may involve archiving data, transferring users to alternative solutions, and shutting down the infrastructure associated with the
application.

7. Hybrid Migration: In a hybrid migration, organizations choose to migrate certain components of an application or workload
while keeping others in the existing environment. This allows for a gradual transition and helps mitigate risks associated with
large-scale migrations. Hybrid migrations are often used when organizations want to take advantage of the benefits of the cloud
while maintaining some on-premises infrastructure.

MIGRATION PROCEDURES
• overview of how a lift-and-shift migration might work for an
eLearning platform that currently operates in a three-tier
architecture (Web, Application, and Database) in an on-premise
data center and wants to migrate to a cloud environment.

Linux Interview Questions


1. Cloud Vendor Evaluation:
To decide which cloud provider would be best, consider several
factors: Cost, Services offered, Security and compliance measures,
Data sovereignty regulations, Performance, Service-level agreements
(SLAs), Customer support, and Vendor lock-in.
For the sake of this scenario, let's assume AWS is chosen due to its
extensive global infrastructure, comprehensive service offerings,
flexible pricing options, strong security and compliance measures,
and extensive documentation and community support.

Linux Interview Questions


2. Preparation:
Start with an assessment of the current infrastructure. Identify all the
components of the eLearning platform, including servers, databases,
and storage, along with their configurations and interdependencies.
Plan the migration strategy for each component.

Linux Interview Questions


3. AWS Environment Setup:

1. Setup VPC:

First, you'll need to set up a Virtual Private Cloud (VPC) for each of the environments (Development, Pre-Production, Production).
1. Navigate to VPC Dashboard in AWS console.
2. Create a new VPC with a unique CIDR block.
3. For each VPC, create public and private subnets in multiple Availability Zones (AZs) to ensure high availability and fault tolerance.

2.Security Groups and Network ACLs:

You will need to set up Security Groups (SGs) and Network Access Control Lists (NACLs) for controlling inbound and outbound
traffic.
1. For each subnet, create SGs and NACLs according to the principle of least privilege.
2. SGs could be configured to allow HTTP/HTTPS access from anywhere for web servers, and database access only from the application
server's SG for database servers.

3. EC2 Instances:

Use AWS's Server Migration Service (SMS) to create AMIs of your on-premise servers. Then, for each environment:
1. Launch EC2 instances from these AMIs in the corresponding private subnet.
2. Select an appropriate instance type based on your server's CPU, memory, and network requirements.
3. Attach Elastic IP to the NAT gateway to allow instances in the private subnet to access the internet.

Linux Interview Questions


4. Databases:
Use AWS Database Migration Service (DMS) to migrate your on-premise databases to the cloud.
1. For each environment, create a Relational Database Service (RDS) instance in the private subnet.
2. Choose a multi-AZ deployment for high availability.
3. Set up automated backups for disaster recovery.

5. Load Balancers and Auto Scaling:


Configure Elastic Load Balancers (ELB) and Auto Scaling Groups (ASG) for your EC2 instances.
1. For each tier in your application, create an ELB in the public subnet and register the corresponding EC2 instances with it.
2. Create an ASG for each tier, specify the minimum, maximum, and desired number of instances.
3. Attach the ASG to the corresponding ELB.

6. DNS and CDN:


1. Set up Route 53 for DNS management. Create a hosted zone and record sets to route traffic to your ELBs.
2. Set up a CloudFront distribution with your Route 53 domain name as the origin. Enable geo-location routing.

Linux Interview Questions


7. Security and IAM:
1. Create IAM users for your developers and administrators.
2. Assign them to groups with policies that provide the necessary permissions.
3. Enable MFA for additional security.
4. Use AWS WAF to protect your application from web attacks and Shield for
DDoS protection.
8. Monitoring:
1. Enable CloudWatch on all resources for monitoring and alarming. Consider
using CloudTrail for auditing.
2. Set up SNS topics and subscriptions to notify administrators of any alarms
or important events.

Linux Interview Questions


4. Migration:
1. Use AWS's Server Migration Service (SMS) to create AMIs of your on-premise servers. Launch
EC2 instances from these AMIs in the corresponding environment (Dev, Pre-Prod, Prod) in AWS.
2. Use AWS Database Migration Service (DMS) to migrate your databases to RDS instances in
AWS.
3. For data migration, consider using AWS Snowball or Direct Connect if the data size is very large.
5.Post-Migration Setup:
1. Configure Elastic Load Balancers and Auto Scaling Groups for your EC2 instances.
2. Set up Route 53 for DNS management.
3. Implement CloudFront with geolocation routing policy to deliver content to users from the
closest geographical location to reduce latency.

Linux Interview Questions


6.Security and Access Control:
1. Use AWS IAM to create users and roles for developers and administrators.
Grant permissions according to the principle of least privilege.
2. Use AWS WAF and Shield for web application firewall and DDoS protection.

3. Consider using AWS Macie for data loss prevention and AWS GuardDuty for
threat detection.
7.Monitoring and Optimization:
1. Use CloudWatch for monitoring and setting up alarms. AWS Trusted Advisor can
give recommendations for cost optimization, performance, security, and fault
tolerance.

Linux Interview Questions


1. Downtime: There might be some downtime during the cutover phase. It needs
to be planned and communicated to the users in advance.
2. Data
Sync: If the data is changing continuously during the migration, you need
to keep the on-premise and cloud environments in sync.
3. Application Compatibility: Some applications might have issues running in the
cloud environment. These need to be identified during the assessment phase.
4. Security Concerns: Data breaches can occur if security is not properly
configured in the cloud environment. Ensure that all security measures are in
place and tested.
5. Cost Overruns: Without proper monitoring, cloud costs can increase
unexpectedly. Regularly monitor and optimize costs.

ISSUES DURING MIGRATION


1. Smoke Testing: Immediately after the migration, conduct a series of basic tests to confirm the application
is functioning correctly in the new environment. Check for things like successful login, accurate data
retrieval, and proper UI rendering.
2. Functional Testing: Conduct a deeper level of testing to verify that all features of the application are
working as expected. This should involve a comprehensive range of scenarios that cover all the
functionalities of the eLearning platform.
3. Performance Testing: Test the performance of the application under normal and peak loads to ensure that
the AWS infrastructure can handle the load. Tools like Apache JMeter or Gatling can be used for this.
4. Security Testing: Verify that all security measures are functioning correctly. This might involve penetration
testing, vulnerability scanning, and auditing of IAM roles and policies.
5. Disaster Recovery Testing: Check if the disaster recovery processes work as expected. For instance, try
recovering from an RDS snapshot to see if the process works correctly and how long it takes.
6. User Acceptance Testing (UAT): Have a group of end-users use the system in the new environment to
ensure the system works as expected from an end-user perspective.

TESTING PLAN
1. Snapshot and Backup: Before beginning the migration, ensure that you have recent backups of all servers
and databases. For AWS, create snapshots of EBS volumes and backups of RDS instances.
2. Maintain Infrastructure: Do not immediately decommission your on-premises infrastructure after the
migration. Maintain it for a period of time to ensure that if something goes wrong, you can redirect traffic
back to the on-premise environment.
3. DNS TTL: Keep the DNS Time-To-Live (TTL) low for the duration of the migration. This will allow you to
quickly redirect traffic back to the on-premise servers if necessary.
4. Data Sync: If data is continuously changing during the migration, you need a plan for syncing or merging
changes made in the AWS environment back to the on-premise servers.
5. Documentation: Document the exact steps required to revert back to the on-premise environment. This
should be as detailed as possible to avoid confusion during a stressful situation.
6. Testing: Just like you test the migration, also test the rollback plan to ensure it works as expected.

ROLLBACK PLAN
Detailed comparison of different migration models (Rehosting, Replatforming,
Refactoring, Re-architecting, and Retiring) along with examples, challenges,
recommended cloud, and the applicability of each model for cloud-to-cloud
and on-premise-to-cloud migrations.

Linux Interview Questions


1.Rehosting (lift and shift):
1. Description: Move applications as-is from the existing infrastructure to the
cloud without making any significant changes.
2. Example: Migrating virtual machines or containers from on-premise servers
to the cloud.
3. Challenges: Limited benefits in terms of cost savings, scalability, or
performance improvements. May require manual adjustments for
compatibility.
4. Recommended Cloud: AWS, Azure, GCP.
5. Applicability: Both cloud-to-cloud and on-premise-to-cloud migrations.

Linux Interview Questions


2. Replatforming (lift, tinker, and shift):
1. Description: Optimize applications during migration by making minor
modifications to take advantage of cloud-native services.
2. Example: Migrating an on-premise application to a managed database
service like AWS RDS or Azure SQL Database.
3. Challenges: Requires some modifications to adapt the application to the
target cloud platform. May need code refactoring or configuration changes.
4. Recommended Cloud: AWS, Azure, GCP.
5. Applicability: Both cloud-to-cloud and on-premise-to-cloud migrations.

Linux Interview Questions


3. Refactoring:
1. Description: Restructure and optimize the application's codebase to
leverage cloud-native services and take full advantage of the cloud.
2. Example: Rebuilding a monolithic application as a set of microservices using
containers and container orchestration platforms like Kubernetes.
3. Challenges: Requires significant development effort and expertise. Potential
impact on application behavior and functionality.
4. Recommended Cloud: AWS, Azure, GCP.
5. Applicability: Both cloud-to-cloud and on-premise-to-cloud migrations.

Linux Interview Questions


4. Re-architecting:
1. Description: Redesign the application architecture to fully utilize cloud-
native services, scalability, and elasticity.
2. Example: Transforming a traditional three-tier architecture into a serverless
architecture using AWS Lambda or Azure Functions.
3. Challenges: Requires substantial development effort and expertise. May
involve rewriting the application from scratch.
4. Recommended Cloud: AWS, Azure.
5. Applicability: Both cloud-to-cloud and on-premise-to-cloud migrations.

Linux Interview Questions


5. Retire:
1. Description: Decommission systems or applications that are no longer
needed or have been replaced by cloud-native alternatives.
2. Example: Shutting down on-premise servers that are no longer in use after
migrating their workloads to the cloud.
3. Challenges: Proper analysis and planning are required to identify systems
that can be retired without impacting the overall ecosystem.
4. Recommended Cloud: N/A.
5. Applicability: Mostly for on-premise-to-cloud migrations, but can also be
relevant for cloud-to-cloud migrations.

Linux Interview Questions


Bash Interview Questions
Q: Can you describe your experience with Bash scripting? What types of tasks have you used Bash
scripting to solve?
A: The answer would depend on the your past experience with Bash scripting and the nature of the
tasks handled. You should be able to mention tasks like automating repetitive tasks, system health
checks, data manipulation, and more.

Q: Could you describe the difference between a 'shell variable' and an 'environment variable'?
A: Shell variables are only accessible in the current shell. But environment variables are accessible
system wide.
For example, setting a shell variable:
VAR1="shell variable"
echo $VAR1

Setting an environment variable:


export VAR2="environment variable"
echo $VAR2

Bash Interview Questions


Q: Write a command to find all the .txt files in a directory and its subdirectories and count the
number of lines in each file. Could you walk me through the command you wrote?
A: The 'find' command can be used to find all .txt files in a directory and its subdirectories:
find /path/to/dir -name "*.txt" -exec wc -l {} \;
This command will print the number of lines in each .txt file.

Q: Can you describe a situation where you have used loops in Bash scripting? Could you
provide a code snippet for that scenario?
A. general example of a loop in Bash scripting is processing files in a directory.
For example:
for file in /path/to/dir/*; do
echo "Processing $file"
# operation
done

Bash Interview Questions


Q: How would you read a file line by line and perform some operation on each line? What command would you
use for this?
A: The 'while' loop and 'read' command can be used to read a file line by line:
while IFS= read -r line; do
echo "Read line: $line"
# operation
done < /path/to/file

Q: Write a Bash script that accepts an arbitrary number of arguments and prints them out one at a time. Can
you discuss how it works and when it would be useful?
A: Here is an example of a bash script that accepts an arbitrary number of arguments:
#!/bin/bash
for arg in "$@"; do
echo "Argument: $arg"
done

Q: Can you explain what 'shebang' (#!/bin/bash) at the beginning of a script means?
A: The 'shebang' (#!/bin/bash) at the beginning of a script specifies the interpreter for the script. In this case, it
specifies that the script should be run using the bash shell.

Bash Interview Questions


Q: Write a script that checks if a file or directory exists, if it's a regular file or a directory, and prints a relevant message.
A: Here is an example script:
#!/bin/bash
if [[ -e /path/to/file ]]; then
if [[ -f /path/to/file ]]; then
echo "It is a regular file."
elif [[ -d /path/to/file ]]; then
echo "It is a directory."
else
echo "It is not a regular file or directory."
fi
else
echo "File or directory does not exist."
fi

Q: How would you make your Bash scripts interactive? Could you provide an example?
A: Interactive scripts can be made using 'read' command:
#!/bin/bash
echo -n "Enter your name: "
read name
echo "Hello, $name"

Bash Interview Questions


Q: What are some ways you can debug a Bash script? Have you used shell options or other tools to help debug a script?
A: Debugging a bash script can be done using 'set -x' to print each command before it's executed. Additionally, we can use the '-v' flag to print
shell input lines.

Q: How would you handle error messages in your bash scripts and ensure that they fail gracefully?
A: Error messages can be redirected to a log file, or we can use 'set -e' to stop the script if any command fails.
#!/bin/bash
set -e
command_that_may_fail >/dev/null 2>>error.log

Q: What does the command '2>&1' do in bash? How would you use it in scripting?
A: The command '2>&1' redirects the stderr (file descriptor 2) to stdout (file descriptor 1). For example, to redirect all output of a command to a
file:
command > file.txt 2>&1
Example:
[cloud_user@32e9b9fd011c ~]$ ls -lrt > file.txt 2>&1
[cloud_user@32e9b9fd011c ~]$ cat file.txt
total 0
-rw-r--r--. 1 cloud_user cloud_user 0 Jun 11 13:09 file.txt
[cloud_user@32e9b9fd011c ~]$ ls -lrt hnk > file.txt 2>&1
[cloud_user@32e9b9fd011c ~]$ cat file.txt
ls: cannot access 'hnk': No such file or directory

Bash Interview Questions


Q: How would you implement a logging mechanism in Bash scripts? What kind of information
would you log?
A: Logging can be implemented by redirecting messages to a file. For example:
#!/bin/bash
LOG_FILE="/path/to/logfile.log"
echo "This message will be logged" | tee -a "$LOG_FILE"

Q: What is the purpose of the 'trap' command in a bash script? Can you provide an example of
how you might use it?
A: The 'trap' command allows us to catch signals and execute a command when a signal is
received. For example:
#!/bin/bash
trap 'echo "Caught a signal"' SIGINT SIGTERM
while true; do
sleep 1
done

Bash Interview Questions


Q: How do you run a Bash script in the background? What are some
considerations when running scripts in the background versus in the
foreground?
A: You can run a bash script in the background by appending '&' at the end of
the command.
The difference between running in the background vs foreground is that the
foreground process can take input, but the background process can't.

Q: How would you pass output of a script as an input to another script?


A: We can use a pipeline '|' to pass the output of a script as input to another
script. For example:
script1.sh | script2.sh

Bash Interview Questions


Q: Can you explain the difference between '==' and '=' in a conditional statement? Provide an example of each in a script.
A: In a conditional statement, == is used for comparison between two variables. On the other hand, = is used for string
comparison and assignment.
In a script, you'd see them used as follows:
#!/bin/bash
var1="Hello"
var2="Hello"
if [ "$var1" = "$var2" ]; then
echo "Strings are equal."
fi

And == in the context of comparison:


#!/bin/bash
num1=10
num2=10
if (( num1 == num2 )); then
echo "Numbers are equal."
fi

Bash Interview Questions


Q: What is the difference between [ ] and [[ ]] for testing conditions? Can you provide examples where one is
preferred over the other?
A: [ ] and [[ ]] are used for testing conditions. [[ ]] is an improved version and is preferred due to its additional
features like pattern matching and more logical operations.
Here's a script illustrating this:
#!/bin/bash
var="Hello"
# Using [ ]
if [ "$var" = "Hello" ]; then
echo "Hello"
fi

# Using [[ ]]
if [[ "$var" == "Hello" ]]; then
echo "Hello"
fi

Bash Interview Questions


Q: What is the role of the 'set' command in bash scripting? Can you provide an
example where it's used?
A: The set command in bash is used to set and unset certain flags or settings within
the shell environment.
For example, set -x is used to debug a script, printing each command to the terminal
before it is executed.

Q: How would you write a bash script to create a backup of a directory, and then
automate it to run daily?
A: Here's a script to create a backup of a directory:
#!/bin/bash
tar -czf /path/to/backup.tar.gz /path/to/dir
This script can be automated to run daily by adding it as a cron job:
0 0 * * * /path/to/backup.sh

Bash Interview Questions


Q: How would you use arrays in a bash script? Provide an example.
A: Arrays in bash are zero-based and can be defined as follows:
#!/bin/bash
ARRAY=("firstelement" "secondelement" "thirdelement")
echo ${ARRAY[0]} # Output: firstelement
echo ${ARRAY[1]} # Output: secondelement

Q: How would you extract specific lines from a text file? For example, print lines 10 to
20 from a file.
A: Here's how you could extract lines 10 to 20 from a file:
sed -n '10,20p' filename
This uses the 'sed' command with the '-n' option to suppress automatic printing, and
'10,20p' to print lines 10 through 20.

Bash Interview Questions


Bash Interview Questions
Q: How can you ensure that your bash scripts are idempotent? Can you provide an example where this
principle is important?
A: Idempotency in the context of bash scripting means that no matter how many times you run the script, the
result will be the same.
An example where this is important is a script to create a directory.
An idempotent script should first check if the directory exists before attempting to create it:
#!/bin/bash
DIR="/path/to/dir"
if [[ ! -d "$DIR" ]]; then
mkdir "$DIR"
fi

Q: Can you describe how you'd use 'awk' or 'sed' in a Bash script to manipulate a complex data file?
A: Here is an example where 'awk' is used to extract and sum the sizes of all files in a directory:
ls -l | awk '{total += $5} END {print total}'
And an example using 'sed' to replace all occurrences of 'foo' with 'bar' in a file:
sed 's/foo/bar/g' filename

Adv Bash Interview Questions


Q: How can you encrypt sensitive data in your Bash scripts (like API keys)?
A: Sensitive data in your bash scripts can be encrypted using a tool like gpg.
First, you'd encrypt the file:
gpg --encrypt --recipient 'Your Name' secretfile
This creates secretfile.gpg. Then, in your script, you can decrypt the file:
gpg --output secretfile --decrypt secretfile.gpg
The decrypted file is only available while the script is running and can be
deleted afterwards.

Bash Interview Questions


Q: Explain how you would write a bash script to parse JSON or XML data.
A: To parse JSON or XML data in a bash script, you'd generally want to use a
command line utility like jq for JSON or xmlstarlet for XML.
For example, to get the value of a key in a JSON file with jq:
jq -r '.key' file.json
And to get the value of an attribute in an XML file with xmlstarlet:
xmlstarlet sel -t -v "//element/@attribute" file.xml

Bash Interview Questions


Q: If you need to write a Bash script that interfaces with a database, how would you handle
connection parameters like username, password, hostname, etc.?
A: You can use a configuration file to handle connection parameters. For example, you could
create a file db.conf:
USERNAME='user'
PASSWORD='pass'
HOSTNAME='localhost'

Then, in your script, source this file:


#!/bin/bash
. db.conf
echo $USERNAME
echo $PASSWORD
echo $HOSTNAME

Bash Interview Questions


Q: How would you implement error handling and logging in a bash script that
interacts with an external API or service?
A: Implementing error handling and logging in a bash script can be
accomplished using the concept of 'exit status'. Every command returns an exit
status (0 for success, >0 for failure/errors). You can use this in your bash script
to detect a failure and handle the error appropriately. In the case of interacting
with an external API, you might use curl to send a request and then check the
exit status of curl.

Bash Interview Questions


Q: What is the purpose of the 'shift' command in a bash script? Can you
provide an example of a script where it is used?
A: The 'shift' command in bash is used to shift positional parameters to the
left. It's often used in scripts that require command-line arguments. Here's an
example:
#!/bin/bash
while [ "$1" != "" ]; do
echo "Parameter: $1"
shift
done
If you run this script with "./script.sh one two three", it will print each
argument on a separate line.

Bash Interview Questions


Q: Can you explain the use of '>/dev/null 2>&1' in bash scripting?
A: The construct '>/dev/null 2>&1' is used in bash to redirect both standard
output (file descriptor 1) and standard error (file descriptor 2) to /dev/null,
effectively silencing all output from the command.
'/dev/null' is a special file that discards all data written to it.

Q: How would you write a Bash script to monitor a log file in real-time and
send alerts based on specific patterns?
A: You can use the 'tail -f' command to monitor a log file in real time, and 'grep'
to match specific patterns. To send alerts, you might send an email or trigger a
notification depending on the system setup.

Bash Interview Questions


Q: How would you implement concurrent or parallel processing in a Bash
script?
A: Parallel processing in a bash script can be implemented using background
jobs or utilities like 'xargs' or 'parallel'. Here's an example using xargs:
#!/bin/bash
echo -e 'job1\njob2\njob3' | xargs -n 1 -P 0 ./myscript.sh
This runs myscript.sh with each job as an argument, and the '-P 0' option tells
xargs to run as many processes as possible in parallel.

Q: Have you used 'getopts' or 'shift' for command-line argument processing in


your bash scripts?
A: 'getopts' and 'shift' can be used for command-line argument processing.

Bash Interview Questions


Q: Can you provide an example of how you'd use a 'case' statement in a bash script?
A: A 'case' statement in a bash script is used for pattern matching. Here's an example:
#!/bin/bash
case $1 in
start)
echo "Starting..."
;;
stop)
echo "Stopping..."
;;
restart)
echo "Restarting..."
;;
*)
echo "Invalid command"
;;
esac
This script accepts a command as an argument and prints a message based on the command.

Bash Interview Questions


GIT and GITHUB Interview Questions
• Q: Can you explain the different branching strategies you have used in Git?
• A: Certainly! In my experience, there are several common branching strategies that teams can
adopt based on their needs. These include Git Flow and GitHub Flow among others.
• Git Flow is a highly structured method that works well for projects with scheduled release
cycles. It has two long-lived branches: master for production and develop for integrating
features. Each new feature has its own branch and is merged into develop when it's ready for
integration. For a release, we fork a release branch off of develop. Once the release is ready, it
is merged into master and develop. For hotfixes, we create a hotfix branch directly off of
master.
• GitHub Flow is simpler and ideal for continuous delivery environments. The master branch is
always deployable. For every new feature or bugfix, a new branch is created off master. Once
the changes are ready and tested, a pull request is opened. If it passes code review, the changes
are merged back into master and the feature branch is deleted. After merging, the master
branch is deployed to production.

GIT and GITHUB Interview Questions


• Q: How would you undo the most recent commit in Git?
• A: You can use the command git reset --soft HEAD~1 to undo the most recent
commit, keeping the changes in the staging area, or git reset --hard HEAD~1 to
discard the changes completely.

• Q: How do you deal with a merge conflict in Git?


• A: When a merge conflict happens, Git will indicate the conflicted files. You
need to open those files and look for the conflict markers (<<<<<<, =======,
>>>>>>). The changes from the current branch are above the =======, and the
changes from the branch being merged are below. Edit the file to resolve the
conflict, then add the file with git add, and commit it with git commit.

GIT and GITHUB Interview Questions


• Q: Explain the difference between git fetch and git pull.
• A: git fetch only downloads new data from a remote repository, but it doesn't integrate any of
this new data into your working files. On the other hand, git pull updates your current HEAD
branch with the latest changes from the remote server. This means git pull is equivalent to git
fetch followed by git merge.

• Q: Can you describe a typical Git workflow you would use in a collaborative project?
• A: A common workflow is the feature branch workflow. This involves creating a new branch
whenever you want to work on a new feature. The master branch should never contain broken
code, it is always production-ready. All development is done in branches and then merged into
the master branch once the feature is complete and tested.

GIT and GITHUB Interview Questions


• Q: What are the steps to perform a rebase in Git?
• A: A rebase can be performed using the following steps:
– Switch to the branch you want to rebase: git checkout feature-branch
– Start the rebase process: git rebase master
– If there are any conflicts, resolve them. After resolving conflicts, use git add . to stage the
resolved files, then continue the rebase with git rebase --continue.
– If you want to abort the process at any time, use git rebase --abort.

• Q: How do you go about resolving a binary file conflict?


• A: Binary files cannot be merged like text files, so you have to decide which
version to keep. You can do this with git checkout --ours filename or git
checkout --theirs filename and then commit the resolved file.

GIT and GITHUB Interview Questions


Git Rebase

• Git rebase is a command that can be used to integrate changes from one branch into another. It is an alternative to the better
known git merge. Both of these commands are designed to integrate changes from one branch into another.

• Rebase is a bit different because instead of merging the branches, it actually moves or combines the changes via a series of
patch. This makes for a more streamlined, linear project history.

• Scenario for Rebase:

• Let's assume you're working in a DevOps team. You have a development branch where all the development work happens.
Once the development is done, it's merged into the main branch, which is then deployed to production.

• However, multiple developers working on the development branch can cause it to quickly get out of sync with the main branch,
especially if other teams are also merging their changes into main.

• To keep development up-to-date with main, you can use the git rebase command. This will allow you to apply the changes from
main onto development as if those changes happened after all the changes on development. Here's how you can do it:

• # switch to development branch


– $ git checkout development

• # rebase the development branch onto main


– $ git rebase main

REBASE EXAMPLE
• Q: What is the function of git cherry-pick?
• A: git cherry-pick is used to apply the changes from an existing commit to
the current branch. It basically generates a new commit with a different
commit hash.

• Q: Could you explain how to squash commits in Git?


• A: Squashing is the process of combining several commits into a single
one. This is usually done in the context of cleaning up a feature branch
before merging into main. This is typically performed through an
interactive rebase, for example, git rebase -i HEAD~n (where 'n' is the
number of commits to squash from the current HEAD).

GIT and GITHUB Interview Questions


• git cherry-pick is a command in Git that allows you to take a commit from one branch and apply it onto
another branch. It's like saying, "I want that specific change they made in that commit over there on my
branch here."
• Here is a simple breakdown:
• You've worked on a feature in a branch (let's call it "branch A") and made several commits to record your
changes.
• Meanwhile, you also worked on another branch (let's call it "branch B").
• You realize that one of the changes (commits) you made on branch A would be really helpful on branch B.
• Instead of manually recreating that change on branch B, you can use git cherry-pick <commit-hash> to
copy that specific commit from branch A and apply it onto branch B.
• It's important to remember that cherry-pick applies the changes made in specific commits and not the
entire set of changes in a branch. It's like picking just the cherry (commit) you want, instead of taking the
entire sundae (all commits on the branch).

Cherry pick
• "Squashing" in Git is a technique used to clean up your commit history.
• Here's a simple way to understand it:
• Imagine that you're working on a big project and you've made a bunch of changes. Each time you make a
set of related changes, you "commit" them, which is like saving your work in a special way that Git can
keep track of. After a few days, you might have made 10 commits.
• But maybe you realize that all these changes were really just about one thing, like "adding a new feature"
or "fixing a bug". It's a bit messy to have all these separate commits about the same thing. This is where
"squashing" comes in.
• "Squashing" is like taking a stack of papers, each with different parts of a story written on them, and
combining them all onto one page. With squashing in Git, you take all those different commits and
combine them into a single commit. This makes your project's history cleaner and easier to understand.
• In short, squashing in Git is all about tidying up your project's history by combining multiple commits into
one.

Squashing
Git Cherry-Pick
• Git cherry-pick allows a developer to select specific commits from one branch and apply them onto
another branch. It's a way of applying some commits from one branch onto another.
• Scenario for Cherry-Pick:
• Continuing with the DevOps scenario, let's say you have made several commits to your development
branch that fix a critical bug. However, this branch also contains a lot of new features that aren't ready for
production yet.
• Instead of merging all the changes from development into main, you can use git cherry-pick to apply only
the bug-fix commits to main. This allows you to fix the bug in production without deploying untested
features.
• First, you need to find the hash of the bug-fix commits using git log. Once you have the hashes, you can
checkout to the main branch and apply the commits using git cherry-pick:
• # switch to main branch
– $ git checkout main

• # cherry-pick the commit


– $ git cherry-pick commit_hash

CHERRY PICK EXAMPLE


• Q: What's your approach to secure a Git repository?
• A: This could involve measures such as using SSH keys for authentication, regularly
updating and auditing contributor access, implementing commit signing to verify the
authenticity of commits, using .gitignore to prevent committing sensitive data, and
using security tools to automatically scan your repositories for security vulnerabilities
or secrets.

• Q: Can you describe the different types of Git hooks?


• A: Git hooks are scripts that run automatically every time a particular event occurs in a
Git repository. They are used for automation of workflow tasks. Examples of hooks
include pre-commit (runs before commit), post-commit (runs after commit), pre-
receive (runs on the remote side before acknowledging a push), and others.

GIT and GITHUB Interview Questions


• Q: Explain the difference between a Git fork and a Git clone.
• A: A fork is a remote, server-side copy of a repository, distinct from the original.
A clone is a local copy of some remote repository. When you clone, you are
making a copy of the complete repository history, but with a fork, you are
making a server-side copy which can be tracked separately.

• Q: How would you handle large files or large amounts of binary data in a Git
repository?
• A: Git is not designed to handle large files or large volumes of binary data. The
Git LFS (Large File Storage) extension can be used for versioning large files while
keeping them out of the actual Git repository.

GIT and GITHUB Interview Questions


• Q: Explain the use and benefit of a Git submodule.
• A: Submodules allow you to keep another Git repository in a subdirectory
of your repository. This lets you clone another repository into your
project and keep your commits separate.

• Q: How do you enforce code quality standards and reviews in GitHub?


• A: This can be accomplished through a combination of tactics, including
pull request templates, code review procedures, continuous integration
checks, and using GitHub's built-in "protected branches" feature to
require certain checks pass before merging.

GIT and GITHUB Interview Questions


• Q: What are GitHub Actions and how have you used them in a project?
• A: GitHub Actions are a CI/CD and general automation feature of GitHub,
allowing you to run scripts (known as "workflows") in reaction to Git events like
push, pull request, and more. They can be used for a wide variety of purposes,
from testing code to deploying applications.

• Q: What are some strategies you might use to keep the history of a Git
repository clean?
• A: Strategies include interactive rebase to squash or fixup commits, avoiding
unnecessary merge commits with rebasing, using git commit --amend for fixing
recent mistakes, and using descriptive commit messages to clearly articulate
changes.

GIT and GITHUB Interview Questions


• Q: How have you used GitHub or other online Git platforms in a Continuous
Integration / Continuous Delivery (CI/CD) pipeline?
• A: The answer will depend on the individual's experience. They may discuss
using GitHub Actions to run tests and deploy code, setting up Jenkins to use Git
repositories, or integrating with other tools like Travis CI, CircleCI, etc.

• Q: What do you consider best practices for managing branches in Git?


• A: Always create a new branch for new features or fixes, regularly merge the
main branch into your branch to keep it up-to-date, delete branches after their
changes have been merged, and use meaningful branch names.

GIT and GITHUB Interview Questions


• Q: How can one revert a pull request in GitHub?
• A: To revert a pull request in GitHub, you can use the "Revert" button in
the GitHub interface. This creates a new pull request that undoes all
changes made in the original pull request.

• Q: Explain how you would set up a Git repository to run code checks
automatically when someone pushes a change.
• A: This can be done using Git hooks, specifically the pre-receive hook on
the server side, or the pre-push hook on the client side. You can also use
CI/CD pipelines with GitHub Actions, Jenkins, or other tools to run checks
automatically on push.

GIT and GITHUB Interview Questions


• Q: How do you synchronize a forked GitHub repository with the original
repository?
• A: This can be done by adding the original repository as a remote (git remote
add upstream <repo-url>), fetching the updates (git fetch upstream), and then
merging the updates into your fork (git merge upstream/main).

• Q: What are some advanced features of GitHub that you find particularly useful
and why?
• A: The answer to this will depend on the individual's experience, but may
include features like GitHub Actions, code scanning, security advisories, pull
request reviews, the dependency graph, and others.

GIT and GITHUB Interview Questions


GITLAB CICD INTERVIEW QUESTIONS
• Q: What is GitLab CI/CD and how does it compare with other CI/CD tools like Jenkins?
• A: GitLab CI/CD is a continuous integration / continuous delivery system that comes
out-of-the-box with GitLab. It's built into the GitLab system and uses a simple YAML file
for configuration. Compared to other tools, its seamless integration with GitLab's
source control makes it extremely easy to use and maintain.

• Q: Can you describe a CI/CD pipeline you've set up using GitLab?


• A: A pipeline I've set up involved multiple stages such as building, testing, and
deploying an application. Each stage consisted of different jobs that ran scripts relevant
to the stage. The pipeline was defined in a .gitlab-ci.yml file in the repository.

GITLAB CICD INTERVIEW QUESTIONS


stages:

- build

- test

build_job:

stage: build

script: echo "Building the app"

test_job:

stage: test

script: echo "Testing the app"

CI/CD Pipeline in GitLab:


Q: How does GitLab CI/CD handle parallel execution of jobs in the pipeline?
• A: GitLab CI/CD can run multiple jobs in parallel by defining them in the same stage of the pipeline. The
number of jobs that can run simultaneously depends on the number of available runners.
tests:
script: rspec
parallel: 10

Q: What are runners in the context of GitLab CI/CD?


• A: Runners in GitLab CI/CD are agents or servers that execute the jobs in the pipeline. They can be specific
to a project or be shared across several projects. They can run on various types of infrastructure, from
virtual machines to Kubernetes clusters.
Q: How can you configure GitLab CI/CD to only trigger certain jobs under specific conditions?
• A: GitLab CI/CD allows the use of rules or only/except keywords to define when jobs should be run. For
instance, you can set a job to only run when a tag is pushed or on changes to specific files.

GITLAB CICD INTERVIEW QUESTIONS


Triggering Jobs Under Specific Conditions:
You can use GitLab's rules or only/except keywords to specify when to run jobs.
only/except are simpler but less flexible, whereas rules allow more complex conditions.
Here's an example with rules:
deploy:
script: echo "Deploying..."
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
when: always
- when: never

Triggering Jobs Under Specific Conditions:


• Q: How do you handle secure variables or secrets in your GitLab CI/CD configuration?
• A: GitLab provides a feature to define secret variables that are securely passed to the runner, without
exposing them in the logs. These can be defined at the project or group level.

• Q: Can you describe how GitLab CI/CD integrates with Kubernetes for deployments?
• A: GitLab has a Kubernetes integration which makes it easy to deploy applications to a Kubernetes cluster.
It can also connect with a Kubernetes cluster to create a runner.

• Q: Explain the process of setting up test and production environments in GitLab.


• A: GitLab allows you to define environments in the CI/CD configuration file, you can then deploy to these
environments using jobs. Additionally, you can make jobs depend on successful deployment in previous
environments.

GITLAB CICD INTERVIEW QUESTIONS


• Q: What is "Auto DevOps" in GitLab and how have you used it, if at all?
• Auto DevOps is a pre-defined CI/CD configuration provided by GitLab. It automatically detects, builds,
tests, deploys, and monitors your applications. You can enable it under "Settings" -> "CI/CD" -> "Auto
DevOps".

• Q: How do you ensure code quality and enforce code testing in your GitLab CI/CD pipelines?
• A: Code quality and testing can be enforced by creating jobs in the pipeline to run code linters, unit tests,
integration tests, etc. Merge requests can be configured to be merged only if the pipeline passes.

• Q: How would you set up a multi-stage deployment process (dev, test, staging, production) in GitLab?
• A: In GitLab, a multi-stage deployment process can be set up using different environments. Each stage
would have its own jobs, and deployments to later stages could be set to manual to control when they're
deployed.

GITLAB CICD INTERVIEW QUESTIONS


GitLab CI/CD lets you define test jobs in your .gitlab-ci.yml. Additionally, it integrates with GitLab's code quality feature, which
automatically detects code quality issues. You enable it by adding a code_quality job to your pipeline:

code_quality:

image: docker:stable

allow_failure: true

script:

- export SP_VERSION=$(echo "$CI_SERVER_VERSION" | sed 's/^\([0-9]*\)\.\([0-9]*\).*/\1-\2-stable/')

- docker run

--env SOURCE_CODE="$PWD"

--volume "$PWD":/code

--volume /var/run/docker.sock:/var/run/docker.sock

"registry.gitlab.com/gitlab-org/security-products/codequality:$SP_VERSION" /code

artifacts:

reports:

codequality: gl-code-quality-report.json

Ensuring Code Quality and Enforce Code Testing:


• Q: What strategies would you use for optimizing the performance (e.g., speed, resource usage) of a GitLab
CI/CD pipeline?
• A: Performance optimization strategies include using pipeline caching to speed up jobs, running jobs in
parallel, optimizing the job scripts themselves, and using GitLab's autoscaling feature.

• Q: How do you handle rollbacks in a GitLab CI/CD pipeline if a deployment fails?


• A: Rollbacks can be handled by creating a job that will revert deployments if a later stage fails. GitLab's
environments also keep a history of deployments, so you can easily revert to a previous version manually
if needed.

• Q: How would you configure GitLab CI/CD to build and publish Docker images?
• A: A job can be set up in the pipeline to build a Docker image and then push it to a registry. GitLab's own
container registry can be used, or an external one.

GITLAB CICD INTERVIEW QUESTIONS


You can do this using the docker command in your CI scripts. You will need to log
in to the Docker registry, build your image, and then push it. Here's an example:
build:
stage: build
script:
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD"
$CI_REGISTRY
- docker build -t "$CI_REGISTRY_IMAGE" .
- docker push "$CI_REGISTRY_IMAGE"

Building and Publishing Docker Images:


• Q: Have you used GitLab's Review Apps feature? If so, can you describe a use case?

• Review Apps are a feature in GitLab that allows you to create temporary environments to view changes in a merge request. For
instance, if you are working on a web application and have made some changes in a feature branch, you can use Review Apps to
spawn a temporary live instance of your app with your new changes.

review:

image: ruby:2.5

script: echo "Deploy to $CI_ENVIRONMENT_SLUG"

environment:

name: review/$CI_COMMIT_REF_NAME

url: http://$CI_ENVIRONMENT_SLUG.example.com

only:

- branches

except:

- master

In this scenario, you can share the Review App URL with your team for feedback before merging the changes to the main branch. This
is especially useful when multiple people are working on the same project and want to review the changes before they go live.

GITLAB CICD INTERVIEW QUESTIONS


Q: How do you ensure the security of your GitLab CI/CD pipelines?
A. can ensure the security of your GitLab CI/CD pipelines by:
• Using secret variables to hide sensitive data. You can define these under Settings -> CI/CD -> Variables.
• Limiting the permissions of the GitLab runner.
• Implementing code quality checks and security scanning stages in your pipeline.

Q: How would you approach troubleshooting a pipeline failure in GitLab CI/CD?


• A: Troubleshooting in GitLab CI/CD would typically involve:
– Checking the logs of the failed jobs.
– Debugging the scripts used in the failed jobs.
– Looking into the infrastructure where the pipeline is running, if necessary.
– You might also consider running the job in a local environment to reproduce the issue.
– CI Lint, accessible from CI/CD -> Pipeline Editor, can be used to check your .gitlab-ci.yml for syntax errors.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How have you used GitLab's CI/CD pipelines to support a microservices architecture?
• A: For a microservices architecture, separate pipelines can be set up for each service. GitLab's pipeline
dependencies can be used to trigger other pipelines, such as integration tests, once a service is
successfully built and tested.

• Q: What experience do you have with Infrastructure as Code (IaC) in the context of GitLab CI/CD?
• A: Infrastructure as Code (IaC) can be used with GitLab CI/CD to manage the infrastructure needed for the
application. This can involve using tools like Terraform or Ansible in a job to set up or update
infrastructure.

• Q: Can you discuss a complex problem you solved with GitLab CI/CD?
• A: A complex problem I faced was optimizing a large, slow pipeline. This involved restructuring the
pipeline to run jobs in parallel where possible, optimizing the job scripts, using pipeline caching, and
setting up autoscaling runners.

GITLAB CICD INTERVIEW QUESTIONS


• Q: You've just set up a new pipeline in GitLab, but the pipeline fails immediately on start with no output.
What might be the issue, and how would you investigate it?
• A. This could be due to a few reasons, but the most common one is a syntax error or issue in your .gitlab-
ci.yml file. You can use GitLab's CI Lint tool (under CI/CD -> Pipeline Editor -> CI Lint) to validate the syntax
of your CI configuration. Also, ensure your pipeline meets any conditions or rules defined in the .gitlab-
ci.yml.
• Q: A job in your GitLab pipeline is failing intermittently with a network error when trying to access an
external service. How would you go about debugging this?
• A: Intermittent network errors could be due to a variety of issues. I would start by checking the status and
logs of the external service for any issues, then look at the network configuration of the GitLab runner. If
possible, I would also add error handling and retry logic to the job script to handle network issues
gracefully.
• Q: Your GitLab pipeline is running much slower than expected. How would you diagnose and address the
issue?
• A: If a pipeline is running slower than expected, I would first look at the performance of the individual
jobs. It could be that one or more jobs are taking longer than they should. I would also check the resource
usage of the runners and consider adding more runners or using parallel execution if necessary.
Additionally, caching and other optimization strategies could be looked into.

GITLAB CICD INTERVIEW QUESTIONS


• Q: A job in your pipeline that deploys to a Kubernetes cluster is failing, but the same commands work when run manually. What
might be the issue?

• A: If a deployment to Kubernetes is failing in the pipeline but not when run manually, it could be a problem with the
environment or credentials in the pipeline. I would check that the KUBECONFIG is set correctly and that the runner has the
necessary permissions. I would also look at the Kubernetes logs for any errors.

• Q: Your GitLab pipeline uses a caching mechanism to speed up builds, but the cache doesn't seem to be working. How would
you investigate this?

• A: If caching isn't working in a pipeline, I would first verify the cache configuration in the .gitlab-ci.yml file. If that looks correct, I
would check the runner logs for any cache-related errors. It could also be that the cache is being invalidated too often, in which
case I would adjust the cache key or paths.

• Q: You've just added a new runner to your GitLab instance, but jobs aren't being picked up by it. What could be going wrong?

• A: If a new runner isn't picking up jobs, I would check the runner's configuration and status in GitLab, and make sure it's not
paused and is assigned to the correct projects. If everything looks correct, I would check the runner logs for any errors or
warnings.

GITLAB CICD INTERVIEW QUESTIONS


• Q: A job that uses secret variables is failing because it's not receiving the correct values. What would be your steps to
troubleshoot this?

• A: If a job isn't receiving the correct secret variable values, I would first check the variable configuration in GitLab, making sure
they're defined at the correct level (project or group) and are protected or masked as necessary. I would also check the job
script to make sure it's accessing the variables correctly.

• Q: A specific job in your pipeline is failing only when run in the pipeline, not when the commands are run manually. How would
you approach this issue?

• A: If a job is failing in the pipeline but not manually, it's likely a problem with the pipeline environment. I would compare the
environment in the pipeline and the manual environment, looking at things like environment variables, the working directory,
and installed packages.

• Q: Your GitLab pipeline is supposed to trigger another pipeline upon completion, but the second pipeline isn't running. How
would you investigate this?

• A: If a pipeline isn't triggering another pipeline as expected, I would first check the configuration of the trigger in the .gitlab-
ci.yml file. If that looks correct, I would check the logs of the first pipeline for any errors related to the trigger, and also check
the settings of the second pipeline to ensure it's set up to be triggered.

GITLAB CICD INTERVIEW QUESTIONS


• Q: You're having trouble with jobs not starting correctly on a self-hosted GitLab runner. How would you go
about diagnosing and fixing the problem?
• A: If jobs aren't starting correctly on a self-hosted runner, I would first check the status and configuration
of the runner in GitLab. If everything looks correct there, I would check the logs of the runner for any
errors. It could also be an issue with the runner's environment, in which case I would verify the
installation and configuration of the runner on the host machine.
• Q: Explain how GitLab CI/CD integrates with Kubernetes.
• A: GitLab CI/CD can be integrated with Kubernetes in various ways. For instance, you can set up a GitLab
runner inside a Kubernetes cluster, deploy applications to Kubernetes directly from a GitLab pipeline, and
even manage Kubernetes resources using GitLab Managed Apps.
• Q: Can you explain what Auto DevOps is in GitLab? How does it work?
• A: Auto DevOps is a feature in GitLab that automatically configures your CI/CD pipeline with best
practices. It includes stages like build, test, code quality, security scanning, and deploy, among others. It
works by detecting the language and framework used in your project and automatically configuring the
appropriate build, test, and deployment scripts.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How can you set up a multi-project pipeline in GitLab CI/CD?
• A: Multi-project pipelines in GitLab CI/CD can be set up using the trigger
keyword in the .gitlab-ci.yml file. This allows a pipeline in one project to
trigger a pipeline in another project.

• Q: How does GitLab CI/CD support cross-project pipelines?


• A: GitLab supports cross-project pipelines using the trigger keyword. This
allows a pipeline in one project to trigger a pipeline in another project,
with support for passing variables between the pipelines.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How would you implement a blue-green deployment using
GitLab CI/CD?
• A: Blue-green deployment in GitLab CI/CD can be achieved by
having two identical production environments: Blue and Green. The
pipeline deploys the new version of the app to the inactive
environment, and if the deployment and tests are successful, the
traffic is switched to this environment.

GITLAB CICD INTERVIEW QUESTIONS


• Q: Can you describe a complex GitLab CI/CD pipeline you've set up and how you handled challenges?
• A: Answers will vary but could include descriptions of multi-stage pipelines, pipelines with complex
dependencies, pipelines integrating with multiple external systems, etc. Overcoming challenges could
involve debugging pipeline failures, optimizing performance, or managing complexity.

• Q: What is directed acyclic graph (DAG) in the context of GitLab CI/CD?


• A: In GitLab CI/CD, a directed acyclic graph (DAG) refers to the structure of the pipeline. Each job in the
pipeline represents a node in the graph, and the dependencies between jobs represent edges. This
structure ensures that jobs are run in the correct order based on their dependencies, and allows for
parallel execution of independent jobs.

• Q: Explain the difference between a bridge job and a parent-child pipeline in GitLab CI/CD.
• A: A bridge job is used to trigger a downstream pipeline in another project or the same project, while a
parent-child pipeline refers to a pipeline structure where the parent pipeline triggers one or more child
pipelines within the same project. Child pipelines help in dividing complex configuration into multiple
simpler configuration files.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How would you ensure security in your GitLab CI/CD pipelines?
• A: Security in GitLab CI/CD pipelines can be ensured by using secret variables for sensitive data,
limiting access to pipelines, using protected branches and environments, enabling security
scanning features like SAST, DAST, Container scanning, and Dependency scanning.
• Q: How would you handle rollbacks in GitLab CI/CD?
• A: Rollbacks in GitLab CI/CD can be handled by using the environments feature and keeping
track of deployments. If a deployment needs to be rolled back, you can redeploy the previous
successful deployment to the environment.
• Q: How does GitLab CI/CD handle artifacts?
• A: GitLab CI/CD handles artifacts using the artifacts keyword in the .gitlab-ci.yml file. Artifacts
are files created by jobs that can be passed to subsequent jobs or stored for later use.
• Q: How do you use GitLab CI/CD variables?
• A: GitLab CI/CD variables can be defined in several ways: pre-defined variables, custom
variables, secure variables, etc. They can be used in the .gitlab-ci.yml file or the job scripts.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How can you include external scripts or CI/CD configuration in your GitLab pipeline?
• A: External scripts or CI/CD configurations can be included in a GitLab pipeline using the include keyword
in the .gitlab-ci.yml file. The external resource can be a file in the same project, a file in a different project,
or a remote file.

• Q: How can you run manual jobs in GitLab CI/CD?


• A: Manual jobs in GitLab CI/CD can be set up using the when: manual keyword in the job definition in the
.gitlab-ci.yml file. Manual jobs are not run automatically but instead need to be manually started from the
GitLab UI.

• Q: How does GitLab CI/CD handle concurrent jobs and pipelines?


• A: GitLab CI/CD handles concurrent jobs by running them in separate runners. The number of concurrent
pipelines for a project can be limited in the project's CI/CD settings.

GITLAB CICD INTERVIEW QUESTIONS


• Q: Explain the concept of pipeline stages in GitLab CI/CD.
• A: In GitLab CI/CD, pipeline stages are used to group jobs. Jobs in the same stage are run in parallel, while
stages are run in order. This allows for a clear structure and order of execution in the pipeline.
• Q: How would you handle a situation where your GitLab CI/CD pipeline becomes too complex or slow?
• A: A complex or slow GitLab CI/CD pipeline can be optimized by breaking it down into multiple stages or
jobs, using parallel execution, caching, and optimizing job scripts. Additionally, child pipelines can be used
to separate parts of the pipeline into different configuration files.
• Q: How can GitLab CI/CD be used for Infrastructure as Code (IaC)?
• A: GitLab CI/CD can be used for Infrastructure as Code by running jobs that apply or update infrastructure
configurations. This could involve tools like Terraform, Ansible, or Kubernetes configuration files.
• Q: How would you handle sensitive data in GitLab CI/CD pipelines?
• A: Sensitive data in GitLab CI/CD pipelines should be handled using secret variables, which can be defined
at the project or group level. Secret variables are not revealed in logs or exposed to merge requests from
forks.

GITLAB CICD INTERVIEW QUESTIONS


• Q: What are GitLab CI/CD templates and how can they be used?
• A: GitLab CI/CD templates are predefined pipeline configuration files
provided by GitLab for various languages and frameworks. You can
use these templates as a starting point for your own pipeline
configuration.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How can you ensure reliability and high availability of your GitLab CI/CD runners?
• A: Reliability and high availability of GitLab CI/CD runners can be achieved by running multiple runners,
monitoring runner status and performance, using autoscaling, and distributing runners across different
regions or zones.

• Q: How would you implement zero-downtime deployments using GitLab CI/CD?


• A: Zero-downtime deployments in GitLab CI/CD can be achieved using techniques like rolling updates,
blue-green deployments, or canary deployments. The specific implementation depends on the
deployment platform and the application's architecture.

• Q: How can you use GitLab CI/CD to automate database migrations?


• A: Database migrations can be automated in GitLab CI/CD by creating a job that runs the necessary
migration commands. This job can be run manually or as part of the deployment stage.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How does GitLab CI/CD handle job artifacts from previous pipeline runs?
• A: GitLab CI/CD handles job artifacts from previous pipeline runs using the
artifacts keyword. You can configure how long artifacts are stored, and artifacts
can be passed between jobs and stages.

• Q: How can you integrate GitLab CI/CD with external systems like Jira, Slack, or
Docker registry?
• A: GitLab CI/CD can be integrated with external systems using webhooks, API
calls, or specific integration features. For instance, you can send notifications to
Slack using the Slack notifier, update Jira issues using the Jira integration, or
push images to a Docker registry as part of a job.

GITLAB CICD INTERVIEW QUESTIONS


ADVANCED SCENARIO BASED QUESTIONS

GITLAB CICD INTERVIEW QUESTIONS


• 1. Scenario: A job is failing with unclear error messages
• Solution: Each time a job runs, it generates logs that you can view in the GitLab UI. These logs contain
important debugging information. In the event of a failure, these logs often contain error messages that
can help identify the issue. You can navigate to the specific job in the "CI/CD" -> "Pipelines" section of
your project in GitLab to view the logs.
• 2. Scenario: A job takes a long time to run
• Solution: If a job takes longer than expected, it's possible that the job is not optimized. In this case, look
into the tasks that the job performs and see if there are ways to improve efficiency. You could look into
things like caching dependencies or artifacts, optimizing your code or scripts, or leveraging GitLab's
parallel job execution features.
• 3. Scenario: A job that used to pass is now failing, but no changes were made to the job
• Solution: If a job begins to fail unexpectedly, it might be due to external factors. If your job depends on
external resources (like a third-party API, a specific version of a dependency, or an external server),
changes or issues with these resources could cause the job to fail. Investigate these dependencies to
ensure they're functioning as expected.

GITLAB CICD “Jobs” INTERVIEW QUESTIONS


• 4. Scenario: A job is failing because it's not receiving the correct environment variables
• Solution: Environment variables can be set in the "Settings" -> "CI/CD" -> "Variables" section of your
project in GitLab. If your job depends on certain variables that aren't being received correctly, verify that
these variables are set and that they're being called correctly in your job.

• 5. Scenario: A job isn't being run, even though the pipeline is being executed
• Solution: This might happen if there are rules or only/except conditions in your .gitlab-ci.yml that aren't
being met. Check your job's conditions to make sure they're configured as expected.

• 6. Scenario: A job that is expected to run in parallel with others is not doing so
• Solution: Parallel jobs in GitLab are configured using the parallel keyword in your .gitlab-ci.yml. If your job
isn't running in parallel as expected, verify that this keyword is present and correctly configured for the
job.

GITLAB CICD “Jobs” INTERVIEW QUESTIONS


• Scenario : A stage is failing due to a job failure.
• In this case, you need to inspect the logs of the failed job to
determine why it failed. You can do this in the GitLab UI by going to
CI/CD -> Pipelines, clicking on the pipeline with the failed stage, and
then clicking on the failed job. This will bring up the logs for the job,
which should help identify why the job failed.
• Once the issue is fixed, you can rerun the pipeline. If the job is
successful, the stage will also succeed.

GITLAB CICD “stage” INTERVIEW QUESTIONS


• The needs keyword in GitLab CI/CD configuration is used to
implement Directed Acyclic Graph (DAG) pipelines. In simpler terms,
it allows jobs to begin as soon as their dependencies are finished,
rather than waiting for all jobs from prior stages to finish. This can
significantly speed up your pipelines.

“Needs”
stages:

- build

- test

- deploy

build_job:

stage: build

script: echo "Building the app"

test_job:

stage: test

script: echo "Testing the app"

needs: ["build_job"]

deploy_job:

stage: deploy

script: echo "Deploying the app"

needs: ["test_job"]
• In GitLab, only and except are two keywords used to control when jobs are
created. They define arrays of references for which jobs should be created.
• The only keyword in a GitLab CI/CD configuration is used to specify that a job
should only run under certain conditions or events. These conditions can be
based on branch names, tags, changes in certain files, etc.
• Here is an example of only usage:
test:
script: npm run test
only:
- master

Only and Except


• Scenario 1: Job is not running when expected.
• This could happen if the conditions specified in only are not met. For example, if you have a job
with only: master, and you're pushing to a feature branch, the job will not run. You should
adjust the only conditions to match the situations when you want the job to run.

• Scenario 2: Job is running when it shouldn't be.


• This could occur if the only conditions are too broad. For instance, if you have a job with only:
branches and you don't want it to run on the master branch, it will still run because master is a
branch. In this case, you could add an except: master to the job definition.
• As of GitLab 12.0, only/except are being replaced by the rules keyword, which provides more
flexibility. However, only/except are still supported for backward compatibility.

Only and Except


• GitLab's workflow: rules directive, introduced in GitLab 13.2, allows for more flexible control over when a
pipeline should be created. This is particularly useful when you want to avoid creating a pipeline based on
certain conditions, like changes in specific files, variables, or the nature of the pipeline trigger.
• For instance, consider this .gitlab-ci.yml configuration:
workflow:
rules:
- if: '$CI_PIPELINE_SOURCE == "schedule"'
when: always
- if: '$CI_COMMIT_REF_NAME == "master"'
when: always
- when: never
In the above example, a pipeline is created if it's triggered by a scheduled pipeline or if the branch is
"master". In all other cases (when: never), the pipeline is not created.

workflow: rules
• Scenario 1: Unnecessary pipelines are created.
• Consider a situation where you don't want a pipeline to be created when changes are only
made in the README file. This might be resolved by using workflow: rules as follows:
workflow:
rules:
- changes:
- "README.md"
when: never
- when: always
The above configuration ensures that a pipeline is not created when the only changes are in the
README.md file. In all other cases, the pipeline will be created.
• Scenario 2: Pipelines are not created when needed.
• Suppose you have a rule that only creates a pipeline when it's a scheduled pipeline. But later, you need a
pipeline to run on the master branch whenever there are any new commits. In this case, you might need
to update the workflow: rules:
workflow:
rules:
- if: '$CI_PIPELINE_SOURCE == "schedule"'
when: always
- if: '$CI_COMMIT_REF_NAME == "master"'
when: always
- when: never
The updated configuration now also creates a pipeline for any new commits to the master branch.
When using workflow: rules, make sure the conditions are correctly defined for your use case.
Misconfiguration could lead to pipelines not being created when necessary or being created when they're
not needed, as illustrated by the above scenarios.
• extends is a keyword used in GitLab CI/CD configuration to allow one job to inherit the
parameters of another. This reduces repetition and allows for cleaner, more manageable CI/CD
configurations.
• Here's a simple example:
.default_script:
script:
- echo "Hello, World!"
job1:
extends: .default_script
job2:
extends: .default_script
script:
- echo "This is job2“
In this example, job1 and job2 both extend .default_script, so they inherit the script parameter
from it. job1 doesn't override script, so it simply runs echo "Hello, World!". job2 does override
script, so it runs echo "This is job2".
• Scenario 1: DRY (Don't Repeat Yourself) principle for CI/CD scripts
• In this scenario, extends can be used to avoid repeating common scripts in
different jobs. This can make your .gitlab-ci.yml configuration file cleaner and
easier to maintain.

• Potential Issue: Misunderstanding of how extends works


• A common mistake when using extends is assuming that parameters from the
base job will be merged with those in the extending job. In reality, parameters
from the base job are overwritten by those in the extending job if they exist in
both. In the above example, job2 doesn't run echo "Hello, World!" because the
script parameter is overwritten.

extends
• Q: How would you set up a GitLab Runner to have access to a private Docker registry?
• A: To allow the GitLab Runner to access a private Docker registry, you'd need to add a
Docker configuration file (.docker/config.json) with the authentication information to
the GitLab Runner's config directory. You could also supply Docker credentials as
environment variables in the runner's configuration.

• Q: Your GitLab Runner is consuming too much memory while running jobs. What
strategies can you employ to limit its resource usage?
• A: You can use Docker's memory limit options when registering the runner. For
example, you might use the --docker-memory option to limit the maximum amount of
memory that the Docker container can use. Alternatively, you might consider running
fewer concurrent jobs on the runner.

GITLAB CICD INTERVIEW QUESTIONS


• Q: How would you handle the situation where you have many projects but want
to avoid sharing the runners between all of them?
• A: You can achieve this by setting up specific runners for each project. During
the runner registration process, you'd provide the specific project's URL and
token. These runners are then only used by the project they're assigned to.

• Q: Your CI/CD pipeline requires a significant amount of compute resources and


is slowing down other jobs on the runner. How can you solve this problem?
• A: Consider setting up dedicated GitLab Runners for these resource-intensive
jobs. You can tag these runners and in the .gitlab-ci.yml file, specify these tags
for the relevant jobs. This ensures that the heavy jobs will only run on the
dedicated runners.

GITLAB CICD INTERVIEW QUESTIONS


• Q: You're using a shared runner provided by GitLab, but your CI/CD jobs often queue for a long
time. How can you improve this situation?
• A: To alleviate this issue, you could consider setting up your own GitLab Runner. This can be a
specific runner for your project or a group runner shared between projects of your group. By
managing your own runner, you have more control over its resources and availability.

• Q: You need to set up a GitLab Runner on a machine that will be disconnected from the internet
for security reasons. How would you ensure that the runner can still execute pipeline jobs?
• A: You can set up a GitLab Runner in an offline (air-gapped) environment by pre-loading the
necessary Docker images onto the machine. You'll also need to set up a local GitLab instance or
a mirror of your remote repository, which the runner can access.

GITLAB CICD INTERVIEW QUESTIONS


• Q: Your GitLab CI/CD pipeline builds a Docker image. However, you don't want to include the
Dockerfile and related files in the repository for security reasons. How can you handle this
situation?
• A: You could store the Dockerfile and related files in a secure private repository or an artifact
repository. In your GitLab CI/CD pipeline, you can fetch these files using secure access tokens
before running the docker build command.

• Q: You're working with a monorepo that contains multiple projects, each with its own
Dockerfile. How would you set up your GitLab CI/CD pipeline to only build Docker images for
the projects that have changed in a given commit?
• A: In the .gitlab-ci.yml file, you can use the changes keyword in conjunction with rules to specify
that a job should only run when certain files change. You'd set up a separate job for each
Dockerfile, and use changes to trigger that job only when the corresponding Dockerfile or the
related source files change.

GITLAB CICD INTERVIEW QUESTIONS


• Q: You want to use Docker layer caching in your GitLab CI/CD pipeline to speed up
Docker image builds. How can you set this up?
• A: Docker layer caching can be set up using the cache keyword in your .gitlab-ci.yml
file. You would cache the directory where Docker stores its layers. However, this only
works when using the Docker executor with the GitLab Runner.

• Q: You're running GitLab Runner on a Kubernetes cluster and want to use the Docker-
in-Docker (DinD) service for your CI/CD jobs. However, you're concerned about the
security implications. How can you run Docker commands securely in this setup?
• A: Using Docker-in-Docker in a Kubernetes cluster requires running containers with
privileged security context, which is generally not recommended. As an alternative,
you can use Kaniko, a tool designed to build Docker images from a Dockerfile, inside an
unprivileged container in a Kubernetes cluster.

GITLAB CICD INTERVIEW QUESTIONS


• Q: You're using GitLab's Container Registry to store Docker images built by your CI/CD pipeline.
However, old and unused images are taking up a lot of space. How can you clean up these
images?
• A: GitLab provides an API for deleting Docker images in its Container Registry. You could create
a scheduled pipeline that uses this API to delete old and unused images. Make sure to
implement some safeguards to prevent deletion of images that are still in use.

• Q: Your GitLab CI/CD pipeline builds a Docker image and pushes it to GitLab's Container
Registry. How can you automate the deployment of this image to a Kubernetes cluster?
• A: You can set up a deployment job in your GitLab CI/CD pipeline that uses kubectl to apply a
Kubernetes manifest file. This manifest file would reference the Docker image from GitLab's
Container Registry. You'd need to update the image tag in the manifest file to match the CI
pipeline ID or another unique identifier for each pipeline run.

GITLAB CICD INTERVIEW QUESTIONS


• To automate the deployment of a Docker image from GitLab's Container Registry to a
Kubernetes cluster, you can make use of GitLab's Kubernetes integration and GitLab CI/CD.

• The workflow can be summarized as follows:

• Push code to your GitLab repository.


• GitLab CI/CD pipeline builds the Docker image.
• The Docker image is pushed to GitLab's Container Registry.
• The GitLab CI/CD pipeline deploys the image from the registry to the Kubernetes cluster.
• Here is an example of a .gitlab-ci.yml configuration file for this:

GITLAB CICD INTERVIEW QUESTIONS


stages:
- build
- deploy

variables:
KUBE_NAMESPACE: <namespace>
CI_REGISTRY_IMAGE: <registry>/<group>/<project>
CI_APPLICATION_TAG: $CI_COMMIT_SHORT_SHA

GITLAB CICD INTERVIEW QUESTIONS


# Build stage, builds the Docker image and pushes it to GitLab's Container Registry
build:
stage: build
image: docker:latest
services:
- docker:dind
script:
- docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY
- docker build -t $CI_REGISTRY_IMAGE:$CI_APPLICATION_TAG .
- docker push $CI_REGISTRY_IMAGE:$CI_APPLICATION_TAG

GITLAB CICD INTERVIEW QUESTIONS


# Deploy stage, deploys the Docker image to the Kubernetes cluster

deploy:

stage: deploy

image:

name: alpine:latest

script:

- apk add --update ca-certificates openssl curl

- curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

- chmod +x ./kubectl

- mv ./kubectl /usr/local/bin/kubectl

- kubectl config set-cluster c1 --server=$KUBE_URL --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

- kubectl config set-credentials gitlab --token=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)

- kubectl config set-context c1 --cluster=c1 --user=gitlab --namespace=$KUBE_NAMESPACE

- kubectl config use-context c1

- kubectl set image deployment/<deployment-name> <container-name>=$CI_REGISTRY_IMAGE:$CI_APPLICATION_TAG -n $KUBE_NAMESPACE

environment:

name: production

url: https://your-app-url

only:

- master
• build: Builds a Docker image and pushes it to GitLab's Container
Registry. The CI_REGISTRY_IMAGE variable represents the path to
the registry, and the CI_APPLICATION_TAG variable represents the
Git commit SHA which is used as a tag for the image.

• deploy: Deploys the Docker image to a Kubernetes cluster. The


kubectl command-line tool is downloaded and set up to interact
with the cluster. It updates the image of the Kubernetes
Deployment with the new Docker image.

GITLAB CICD Interview Questions


"An Integrated Terraform and GitLab CI/CD Pipeline for Deploying an
Azure Kubernetes Service (AKS) Cluster"

GITLAB RealWorld Example - Addon


• In GitLab repository, let's define a CI/CD template and the main
pipeline:
.gitlab/cicd-template.yml:

GITLAB RealWorld Example - Addon


# Template for Terraform jobs

.terraform:

image:

name: hashicorp/terraform:latest

entrypoint:

- '/usr/bin/env'

- 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'

# Template for validate job

.validate_job:

extends: .terraform

script:

- terraform init -backend=false

- terraform validate

GITLAB RealWorld Example - Addon


# Template for plan job
.plan_job:
extends: .terraform
script:
- terraform init
- terraform plan -out "planfile"

# Template for apply job


.apply_job:
extends: .terraform
script:
- terraform apply -auto-approve "planfile"

GITLAB RealWorld Example - Addon


.gitlab-ci.yml:
# Include CI/CD templates
include:
- local: '.gitlab/cicd-template.yml'

# Azure and Terraform variables


variables:
TF_ROOT: ${CI_PROJECT_DIR}
ARM_SUBSCRIPTION_ID: <Azure Subscription ID>
ARM_TENANT_ID: <Azure Tenant ID>
ARM_CLIENT_ID: <Azure Client ID>
ARM_CLIENT_SECRET: <Azure Client Secret>

GITLAB RealWorld Example - Addon


# Pipeline stages

stages:

- validate

- plan

- apply

# Before script

before_script:

- apt-get update -y

- apt-get install -y git

- git clone https://<username>:<password>@bitbucket.org/<username>/terraform-repo.git

- cd terraform-repo

GITLAB RealWorld Example - Addon


# Define jobs

validate:

stage: validate

extends: .validate_job

except:

- schedules

- web

.plan:

stage: plan

extends: .plan_job

only:

refs:

- branches

changes:

- "**/*.tf"

GITLAB RealWorld Example - Addon


apply:

stage: apply

extends: .apply_job

only:

refs:

- main

rules:

- changes:

- "**/*.tf"

# After script

after_script:

- cd ..

- rm -rf terraform-repo

GITLAB RealWorld Example - Addon


• .terraform: This is the base template for all the Terraform jobs. It sets the Docker image to the latest
Terraform version and sets the entrypoint, which is required for the Terraform Docker image.
• .validate_job, .plan_job, and .apply_job: These are templates for the individual jobs. Each template
extends the .terraform template and defines a script to run the corresponding Terraform command.
• variables: These are environment variables that hold the Azure credentials needed for Terraform to create
resources in Azure.
• before_script: This script runs before all jobs. It updates the system packages, installs Git, clones the
Terraform repository from Bitbucket, and changes the working directory to the repository.
• validate, .plan, and apply: These are the actual jobs in the pipeline. Each job extends the corresponding
job template and is set to run at a specific stage. The except and only keys are used to control when the
job should run.
• after_script: This script runs after all jobs. It changes the working directory back to the original location
and removes the cloned Terraform repository to clean up.

GITLAB RealWorld Example - Addon


TERRAFORM

GITLAB RealWorld Example - Addon


terraform-repo

└───modules

│ └───aks

│ │ main.

│ │ variables.

│ │ outputs.

└───environments

└───prod

│ main.

│ variables.

│ terraform.tfvars

GITLAB RealWorld Example - Addon


• modules/aks
• main.tf

GITLAB RealWorld Example - Addon


resource "azurerm_resource_group" "rg" {

name = var.resource_group_name

location = var.location

resource "azurerm_kubernetes_cluster" "aks" {

name = var.cluster_name

location = azurerm_resource_group.rg.location

resource_group_name = azurerm_resource_group.rg.name

dns_prefix = var.dns_prefix

default_node_pool {

name = var.default_node_pool_name

node_count = var.node_count

vm_size = var.vm_size

identity {

type = "SystemAssigned"

}
• variables.tf
variable "resource_group_name" { type = string }
variable "location" { type = string }
variable "cluster_name" { type = string }
variable "dns_prefix" { type = string }
variable "default_node_pool_name" { type = string }
variable "node_count" { type = number }
variable "vm_size" { type = string }

GITLAB RealWorld Example - Addon


• outputs.tf
output "kube_config" {
value = azurerm_kubernetes_cluster.aks.kube_config_raw
}

GITLAB RealWorld Example - Addon


• environments/prod
• main.tf

GITLAB RealWorld Example - Addon


provider "azurerm" {

features {}

module "aks" {

source = "../../modules/aks"

resource_group_name = var.resource_group_name

location = var.location

cluster_name = var.cluster_name

dns_prefix = var.dns_prefix

default_node_pool_name = var.default_node_pool_name

node_count = var.node_count

vm_size = var.vm_size

GITLAB RealWorld Example - Addon


• variables.tf
variable "resource_group_name" { }
variable "location" { }
variable "cluster_name" { }
variable "dns_prefix" { }
variable "default_node_pool_name" { }
variable "node_count" { }
variable "vm_size" { }

GITLAB RealWorld Example - Addon


• terraform.tfvars
resource_group_name = "<resource group name>"
location = "<location>"
cluster_name = "<cluster name>"
dns_prefix = "<dns prefix>"
default_node_pool_name = "<default node pool name>"
node_count = <node count>
vm_size = "<vm size>"

GITLAB RealWorld Example - Addon


Azure Cloud Interview Questions
• Imagine we have an Azure Web App that is experiencing performance issues during peak traffic
times. What steps would you take to diagnose and address these issues?
To address performance issues with an Azure Web App, I would first use Azure Monitor and
Application Insights to diagnose the problem. These tools provide detailed performance and
diagnostic information that can help identify bottlenecks. Depending on the diagnosis, solutions
could include scaling up the app service plan, implementing auto-scaling, or optimizing the
application code.

• You are tasked with migrating a large legacy application from on-premises to Azure. How would
you approach this process? What factors would you consider, and what potential challenges do
you anticipate?
Migrating a large legacy application to Azure would be a multi-step process. I would start by
evaluating the application's architecture and dependencies. Azure Migrate is a good tool for this.
Then I would consider whether the application could be rehosted (lift-and-shift), refactored
(modified for cloud compatibility), or even fully redesigned for the cloud. The main challenges
usually lie in managing dependencies, data migration, and ensuring minimal downtime during the
migration process.

Azure Cloud Interview Questions


• Our organization is interested in implementing Azure's Infrastructure as Code (IaC) for resource
deployment. Could you please explain how you would implement this using Azure Resource
Manager (ARM) templates?
Azure Resource Manager (ARM) templates allow us to define Infrastructure as Code (IaC) for
Azure. By using JSON syntax, we can specify the infrastructure's configuration, saving it in source
control for versioning and repeatability. With IaC, we can ensure consistent setups, minimize
human error, and streamline the deployment process.

• Suppose we're implementing a microservices architecture in Azure. Which services would you
recommend we use, and why?
For a microservices architecture in Azure, I would recommend Azure Kubernetes Service (AKS) for
orchestrating containerized applications, Azure Functions for serverless workloads, and Azure
Logic Apps for workflows. These services offer scalability, flexibility, and simplify the management
of microservices.

Azure Cloud Interview Questions


• We have several Azure Virtual Machines that are underutilized outside of business
hours. What strategies could you implement to reduce costs?
Azure offers various ways to manage costs with VMs. You could use Azure DevTest Labs
to automatically shut down VMs during off-hours. Alternatively, you could resize the VMs
to a smaller size during non-peak hours, or even deallocate them entirely when not in
use.

• Let's say we've just experienced a major data breach. As a senior engineer, how would
you respond? What steps would you take to investigate the breach and prevent it from
happening in the future?
In a data breach scenario, I would start by isolating the compromised resources and then
using Azure Security Center and Azure Sentinel to investigate the breach. This could
involve analyzing logs, user activity, and network traffic. To prevent future breaches, I
would review and tighten security policies, implement Multi-Factor Authentication, and
consider additional monitoring and alerting.

Azure Cloud Interview Questions


• If you were tasked with setting up a DevOps pipeline using Azure DevOps, what steps
would you take? How would you ensure that the pipeline supports continuous
integration and continuous delivery?
For setting up a DevOps pipeline in Azure DevOps, I would start by defining the build
pipelines for continuous integration and then set up release pipelines for continuous
delivery. I would also ensure that the pipeline includes stages for testing and quality
checks and implement Infrastructure as Code (IaC) for resource deployment.

• Imagine a situation where you need to make the same VM configurations across
multiple Azure subscriptions. What Azure service would you utilize to ensure
consistency and reduce manual effort?
To make the same VM configurations across multiple Azure subscriptions, I would use
Azure Policy and Azure Blueprints. These services ensure compliance and consistency
across resources and subscriptions.

Azure Cloud Interview Questions


• Suppose you have an Azure Function that is not scaling as expected. What could
be the cause, and how would you troubleshoot this issue?
If an Azure Function is not scaling as expected, it could be due to a number of
factors, such as a misconfigured scaling policy or resource limitations in the App
Service plan. I would review the scale-out settings and consider moving to a
Premium plan if necessary.

• You need to design a disaster recovery plan for an Azure-based application.


What would be your approach and which Azure services would you consider in
your plan?
For a disaster recovery plan, I would consider using a combination of Azure
services like Azure Site Recovery for replicating workloads, Azure Backup for data
protection, and Azure Traffic Manager for DNS level failover. The choice of
services would depend on the application's requirements and the organization's
recovery objectives.
Azure Cloud Interview Questions
Q: Our application hosted in Azure is experiencing performance issues during peak hours. How
would you implement auto-scaling in this scenario to maintain the performance?
A: Azure provides an auto-scaling feature that can be configured based on certain rules like CPU
usage, memory usage, or a schedule. I would configure the auto-scaling settings of the App
Service or Virtual Machine Scale Sets (VMSS) to scale out during peak hours and scale in during
off-peak hours to maintain application performance and also control costs.
Q: We have a requirement to segregate network traffic between different departments in the
company within Azure. What approach would you use to implement this?
A: Azure provides Network Security Groups (NSGs) which can be used to control inbound and
outbound network traffic to Azure resources. I would create separate NSGs for different
departments with appropriate inbound and outbound rules to segregate the network traffic.
Q: We have been using GitLab for CI/CD and now want to shift to Azure DevOps. How would you
migrate the pipelines without disrupting the workflow?
A: First, I would analyze the existing GitLab CI/CD pipeline to understand the build and
deployment processes. Then, I would replicate these processes in Azure DevOps by creating build
and release pipelines. Initially, I would run both pipelines in parallel for some time to ensure that
the Azure DevOps pipeline works as expected before decommissioning the GitLab pipeline.

Azure Cloud Interview Questions


• Q: We are planning to deploy a multi-region, highly available application in Azure using
Terraform. How would you approach this?
• A: I would write a Terraform script that creates resources like Azure Traffic Manager or Azure
Front Door for global load balancing, VM scale sets or Azure Kubernetes Service (AKS) in
multiple regions for high availability, and Azure SQL Database or Cosmos DB with geo-
replication for data redundancy. Then I would parameterize the Terraform script so that it can
be used to deploy the same infrastructure in multiple regions.

• Q: You're tasked with migrating an application to AKS. However, the application relies on a
stateful service that can't be containerized. How would you handle this while ensuring the
application remains functional?
• A: In Azure, stateful services that can't be containerized can be managed separately and
consumed by the applications running in AKS. Depending on the type of service, Azure offers a
range of managed services, such as Azure SQL Database, Azure Cosmos DB, or Azure File
Storage. I would migrate the stateful service to the appropriate managed service in Azure and
modify the application configuration in AKS to connect to this service.

Azure Cloud Interview Questions


• Q: Imagine you're working on a multi-tier application in Azure where each tier needs to
communicate with each other, but they should be isolated from external networks. What
architecture would you suggest?
• A: I would suggest using Azure Virtual Networks (VNet) with multiple subnets, one for each tier.
This way, each tier can communicate with each other while being isolated from the outside
world. To further isolate the tiers from each other, we can use Network Security Groups (NSGs)
to control traffic between the tiers.

• Q: You're tasked with setting up a Site-to-Site VPN between an on-premises datacenter and
Azure to allow secure, encrypted communication. How would you approach this?
• A: I would use Azure VPN Gateway to create a Site-to-Site VPN. This involves creating a Virtual
Network Gateway in Azure, setting up a local network gateway to represent the on-premises
VPN device and network, and then creating a connection between the two gateways.

Azure Cloud Interview Questions


• Q: We have a requirement to route certain traffic from our Azure VNet to a network
virtual appliance instead of taking the default route. How would you implement this?
• A: This can be achieved using User Defined Routes (UDR) in Azure. We can create a
route table, define the routes that point to the network virtual appliance, and then
associate the route table with the subnet.

• Q: We have an AKS cluster where some pods need to connect to an on-premises


database. However, the database server should only receive traffic from a known IP
address. How would you handle this?
• A: I would suggest implementing Azure Container Instances with a static IP that the on-
premises database can whitelist. Alternatively, Azure CNI plugin for AKS can be used to
assign an IP address per pod, but this approach consumes more IP addresses.

Azure Cloud Interview Questions


• Q: You need to prevent DDoS attacks on your Azure application. What Azure
services would you utilize?
• A: Azure provides a service called Azure DDoS Protection, which can be used to
protect applications against DDoS attacks. It automatically handles the scaling
required to absorb the attack and leverages AI to tune protection policies.

• Q: We have a Virtual Machine in Azure that requires a static public IP address.


How would you configure this?
• A: In Azure, we can assign a public IP address resource to a VM and set the
assignment method as "Static". This IP address will be retained across reboots
and redeployments.

Azure Cloud Interview Questions


• Q: We're planning to host a global application in Azure with high availability. How
would you ensure the application is always accessible to users from different regions
with minimal latency?
• A: I would suggest using Azure Traffic Manager or Azure Front Door. These services
route incoming traffic to the nearest or healthiest instance of the application based on
routing methods like latency, priority, or weighted round-robin.

• Q: An application running in Azure needs to connect securely to a third-party API over


the internet. The third-party requires the source IP to be whitelisted. How would you
achieve this?
• A: We can associate a static public IP address with the Azure resource that
communicates with the API, such as a VM or an App Service. Then, provide this static
IP to the third-party for whitelisting.

Azure Cloud Interview Questions


• Q: Our company has strict security requirements and we need to ensure that only
authorized users can manage Azure resources. How would you implement this?
• A: I would implement Role-Based Access Control (RBAC) in Azure. RBAC allows us to
grant only the necessary permissions to users based on their role in the organization.
For instance, a network engineer could be given permissions to manage only network
resources.

• Q: We have sensitive data stored in Azure. How can we protect this data and ensure
that even if a breach occurs, the data cannot be read by unauthorized users?
• A: To protect sensitive data in Azure, I would use Azure Disk Encryption for data at rest,
and Azure Key Vault to store and manage cryptographic keys and other secrets used by
cloud applications and services. For data in transit, I would ensure that all data
communication happens over a secure channel using SSL/TLS encryption.

Azure Cloud Interview Questions


• Q: Our company is required to comply with specific regulations. How can we ensure our Azure
resources are compliant and maintain this compliance over time?
• A: Azure Policy and Azure Blueprints can be used to ensure compliance with company or
industry standards. Azure Policy allows you to define policies for your Azure resources to ensure
they comply with your standards and the Azure Blueprint service lets you define a repeatable
set of Azure resources that adhere to your organization's standards, patterns, and
requirements.

• Q: A web application hosted on Azure App Service was recently the target of a SQL injection
attack. How can you protect the application from such attacks in the future?
• A: Azure provides a service called Azure Web Application Firewall (WAF) which can protect your
applications from common exploits and vulnerabilities like SQL injection and cross-site scripting
attacks. I would set up a WAF in front of the App Service and configure it to block such attacks.

Azure Cloud Interview Questions


• Q: How would you ensure that the virtual machines in Azure are secure and
regularly updated with the latest patches?
• A: I would use Azure Security Center and Azure Update Management. Azure
Security Center provides threat protection for your Azure resources and Azure
Update Management provides a set of tools to manage updates and patches for
your Azure VMs.

• Q: You need to set up a secure connection between two Azure Virtual Networks
located in different regions. How would you implement this?
• A: To create a secure connection between two Azure VNets, I would set up a
VNet-to-VNet VPN Gateway connection. This allows direct and secure
connectivity between the VNets over the Azure backbone network.

Azure Cloud Interview Questions


• Q: We have a requirement to track and monitor all activities in our Azure environment.
What services would you use to implement this?
• A: I would use Azure Monitor and Azure Activity Log. Azure Monitor can collect,
analyze, and act on telemetry data from your Azure resources, while Azure Activity Log
provides insight into subscription-level events.

• Q: We have a web application that experiences uneven traffic patterns, often spiking
during specific hours. How would you ensure the application remains available during
these peak times using Azure services?
• A: I would consider using the Azure Load Balancer or Azure Application Gateway (for
layer 7) to distribute the traffic among multiple instances of the web application.
Additionally, I would leverage Azure's autoscaling features to automatically adjust the
number of instances based on the load.

Azure Cloud Interview Questions


• Q: Our application hosted in multiple Azure regions experiences high latency when
accessed from different parts of the world. How could you improve the user
experience?
• A: I would recommend using Azure Traffic Manager or Azure Front Door. These services
route incoming traffic to the nearest or the most responsive instance of the
application, helping to reduce latency.

• Q: We have a stateful application hosted on multiple VMs in Azure that maintains a


continuous connection with the users. What type of load balancing solution would you
suggest?
• A: For stateful applications, I would recommend using Azure Application Gateway with
session affinity (sticky sessions) enabled. This ensures that all requests from a user are
directed to the same backend VM where the user's session state is preserved.

Azure Cloud Interview Questions


• Q: You need to protect an Azure-hosted web application from DDoS attacks.
How would you ensure the application remains available during an attack?
• A: I would enable Azure DDoS Protection on the virtual network. This service
automatically handles the scaling required to absorb a DDoS attack.
Additionally, I would also use Azure Load Balancer or Azure Application
Gateway to distribute traffic evenly among instances.

• Q: We need to balance incoming traffic among a set of VMs based on the URL
path. How would you accomplish this in Azure?
• A: Azure Application Gateway allows URL-based routing rules. We can configure
it to route traffic to different backend pools based on the URL path.

Azure Cloud Interview Questions


• Q: We have an AKS cluster hosting a number of microservices. How would you
implement load balancing for incoming traffic to these microservices?
• A: For load balancing in AKS, I would use an Ingress controller like NGINX or Azure
Application Gateway Ingress Controller (AGIC). An Ingress controller is a Kubernetes
resource that manages external access to the services in a cluster, typically HTTP.

• Q: We have several applications in Azure that need to access sensitive information like
connection strings and passwords. How can you securely store and manage this
information?
• A: I would suggest using Azure Key Vault. It's a cloud service for securely storing and
accessing secrets. Application secrets can be moved to Key Vault and accessed securely
using its API, minimizing the exposure of sensitive information.

Azure Cloud Interview Questions


• Q: You're tasked with creating a virtual machine that needs to be directly accessible
from the internet. What steps will you take to ensure it's securely accessible?
• A: To create a secure VM accessible from the internet, I would use Network Security
Groups (NSGs) to limit inbound and outbound traffic to necessary ports only. I'd also
restrict SSH/RDP access to certain IP ranges, use Azure Key Vault for any secrets, and
consider using Azure Disk Encryption to protect the data at rest.

• Q: We have an AKS cluster hosting a critical application. How would you ensure this
application remains highly available?
• A: I would design the application based on microservices architecture and deploy each
service as a separate deployment in AKS. This way, if a service fails, it can be
independently restarted. For the AKS cluster, I would distribute the nodes across
multiple Availability Zones (AZs) for better resiliency.

Azure Cloud Interview Questions


• Q: How can you ensure that the keys stored in Azure Key Vault are highly secure
and cannot be accessed even if someone gains access to our Azure account?
• A: To secure the keys in Azure Key Vault, I would enforce multi-factor
authentication and use RBAC to restrict access. I would also monitor the access
logs regularly for any unauthorized access.

• Q: Our application hosted on Azure VMs needs to scale based on CPU usage.
How would you implement this?
• A: I would use Azure VM Scale Sets with autoscaling configured based on CPU
usage. The scale set would automatically increase the number of VM instances
when CPU usage is high and decrease when it's low.

Azure Cloud Interview Questions


• Q: We have a microservices-based application that needs to be migrated to AKS. However, each
microservice has its own database. How would you handle this while ensuring minimum latency
between the services and their respective databases?
• A: One approach would be to create the databases as separate Azure Managed Database
instances and connect to them from the AKS pods over the internal Azure network. Another
approach, if the databases can be containerized, would be to run them as StatefulSets in the
same AKS cluster. This would minimize latency but might increase complexity.

• Q: How would you ensure that the secrets in Azure Key Vault are rotated regularly and the
process is automated?
• A: I would use Azure Key Vault's secret rotation feature along with Azure Logic Apps or Azure
Functions. Azure Logic Apps/Functions can be scheduled to run at regular intervals to rotate the
secrets in Key Vault.

Azure Cloud Interview Questions


• Q: You need to deploy a large number of identical VMs. How can you simplify this
process?
• A: I would leverage Azure Resource Manager (ARM) templates or use Azure VM Scale
Sets. ARM templates allow you to deploy and manage Azure resources consistently,
while VM Scale Sets lets you manage a group of load-balanced VMs.

• Q: Your team wants to implement a blue-green deployment strategy for a web


application running on AKS. How would you accomplish this?
• A: In AKS, I would accomplish this by having two separate deployments for the blue
and green environments. I would then use a Kubernetes service or ingress controller to
control the traffic routing. By changing the service selector or ingress routing rules, we
can switch traffic between the two deployments.

Azure Cloud Interview Questions


• Q: What would be your approach to backing up secrets stored in Azure Key
Vault?
• A: Azure Key Vault provides backup and restore options for secrets, keys, and
certificates. The backup operation can be performed through Azure portal,
PowerShell, CLI, or SDKs. Regular backups can be scheduled using Azure
Automation or Logic Apps.

• Q: We have an application running on an Azure VM that needs to send email


notifications. How would you implement this?
• A: I would suggest using Azure SendGrid service for this requirement. SendGrid
provides reliable transactional email delivery, scalability, and real-time analytics
along with flexible APIs that make custom integration easy.

Azure Cloud Interview Questions


• Q: How would you isolate and secure AKS worker nodes within the Azure
network?
• A: I would put the AKS worker nodes in a private subnet within an Azure VNet.
To further isolate the worker nodes, we could use Network Policies to control
the traffic flow between pods in a cluster.

• Q: How would you manage the state of a stateful application in AKS?


• A: Stateful applications in AKS can be managed using StatefulSets and Persistent
Volumes. StatefulSets provide a way to manage the deployment and scaling of a
set of Pods, and maintain a sticky identity for these Pods. Persistent Volumes
can be used to provide a file system that can be mounted to the Pods, allowing
the application data to persist beyond the life of the Pods.

Azure Cloud Interview Questions


• Q: You need to reduce the costs of running a non-critical workload on Azure
VMs. What steps can you take?
• A: Azure offers multiple ways to reduce VM costs. Some options include: using
reserved instances for predictable workloads, shutting down VMs during off-
peak hours, using VM scale sets to autoscale based on demand, or choosing a
smaller VM size if less resources are needed.

• Q: You have a requirement to store large amounts of unstructured data


alongside an AKS application. What Azure storage solution would you use?
• A: For large amounts of unstructured data, I would recommend using Azure
Blob Storage. Azure Blob Storage can be mounted into AKS pods, allowing the
application to read and write data.

Azure Cloud Interview Questions


• Q: How would you share sensitive data between Azure VMs securely?
• A: I would store the sensitive data in Azure Key Vault and provide the
VMs with managed identities to access the Key Vault. Managed identities
provide an automatic and secure way to authenticate to Key Vault
without having to manage secrets.

• Q: You need to connect two Azure Virtual Networks (VNets) that reside in
two different regions. How would you accomplish this?
• A: I would use VNet peering. Azure VNet peering allows for seamless
connectivity between VNets in different regions through the Azure
backbone network. This type of peering is known as Global VNet peering.

Azure Cloud Interview Questions


• Q: Your company has an on-premise datacenter and an Azure environment. You are
tasked with setting up a secure connection between the two for regular data transfer.
What would you use?
• A: To securely connect the on-premise datacenter to Azure, I would set up a Site-to-
Site VPN using Azure VPN Gateway. This creates a secure, encrypted tunnel for data to
pass between the on-premise network and the Azure VNet.

• Q: We have multiple VNets in Azure and need to ensure that resources in these VNets
can communicate with each other. How can you ensure this communication is efficient
and secure?
• A: I would use VNet peering to connect the VNets. VNet peering allows for direct
network-to-network connectivity using Azure's network backbone. For security, I
would use Network Security Groups (NSGs) and Azure Firewall to control traffic flow.

Azure Cloud Interview Questions


• Q: Our organization wants to extend our on-premise Active Directory to Azure. We
want a secure network connection between our on-premise network and Azure. How
would you achieve this?
• A: I would recommend setting up a Site-to-Site VPN for a secure connection. If there's
a need for a more robust, high-speed, and reliable connection, consider Azure
ExpressRoute. Once the connection is established, you can set up an Azure AD Domain
Services instance and synchronize it with the on-premise AD.

• Q: How would you ensure redundancy and a high-availability setup for the VPN
connection between the on-premises network and Azure?
• A: I would set up redundant VPN gateways on both the on-premises and Azure sides.
Additionally, using Azure's VPN Gateway, you can configure active-active VPN gateways
for both VPN and ExpressRoute to achieve higher availability.

Azure Cloud Interview Questions


• Q: How can you secure traffic between subnets within the same Azure VNet?
• A: To secure traffic between subnets within the same VNet, you can use
Network Security Groups (NSGs) to control inbound and outbound traffic to a
subnet. You can also use Azure Firewall or Network Virtual Appliances for more
advanced filtering.

• Q: You have several VNets across different regions and you need to enable
communication between all VNets. What is the best way to achieve this?
• A: I would use Azure VNet peering and create a peering relationship between
each VNet. For a more scalable solution, you could consider using Azure Virtual
WAN which provides a way to connect VNets, office branches and other
resources in a hub and spoke model.

Azure Cloud Interview Questions


• Q: Let's say you have an Azure App Service and an Azure SQL Database. You've peered
the VNet containing the App Service with the VNet containing the SQL Database. Can
your App Service now access the SQL Database over the VNet?
• A: By default, no. Even though the VNets are peered, App Services can't access
resources over VNet peering. To allow the App Service to access the SQL Database, you
would need to use VNet Integration or Private Endpoint for the App Service.

• Q: You have two peered VNets: VNet A and VNet B. VNet A is also peered with VNet C.
Does this mean that VNet B and VNet C can communicate with each other?
• A: Not necessarily. VNet peering is not transitive. If you want VNet B and VNet C to
communicate, you would have to create a separate peering between them.

Azure Cloud Interview Questions


• Q: We have a VPN Gateway setup in active-active mode with the on-premises network.
We also have ExpressRoute setup for the same VNet. Which connection will the traffic
take by default?
• A: By default, if both a VPN and ExpressRoute connections exist for a VNet, the
outbound traffic will prefer ExpressRoute over VPN regardless of the route's prefix
length.

• Q: Your organization has a requirement to use a specific range of public IP addresses


for the traffic originating from your Azure resources. How would you accomplish this?
• A: Azure Public IP Prefix enables you to use a specific range of public IP addresses.
Once you obtain a public IP prefix, you can assign the IP addresses from this prefix to
your Azure resources.

Azure Cloud Interview Questions


• Q: You're working with a large enterprise that has multiple VNets across different Azure regions
and subscriptions. They require connectivity between all VNets with centralized control and
visibility. What solution would you recommend?
• A: I would recommend using Azure Virtual WAN. Virtual WAN provides a centralized, global
transit network architecture in Azure across multiple regions and subscriptions, and it
integrates with Azure Monitor and Azure Security Center for control and visibility.

• Q: Your organization requires an Azure-based solution that allows remote employees to


connect securely to the Azure VNet from their personal devices. How would you fulfill this
requirement?
• A: I would suggest Azure Point-to-Site (P2S) VPN. It allows secure connections from individual
devices to the Azure VNet. For better security, Azure AD can be used for authentication, and
Conditional Access policies and Multi-Factor Authentication can be enforced.

Azure Cloud Interview Questions - Advanced


• Q: You're architecting a multi-tier web application on Azure. The different tiers (web,
application, and database) are hosted in different subnets within a VNet. How would
you restrict traffic between the tiers to only necessary communication?
• A: I would use Network Security Groups (NSGs) with appropriate rules to control traffic
between the subnets. Additionally, I could use Azure Firewall for more granular control
and logging. For restricting outbound internet access from specific tiers, I might use
Azure NAT Gateway or User-defined Routes (UDR).

• Q: You have multiple Azure Kubernetes Service (AKS) clusters in different regions. How
can you ensure private and secure communication between the clusters?
• A: I would implement a combination of Azure Private Link and inter-region VNet
peering. Azure Private Link enables private connectivity to AKS clusters, and VNet
peering allows for secure communication between VNets in different regions.

Azure Cloud Interview Questions - Advanced


• Q: Your company has an on-premise network and an Azure VNet. They're connected via Site-to-Site VPN.
The on-premise network has a lot of outbound traffic to the internet. How can you offload this traffic to
the Azure VNet to reduce costs?
• A: I would use Azure's VPN Gateway in conjunction with Azure's NAT Gateway. The on-premise network
can route its outbound internet traffic to Azure VNet via the Site-to-Site VPN. This traffic can then be sent
to the internet via Azure's NAT Gateway, potentially reducing the outgoing traffic costs.

• Q: You have to set up a highly available, global application on Azure. The application should be close to
end users and provide fast, secure content delivery with intelligent threat protection. How would you
design this?
• A: I would utilize Azure Front Door for global load balancing and site acceleration. I'd host the application
in multiple regions for high availability and closeness to end users. Azure Front Door provides SSL offload
and application acceleration at the edge close to end users, as well as integrated WAF for threat
protection.

Azure Cloud Interview Questions - Advanced


• Q: You're running a distributed, stateful application on Azure Kubernetes Service (AKS). To
ensure high availability, the application is replicated across multiple AKS clusters in different
Azure regions. How would you manage application state across these clusters?
• A: For managing the state across multiple clusters, Azure Cosmos DB could be used as it
provides multi-master support and global distribution. Application instances in different AKS
clusters would interact with Cosmos DB to maintain and retrieve state.

• Q: You're designing a hybrid cloud strategy for a large organization with stringent security and
compliance requirements. The organization plans to keep sensitive data on-premises but wants
to utilize cloud resources for compute. How would you design a secure solution for this
scenario?
• A: I would suggest a strategy that uses Azure Arc and Azure Private Link. Azure Arc can extend
Azure management capabilities to on-premises datacenters, and Azure Private Link can enable
secure and private access to Azure services, keeping data off the public internet.

Azure Cloud Interview Questions - Advanced


• Q: You are managing a global e-commerce application hosted in Azure. The application uses
Azure CDN for content delivery. You need to ensure that some content is not cached in specific
geographical regions due to legal restrictions. How would you accomplish this?
• A: You can control Azure CDN caching behavior and create a custom caching rule in the Azure
portal. You can then specify match conditions and actions to take when a match occurs, which
includes preventing caching for specific geographical regions.

• Q: Your company has an existing application hosted on VMs in an Azure VNet. You're planning
to migrate the application to Azure Kubernetes Service (AKS). The application needs to access
an Azure SQL Database using a private endpoint. How would you design the network
architecture in this scenario?
• A: The AKS cluster can be configured to use Azure CNI networking, which assigns an IP address
from the VNet subnet to each pod. This allows pods in the AKS cluster to communicate directly
with the Azure SQL Database via the private endpoint.

Azure Cloud Interview Questions - Advanced


• Q: You're running a multi-tier application on Azure. The web tier is hosted on Azure
App Service and the data tier uses Cosmos DB. You need to ensure only the web tier
can access Cosmos DB and all other access is blocked. How would you accomplish this?
• A: You can use Azure Private Link to expose Cosmos DB over a private endpoint in the
VNet hosting the App Service. Then you can configure the network access control on
Cosmos DB to block all other networks.

• Q: Your organization is migrating a high-throughput, low-latency application to Azure.


The application uses a message-based architecture with a publisher-subscriber pattern.
Which Azure service would be best suited for this use case?
• A: Azure Service Bus Premium tier would be a good choice for this use case. It provides
high-throughput, low-latency messaging with features like duplicate detection,
transaction support, and sessions which can help maintain message order.

Azure Cloud Interview Questions - Advanced


• Q: Your company uses Azure Key Vault to store secrets required by applications. The
application hosted on Azure Kubernetes Service (AKS) needs to access a secret from
Key Vault. How can you securely implement this?
• A: I would use Azure Key Vault Provider for Secrets Store CSI Driver on Azure
Kubernetes Service. This allows AKS to read the secrets directly from Azure Key Vault
into Kubernetes Secrets.

• Q: How would you enforce least privilege access to the secrets stored in Azure Key
Vault?
• A: I would use Azure Role-Based Access Control (RBAC) to assign specific permissions
to users, groups, and applications. For example, an application may have only "get" and
"list" permissions for secrets, ensuring they cannot modify or delete secrets.

Azure Cloud Interview Questions - Advanced


• Q: You have Azure SQL databases that need to be encrypted at rest using your own key.
Where would you store this key and how would you implement encryption?
• A: I would store the key in Azure Key Vault and implement Transparent Data Encryption
with a customer-managed key. Azure SQL will automatically manage the encryption
and decryption process while ensuring the key is securely stored and managed in Key
Vault.

• Q: Your organization needs to audit and monitor all activities related to secrets, keys,
and certificates in Azure Key Vault. How would you implement this?
• A: I would enable Azure Key Vault's logging and integrate it with Azure Monitor and
Azure Log Analytics. This would allow us to create alerts based on specific activities and
have a centralized platform to monitor and analyze the logs.

Azure Cloud Interview Questions - Advanced


• Q: You're storing sensitive customer data in an Azure Storage
account. You're required to protect this data using a customer-
managed key. Where would you store this key and how would you
implement this?
• A: The customer-managed key should be stored in Azure Key Vault.
To implement, you would configure Azure Storage Service
Encryption to use the key stored in Key Vault for encryption of the
data.

Azure Cloud Interview Questions - Advanced


• Q: How would you ensure Azure Key Vault is secure and always available,
considering that it's a central component in your security strategy?
• A: I would enable Azure Private Link for Key Vault to ensure that it's not
accessible over the internet. For high availability, I would use Azure's geo-
redundancy feature to replicate Key Vault across Azure regions.

• Q: Your organization needs to rotate keys used in Azure services regularly. How
would you automate this process?
• A: Azure Key Vault supports key rotation and auditing. I would create an Azure
Function or a Logic App to automate key rotation in Key Vault. This automation
could be triggered on a schedule that aligns with the organization's key rotation
policy.

Azure Cloud Interview Questions - Advanced


• Q: How would you handle a scenario where you need to share a secret stored in Azure Key
Vault with a third party without giving them direct access to the Key Vault?
• A: I would create an Azure Function that retrieves the secret from Key Vault and shares it
securely with the third party. This way, the third party doesn't have direct access to the Key
Vault, and I can add additional security measures in the function, such as IP restrictions or
authentication.

• Q: You have an application running on Azure virtual machines that needs to authenticate with
other Azure services. How would you secure the credentials required for authentication?
• A: I would use Managed Identities for Azure resources. This feature provides an automatically
managed identity in Azure AD that the application can use to authenticate to any service that
supports Azure AD authentication, without needing any credentials in the application code.

Azure Cloud Interview Questions - Advanced


• Q: How would you mitigate the risk of an unauthorized person or service accessing the Azure
Key Vault and retrieving sensitive information if they somehow obtained sufficient privileges?
• A: I would implement Just In Time (JIT) access and Privileged Identity Management (PIM) to
ensure that users are only given the necessary privileges for the required duration when
needed. I would also enable Azure AD Conditional Access to restrict access based on user,
location, and device status. Finally, I would enable Azure Key Vault's Firewalls and Virtual
Networks service to limit access to known IPs or VNet.

• Q: Your company's security policy requires that secrets stored in Azure Key Vault be periodically
backed up and recoverable. How would you implement this?
• A: Azure Key Vault supports backup and restore of individual keys, secrets, and certificates. I
would create an Azure Logic App or Azure Function to automate the process of backing up keys,
secrets, and certificates from Key Vault to a secure storage account on a regular schedule.

Azure Cloud Interview Questions - Advanced


• Q: Your company is implementing a centralized logging solution that requires all logs
from Azure Key Vault to be sent to a third-party SIEM tool. How would you accomplish
this?
• A: I would integrate Azure Key Vault with Azure Monitor and Azure Event Hubs. Key
Vault logs can be streamed to an Event Hub, and the third-party SIEM tool can be
configured to ingest the logs from the Event Hub.

• Q: A third-party service needs to be able to verify the integrity of documents stored in


an Azure Blob Storage. How would you use Azure Key Vault to accomplish this?
• A: I would use Key Vault's key signing feature to generate a digital signature for each
document when it's stored in the Blob Storage. The third-party service can then verify
the signature using the public key from the Key Vault, ensuring the document's
integrity.

Azure Cloud Interview Questions - Advanced


• Q: How would you enable users in your organization to sign in to Azure using their organization email?
• A: I would synchronize the organization's user directory with Azure Active Directory (Azure AD) using
Azure AD Connect. This would allow users to use their existing organization email credentials to sign in to
Azure.

• Q: Your organization has a security requirement that users must provide a second form of authentication
when logging in to Azure. How would you implement this?
• A: I would enable Multi-Factor Authentication (MFA) in Azure Active Directory. This would require users to
provide a second form of authentication, such as a phone call, text message, or mobile app notification, in
addition to their password.

• Q: Some users in your organization need to be able to create and manage resources in Azure, while others
should only have read access. How would you manage these access levels?
• A: I would use Azure Role-Based Access Control (RBAC) to assign the appropriate roles to users. Azure
RBAC has several built-in roles such as Owner, Contributor, and Reader, which can be assigned to users at
different scopes (e.g., subscription, resource group, or resource level) based on their responsibilities.

Azure Cloud Interview Questions - Advanced


• Q: You have a requirement to prevent users in your organization from logging in to Azure outside of office
hours. How would you implement this?
• A: I would create a Conditional Access policy in Azure Active Directory that blocks sign-in attempts outside
of office hours. Conditional Access allows you to create policies that enforce security requirements based
on conditions such as user, location, device, and sign-in risk.
• Q: Your organization needs to ensure that any unusual or risky sign-in attempts to Azure are detected and
addressed promptly. How would you achieve this?
• A: I would enable Azure AD Identity Protection, which uses machine learning algorithms to detect risky
sign-in behavior. When a risky sign-in is detected, the system can automatically respond based on policies
set by the administrator, such as blocking the sign-in or requiring multi-factor authentication.

• Q: Your organization uses Microsoft 365 and Azure. You need to ensure that a user's sign-in session to
Microsoft 365 is also valid for Azure to provide a seamless user experience. How would you implement
this?
• A: Since Microsoft 365 and Azure use the same identity platform, Azure Active Directory, a user's sign-in
session is already valid across both services. If needed, I would fine-tune the session lifetime settings in
Azure AD to meet the organization's requirements for user experience and security.

Azure Cloud Interview Questions - Advanced


• Q: Your organization uses Azure Active Directory (Azure AD) and has a requirement to enforce
password complexity rules that exceed the Azure AD default settings. How would you
implement this?
• A: I would use Azure AD Password Protection and create a custom banned password list that
includes commonly used passwords specific to the organization. Additionally, I'd enable
Password Protection on Windows Server Active Directory to enforce the same complexity rules
on-premises.

• Q: You've configured Multi-Factor Authentication (MFA) for your Azure users. However, some
users who travel frequently are unable to receive phone calls or text messages for the second
factor. How can you ensure they're still able to authenticate?
• A: I would recommend those users to use the Microsoft Authenticator app for MFA. This app
can generate an OATH verification code for the second factor, which doesn't require a phone
call or text message.

Azure Cloud Interview Questions - Advanced


• Q: You've enabled Azure AD Identity Protection to detect risky sign-in behavior.
However, some legitimate sign-in attempts are being incorrectly flagged as risky. How
would you address this issue?
• A: I would review and adjust the risk policies in Azure AD Identity Protection. I could
set the policies to "Allow access" but require multi-factor authentication and require
users to change their password, rather than block the sign-in attempts.

• Q: You're required to configure Single Sign-On (SSO) for a third-party SaaS application
that your organization uses. The SaaS application supports SAML 2.0. How would you
implement this in Azure AD?
• A: I would use Azure AD's non-gallery application template to configure SSO for the
SaaS application. This involves setting up a SAML-based Sign-On method, where I'd
provide the SAML details of the SaaS application and map the user attributes.

Azure Cloud Interview Questions - Advanced


• Q: Your organization is concerned about potential phishing attacks and wants to ensure that
users are adequately protected when they sign in to Azure. How would you implement this?
• A: I would enable Azure AD Conditional Access policies that apply multi-factor authentication,
especially for sign-ins from unfamiliar locations or devices. Also, I would use Azure AD's user
risk policy to challenge or block access if the sign-in is deemed risky.

• Q: Your organization needs to comply with data residency regulations, and you need to
configure Azure so that users' data is only stored in specific geographical locations. How would
you implement this?
• A: When creating resources in Azure, I would ensure that they are created in the regions that
comply with the data residency regulations. For existing resources, I might need to migrate
them to appropriate regions. Additionally, for Azure services that replicate data across regions
for redundancy, I would ensure that the paired regions also comply with the regulations.

Azure Cloud Interview Questions - Advanced


• Q: Your application running on AKS needs to store large files. What kind of storage would you use and why?

• A: I would use Azure Blob Storage as it is ideal for storing large amounts of unstructured and semi-structured data like files.

• Q: How would you securely manage image deployment in AKS, considering that your images are stored in Azure Container
Registry (ACR)?

• A: I would use Azure Active Directory (Azure AD) service principals and assign them the appropriate roles to pull images from
the ACR.

• Q: You have a requirement to run batch jobs in your AKS cluster. How would you handle this?

• A: I would use Kubernetes Jobs or CronJobs for running batch processing tasks.

• Q: Your organization wants to implement blue-green deployment in AKS. How would you do this?

• A: I would create two separate environments (blue and green) within the same AKS cluster, each represented by a separate
Kubernetes namespace. I'd then use Kubernetes Services to switch traffic between the environments.

Azure Cloud Interview Questions - Advanced


• Q: You are asked to deploy a stateful application on AKS. What do you need to consider?

• A: I would use StatefulSets, which is a Kubernetes object that manages the deployment and scaling of a set of Pods and
provides guarantees about the ordering and uniqueness of these Pods.

• Q: Your organization has a multi-tenant AKS environment. How do you isolate network traffic for each tenant?

• A: I would use Kubernetes Network Policies to control the traffic flow at the IP address or port level (OSI layer 3 or 4).

• Q: How would you set up automatic scaling of AKS nodes based on custom metrics like queue length in a message queue
service like Azure Service Bus?

• A: I would use Kubernetes Event-Driven Autoscaling (KEDA) which can scale based on events on various services including Azure
Service Bus.

• Q: Your organization uses Prometheus and Grafana for monitoring other environments. How would you use these tools with
AKS?

• A: I would use the Prometheus metrics server and Grafana in AKS. These tools can be run as containers in the AKS cluster.

Azure Cloud Interview Questions - Advanced


• Q: How would you ensure that your AKS cluster is always running the latest OS security
patches?
• A: I would enable Azure's automatic OS upgrades or the AKS node image upgrade feature to
keep the nodes patched and up-to-date.

• Q: You need to implement a disaster recovery strategy for your AKS cluster. What would you
consider?
• A: I would consider taking regular snapshots of the volumes, backing up Kubernetes objects
using a tool like Velero, and replicating the cluster in another region.

• Q: You want to restrict egress traffic from your AKS cluster to only specific IP addresses. How
would you do it?
• A: I would implement egress controls using Azure Firewall or using Network Policies in
Kubernetes.

Azure Cloud Interview Questions - Advanced


• Q: How would you implement a service mesh in AKS?
• A: I would use a service mesh technology such as Istio, Linkerd, or Azure Service Fabric Mesh.
These can be deployed on the AKS cluster.

• Q: You are asked to restrict who can deploy to your AKS cluster. How would you do this?
• A: I would use Kubernetes Role-Based Access Control (RBAC) to control who has permissions to
deploy to the cluster.

• Q: How would you handle sensitive information such as API keys or passwords for applications
running on AKS?
• A: I would store these sensitive details in Kubernetes Secrets or Azure Key Vault.

Azure Cloud Interview Questions - Advanced


• Q: You are managing multiple AKS clusters in your organization and have observed that keeping the
Kubernetes version consistent and up-to-date across all clusters is becoming a challenge. How would you
automate this task?
• A: Azure Kubernetes Service (AKS) supports Kubernetes version upgrade through AKS API. I could write a
script leveraging Azure CLI or SDK to check and update the Kubernetes version regularly. Additionally, I
would look into leveraging Azure Automation or Azure Logic Apps to run these scripts on a regular
schedule.

• Q: Your application running on AKS has a sudden spike in traffic. How would you ensure that your
application can handle the traffic and still deliver high performance?
• A: I would ensure that Kubernetes Horizontal Pod Autoscaler is configured correctly for the applications,
allowing the number of pods to scale up based on the CPU or memory usage. I could also consider using
the Kubernetes Event-driven Autoscaling (KEDA) for event-driven workloads. Moreover, the AKS cluster
itself should be set up with the Azure Cluster Autoscaler to increase or decrease the number of nodes as
needed.

Azure Cloud Interview Questions – Advanced AKS


• Q: You are deploying a multi-tier application on AKS with database, backend, and
frontend services. Each of these services needs to communicate with each other
securely. How would you set this up?
• A: I would create Kubernetes namespaces for each tier of the application to provide a
scope for names and manage resources more efficiently. Kubernetes Network Policies
can then be applied to control the flow of traffic between these namespaces (tiers),
restricting access and providing a secure communication channel.

• Q: Your organization has a requirement to store application logs for at least one year
for auditing purposes. Your application is running on AKS. How would you implement
this?
• A: I would integrate AKS with Azure Monitor for containers and Azure Log Analytics. I
could then set up retention policies in Log Analytics to keep the logs for one year.

Azure Cloud Interview Questions – Advanced AKS


• Q: You've noticed that your AKS clusters are consuming more resources than necessary during
off-peak hours, leading to increased costs. How would you address this issue?
• A: I would implement the Kubernetes Cluster Autoscaler in AKS. This will automatically adjust
the number of nodes in the cluster based on the current resource needs. Furthermore, I could
implement the Kubernetes Vertical Pod Autoscaler, which adjusts the CPU and memory
allocation of the pods based on their usage.

• Q: You are deploying a sensitive application on AKS, and the application needs to access secrets
stored in Azure Key Vault. How would you ensure that the secrets are securely accessed by your
application?
• A: I would use the Azure Key Vault Provider for Secrets Store CSI Driver. This allows AKS to read
the secrets directly from Azure Key Vault into Kubernetes Secrets, so that they can be securely
accessed by the application pods. I would also implement Azure AD Pod Identity to assign an
identity to the pods, which can be used to authenticate to Key Vault.

Azure Cloud Interview Questions – Advanced AKS


• Q: You are deploying a new AKS cluster. To follow the principle of
least privilege, you want to assign each developer in your team
access to a specific namespace in the cluster. How would you
configure this?
• A: I would use Azure AD and Kubernetes RBAC integration. First, I
would create an Azure AD group for each team of developers. Then,
in Kubernetes, I would create a namespace for each team and assign
the corresponding Azure AD group to a RoleBinding or
ClusterRoleBinding within that namespace. This will restrict the
developers to their assigned namespace.

Azure Cloud Interview Questions – Advanced AKS


• Q: Your AKS-based application communicates with an on-premises service behind a corporate firewall. You
are asked to ensure that the egress traffic from your AKS pods to the on-premises service originates from
a known IP address. How would you achieve this?
• A: I would create an Azure NAT Gateway and associate it with the subnet of the AKS cluster. This ensures
that all egress traffic from the AKS pods to the on-premises service originates from the IP address of the
NAT Gateway.

• Q: You've been asked to enable end-to-end SSL encryption for an application running on AKS. The
application includes a frontend exposed through an ingress controller and a backend API. How would you
configure this?
• A: I would install a cert-manager in the AKS cluster to automate the management of SSL certificates. Then,
I would configure an Ingress resource to use a cert-manager issued certificate for the frontend. For the
backend API, I would make sure it's also configured to use SSL, perhaps by using a sidecar container for
SSL termination, or by enabling SSL in the application itself.

Azure Cloud Interview Questions – Advanced AKS


• Q: Your AKS cluster hosts multiple applications. You want to monitor each application
separately and set up alerts for abnormal behavior. How would you implement this?
• A: I would use Azure Monitor for containers and Log Analytics. In Azure Monitor, I would set up
separate workspaces for each application. I would then create custom dashboards and alerts
for each workspace based on the key performance indicators of the respective applications.

• Q: You've observed that your AKS nodes are running out of compute resources, and new pods
cannot be scheduled even though there's free space in other nodes. How would you prevent
this situation in the future?
• A: This could be due to pod and node affinity rules preventing scheduling of pods on other
nodes. I would review these rules and make necessary adjustments. Alternatively, it could be
due to fragmentation of resources in the nodes. In this case, I would consider overprovisioning
nodes or using the Descheduler addon to periodically rebalance the pods.

Azure Cloud Interview Questions – Advanced AKS


• Q: You have multiple AKS clusters running across different Azure regions for high availability.
How would you ensure a consistent configuration across all these clusters?
• A: I would leverage GitOps using tools like Flux or ArgoCD. The desired state of the clusters
would be defined in a Git repository, and the GitOps tools would ensure that the clusters match
the desired state. Any configuration changes would be made in the Git repository, providing a
single source of truth and automatic synchronization across all clusters.

• Q: Your AKS-based application is latency-sensitive and you want to deploy it close to your users
worldwide. How would you ensure that users are directed to the nearest AKS cluster?
• A: I would use Azure Traffic Manager or Azure Front Door, which offer global load balancing
capabilities. These services would direct user traffic to the nearest AKS cluster based on
network latency.

Azure Cloud Interview Questions – Advanced AKS


• Q: You are deploying a stateful application on AKS that requires persistent storage. How would
you ensure high availability of the application data?
• A: I would use Azure Disk or Azure Files as persistent volumes in AKS, which provide high
durability and availability. If the application supports it, I would consider using a StatefulSet or a
Kubernetes operator designed for the application to manage the persistent storage.

• Q: You want to implement a zero-trust network policy in your AKS cluster. How would you
restrict network traffic to the minimum required for your applications to function?
• A: I would use Kubernetes Network Policies to define rules for how pods communicate with
each other and with other network endpoints. I would start with a default deny-all policy and
then create allow policies for the necessary communication paths. I could also consider using
Azure Policy to enforce the presence of network policies in the AKS clusters.

Azure Cloud Interview Questions – Advanced AKS


• Q: You have noticed increased CPU usage in your AKS nodes but you're not sure which
pods are causing this. How would you identify the resource-hungry pods?
• A: I would use Kubernetes commands (like kubectl top pods) to get real-time CPU
usage of the pods. For a more comprehensive view, I would use Azure Monitor for
containers, which can provide insights on CPU usage over time, broken down by
namespaces, nodes, and pods.

• Q: You are deploying a microservices-based application on AKS. How would you


manage and observe the traffic flow between the microservices?
• A: I would use a service mesh, such as Istio or Linkerd, which provides traffic
management capabilities like load balancing, routing, and fault injection. For
observability, I could use the service mesh's built-in features or integrate with Azure
Monitor and Azure Application Insights to collect and analyze metrics, logs, and traces.

Azure Cloud Interview Questions – Advanced AKS


Terraform Interview Questions
• Q: How can you prevent Terraform from deleting a specific Azure resource even if the Terraform code has
been removed?
• A: The prevent_destroy lifecycle flag can be set to true to protect a resource from being destroyed.
However, this can be dangerous as Terraform will error out if an operation requires that resource to be
destroyed.
• Q: If you have a large number of similar resources to manage in Azure, how would you avoid duplicating
code in Terraform?
• A: Terraform allows you to create modules, which are self-contained packages of Terraform configurations
that are managed as a group. Modules can be used over and over again, reducing code duplication.
• Q: You need to deploy a multi-tier application on Azure using Terraform, where the tiers must be deployed
in a specific order. How would you manage this?
• A: You can use the depends_on argument to specify that a certain resource or module depends on
another resource or module. Terraform will ensure that the dependent resources are created first.
• Q: How would you share outputs between different Terraform configurations in the same Azure
subscription?
• A: To share outputs between different configurations, you can use Terraform's remote state data source. It
allows you to access the output values of another Terraform configuration.

Terraform interview Questions – Part 01


• Q: What strategies would you use to manage state files when collaborating with a team on an Azure
project with Terraform?
• A: For team collaboration, you should use a remote backend for storing the state file. Backends like Azure
Blob Storage are a good option. The backend should support state locking and consistency to prevent
state conflicts.
• Q: How do you handle breaking changes or major upgrades with Azure provider in Terraform?
• A: When facing breaking changes, it's recommended to read the provider's upgrade guide thoroughly. It is
often beneficial to perform such upgrades in a controlled environment, like a development or staging
environment, before moving to production.
• Q: How can you reuse a set of Azure resources across multiple environments (like Dev, Staging, and
Production) in Terraform?
• A: Terraform modules can be used to group resources and reuse them across different environments. You
can use input variables to customize module behavior per environment.
• Q: If an Azure resource fails to be created due to an intermittent error, how can you ensure Terraform
automatically retries the operation?
• A: Terraform doesn't have built-in support for automatic retries on a per-resource basis. However, you can
script around the terraform apply command to implement a retry logic in case of errors.

Terraform interview Questions – Part 01


• Q: When working with Azure and Terraform, how do you handle deployments to regions that have
differing service availability?
• A: You need to carefully plan and structure your Terraform files based on the services available in each
region. You can use conditionals within your Terraform configuration to check the region and then
provision resources based on service availability.

• Q: You've been asked to decrease the potential downtime when updating Azure resources managed with
Terraform. What steps would you take?
• A: Using the create_before_destroy lifecycle policy can help reduce downtime. When set to true,
Terraform will create the new resource before destroying the old one during an update operation.

• Q: How would you expose secure data, such as Azure service principal keys, to Terraform while keeping
the information secure?
• A: Sensitive data can be exposed to Terraform through secure environment variables or input variables. As
of Terraform 0.14, you can mark a variable as sensitive, which prevents the value from being displayed in
logs or console output.

Terraform interview Questions – Part 01


• Q: In Azure, you have multiple resource groups for various services. How can you manage these efficiently
using Terraform?
• A: You can use multiple Terraform modules, each for managing resources in a single resource group. This
way, you can isolate changes to each service and manage them independently.

• Q: How would you manage a situation where you want to selectively apply changes to certain Azure
resources managed by Terraform, but not to others?
• A: You can use the -target flag with terraform apply to selectively apply changes to specific resources.

• Q: Suppose you're managing an Azure Kubernetes Service (AKS) cluster using Terraform. Now you want to
update the version of Kubernetes. How would you do this, and what would you look out for?
• A: You can update the Kubernetes version by changing the kubernetes_version attribute of the
azurerm_kubernetes_cluster resource in your Terraform configuration. It's important to ensure that your
applications are compatible with the new version before performing the update.

Terraform interview Questions – Part 01


• Q: How can you ensure that an Azure Storage Account managed by Terraform is highly available?

• A: You can configure the Azure Storage Account with the replication_type argument set to GRS (Geo-redundant storage) or
RAGRS (Read-access geo-redundant storage) for high availability.

• Q: You're managing Azure Virtual Machines with Terraform and want to ensure you're using the most cost-effective VM sizes.
What approaches could you take?

• A: You can use variables in Terraform to allow the VM size to be defined at runtime. This makes it easy to switch between
different VM sizes based on cost and performance needs.

• Q: How can you securely manage the state file when using Terraform with Azure?

• A: The state file can be stored in an Azure Storage Account, which supports features like encryption at rest and access control
with Azure Active Directory.

• Q: You're working with a large Azure environment managed by Terraform, and running terraform plan is slow. What can you do
to speed it up?

• A: One strategy to speed up terraform plan is to split your configuration into multiple smaller configurations, each with its own
state file. This reduces the amount of data that Terraform needs to fetch and compare during the plan phase.

Terraform interview Questions – Part 01


• Q: How can you manage Azure role assignments with Terraform?
• A: You can use the azurerm_role_assignment resource in your Terraform configuration
to manage Azure role assignments.

• Q: Suppose you want to use the same Terraform configuration to create similar
environments in different Azure subscriptions. How would you manage this?
• A: You can use variables in Terraform to make the Azure subscription ID configurable.
Then you can switch between subscriptions by changing the value of the variable.

• Q: How can you manage secrets like passwords when creating Azure resources with
Terraform?
• A: You should use a secure method for managing secrets, such as Azure Key Vault.
Terraform can retrieve secrets from Key Vault at runtime, which means the secrets are
not stored in your Terraform configuration or state file.

Terraform interview Questions – Part 01


• Q: You've made changes directly to an Azure resource that's managed by Terraform. How will this affect your Terraform
operations?

• A: If you make changes directly to a resource, the actual state of the resource will differ from what's recorded in the Terraform
state file. The next time you run terraform plan, Terraform will propose to undo your changes to make the resource match its
configuration.

• Q: You're managing an Azure App Service with Terraform and need to enable "Always On". How can you do this?

• A: The "Always On" feature can be enabled in the site_config block of the azurerm_app_service resource in your Terraform
configuration.

• Q: How can you connect to an Azure SQL Database from Terraform to run some initialization scripts?

• A: Terraform itself should not be used to run scripts against a SQL database. It's not its intended use case and can lead to
complications. Instead, you could use a tool like Azure CLI or PowerShell in conjunction with Terraform.

• Q: Suppose you're creating an Azure Virtual Network with Terraform and want to ensure that only certain IP ranges can access
it. How can you do this?

• A: You can manage network access control using the azurerm_network_security_group and azurerm_network_security_rule
resources in your Terraform configuration.

Terraform interview Questions – Part 01


• Q: How can you prevent a specific resource from being destroyed when running
terraform destroy in Azure?
• A: You can add a lifecycle block with prevent_destroy = true to the resource in your
Terraform configuration. This will prevent the resource from being destroyed when
terraform destroy is run.

• Q: What steps would you take to debug an issue with Terraform in Azure?
• A: You can use the TF_LOG environment variable to enable detailed logging. Terraform
will output detailed logs which can be useful for debugging. You can also use the
terraform console command to experiment with expressions and evaluate their values.

• Q: How can you run a specific version of the Azure provider in Terraform?
• A: You can specify a version constraint for the Azure provider in the required_providers
block in your Terraform configuration. This allows you to ensure that you're using a
specific version of the provider.

Terraform interview Questions – Part 01


• Question: You've just joined a team that uses Terraform extensively, but you notice
there's no remote state management set up. What problems could this lead to, and
how would you recommend addressing it?
– Answer: The main issue without remote state management is that the state of infrastructure is
not easily shared or synchronized among team members. It becomes difficult to collaborate
because the state of resources might be different on different developers' machines. Moreover,
without remote state, it's also challenging to manage state versioning and history.
– I would suggest implementing remote state management using a backend storage such as
Terraform Cloud, AWS S3, or Google Cloud Storage. These solutions allow state files to be stored
and shared centrally, enabling versioning and easier collaboration among team members.

• Question: While working with a large infrastructure, how do you structure your
Terraform code to make it manageable and reusable?
– Answer: To keep Terraform code manageable and reusable, it's best to use a modular approach.
Infrastructure components such as virtual networks, load balancers, or compute instances
should each have their own module. This way, modules can be reused across different
environments (development, staging, production, etc.) and different projects.
– It's also helpful to leverage Terraform workspaces to manage multiple environments. This way,
we can use the same infrastructure code for different environments, just changing the inputs
(like instance size, count etc.) based on the environment.

Terraform Interview Questions - Scenarios


• Question: You've inherited a Terraform project and you're asked to implement a plan to handle
Terraform drift. What steps would you take?
– Answer: Terraform drift refers to the situation where the actual state of infrastructure differs from the state
recorded in the Terraform state file. It usually occurs when changes are made directly in the infrastructure
outside of Terraform, making the state file inaccurate.
– Firstly, I would use the terraform refresh command to update the state file according to the real resources.
Then, I'd use terraform plan to check the differences between the current state and the code. If the drift is
significant, then we need to understand why it happened and educate the team about the issues of
manually tweaking the infrastructure.
– To prevent future drift, I would recommend implementing policies or controls that limit direct infrastructure
changes and enforcing the use of Terraform for all infrastructure modifications. Also, automated tooling or
scripts that regularly run terraform plan can help detect drift earlier.

• Question: You need to upgrade Terraform from version 0.13 to 0.15 in a large project. What
would your approach be?
– Answer: Terraform upgrades need to be handled carefully, especially when jumping between versions. The
first step is to review the upgrade guides published by HashiCorp for both 0.14 and 0.15 versions to
understand any breaking changes, new features, and deprecations.
– It's advisable to test the upgrade in a non-production environment first. After a successful test, the upgrade
can be done in the production environment. Also, I'd make sure to upgrade any Terraform providers used in
the project at the same time.
– It's crucial to ensure that all team members are on the same version of Terraform after the upgrade, to
prevent inconsistencies and potential issues.

Terraform Interview Questions - Scenarios


• Question: How do you securely manage sensitive information like passwords
and API keys in your Terraform scripts?
– Answer: Sensitive data should never be hardcoded in Terraform scripts. Instead, Terraform
has several ways to manage sensitive data:
– Variables: You can declare sensitive data as variables and pass them in at runtime. This is
more secure than hardcoding, but these variables can still show up in logs or command line
history.
– Environment variables: Terraform will read any environment variable that starts with
TF_VAR_. This keeps sensitive data out of your scripts.
– Terraform Cloud: If you're using Terraform Cloud, you can set sensitive variables directly in
the workspace. These variables are stored securely and only provided to runs as needed.
– Vault: For a more sophisticated solution, you could use HashiCorp's Vault. Vault is a tool for
securely managing secrets, and it integrates with Terraform.
– In any of these methods, be careful not to output sensitive data in your scripts, as Terraform
will display this data in the console and save it in the state file.

Terraform Interview Questions - Scenarios


• Question: You are given a task to automate the process of setting up and
tearing down development environments using Terraform. How would you
design this process?
– Answer: First, I would write reusable Terraform modules that encapsulate the creation of
the resources required for a development environment. These modules could be
parameterized to allow customization as needed, such as changing the instance size or
number of instances.
– Next, I would use Terraform workspaces to create separate environments for each
developer or team. This isolates the state of each environment and prevents conflicts
between them.
– I would then use a Continuous Integration/Continuous Deployment (CI/CD) pipeline to
handle the automation. The pipeline could be triggered manually when a developer needs a
new environment, or automatically at regular intervals or based on some event. It would
run terraform init to initialize the workspace, terraform plan to validate the changes, and
terraform apply to create or update the environment.
– Finally, another pipeline could be set up to destroy environments when they are no longer
needed. This could also be run manually or automatically, and would run terraform destroy
to remove all resources associated with the environment.

Terraform Interview Questions


• Question: You're working on a project with multiple cloud providers and
you need to share information between them, such as the ID of a
resource created in one provider that is used in another. How would you
handle this?
– Answer: When dealing with multiple cloud providers, output variables are a
powerful tool for sharing information between them. An output from one module
can be used as an input to another module, even if they use different providers.
– For example, if you create a VPC in AWS and need to use its ID in a Google Cloud
Platform resource, you can define an output variable in the AWS module for the
VPC ID. Then, in the GCP module, you can use a Terraform Remote State data
source to fetch the state of the AWS module and access the VPC ID output.
– Keep in mind that in order for this to work, you need to be using remote state and
the state needs to be accessible from wherever the GCP module is being run.

Terraform Interview Questions


• Question: Terraform's count and for_each can both be used to create
multiple instances of a resource, but they work a bit differently. Can you
give me a scenario where you might choose to use for_each instead of
count?
– Answer: While both count and for_each can be used to create multiple instances of
a resource, for_each has a key advantage in that it creates a mapping from a key to
a resource instance, rather than using a numerical index like count. This makes it
more stable when adding or removing instances from the middle of the list,
because it doesn't shift the indices of the other instances.
– For example, if I were creating a series of AWS S3 buckets and I wanted to be able
to manage each bucket individually without affecting the others, I would use
for_each with a map or set of bucket names. This would allow me to add or remove
buckets freely without worrying about the order they were created in.

Terraform Interview Questions


• Question: In your Terraform project, how would you organize
multiple environments like dev, staging, and production for an Azure
application?
– Answer: One of the common ways to handle multiple environments in
Terraform is by using workspaces. Each workspace maintains a separate
Terraform state, allowing us to manage each environment independently.
– However, another recommended way is to structure the environments in
separate directories like dev, staging, and production, and have a main.tf file
in each. This allows for even more customization for each environment.

Terraform Interview Questions


• Question: Your team is growing, and the Terraform codebase is
becoming hard to manage due to multiple people working on
different parts of the infrastructure. How would you use modules
and directories to better organize the Terraform codebase?
– Answer: To make Terraform codebase manageable, I would segregate the
codebase into different Terraform modules. Each module would represent a
logical part of the infrastructure, like a database, network, compute
instances, and so on. This way, each module is self-contained and
responsible for one specific aspect of the infrastructure. This makes it easier
to develop, test, and maintain.
– In terms of directory structure, I would place each module in its own
directory

Terraform Interview Questions


• Example:
├── main.
├── modules
│ ├── database
│ ├── network
│ └── compute
├── variables.
├── outputs.
└── versions.
The main.tf file at the root level will call these modules and wire them together. Each module will
have its own main.tf, variables.tf, and outputs.tf to define resources, inputs, and outputs.

Terraform Interview Questions


• Question: You've inherited a Terraform codebase that is entirely contained
within a single main.tf file. You're asked to restructure this into a more
maintainable structure using modules. How would you approach this?
– Answer: The first step would be to identify distinct components of the infrastructure. This
could be different types of resources like compute instances, networking components,
databases, etc., or it could be logical groupings of resources that together serve a common
purpose.
– Once the components have been identified, I would create a new directory for each one
under a modules directory. Each of these directories would then get its own main.tf,
variables.tf, and outputs.tf file, containing the resources, variables, and outputs for that
component.
– Back in the root directory, the main.tf file would then be updated to call each of these
modules, passing in any necessary variables. This way, the main.tf file serves as an
orchestration layer that wires together the various modules, while the modules themselves
are responsible for creating the actual resources.
– It's worth noting that this can be a significant refactor and should be done incrementally,
with thorough testing at each stage to ensure that the infrastructure continues to function
correctly.

Terraform Interview Questions


• Question: If your team decides to make a module for a load balancer available
for different teams working on distinct projects, where would you put this
module and why?
– Answer: I would place the module in its own repository. By doing this, it's easier for
different teams to utilize it without the risk of inadvertently affecting each other's
configurations. Versioning becomes simpler, and changes can be managed using standard
version control practices.

• Question: How would you handle different versions of a module being used by
different environments or different teams?
– Answer: If different versions of a module are required, we can take advantage of the
versioning feature of the source control system (like Git). For example, we can tag different
versions of the module in the Git repository. When referencing the module in the Terraform
code, we can specify the version by using the ref argument and pointing to the appropriate
Git tag.

Terraform Interview Questions


• Question: You have multiple teams working on different services, each with its
own Terraform codebase, and they all use an Azure Kubernetes Service (AKS)
cluster. How would you create a module for the AKS cluster to avoid duplicating
the cluster creation code?
– Answer: I would identify the common code used to create the AKS cluster and abstract that
into a module. The module should accept input variables to handle any differences between
the teams' requirements for the cluster. This way, each team can use the module to create
their AKS cluster, passing in their own specific values.

• Question: What is the purpose of the variables.tf and outputs.tf files within a
Terraform module?
– Answer: The variables.tf file is where we define all the input variables that our module will
accept. The outputs.tf file is where we define any values that our module will output after it
has been run. These outputs are often used to pass information about the resources a
module has created to other parts of your Terraform configuration.

Terraform Interview Questions


• Question: How would you handle sensitive data like passwords and API keys in a
Terraform module?
– Answer: Sensitive data should be passed to a module through input variables, and these
values should ideally be fetched from a secure secrets management system. Additionally,
these variables should be marked as sensitive by setting the sensitive attribute to true,
which prevents the values from being shown in logs or console output.

• Question: How would you structure a Terraform project for an application that
includes a web server, a database, and a cache, each potentially running on
different cloud providers?
– Answer: I would structure this as separate modules for each component of the application.
Each module would be responsible for creating the resources on the specific cloud provider.
The root module would call these modules, passing any necessary variables between them.

Terraform Interview Questions


• Question: What are Terraform local values and how could you use them
in a module?
– Answer: Local values, or "locals", are named expressions in Terraform that can be
referenced throughout your module. They can be helpful to avoid repeating the
same values or expressions multiple times. You might use a local value to hold a
complex expression that is used in multiple places, or to provide a more descriptive
name for a value that is used in multiple places.

• Question: How would you use output variables in a Terraform module?


– Answer: Output variables in a Terraform module are used to expose information
about the resources created within the module. For example, if a module creates a
database server, it might output the connection string for that server. This
connection string could then be used in other parts of your Terraform configuration
that need to interact with the database.

Terraform Interview Questions


• Question: How can Terraform modules be shared between projects or
teams?
– Answer: Terraform modules can be shared between projects or teams by placing
them in their own version control repositories. Other projects can reference the
module directly from the repository. This allows the module to be versioned and
maintained independently of any individual project.

• Question: How would you test a Terraform module?


– Answer: Testing a Terraform module can be done using a tool like terratest, which is
a Go library that provides patterns for testing infrastructure. With terratest, you can
write tests that deploy your module, validate that it works as expected, and then
tear it down again. It's also good practice to validate your Terraform code using
terraform validate and terraform fmt to check for any syntax errors or deviations
from the standard formatting rules.

Terraform Interview Questions


• Question: What is the purpose of terraform fmt and how does it
help in managing Terraform code?
– Answer: The terraform fmt command is used to rewrite Terraform
configuration files in a canonical format and style. It helps in maintaining
consistent coding style, improving readability and making version control
diffs cleaner. This is particularly useful in a team environment where
multiple engineers are working on the same codebase.

Terraform Interview Questions


Question: What is a Terraform provisioner and when might you use one?
Answer: A provisioner in Terraform is used to execute scripts on a local or remote machine as part
of resource creation or destruction. They are often used to perform configuration management
tasks, bootstrap the system, cleanup before resource destroy, etc.
However, provisioners should only be used as a last resort when native Terraform resources are
not available. They represent a procedural approach and have limitations in error handling and
robustness.

Question: What is the purpose of Terraform local values?


Answer: Local values in Terraform are named expressions that can simplify your code by providing
a way to avoid repeating the same values or expressions multiple times. They can also make your
code more readable by serving as descriptive names for complex expressions that are used in
multiple places.

Terraform Interview Questions


• Question: You have a Terraform configuration that creates an Azure virtual machine. How would you add
tags to this virtual machine using a tags block?
• Answer: Tags in Azure can be added to resources using the tags argument, which accepts a map of strings.
• Example
resource "azurerm_virtual_machine" "example" {
// other configuration...

tags = {
Environment = "Production"
Owner = "Operations"
}
}

Terraform Interview Questions


• Question: How would you create multiple identical virtual networks in
Azure using the count meta-argument in Terraform?
• Answer: The count meta-argument in Terraform can be used to create
multiple identical instances of a resource. Here's how you can use count
to create multiple Azure virtual networks:
resource "azurerm_virtual_network" "example" {
count =3
// other configuration...
}

Terraform Interview Questions


• Question: You have a list of Azure virtual machine names and you want to fetch information about these machines using the
azurerm_virtual_machine data source. How would you use the for_each meta-argument to achieve this?

Answer: You can use for_each with the azurerm_virtual_machine data source to fetch information about each VM.

variable "vm_names" {

description = "List of VM names"

type = list(string)

data "azurerm_virtual_machine" "example" {

for_each = toset(var.vm_names)

name = each.value

// other parameters such as resource_group_name...

Terraform Interview Questions


• Question: How would you create a dependent resource in Azure using Terraform? For
instance, a public IP that depends on a network interface?
• Answer: The depends_on meta-argument in Terraform can be used to specify explicit
dependency relationships between resources.
• Example
resource "azurerm_network_interface" "example" {
// configuration...
}
resource "azurerm_public_ip" "example" {
// configuration...
depends_on = [azurerm_network_interface.example]
}
Terraform Interview Questions
• Question: How would you pass the output of one resource as an input to another resource in
Terraform? For example, passing the ID of a virtual network to a subnet?
• Answer: You can pass the output of a resource as an input to another resource using the syntax
resource_type.resource_name.attribute.
Example:
resource "azurerm_virtual_network" "example" {
// configuration...
}
resource "azurerm_subnet" "example" {
virtual_network_name = azurerm_virtual_network.example.name
// other configuration...
}

Terraform Interview Questions


• Question: In your Terraform project, you have a module that creates a virtual network in Azure and another module that
creates a subnet. You want the subnet to be created inside the virtual network. How would you accomplish this?

• Answer: You can accomplish this by creating an output in the virtual network module that outputs the virtual network's name.
Then, you can pass the output of the virtual network module as a variable to the subnet module.

• Example:

module "virtual_network" {

source = "./modules/virtual_network"

// other variables...

module "subnet" {

source = "./modules/subnet"

virtual_network_name = module.virtual_network.name

// other variables...

Terraform Interview Questions


• Question: You have a map of Azure VM names and their sizes. You want to create a VM for each item in this map, with the VM
name and size as given. How would you use the for_each meta-argument to achieve this?

• Answer: You can use the for_each meta-argument to create a VM for each item in the map. Inside the resource block, you can
use each.key to access the map key and each.value to access the map value.

• Example:

variable "vms" {

description = "Map of VM names and sizes"

type = map(string)

resource "azurerm_virtual_machine" "example" {

for_each = var.vms

name = each.key

vm_size = each.value

// other configuration...

Terraform Interview Questions


• Question: You want to add a provisioner to your Terraform configuration that runs a PowerShell script on
an Azure VM after it's created. How would you accomplish this?
• Answer: You can add a provisioner to a resource using the provisioner block. Here's how you can add a
remote-exec provisioner that runs a PowerShell script:
Example:
resource "azurerm_virtual_machine" "example" {
// other configuration...
provisioner "remote-exec" {
inline = [
"powershell.exe Write-Host 'Hello, World!'"
]
}
}

Terraform Interview Questions


• Question: You want to create a Terraform configuration that creates an Azure SQL Server and then creates a database in that
server. The database should not be created until the server is created. How would you accomplish this?

• Answer: You can use the depends_on meta-argument to specify that the SQL database resource depends on the SQL server
resource.

• Example:

resource "azurerm_sql_server" "example" {

// configuration...

resource "azurerm_sql_database" "example" {

// configuration...

depends_on = [azurerm_sql_server.example]

Terraform Interview Questions


• Question: You want to create a Terraform configuration that creates a resource group, a storage account, and a storage container in Azure, in that
order. How would you manage the dependencies between these resources?

• Answer: In this case, you can use the depends_on meta-argument in the storage account resource to specify that it depends on the resource
group, and in the storage container resource to specify that it depends on the storage account.

• Example:

resource "azurerm_resource_group" "example" {

// configuration...

resource "azurerm_storage_account" "example" {

// configuration...

depends_on = [azurerm_resource_group.example]

resource "azurerm_storage_container" "example" {

// configuration...

depends_on = [azurerm_storage_account.example]

Terraform Interview Questions


• Question: Explain the lifecycle of a Terraform resource and how the
lifecycle configuration block can be used to manage it?
• Answer: The lifecycle of a Terraform resource involves the creation,
modification, and deletion of the resource. The lifecycle
configuration block allows you to control this lifecycle.
• The create_before_destroy argument can be used to ensure that a
new resource is created before the existing one is destroyed during
an update. The prevent_destroy argument can be used to prevent
the resource from being destroyed. The ignore_changes argument
can be used to ignore changes to certain attributes of the resource.

Terraform Interview Questions


• Question: How would you make sure a resource (like an Azure VM) is recreated whenever its image_id
changes?
• Answer: This can be achieved using the lifecycle block and the ignore_changes argument. You can specify
all attributes to be ignored except the image_id, which will cause the resource to be recreated whenever
the image_id changes.
• Example:
resource "azurerm_virtual_machine" "example" {
// configuration...

lifecycle {
ignore_changes = [all_except = ["image_id"]]
}
}

Terraform Interview Questions


• Question: Explain how you can prevent a resource (like an Azure storage account) from being accidentally
destroyed?
• Answer: You can use the prevent_destroy argument in the lifecycle block to prevent a resource from
being destroyed. If prevent_destroy is set to true, Terraform will refuse to destroy the resource.
• Example:
resource "azurerm_storage_account" "example" {
// configuration...

lifecycle {
prevent_destroy = true
}
}

Terraform Interview Questions


• Question: How would you manage a large number of similar resources (like Azure VMs)
without copying and pasting the same block of code multiple times?
• Answer: For a large number of similar resources, you can use the count or for_each
meta-arguments. With count, you can create a fixed number of instances of a resource.
With for_each, you can create instances for each item in a list or map.

• Question: What is the purpose of a data block in Terraform and how would you use it
in the context of Azure?
• Answer: A data block in Terraform fetches data from a provider, like Azure. This allows
you to use information defined outside of Terraform, or defined by another separate
Terraform configuration.

Terraform Interview Questions


• An example would be fetching the latest version of a VM image using the
azurerm_virtual_machine_image data source:
data "azurerm_virtual_machine_image" "example" {
name = "image_name"
publisher_name = "publisher_name"
offer = "offer"
sku = "sku"
location = "location"
version = "latest"
}

Terraform Interview Questions


• Question: Explain how you would output the public IP address of an
Azure VM created with Terraform?
• Answer: To output the public IP address of an Azure VM, you need to
create an output value.
• Example:
output "vm_public_ip" {
value = azurerm_public_ip.example.ip_address
description = "The public IP address of the VM."
}

Terraform Interview Questions


• Question: In Terraform, what is the difference between a null value and an empty
string or a zero value?
• Answer: In Terraform, a null value represents the absence of a value, while an empty
string or a zero is a value itself. This difference becomes important when dealing with
optional attributes. If an optional attribute is assigned a null value, it is equivalent to
the attribute being omitted. But if it is assigned an empty string or zero, it is considered
a set value.

• Question: How do you handle errors in provisioners in Terraform?


• Answer: You can control how Terraform handles provisioner errors using the on_failure
argument in a provisioner block. By default, provisioner errors cause resource creation
to fail, but you can change this behavior by setting on_failure to continue (ignores the
error) or cleanup (destroys the resource).

Terraform Interview Questions


• Question: What are meta-arguments in Terraform?
• Answer: Meta-arguments in Terraform are special arguments that can be used within resource, module, or
provider blocks to control the behavior and relationships of resources. They provide additional
configuration options and allow you to define dependencies, lifecycle behavior, and other advanced
settings.

• Question: Explain the purpose of the depends_on meta-argument in Terraform.


• Answer: The depends_on meta-argument is used to define explicit dependencies between resources. It
allows you to specify that one resource depends on another, ensuring that the dependent resource is
created or modified before the resource that depends on it.

• Question: What is the purpose of the count meta-argument in Terraform?


• Answer: The count meta-argument is used to create multiple instances of a resource based on a numeric
value or a dynamic expression. It allows you to create multiple, identical resources while providing
flexibility and configurability.

Terraform Interview Questions - meta-arguments


• Question: What is the purpose of the for_each meta-argument in Terraform?
• Answer: The for_each meta-argument is used to create multiple instances of a resource based on a map
or set of values. It allows you to dynamically create resources for each item in the collection, providing
more flexibility and granular control over resource creation.

• Question: Explain the lifecycle meta-argument and its usage in Terraform.


• Answer: The lifecycle meta-argument is used to define the behavior and lifecycle configuration of a
resource. It allows you to specify settings like preventing destruction, controlling updates, and defining
custom behavior during the lifecycle stages of a resource.

• Question: What is the purpose of the ignore_changes meta-argument in Terraform?


• Answer: The ignore_changes meta-argument is used within a lifecycle block to specify attributes of a
resource that Terraform should ignore when determining if a resource needs to be updated. It allows you
to exclude specific attributes from triggering updates, helping to manage configuration drift and maintain
resource stability.

Terraform Interview Questions - meta-arguments


• Question: Explain the purpose of the create_before_destroy meta-argument in
Terraform.
• Answer: The create_before_destroy meta-argument, when set to true, ensures
that a new resource is created before the existing resource is destroyed during
an update. This can be useful when replacing a resource to minimize downtime
and ensure a smooth transition.

• Question: What is the purpose of the provider meta-argument in Terraform?


• Answer: The provider meta-argument is used to associate a specific provider
with a resource or module. It allows you to specify the provider configuration
and version required for the resource to function correctly.

Terraform Interview Questions - meta-arguments


• Question: Explain the purpose of the provisioner meta-argument in Terraform.
• Answer: The provisioner meta-argument is used to define actions or scripts that
run on a resource during its creation or update. Provisioners allow you to
perform tasks such as installing software, executing configuration scripts, or
invoking external tools on the provisioned resource.

• Question: What is the significance of the alias meta-argument in Terraform?


• Answer: The alias meta-argument is used to assign a unique alias to a resource
block. It allows you to create multiple instances of the same resource type with
different configurations and references. The alias provides a way to
differentiate between similar resources and refer to them explicitly in other
parts of the Terraform configuration

Terraform Interview Questions - meta-arguments


• Question: You need to execute a script on an Azure virtual machine after it's provisioned. How would you accomplish this using provisioners?

• Answer: You can use a provisioner, such as the remote-exec provisioner, to run the script on the provisioned Azure virtual machine.

• Example:

resource "azurerm_virtual_machine" "example" {

# Configuration...

provisioner "remote-exec" {

inline = [

"echo 'Running post-provisioning script'",

"chmod +x /path/to/script.sh",

"/path/to/script.sh",

Terraform Interview Questions - Provisioners


• Question: You need to install specific software on an AWS EC2 instance after it's created. How would you achieve this using provisioners?

• Answer: You can use a provisioner, such as the remote-exec provisioner, to install the software on the created AWS EC2 instance.

• Example:

resource "aws_instance" "example" {

# Configuration...

provisioner "remote-exec" {

inline = [

"sudo apt-get update",

"sudo apt-get install -y software-package",

Terraform Interview Questions - Provisioners


• Question: Explain why it's recommended to avoid using provisioners in
Terraform unless absolutely necessary.
– Answer: Provisioners should be avoided unless there is no other suitable alternative. Here
are some reasons:
– Provisioners introduce external dependencies and can be prone to errors and failures,
leading to inconsistent or unreliable infrastructure provisioning.
– Provisioners make the Terraform configuration less declarative by introducing imperative
actions, which can hinder idempotency and make it harder to manage and reason about the
infrastructure.
– Provisioners do not benefit from Terraform's built-in lifecycle management, such as plan
execution, graph dependencies, and state management. This can lead to difficulties in
maintaining and modifying infrastructure over time.
– Provisioners may have limited error handling and retry mechanisms, which can make it
challenging to handle transient network issues or recover from failed provisioning steps.
– Provisioners may require direct access to instances, which can introduce security risks and
dependency on network connectivity.
– Provisioners tie infrastructure management to the provisioning tool itself, limiting
portability and hindering collaboration with other infrastructure management tools or
processes.

Terraform Interview Questions - Provisioners


• Question: You need to run a custom shell script on an Azure virtual
machine during its creation. What alternative approaches could you
consider instead of using provisioners?
– Answer: Instead of using provisioners, you can explore the following
alternative approaches:
– Utilize a custom VM image or a preconfigured image with the desired
software and configuration already installed. This can be achieved using
tools like Packer or Azure VM Image Builder.
– Use cloud-init or cloud-config to define the desired configuration and
software installation steps. These cloud-init configurations can be provided
during the VM creation process.
– Leverage a configuration management tool, such as Ansible, Chef, or Puppet,
to manage the desired state of the virtual machine and perform software
installation and configuration tasks.

Terraform Interview Questions - Provisioners


• Local Exec Provisioner: The local-exec provisioner executes commands on the machine running Terraform,
such as the developer's local machine. It is typically used for tasks that don't require interaction with the
resource being provisioned. For example, running a local script or initializing a local database.
• Remote Exec Provisioner: The remote-exec provisioner executes commands on the resource being
provisioned. It establishes an SSH or WinRM connection to the remote resource and runs the specified
commands. It is often used for initial configuration or software installations on remote instances.
However, it's worth noting that using remote exec provisioners can introduce security risks and tight
coupling between Terraform and the resource.
• File Provisioner: The file provisioner is used to copy files or directories from the machine running
Terraform to the resource being provisioned. It can be helpful for transferring configuration files, scripts,
or other required artifacts to the remote resource during provisioning.
• Remote State Provisioner: The remote-state provisioner allows resources to reference data from another
Terraform state file. It enables sharing data between different Terraform configurations. For example, you
can reference an output value from one state file as input for another resource in a separate state file.
• Null Resource Provisioner: The null_resource provisioner is used when you need to perform local actions
or run provisioners without creating any infrastructure resource. It is often used to orchestrate external
operations or to run additional actions triggered by changes in Terraform resources.

Terraform Interview Questions - Provisioners


• The major difference between these provisioners lies in the scope of execution and the
target of the commands:

• local-exec operates locally on the machine running Terraform.


• remote-exec executes commands on the remote resource being provisioned.
• file copies files from the local machine to the remote resource.
• remote-state allows referencing data from another Terraform state file.
• null_resource enables executing local actions or provisioners without creating
infrastructure resources.
• It's important to use provisioners judiciously and consider the potential impacts on
security, maintainability, and portability when incorporating them into your Terraform
configurations.

Terraform Interview Questions - Provisioners


• Q: What is the purpose of a Terraform state file and why is it important?
• A: Terraform uses state data to remember your managed infrastructure and related configuration, to keep
track of metadata, and to improve performance for large infrastructures. The state file maps the resources
in your configuration files to real-world resources which it manages, allowing Terraform to understand
your infrastructure as it exists.
• Q: Can you describe a scenario where you might need to manually edit a Terraform state file? What are
the risks?
• A: Manually editing a Terraform state file is highly discouraged. The state file contains crucial information
about the infrastructure managed by Terraform, and any discrepancies between the actual infrastructure
and the state file can cause significant problems. In cases where manual intervention is unavoidable, such
as when a resource has been manually deleted, terraform taint or terraform import are safer alternatives
to direct state manipulation.
• Q: How does Terraform handle state locking, and how does it prevent conflicts in a team environment?
• A: State locking is a feature of Terraform that locks the state file when an operation is being performed,
preventing others from running Terraform commands that could interfere. This prevents conflicts in the
state file and ensures consistency. State locking is available with certain backends that support it, such as
the S3 backend when used with DynamoDB.

Terraform Interview Questions – State Files


• Q: How would you handle sensitive data in the Terraform state file?
• A: As of Terraform 0.14, you can mark a variable as sensitive, preventing the value from being displayed in
logs or console output. However, this does not prevent the value from being stored in the state file. If the
state contains sensitive data, you should treat the state file as sensitive and use a secure remote backend
with encryption at rest.

• Q: If you are working on multiple separate Terraform projects, how can you ensure that the state files
don't get mixed up?
• A: To manage the state files for separate projects, you can use separate backends for each project or use
the workspace feature of Terraform. Workspaces allow you to have separate state files in the same
backend, effectively separating resources between environments or projects.

• Q: Describe a situation where you would use remote state in Terraform.


• A: Remote state is particularly useful when working as part of a team or when you need to keep the state
of your infrastructure secure yet accessible. Remote state allows the state data to be stored in a remote
data store, such as AWS S3, Azure Blob Storage, Google Cloud Storage, and others. This ensures everyone
on the team has access to the latest state of the infrastructure.

Terraform Interview Questions – State Files


• Q: What command would you use to pull the latest state file, and why might you need
to do this?
• A: You can use the terraform refresh command to update the local state file against the
real resources. This is useful to ensure the state file accurately represents the real-
world resources, especially when changes might have been made outside of Terraform.

• Q: How does Terraform use the state file during a terraform plan operation?
• A: During a terraform plan operation, Terraform uses the state file to determine what
actions are necessary to achieve the desired state defined in your configuration files.
By comparing the last known state of the resources from the state file to the current
configuration, Terraform can determine what changes need to be made.

Terraform Interview Questions – State Files


• Q: If a resource is deleted manually (not using Terraform), how does this affect the
state file and subsequent Terraform operations?
• A: If a resource is deleted manually, Terraform will not be aware of this until the next
plan or apply operation. The state file will still contain the resource, and Terraform will
attempt to reconcile the difference. It will see that a resource that should exist
(according to the state file) does not actually exist and will plan to recreate it.

• Q: What strategies can you use to mitigate the risk of state file corruption or loss?
• A: Using a remote backend that supports versioning can help prevent loss or corruption
of the state file. Versioning allows you to roll back to a previous version if something
goes wrong. Additionally, regular backups of the state file are also a good practice to
recover from accidental deletion or corruption.

Terraform Interview Questions – State Files


• Q: What is the terraform state command used for? Could you describe a scenario where you might need
to use it?
• A: The terraform state command is used for advanced state management. For instance, you might use
terraform state mv to move an item in the state file to a different address. This is useful when you're
refactoring Terraform configuration and want to avoid destroying and recreating resources.

• Q: Can you explain how you would use workspaces in Terraform and how they interact with state files?
• A: Terraform workspaces allow you to manage multiple environments, like staging and production, within
the same configuration. Each workspace has its own separate state file, allowing resources within each
workspace to be managed independently. You switch between workspaces using the terraform workspace
select command.

• Q: How would you handle a situation where your local Terraform state is out of sync with the actual
infrastructure?
• A: If the local state is out of sync, you might use the terraform refresh command to update the local state
file with the actual infrastructure's state. This command queries the providers for each resource's actual
status and updates the state file accordingly.

Terraform Interview Questions – State Files


• Q: Explain a scenario where you might need to use terraform state rm and its implications.
• A: The terraform state rm command removes items from the Terraform state. It does not destroy the
actual infrastructure resource, just removes it from Terraform management. You might use this command
when a resource has been manually deleted, and you need to reconcile the state file to match the actual
infrastructure.

• Q: What would you do if you were working in a team and your Terraform operations were consistently
failing due to state lock errors?
• A: Consistent state lock errors imply that another operation is already in progress, or a previous operation
failed and left the state locked. You can use the terraform force-unlock command to manually unlock the
state. However, be careful, as forcing an unlock can potentially corrupt the state.

• Q: How would you securely manage a Terraform state file in a production environment?
• A: For a production environment, the state file should be stored in a remote backend that supports
encryption at rest. The backend should also support locking to prevent concurrent state operations. The
state file contains sensitive data, so it should be treated with the same level of care as any sensitive data.

Terraform Interview Questions – State Files


• Q: What is the impact on the state file when you run terraform destroy?
• A: The terraform destroy command destroys all resources managed by
Terraform in the state file and then updates the state file to indicate that
no resources are managed. After running this command, the state file will
still exist, but it will not contain any managed resources.

• Q: In what scenario might you need to use terraform state list and what
information does it provide?
• A: The terraform state list command is used to list all resources in the
current state file. This can be helpful if you need to get an overview of all
resources managed by Terraform, especially in large deployments.

Terraform Interview Questions – State Files


• Q: If you accidentally deleted your Terraform state file, how would you recover?
• A: If versioning is enabled on the remote state backend, you can recover a
deleted state file by restoring the most recent version. If you're not using
versioning or a remote backend, it's significantly harder to recover unless you
have a backup of the state file. This highlights the importance of state file
backups and using versioned remote backends.

• Q: Can you explain how the terraform taint command affects the state file?
• A: The terraform taint command marks a resource as tainted within the state
file. A tainted resource will be destroyed and recreated during the next
terraform apply. This can be useful if you know a specific resource has an issue
and needs to be recreated.

Terraform Interview Questions – State Files


• Q: Suppose you have resources in your Azure subscription that were created outside of Terraform. Now
you want to manage these resources with Terraform. How would you handle this?
• A: You can use the terraform import command to bring existing resources under Terraform management.
For each resource, you'll need to add a corresponding resource block in your Terraform configuration, and
then run a command similar to terraform import azurerm_resource_group.example
/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/mygroup, replacing with the
appropriate resource type, resource name, and Azure resource ID.
• Q: After importing a resource into Terraform, what steps do you need to take before you can start
managing it with terraform apply?
• A: After using terraform import to import the resource, you need to write a resource configuration block
in your Terraform files that corresponds to the imported resource. The details in the configuration should
match the current state of the imported resource. After that, you can manage the resource using
terraform apply.
• Q: In your Terraform code, you need to use the same compound expression multiple times. What could
you do to make your code more DRY (Don't Repeat Yourself)?
• A: You can use Local Values to assign a name to an expression. Locals are similar to var variables but are
defined within the module where they are used. They can be used to help make expressions more
readable or to avoid repeating the same values or expressions multiple times.

Terraform Interview Questions – import and Expressions


• Q: Suppose you have a list of strings representing virtual machine names, and you want to create an Azure
Virtual Machine for each. How would you do this in Terraform?
• A: You can use a for_each expression along with a toset() function to create a resource for each name in
the list. The for_each meta-argument allows creating multiple instances of a resource based on a given set
of items.

• Q: You are given an expression that results in a list of maps. You need to find a specific map in this list
based on a key-value pair. Which Terraform expression would you use?
• A: The lookup() function could be used to find a specific map based on a key-value pair. Alternatively, you
could use a combination of a for expression and the if keyword to filter the list of maps.

• Q: You have two different modules, one that deploys a network and another one that deploys virtual
machines. The VM module needs information about the network (like subnet IDs). How would you
provide this information?
• A: The output values of a module can be used to pass information from one module to another. In this
scenario, the network module should output the subnet IDs, and the VM module can then reference
these outputs as input variables.

Terraform Interview Questions – import and Expressions


• Q: How can you use count and count.index to create multiple resources in Terraform?
• A: The count parameter allows you to create multiple instances of a resource by
providing an integer value. count.index can be used within the resource block to
distinguish between these instances. For example, you could create multiple instances
of a VM and give each a unique name using count.index.

• Q: Suppose you need to create a conditional expression in your Terraform


configuration that creates resources only when a certain condition is true. How would
you do this?
• A: You can use the count parameter in combination with a conditional expression.
When the condition is true, count is set to 1, and when it's false, count is set to 0. This
way, the resource is only created when the condition is true.

Terraform Interview Questions – import and Expressions


• Q: Imagine you have an existing Azure SQL Database that was not created by
Terraform, and you now want to manage it using Terraform. How would you import
this resource into your Terraform configuration?
• A: First, you would write a resource block in your Terraform configuration for the Azure
SQL Database with the appropriate settings. Then, you would use the terraform import
command to import the existing resource into your Terraform state, using the Azure
Resource Manager ID of the SQL database.

• Q: Suppose you need to use a complex for expression multiple times in your
configuration. To make your code more readable, what could you do?
• A: You can use Local Values in this scenario. A local value assigns a name to an
expression, so it can be reused within the module. You can define the for expression as
a local value and then reference the local value elsewhere in your configuration.

Terraform Interview Questions – import and Expressions


• Q: Describe a situation where you'd use a map or list variable in
Terraform and provide an example.
• A: Map and list variables are useful in Terraform when we need to
pass complex data structures as input to modules or to define
repeated similar resources. For instance, let's say we are creating
multiple Azure storage accounts with different names and locations.
We can use a map variable for this purpose.

Terraform Interview Questions – Variables and outputs


variable "storage_accounts" {

description = "Map of all storage accounts and their properties"

type = map(object({

name = string

location = string

}))

default = {

account1 = { name = "account1", location = "West Europe" }

account2 = { name = "account2", location = "North Europe" }

resource "azurerm_storage_account" "example" {

for_each = var.storage_accounts

name = each.value.name

location = each.value.location

resource_group_name = azurerm_resource_group.example.name

account_tier = "Standard"

account_replication_type = "GRS"

Terraform Interview Questions – Variables and outputs


• Q: Explain a scenario where you would use Terraform outputs and
how they can be used in other configurations or modules.
• A: Terraform outputs can be useful in scenarios where we want to
expose certain data from a module to be used in other
configurations, or just to display after the 'terraform apply'
completes. For instance, once you've created an Azure Virtual
Machine, you might want to output the public IP address so it can
be used to configure DNS or for other configuration.

Terraform Interview Questions – Variables and outputs


resource "azurerm_virtual_machine" "example" {
// configuration...
}

output "public_ip" {
value = azurerm_virtual_machine.example.public_ip_address
description = "The public IP address of the main server."
}

Terraform Interview Questions – Variables and outputs


• Q: Let's say you're working with a Terraform module for deploying an application on
Azure, and you need to pass different sets of variables for different environments
(prod, dev, staging). How would you manage this scenario?
• A: One way to manage this scenario is by using separate tfvars files for different
environments. We can create files like prod.tfvars, dev.tfvars, and staging.tfvars with
the respective configuration for each environment.
• For example, the dev.tfvars could look like this:
environment_name = "dev"
location = "West Europe"
instance_count = 1
Then, when running Terraform commands, you can specify the appropriate tfvars file for
each environment, like so: terraform apply -var-file=dev.tfvars.

Terraform Interview Questions – Variables and outputs


• Q: In what cases should you use local values in Terraform and how do they
differ from variables?
• A: Local values in Terraform are often used to simplify complex expressions and
make your configuration easier to read and maintain. Unlike input variables,
locals are internal to the module and not exposed to users of the module. They
are defined under locals block and can be referred anywhere within the
module.
• For instance, if you were frequently accessing the first resource group's name in
your configuration, you could simplify this by defining a local.
locals {
first_rg_name = azurerm_resource_group.example[0].name
}
You could then use local.first_rg_name to refer to that value.

Terraform Interview Questions – Variables and outputs


• Q: Suppose you are managing a Terraform configuration where sensitive data
(like passwords or API keys) are defined as variables. How would you handle
these to ensure that they are not exposed in any logs or console output?
• A: As of Terraform 0.14 and later, you can mark variables as sensitive, which will
prevent the value from showing in logs or console output.
variable "db_password" {
description = "The password for the database"
type = string
sensitive = true
}

Terraform Interview Questions – Variables and outputs


• Q: In a situation where you need to deploy the same infrastructure
components multiple times with minor differences, how could you
leverage variable lists or maps to simplify the process?

• A: In such a scenario, you can use variable lists or maps in


combination with for_each or count to create multiple instances of
a resource. For example, you could define a list of VM names and
then use that list to create multiple VMs.

Terraform Interview Questions – Variables and outputs


• Q: Suppose your Terraform module creates a Kubernetes cluster in Azure (AKS). You
want to output the kubeconfig file content so it can be used by other resources or
saved locally for kubectl use. How would you do this?
• A: The Azure Kubernetes Service resource in Terraform has an attribute called
kube_config_raw that provides the kubeconfig file content. You can output this like so:

output "kube_config_raw" {
value = azurerm_kubernetes_cluster.example.kube_config_raw
sensitive = true
description = "Raw Kubernetes config to be used by kubectl and other compatible tools"
}

Terraform Interview Questions – Variables and outputs


• Q: Imagine you have different variable values for different stages (dev, test, prod) of your infrastructure.
How would you structure your Terraform configurations to make it easy to apply changes to a specific
stage?
• A: One common way to handle this is to use separate Terraform workspace for each environment and
have a corresponding variable file for each workspace. You can use workspace-specific variable files
(dev.tfvars, test.tfvars, etc.) and apply them with the -var-file flag.

• Q: You are managing a complex infrastructure with many interdependent resources. How would you use
output variables to expose necessary data (like IP addresses, DNS names, etc.) for use in other
configurations or modules?
• A: Output variables can be used to expose data from one module to another, or to the root module. By
using outputs, the necessary data will be displayed in the console after running terraform apply. If the
state is stored remotely, the output values can also be queried using terraform output command.

Terraform Interview Questions – Variables and outputs


Q: If you are creating a module that will be used by other teams in your organization, how would you
document the variables and outputs so that it's clear what inputs the module expects and what it
outputs?
• A: Good documentation is key for module usability. Each variable and output should have a description
field explaining its purpose. This is not only helpful for users of the module but also valuable for
maintaining the module over time.
variable "instance_count" {
description = "Number of instances to create"
type = number
default =1
}
output "instance_ids" {
description = "List of IDs of the instances created"
value = aws_instance.example[*].id
}

Terraform Interview Questions – Variables and outputs


• Q: How do you set a default value for a variable and what
considerations should you make when deciding whether to use a
default value?
• A: You can provide a default value for a variable using the default
argument in the variable block. When deciding whether to use a
default value, consider whether there is a sensible default that
would make sense in most use cases. It's important that the default
value doesn't lead to unexpected results for users of your module or
configuration.

Terraform Interview Questions – Variables and outputs


• Q: If your Terraform output contains sensitive data (like passwords or private
keys), how can you prevent this information from being displayed in the
console?
• A: You can use the sensitive argument in an output block to prevent the value
from being displayed in the console

output "db_password" {
value = azurerm_mysql_database.example.password
sensitive = true
}

Terraform Interview Questions – Variables and outputs


• Q: In Terraform, how can you use a variable within a string?
• A: Terraform uses a technique called string interpolation. You can include a
variable within a string by wrapping it in ${}. For example, "Hello, ${var.name}!".

• Q: What is the difference between local values and input variables in


Terraform? When might you use one over the other?
• A: Both local values and input variables in Terraform can be used to assign a
name to an expression, so it can be reused. The main difference is that input
variables are parameters for a module, while local values are only within the
module where they are defined. You might use local values for temporary
complex objects or when a value is repeated many times, to make the
configuration more DRY (Don't Repeat Yourself).

Terraform Interview Questions – Variables and outputs


Q: You run terraform apply, and it fails with an error that the resource you're trying to create already exists.
What could be the cause, and how would you solve it?
A: This issue usually arises when the Terraform state file does not reflect the actual infrastructure. The
resource might have been manually created or managed by a different Terraform state file. One way to
resolve this is by importing the resource into the current Terraform state using terraform import.
Q: After running terraform plan, you notice that Terraform intends to recreate a resource even though you
haven't made any changes to its configuration. What might cause this, and how would you handle it?
A: This could be due to some default values in the resource configuration that Terraform doesn't correctly
identify as unchanged, or external changes that Terraform picked up during refresh. To understand why
Terraform wants to recreate the resource, you can use the -detailed-exitcode flag with terraform plan which
will provide more information about what's changing.
Q: You have a large and complex Terraform configuration, and Terraform is failing with an error message that
isn't clear. How would you approach debugging this?
A: Terraform provides several mechanisms to help with debugging. First, you can set the TF_LOG
environment variable to enable detailed logging. This often provides more context about the error. If the
configuration involves many modules, you can comment out parts of it and run terraform plan to narrow
down the source of the error.

Terraform Interview Questions – Troubleshooting


• Q: When working with an Azure resource, Terraform repeatedly times out or fails due to rate limiting. How would you mitigate
this?

• A: You can make use of the timeouts block in the Terraform resource to increase the amount of time Terraform will wait for
certain operations to complete. If the issue is due to rate limiting, you may need to reach out to Azure support to request a rate
limit increase.

• Q: You run terraform apply and it fails due to insufficient permissions. What steps would you take to resolve this?

• A: First, you would need to determine exactly what permissions are missing. The error message from terraform apply will
usually indicate what action Terraform is trying to take. Once you know what permissions are needed, you can update the IAM
policy for the account or role that Terraform is using to include these permissions.

• Q: Terraform is failing with an error related to a provider plugin. What steps would you take to resolve this?

• A: Provider-related errors can be caused by a variety of issues, such as an outdated or incompatible version of the provider,
network connectivity issues when downloading the provider, or bugs in the provider itself. Possible solutions include updating
the provider version, verifying network connectivity, or reporting the issue to the provider's maintainers if it appears to be a
bug.

Terraform Interview Questions – Troubleshooting


• Q: Terraform is not reflecting the changes you made in the Azure portal. How would
you troubleshoot this issue?
• A: This is likely due to the fact that Terraform's state file does not automatically sync
with changes made outside of Terraform. You can run terraform refresh to update the
state file with the current state of the resources as they exist in Azure. If necessary, you
can then adjust your Terraform configuration to match the desired state.

• Q: Terraform operations are failing with a lock error on the state file. How would you
resolve this?
• A: This error usually occurs when two or more Terraform operations are trying to
modify the state file concurrently. If you're sure no other operations are ongoing, a
lock error might be due to a previous operation that didn't clean up its lock properly. In
this case, you would need to manually unlock the state using terraform force-unlock.

Terraform Interview Questions – Troubleshooting


• Q: You have updated the provider version in your Terraform configuration, and now you are receiving a number of errors related
to resources that were not appearing before the update. How would you address this situation?

• A: This could be due to breaking changes introduced in the new version of the provider. Review the changelog for the new
provider version to see if there are any breaking changes that could affect your configuration. You may need to adjust your
configuration to be compatible with the new version, or pin your provider version to the older version until you can
accommodate the changes.

• Q: You run terraform apply and receive an error message indicating a conflict with a resource that was not part of your most
recent changes. What could be the cause, and how would you resolve this?

• A: This could indicate a dependency issue, where the resource reporting the error depends on another resource that has been
changed. Review the dependencies of the problematic resource to see if any of them have been recently altered. You may need
to adjust the configuration of these dependencies, or the way they are referenced.

• Q: Your Terraform operation failed midway through applying changes. Now when you run terraform apply again, you're seeing
errors about resources that already exist. What is happening, and how can you resolve this?

• A: This could be because the state file was not updated with resources that were successfully created before the operation
failed. Use the terraform import command to import the existing resources into your state file, then run terraform apply again.

Terraform Interview Questions – Troubleshooting


• Q: When applying a Terraform configuration, you receive an error that a required attribute is missing, but
you see it defined in your configuration. What could be happening here?
• A: This might be due to the attribute being in the wrong block or being incorrectly named in your
configuration. Verify that the attribute is correctly named and placed according to the resource's
documentation. Also, check for syntax errors or incorrect variable references.

• Q: Terraform fails during planning with an error message "Error: Reference to undeclared resource". How
would you fix this?
• A: This error occurs when your configuration refers to a resource that Terraform doesn't know about. This
could be due to a typo in the resource name, or the resource not being declared in your configuration.
Ensure that the referred resource is correctly declared in your configuration.

• Q: You made changes to your Terraform configuration, but when you run terraform plan, it doesn't detect
any changes. What might be happening?
• A: This could be due to a syntax error in your configuration causing Terraform to not recognize your
changes, or your changes might not actually result in any changes to the infrastructure. Check your
configuration for syntax errors and verify that your changes should result in changes to the infrastructure.

Terraform Interview Questions – Troubleshooting


• Q: When working with multiple Terraform workspaces, you receive an error that a resource already exists. How would you
investigate this?

• A: Check to make sure you're in the correct workspace by running terraform workspace show. Resources can overlap between
workspaces if they're not differentiated in some way. Consider using the workspace name in your resource names to avoid
collisions between workspaces.

• Q: You've been using Terraform to manage an Azure Kubernetes Service (AKS) cluster. Suddenly, terraform apply fails with errors
related to the AKS API. How would you approach this problem?

• A: This might be due to changes or temporary issues with the Azure API, or due to incompatibilities with the Terraform Azure
provider. You should first verify the issue isn't a temporary Azure outage by checking Azure's status page. Next, look for
reported issues on the Terraform provider's GitHub page. Finally, if no information is available, you can use Terraform's debug
logs to get more information and possibly report a new issue.

• Q: You're using Terraform to manage a complex system of interconnected resources. A change to one resource is causing
unexpected modifications to others. How would you investigate this?

• A: This is likely due to implicit or explicit dependencies between your resources. Terraform's graph command can be helpful
here - it outputs a visual representation of your resources and their dependencies, which you can use to understand how a
change might propagate through your system.

Terraform Interview Questions – Troubleshooting


• Q: After refactoring your Terraform code to use modules, you're seeing errors related to missing resources
during terraform apply. What might be happening and how can you fix it?
• A: This could be caused by the state file not being updated to reflect the new module structure, leading
Terraform to believe that resources are missing. You can use the terraform state mv command to update
the state file with the new resource addresses.

• Q: You have a resource that occasionally fails during terraform apply, but succeeds after a retry. How could
you make your Terraform code more robust to this kind of transient failure?
• A: You can use the lifecycle block's create_before_destroy argument to change the order in which
resources are managed. This makes Terraform create the new resource first during an update, so the old
resource can still serve requests until the new one is ready.

• Q: A Terraform module you're using from the Terraform Registry is causing errors during terraform apply.
What steps can you take to investigate and resolve this issue?
• A: You can review the module's documentation and source code to try to understand what might be
causing the issue. Check the module's issue tracker for similar problems reported by others. If you can't
find a solution, consider raising a new issue on the tracker.

Terraform Interview Questions – Troubleshooting


• Q: You're running terraform apply in an automated CI/CD pipeline, and it fails with a timeout error while waiting for a resource
to be created. What steps could you take to address this issue?

• A: This could be addressed by adjusting the timeout settings in your Terraform configuration, using the timeouts block. You
could also investigate whether there are any issues with the cloud provider that's causing resource creation to take longer than
expected.

• Q: You're using a Terraform module which creates an Azure Storage Account. The module fails with an error stating that the
storage account name isn't valid. How would you troubleshoot this issue?

• A: Storage account names in Azure have certain restrictions (like length and characters). If you're passing a name to the module,
ensure it adheres to these restrictions. If the module generates a name automatically, it could be a bug in the module, and you
might need to raise an issue with the module's maintainers.

• Q: You've made some changes to your Terraform configuration, but when you run terraform plan, it's showing a much larger set
of changes than you expected. How would you determine what's causing these extra changes?

• A: It could be due to a number of factors like implicit dependencies, changes in default values, etc. You could examine the
output of terraform plan to see which resources it's intending to change and why. You might also find the terraform graph
command useful to visualize dependencies between resources.

Terraform Interview Questions – Troubleshooting


Ansible Interview Questions
• What is the role of Ansible Playbooks?
• Ansible Playbooks are sets of 'plays' or 'tasks' used to define automation jobs in Ansible, they
are written in YAML format.

• What is the Ansible Galaxy?


• Ansible Galaxy is a shared repository for Ansible roles. Users can use Galaxy to share roles, and
to use roles created by other users.

• What is idempotency in Ansible?


• Idempotency means that operations in Ansible can be run multiple times without changing the
result after the first successful run.

Ansible Interview Questions – Basic


• Explain the Ansible architecture.
• Ansible's architecture is straightforward, it interacts with managed nodes through the ansible
program that is installed on the control node. The connection can be either using SSH (default)
or other protocols.

• What is ad hoc command in Ansible?


• Ad hoc commands are simple, standalone, one-liner ansible commands which are used for
quick tasks where writing a playbook could be an overkill.

• What is Ansible Tower?


• Ansible Tower is Ansible's enterprise-level product for centralized and controlled IT automation.

Ansible Interview Questions – Basic


• What is the use of Ansible's Inventory?
• The Inventory is a description of the nodes (can be a single machine, group of machines or even
remote systems) that can be accessed by Ansible.

• Can you name some Ansible modules?


• Some Ansible modules are command, shell, yum, apt, copy, template, file, service, and user.

• What is a 'Handler' in Ansible?


• Handlers are just like regular tasks but run only when notified by another task. They are used to
manage services like restarting a service when a config file changes.

Ansible Interview Questions – Basic


• What is a role in Ansible?
• A role is an independent block of tasks, files, templates, and variables, which can be used to
automatically load certain vars_files, tasks, and handlers based on a known file structure.

• How does Ansible use SSH?


• Ansible uses SSH to connect to remote hosts for execution of tasks. It does not require any
agent to be installed on the remote host.

• What is an Ansible task?


• A task is a block of code inside a playbook which calls an Ansible module.

Ansible Interview Questions – Basic


• What is Ansible Vault?
• Ansible Vault is a feature that allows users to encrypt values and data structures within Ansible
projects.

• How do you manage different environments in Ansible?


• You can manage different environments by using separate inventory files or separate
directories per environment.

• What is a check mode in Ansible?


• The check mode (also called dry run) is used for testing playbooks without making any changes
on the remote host.

Ansible Interview Questions – Basic


• What is 'gather_facts' in Ansible?
• 'gather_facts' is a method to retrieve system information from the remote hosts before
executing tasks.

• What is Jinja2 in Ansible?


• Jinja2 is a templating engine for Python, which is used in Ansible for manipulating variables.

• How can you access a variable of the first host in a group?


• You can access a variable using the 'hostvars' keyword: hostvars['hostname']['variablename']

Ansible Interview Questions – Basic


• What are the limitations of Ansible?
• Ansible can be slower due to its push-based architecture, it is not suitable for real-time data
processing, and the documentation might not cover all use-cases.

• How does Ansible handle error handling or failures?


• Ansible handles errors using directives such as "ignore_errors" or "failed_when". Ansible also
supports block/rescue/always structure for error handling.

• How do you organize playbooks for different environments?


• Different environments can be handled using separate inventory files or using group_vars and
host_vars directories.

Ansible Interview Questions – Basic


• How do you use Ansible for continuous deployment or continuous delivery?
• Ansible can be used for continuous deployment or delivery by automating tasks and using
Ansible Tower/AWX for orchestration.

• How do you debug an Ansible Playbook?


• Ansible playbook can be debugged using the debug module, verbosity flags like -v, -vv, -vvv, or
using the ansible-playbook --step and --list-tasks commands.

• How do you optimize the performance of Ansible in large environments?


• Performance can be optimized using strategies like "free" or "fastest", pipelining, fact caching,
and controlling the number of forks.

Ansible Interview Questions – Basic


• Explain the difference between 'include' and 'import' in Ansible.
• 'import' is static and is executed when the playbook is loaded. 'include' is dynamic and
is executed during the runtime.

• How do you manage secrets in Ansible?


• Secrets can be managed using Ansible Vault which encrypts sensitive data.

• Explain the concept of dynamic inventory in Ansible.


• Dynamic inventory is used when the targets are not known until the playbook run
time. It can be based on a script, API call, or cloud providers.

Ansible Interview Questions – Basic


• How do you create custom Ansible modules?
• Custom Ansible modules can be created in any language that can return JSON, though they are usually
written in Python.

• How do you use Ansible for network automation?


• Ansible can be used for network automation by using modules specifically designed for networking
devices. It can automate tasks like configuration management, test and validate the current network state,
etc.

• What are lookup plugins in Ansible?


• Lookup plugins are used to retrieve data from outside sources such as files, databases, or other systems.

Ansible Interview Questions – Basic


• How do you handle backends with varying capabilities in Ansible?
• You can handle backends with varying capabilities using conditional tasks based on the
gathered facts or variables.

• What is the 'async' mode in Ansible?


• 'async' mode allows a task to continue in the background while the playbook continues
executing the next tasks.

• What are some of the strategies in Ansible?


• Strategies control the order of task execution. Some of them are linear, debug, and free
strategies.

Ansible Interview Questions – Basic


• What is a fact in Ansible? Can you create a custom fact?
• Facts are information derived from speaking with remote systems. Yes, we can create custom
facts using the set_fact module or using local scripts.

• What is the purpose of the 'delegate_to' keyword in Ansible?


• 'delegate_to' allows you to perform a task on a host other than the current one in the
playbook.

• How do you test your Ansible roles?


• Ansible roles can be tested using tools such as Molecule, ansible-lint, or using CI/CD pipelines.

Ansible Interview Questions – Basic


• What is magic variable in Ansible?
• Magic variables in Ansible are special variables that have a predefined meaning in Ansible.
Some of them include hostvars, group_names, groups, and others.

• How do you manage rolling updates with Ansible?


• Ansible can manage rolling updates with the 'serial' keyword in playbooks, which allows you to
define how many hosts you want to manage at a time.

• Can Ansible use APIs for automation? How?


• Yes, Ansible can use APIs for automation. This can be achieved by using the URI module in
Ansible, which can interact with a lot of different APIs.

Ansible Interview Questions – Basic


• Q: How would you handle a situation where you needed to manage a large number of hosts in Ansible?
What strategies might you use?
• A: With Ansible, managing a large number of hosts can be effectively done through a well-structured
inventory file, utilizing host groups, and using variables for configuration settings. This makes it easier to
manage configurations across all systems or across a group of systems.

• Q: Can you describe a scenario in which you would use Ansible handlers?
• A: Handlers are tasks that only run when notified by another task. They're useful in scenarios where you
want to minimize the number of actions performed. For instance, if you're updating a configuration file for
a service, you might use a handler to restart that service - but the handler will only run if the configuration
file actually changed.

• Q: If an Ansible playbook fails in the middle, what steps would you take to debug it?
• A: Ansible provides a --step and --start-at-task debugging facility that can be used to debug playbooks. You
can also increase the verbosity of the Ansible command using the -v options to understand what went
wrong.

Ansible Interview Questions - Scenarios


• Q: In what scenario would you choose to use a template in Ansible? Can you provide an example?
• A: Templates in Ansible are useful when you want to dynamically generate configuration files based on
variables. For example, if you are deploying a web server to different environments (like staging and
production), you might use a template for the server's configuration file, with variables for settings that
change between environments.

• Q: Suppose you have a playbook that deploys a multi-tier application. How could you structure this
playbook to ensure the database is set up before the application server?
• A: You can use different plays within the same playbook to orchestrate this kind of multi-tier setup. The
first play would target the database servers and contain tasks to set up the database. The second play
would target the application servers and contain tasks to deploy the application, and it would only start
once the first play has completed.

• Q: Explain a scenario where you might use Ansible Vault.


• A: Ansible Vault is used to keep sensitive data like passwords or keys in encrypted files, rather than as
plaintext in playbooks or roles. An example scenario is when you need to provide a sudo password or
deploy an SSL certificate.

Ansible Interview Questions


• Q: How would you test an Ansible playbook before running it on production servers?
• A: You could use tools like Vagrant or Docker to create a local environment that matches your production
environment as closely as possible, and then run your playbooks against that. You might also use ansible-
playbook --check to do a "dry run" of the playbook. Ansible Molecule is also a popular tool for testing
Ansible roles.

• Q: Describe a scenario where using ansible-pull might be more appropriate than the usual ansible or
ansible-playbook commands.
• A: ansible-pull is useful in a scenario where the managed nodes are not always online or reachable. For
example, laptops that are part of a roaming workforce could use ansible-pull to apply configurations
whenever they're online, rather than having to be reachable by the control node at all times.

• Q: Suppose you've written a task that installs a package using the apt module, but it fails when you run
the playbook on a CentOS server. What's wrong, and how would you fix it?
• A: The apt module is specific to Debian-based systems, so it will fail on a CentOS server which is based on
Red Hat and uses yum or dnf. You could either use the package module which is generic, or add a
condition to your task to use apt or yum depending on the system.

Ansible Interview Questions


• Q: Can you describe a scenario where it might be helpful to define your own custom module in Ansible,
rather than using the pre-existing ones?
• A: Custom modules can be useful when there's no existing module that performs the function you need,
or when existing modules don't provide enough control or flexibility. For example, you might write a
custom module to interact with an internal or proprietary API in your organization.

• Q: How would you handle a situation where you needed to deploy an application that consists of multiple
microservices with Ansible?
• A: For deploying an application with multiple microservices, you can use Ansible roles. Each role
encapsulates a specific functionality or microservice, and then in your main playbook, you can call each
role. This makes the deployment more manageable and modular.

• Q: In what scenario would you choose to use a static inventory over a dynamic one in Ansible?
• A: If your infrastructure is relatively stable and doesn't change frequently, a static inventory can be simpler
and easier to manage. A dynamic inventory is more useful when you're working with cloud-based
infrastructure that can change frequently.

Ansible Interview Questions


• Q: How can Ansible be used to automate system updates and patches?
• A: You can write an Ansible playbook that uses the yum or apt module to update packages on
your servers. This playbook could be run manually when needed, or scheduled to run regularly
using cron or another job scheduler.

• Q: Can you describe a scenario where you would need to use loops in an Ansible playbook?
• A: Loops in Ansible can be used to perform the same action on multiple items. For example, you
might use a loop to install a list of packages, create multiple users, or make multiple
configuration changes.

• Q: Suppose you're managing a web server with Ansible, and you want to ensure that the server
is restarted whenever its configuration file is changed. How would you accomplish this?
• A: You could use a handler for this. The task that updates the configuration file would have a
notify directive that triggers the handler, and the handler would contain the task to restart the
server.

Ansible Interview Questions


• Q: How would you handle different system environments (development, staging, production)
with Ansible?
• A: Ansible provides a few ways to handle this. One way is to have a separate inventory file for
each environment. Another way is to use group variables to define environment-specific
settings.

• Q: Can you describe how to manage secrets in Ansible Playbooks?


• A: Secrets should never be hardcoded in Ansible playbooks. Ansible provides Ansible Vault to
encrypt sensitive data. This encrypted file can then be checked into source control and can be
decrypted on the fly when running a playbook using a vault password.

• Q: How would you configure a task to run only when a certain condition is met?
• A: Ansible uses conditional statements with the when keyword to control whether a task
should run. For example, a task might only run when a certain variable is set, or on certain
types of machines.

Ansible Interview Questions


• Q: How would you ensure idempotency in an Ansible Playbook?
• A: Idempotency can be achieved by using Ansible modules which are designed to be
idempotent, meaning they only make changes when necessary. When writing custom tasks, you
should use conditionals to ensure that actions are only taken when necessary.

• Q: How can you distribute the execution of a time-consuming task across multiple systems to
decrease the overall execution time?
• A: Ansible’s default behavior is to execute operations in parallel up to a certain level of
concurrency. You can control this level of concurrency with the forks configuration setting or
command-line option.

• Q: You're managing a set of servers that have slightly different requirements. How might you
handle this within a single Ansible role?
• A: You could use variables and conditionals to add flexibility to your role. For example, you
could have a variable for the required packages, and define this variable differently for each
group of servers.

Ansible Interview Questions


• Q: Suppose you want to execute a shell script on a remote server using Ansible. How
would you do that?
• A: You can use Ansible's script module to copy a script from your local machine to the
remote server and execute it. Alternatively, you could use the command or shell
module to run a script that already exists on the remote server.

• Q: What strategies can you use to optimize the performance of an Ansible playbook?
• A: There are several strategies to optimize Ansible performance such as:
• Limiting the hosts (using the --limit option or specifying a more specific host pattern).
• Using 'async' for tasks that can run in the background.
• Disabling facts gathering with gather_facts: no if it's not needed.
• Using pipelining = True in the Ansible configuration file.
• Reducing the forks number to limit parallelism if memory usage is a concern.

Ansible Interview Questions


• Q: Suppose you need to apply a complex change to your infrastructure that can't be accomplished with a
single Ansible module. How might you handle this?
• A: This could be accomplished by creating a custom module or using the command or shell modules to
run arbitrary commands. In some cases, you might also use a series of existing modules to accomplish the
change in multiple steps.

• Q: How would you approach rolling updates with Ansible?


• A: Rolling updates can be accomplished with Ansible using the serial keyword. This allows you to specify
how many hosts should be managed at a time. For example, if you have a web service running on 10 hosts
and you can tolerate taking down 2 at a time, you could use serial: 2 in your playbook.

• Q: Suppose you are required to manage the network devices along with servers in your organization. Can
Ansible be used for network automation?
• A: Yes, Ansible can be used for network automation as well. It has specific network modules to manage
network devices from various vendors. We can use these modules to automate the configuration and
management of network devices.

Ansible Interview Questions


• Q: How do you handle failures in Ansible playbook?
• A: Failures in Ansible playbook can be handled using 'ignore_errors' directive. By
default, Ansible stops executing the remaining actions after a task fails. But with the
'ignore_errors' option set to 'yes', Ansible will continue executing the next tasks.

• Q: How can you deploy an application to an auto-scaling infrastructure with Ansible?


• A: Auto-scaling environments can be handled using dynamic inventories. This allows
Ansible to query your infrastructure every time it runs, so it always knows about all the
servers. You can also use Ansible to configure the base image or launch configurations
used by your auto-scaling group, ensuring all new servers are configured correctly.

• Q: Can you talk about a complex Ansible playbook that you have written and what
were the challenges?
• A: The answer to this question would be based on the individual's personal experience.

Ansible Interview Questions


• Q: How can you test an Ansible playbook to make sure it works before applying it to the production?
• A: Testing Ansible playbook can be done by creating a separate testing or staging environment similar to
the production environment. Moreover, Ansible has a 'check' mode (--check) which does a dry run and
shows what changes will be made without actually executing them.

• Q: Explain how you would use dynamic inventory in a cloud environment like AWS or Azure. What are the
benefits and what are the potential drawbacks?
• A: Dynamic inventory is exceptionally useful in cloud environments where the inventory can change often.
You can use a dynamic inventory script provided by Ansible or a third-party script to connect to the cloud
provider's API and get a list of instances based on their state, tags, etc. The main benefit is up-to-date
inventory; the drawback could be the additional time to get the inventory and the requirement for valid
API credentials during playbook execution.

• Q: How would you use Ansible to ensure that a certain security policy is applied across all servers in an
organization?
• A: You can define the security policy as a series of tasks or roles (such as installing certain packages,
ensuring certain services are running, or applying specific firewall rules), and then apply those roles to all
hosts in your inventory. You could also set up a scheduled job to regularly apply the playbook and ensure
continued compliance.

Ansible Interview Questions


• Q: If you are required to handle a hybrid environment with Linux and Windows servers, how
would you structure your Ansible roles and playbooks?
• A: You can use conditionals based on the ansible_os_family variable to handle tasks differently
based on the OS type. Alternatively, you can separate out tasks into different roles based on the
OS, or even use separate playbooks for Windows and Linux hosts.

• Q: What strategy would you use to deploy a multi-tier application where the different tiers (like
the database, application server, and load balancer) have to be deployed in a certain order?
• A: This can be achieved by structuring your playbook into multiple plays, each targeting a
specific tier. Ansible runs plays in the order they're defined, so the order of deployment can be
controlled in that way.

• Q: How would you handle a scenario where an Ansible playbook needs to interact with APIs?
• A: Ansible has the uri module which can be used to interact with a REST API. You can use it to
make GET, POST, PUT, DELETE, or HEAD requests.

Ansible Interview Questions


• Q: How would you use Ansible in a CI/CD pipeline?
• A: Ansible can be used to automate deployment as part of a CI/CD pipeline. After the build
stage, you could trigger an Ansible playbook to deploy the built application. If you're using
Jenkins, for example, you could use the ansible-playbook command directly in your Jenkins
pipeline script, or use the Ansible plugin for Jenkins.

• Q: You're tasked with managing a legacy application with Ansible. This application has a specific
package that must be compiled from source with custom flags. How would you handle this with
Ansible?
• A: You can use the command or shell module to run the necessary commands to compile the
package. Remember to use creates or a similar method to achieve idempotency.

• Q: You are deploying a critical application using Ansible. How would you design a strategy for
zero-downtime deployments?
• A: Zero downtime deployment can be achieved using a rolling update strategy, where you
update a few hosts at a time, ensuring that the majority of your infrastructure remains
available. This can be done using the serial keyword in Ansible.

Ansible Interview Questions


• Q: If you need to execute a task that takes a long time to complete, how would
you manage that task in Ansible to prevent it from timing out?
• A: You can use the async and poll parameters with a task in Ansible to manage
long-running tasks. async launches the task asynchronously, and poll checks
back on the task status at a specified interval.

• Q: How would you handle a large inventory with thousands of hosts in Ansible?
• A: You can use inventory scripts to pull inventory from a data source like a
CMDB, cloud inventory, etc. In the Ansible configuration, you can also increase
the forks to execute more parallel tasks. Also, breaking the hosts into smaller,
logical groups in the inventory can help manage it better.

Ansible Interview Questions


• Q: Suppose you need to quickly check if all servers in your inventory are reachable, without
running a full playbook. How would you do this with an ad-hoc command?
• A: You can use the ping module in an ad-hoc command to check if all servers are reachable. The
command would be: ansible all -m ping.

• Q: You need to stop a service immediately on all web servers due to an ongoing security
incident. How would you use an ad-hoc command to do this?
• A: To stop a service immediately, you could use the service module in an ad-hoc command. For
example, if you needed to stop the apache2 service, you could run: ansible webservers -m
service -a "name=apache2 state=stopped" -b.

• Q: Can you give an example of using an ad-hoc command to gather information about the
servers in your inventory?
• A: You can use the setup module to gather information about the servers. For example, to
gather facts about all servers in your inventory, you could run: ansible all -m setup.

Ansible Interview Questions – Adhoc commands


• Q: How would you use an ad-hoc command to copy a file to a group of servers?
• A: To copy a file, you can use the copy module in an ad-hoc command. For example:
ansible webservers -m copy -a "src=/etc/hosts dest=/tmp/hosts" -b.

• Q: Imagine you need to quickly create a directory on all your servers. How would you
achieve this using an ad-hoc command in Ansible?
• A: You can use the file module with state set to directory to create a directory. The
command would be: ansible all -m file -a "path=/path/to/directory state=directory" -b.

• Q: How would you use an ad-hoc command to install a package on a group of servers?
• A: To install a package, you could use the yum or apt module depending on the Linux
distribution. For example, to install nginx on a group of Debian servers, you could run:
ansible webservers -m apt -a "name=nginx state=present" -b.

Ansible Interview Questions – Adhoc commands


• Q: How can you execute a shell command on all servers in a group using an ad-hoc
command?
• A: To execute a shell command, you can use the command or shell module in an ad-
hoc command. For example: ansible webservers -m shell -a "ls -l /var/www/html" -b.

• Q: How would you use an ad-hoc command to change file permissions on all servers in
a group?
• A: To change file permissions, you can use the file module in an ad-hoc command. For
example: ansible webservers -m file -a "path=/var/www/html/index.html mode=0644"
-b.

• Q: How would you use an ad-hoc command to add a user to a group of servers?
• A: To add a user, you can use the user module in an ad-hoc command. For example:
ansible webservers -m user -a "name=john state=present" -b.

Ansible Interview Questions – Adhoc commands


• Q: How would you use an ad-hoc command to start a service on all servers in a group?
• A: To start a service, you could use the service module in an ad-hoc command. For
example, to start the apache2 service, you could run: ansible webservers -m service -a
"name=apache2 state=started" -b.

• Q: How would you use an ad-hoc command to reboot all servers in your inventory?
• A: You can use the reboot module to reboot all servers. The command would be:
ansible all -m reboot -b.

• Q: You need to quickly check the disk usage on all servers. How can you achieve this
with an ad-hoc command?
• A: You can use the command module to execute the df command on all servers. The
command would be: ansible all -m command -a "df -h".

Ansible Interview Questions – Adhoc commands


• Q: If you want to fetch a log file from multiple servers for debugging an issue, how would you
use an ad-hoc command to do it?
• A: You can use the fetch module to fetch a file from remote servers. For example: ansible all -m
fetch -a "src=/var/log/syslog dest=/tmp/logs/" -b.

• Q: How can you change the owner of a file on all servers in a group using an ad-hoc command?
• A: To change file ownership, you can use the file module in an ad-hoc command. For example:
ansible webservers -m file -a "path=/var/www/html/index.html owner=www-data" -b.

• Q: Suppose you want to update all packages on your servers. How can you achieve this using an
ad-hoc command?
• A: To update all packages, you could use the yum or apt module depending on the Linux
distribution. For example, to update all packages on a group of Debian servers, you could run:
ansible webservers -m apt -a "upgrade=dist" -b.

Ansible Interview Questions – Adhoc commands


• Q: How would you remove a user from a group of servers using an ad-hoc command?
• A: To remove a user, you can use the user module in an ad-hoc command with state=absent.
For example: ansible webservers -m user -a "name=john state=absent" -b.

• Q: If you want to find all instances of a specific text in a directory on a remote server, how
would you use an ad-hoc command to do it?
• A: You can use the command module with grep to find a specific text. For example: ansible
webservers -m command -a "grep -r 'text_to_find' /path/to/directory".

• Q: Suppose you want to create a symlink on a group of servers. How can you achieve this using
an ad-hoc command?
• A: To create a symlink, you can use the file module in an ad-hoc command with state=link. For
example: ansible webservers -m file -a "src=/path/to/file dest=/path/to/symlink state=link" -b.

Ansible Interview Questions – Adhoc commands


• Q: How would you use an ad-hoc command to check the uptime of all servers in your
inventory?
• A: You can use the command module to execute the uptime command on all servers. The
command would be: ansible all -m command -a "uptime".

• Q: If you want to check the contents of a file on multiple servers, how would you use an ad-hoc
command to do it?
• A: You can use the command module with cat to check the contents of a file. For example:
ansible webservers -m command -a "cat /path/to/file".

• Q: How would you use an ad-hoc command to change the kernel parameters on your servers
without a reboot?
• A: You can use the sysctl module for this. For example, to change the value of
net.core.somaxconn, you would run: ansible all -m sysctl -a "name=net.core.somaxconn
value=1024 state=present reload=yes" -b.

Ansible Interview Questions – Adhoc commands


• Q: How would you handle a scenario where you need to add an entry to the hosts file on all
servers in your inventory using an ad-hoc command?
• A: You could use the lineinfile module. For example: ansible all -m lineinfile -a "path=/etc/hosts
line='192.168.1.10 myhost' state=present" -b.

• Q: Can you provide an example of using an ad-hoc command to create a cron job on all servers
in a group?
• A: You can use the cron module for this. For example, to create a cron job that runs a script
every day at 5 AM, you could run: ansible webservers -m cron -a "name='daily script' minute=0
hour=5 job='/path/to/script.sh' state=present" -b.

• Q: How would you use an ad-hoc command to enable a system service to start at boot on all
servers?
• A: To enable a service to start at boot, you could use the service module. For example: ansible
all -m service -a "name=nginx enabled=yes" -b.

Ansible Interview Questions – Adhoc commands


• Q: If you want to change the SELinux state on all servers in your inventory using an ad-hoc
command, how would you do it?
• A: You can use the selinux module to manage SELinux states. For example, to set SELinux to
permissive mode, you could run: ansible all -m selinux -a "state=permissive" -b.

• Q: Suppose you need to add a repository on all your Ubuntu servers. How would you use an ad-
hoc command to do it?
• A: To add a repository, you can use the apt_repository module. For example: ansible
webservers -m apt_repository -a "repo='ppa:nginx/stable' state=present" -b.

• Q: How would you use an ad-hoc command to lock a user account on all servers in a group?
• A: To lock a user account, you can use the user module with password_lock=yes. For example:
ansible webservers -m user -a "name=john password_lock=yes" -b.

Ansible Interview Questions – Adhoc commands


• Q: If you need to quickly drain all connections from your load balancer for maintenance using
an ad-hoc command, how would you do it?
• A: This depends on the type of load balancer you're using, but assuming you're using an Nginx
load balancer, you could use the command module to rename the upstream configuration file
and reload the Nginx service: ansible loadbalancers -m command -a "mv
/etc/nginx/conf.d/upstream.conf /etc/nginx/conf.d/upstream.conf.bak && service nginx reload"
-b.

• Q: How would you use an ad-hoc command to create a RAID array on your servers?
• A: You can use the mdadm module to manage RAID arrays. For example, to create a RAID1
array, you could run: ansible all -m mdadm -a "devices=['/dev/sda', '/dev/sdb'] level=raid1
name=my_raid state=present" -b.

• Q: If you want to unmount a filesystem on all servers in a group using an ad-hoc command, how
would you do it?
• A: You can use the mount module with state=unmounted. For example: ansible webservers -m
mount -a "path=/mnt/data state=unmounted" -b.

Ansible Interview Questions – Adhoc commands


• Q: How would you use an ad-hoc command to change the date and time on all servers in your
inventory?
• A: You can use the command module with date to change the date and time. For example:
ansible all -m command -a "date --set='2024-12-31 23:59'" -b.

• Q: Suppose you need to disable password authentication on all your servers. How would you
use an ad-hoc command to do it?
• A: You can use the lineinfile module to update the SSHD config file. For example: ansible all -m
lineinfile -a "path=/etc/ssh/sshd_config regexp='^PasswordAuthentication'
line='PasswordAuthentication no' state=present" -b and then restart the sshd service: ansible
all -m service -a "name=sshd state=restarted" -b.

• Q: How would you use an ad-hoc command to monitor real-time system performance on your
servers?
• A: You can use the command module with top or htop to monitor real-time system
performance. For example: ansible all -m command -a "top -b -n 1".

Ansible Interview Questions – Adhoc commands


• Q: If you want to find the process using the most memory on all servers
in a group using an ad-hoc command, how would you do it?
• A: You can use the command module with ps to find the process using
the most memory. For example: ansible webservers -m command -a "ps
aux --sort=-%mem | head -n 2".

• Q: How would you use an ad-hoc command to verify that a specific TCP
port is open on all servers in your inventory?
• A: You can use the wait_for module to check if a TCP port is open. For
example, to check if port 80 is open, you could run: ansible all -m
wait_for -a "host=localhost port=80 timeout=1". This will attempt to
connect to port 80 on each server and report an error if it fails.

Ansible Interview Questions – Adhoc commands


• Q: You are deploying an application that requires a specific version of Python. How would you
ensure that Python version is installed using an Ansible playbook?
• A: You can use the apt module (or yum depending on your Linux distribution) to install the
specific Python version.
---
- name: Install specific version of Python
hosts: webservers
become: yes
tasks:
- name: Install Python 3.8
apt:
name: python3.8
state: present

Ansible Interview Questions – PlayBooks


• Q: How would you use a playbook to change the permissions of a directory and its contents on your
servers?
• A: You can use the file module in a playbook to change permissions.
---
- name: Change directory permissions
hosts: webservers
become: yes
tasks:
- name: Change permissions of /var/www and its contents to 755
file:
path: /var/www
mode: '0755'
recurse: yes

Ansible Interview Questions – PlayBooks


• Q: Suppose you need to deploy a web server on your hosts and ensure it's running. How would you create a playbook for this task?

• A: You can use the apt module to install the web server, and the service module to ensure it's running. Here is an example playbook for deploying
an Nginx server:

---

- name: Deploy Nginx web server

hosts: webservers

become: yes

tasks:

- name: Install Nginx

apt:

name: nginx

state: present

- name: Ensure Nginx is running

service:

name: nginx

state: started

enabled: yes

Ansible Interview Questions – PlayBooks


• Q: How would you configure a playbook to only execute a task when a certain
condition is met?
• A: You can use the when keyword in a task to conditionally execute it.
---
- name: Conditional execution playbook
hosts: all
tasks:
- name: Install Nginx on Debian systems
apt:
name: nginx
state: present
when: ansible_os_family == "Debian"

Ansible Interview Questions – PlayBooks


• Q: How can you manage errors in a playbook? For example, how can you ignore certain
errors and not others?
• A: You can manage errors in a playbook using the ignore_errors keyword or using
failed_when to define your own failure conditions. Here's an example of how to ignore
errors for a specific task:
---
- name: Playbook with error management
hosts: all
tasks:
- name: This task might fail, but we'll ignore its errors
command: /bin/false
ignore_errors: true

Ansible Interview Questions – PlayBooks


• Q: Suppose you need to deploy a package on all your servers, but you want to do it one server at a time to
minimize impact on your service. How would you configure the playbook?
• A: You can use the serial keyword to limit the number of hosts that are managed at the same time.
---
- name: Serial execution playbook
hosts: all
serial: 1
tasks:
- name: Install a package
apt:
name: some-package
state: present

Ansible Interview Questions – PlayBooks


• Q: How would you use variables in a playbook to make it more flexible?
• A: You can define variables at the beginning of your playbook using the vars keyword, and then use them
later in your tasks.
---
- name: Playbook with variables
hosts: all
vars:
webserver_package: nginx
tasks:
- name: Install web server
apt:
name: "{{ webserver_package }}"
state: present

Ansible Interview Questions – PlayBooks


• Q: How would you use a playbook to add a user to your servers and set up their SSH keys?

• A: You can use the user and authorized_key modules to accomplish this.

---

- name: Add user and setup SSH keys

hosts: all

become: yes

tasks:

- name: Add the user

user:

name: myuser

state: present

- name: Setup SSH keys

authorized_key:

user: myuser

key: "{{ lookup('file', '/path/to/public_key') }}"

Ansible Interview Questions – PlayBooks


• Q: How would you configure a playbook to include tasks from another
playbook?
• A: You can use the import_tasks or include_tasks keywords to include
tasks from another playbook.
---
- name: Playbook with included tasks
hosts: all
tasks:
- import_tasks: other_tasks.yml

Ansible Interview Questions – PlayBooks


• Q: How would you use a playbook to manage your servers' firewall rules?
• A: You can use the ufw module to manage firewall rules on your servers.
---
- name: Firewall management playbook
hosts: all
become: yes
tasks:
- name: Allow SSH connections
ufw:
rule: allow
port: 22
proto: tcp

Ansible Interview Questions – PlayBooks


• Q: How would you create a playbook to deploy a Docker container on your hosts?
• A: You can use the docker_container module in Ansible to manage Docker containers.
---
- name: Deploy Docker container
hosts: webservers
become: yes
tasks:
- name: Run a Docker container
docker_container:
name: mycontainer
image: myimage
state: started

Ansible Interview Questions – PlayBooks


• Q: How would you create a playbook to update all packages on your hosts without any interruptions (e.g.
without requiring a reboot)?
• A: You can use the apt module (or yum module for Red Hat-based systems) with the upgrade option set to
'dist’.
---
- name: Update all packages
hosts: all
become: yes
tasks:
- name: Upgrade all packages
apt:
upgrade: dist
update_cache: yes

Ansible Interview Questions – PlayBooks


• Q: You need to deploy a web application that requires environment variables to be set. How would you use a playbook to set
these environment variables for the application?

• A: You can use the environment keyword in your playbook to set environment variables.

---

- name: Deploy web application

hosts: webservers

become: yes

tasks:

- name: Run web application with environment variables

command: /path/to/web_application

environment:

DB_HOST: db.example.com

DB_USER: dbuser

DB_PASS: dbpass

Ansible Interview Questions – PlayBooks


• Q: How would you handle sensitive data, like passwords, in your Ansible playbooks?

• A: You can use Ansible Vault to encrypt sensitive data. You can then reference the encrypted data in your playbooks.

---

- name: Playbook with sensitive data

hosts: all

become: yes

vars:

db_password: "{{ vault_db_password }}"

tasks:

- name: Create database user

mysql_user:

name: dbuser

password: "{{ db_password }}"

priv: '*.*:ALL'

state: present

In this playbook, vault_db_password is an encrypted variable stored in a separate file and decrypted at runtime.

Ansible Interview Questions – PlayBooks


Q: You have a web application that needs to retrieve data from a remote API. How could you use Ansible
to securely store the API key needed by the application?
• A: You could use Ansible Vault to encrypt the API key and store it securely.
---
- hosts: webservers
vars:
api_key: "{{ vault_api_key }}"
tasks:
- name: Create environment file
template:
src: /path/to/template.j2
dest: /path/to/environment_file
In this playbook, vault_api_key is an encrypted variable stored in a separate file and decrypted at runtime.
The template file could look like this:
API_KEY={{ api_key }}

Ansible Interview Questions – PlayBooks


• Q: How would you test an Ansible playbook before running it on
production servers?
• A: You could use the --check and --diff flags to do a dry run of the
playbook and see what changes it would make:
ansible-playbook playbook.yml --check --diff
Additionally, you could use a testing tool like Molecule to create a
local test environment and run your playbook against that.

Ansible Interview Questions


• Q: You need to execute a command that takes a long time to complete. How would you use
Ansible to execute this command without waiting for it to finish?
• A: You can use the async and poll options to start a task asynchronously.
---
- hosts: all
tasks:
- name: Execute long-running command
command: /path/to/long_running_command
async: 3600
poll: 0
In this playbook, the long_running_command will start and Ansible will move on to the next task
without waiting for it to complete.

Ansible Interview Questions – PlayBooks


• Q: How would you set up a playbook to ensure that all your servers
have the same sshd_config file, and that the SSH service is restarted
if the file changes?
• A: You can use the copy module to copy the sshd_config file and the
notify keyword to restart the service when the file changes.

Ansible Interview Questions – PlayBooks


---

- hosts: all

become: yes

tasks:

- name: Ensure sshd_config is correct

copy:

src: /path/to/sshd_config

dest: /etc/ssh/sshd_config

owner: root

group: root

mode: 0644

notify: Restart ssh

handlers:

- name: Restart ssh

service:

name: ssh

state: restarted

In this playbook, if the sshd_config file changes, the "Restart ssh" handler will be triggered and the SSH service will be restarted.
• Q: Your playbook execution fails midway with a 'Permission Denied' error while trying to copy a
file to a remote machine. How would you investigate and solve this problem?
– A: This issue may occur due to inadequate permissions. To solve this, you can:
– Ensure that the playbook is running with necessary privileges. If you're copying a file to a directory that
requires elevated permissions, you might need to use become: yes to elevate privileges for that task.
– Check if the SSH user has the required permissions on the remote machine.
– If using SSH keys for authentication, ensure the key has been properly added to the remote machine.

• Q: A task in your playbook is idempotent but still results in a change every time you run the
playbook. How would you identify the issue and what could be the possible reasons?
– A: The task could be working with dynamic data that changes with each run, or there might be an issue with
the task itself. Here are a few ways to debug:
– Add the -v flag (verbose mode) when running the playbook to get more information about what the task is
doing.
– If the task is using a module, check the documentation for that module to see if there are any known issues
with idempotency.
– Use the debug module to print out the values of variables and other information that might help identify
the issue.

Ansible Interview Questions – PlayBooks


• Q: While running an Ansible playbook, you encounter an error that states "Failed to connect to
the host via ssh." How would you troubleshoot this?
– A: The error indicates that Ansible is unable to establish an SSH connection to the host. Here's how you can
troubleshoot:
– Confirm that you can manually SSH into the server from the control node.
– Check that your inventory file is correct (i.e., it has the correct usernames, hostnames/IP addresses).
– Ensure that the correct SSH keys are being used, and that the necessary keys are added to the ssh-agent if
you're using one.
– If the target is a remote network, ensure there are no firewall rules or network issues preventing the
connection.

• Q: A playbook is running slower than expected. What steps can you take to identify the cause
and remedy the situation?
– A: Several things could cause a playbook to run slowly. Here's how you might troubleshoot:
– Use the ANSIBLE_DEBUG environment variable to enable debug mode and identify any tasks that are taking
a long time to execute.
– Use the profile_tasks callback plugin to get a detailed breakdown of how long each task takes.
– Check network latency between the control node and the hosts. High latency can slow down playbook
execution, especially if many tasks are being performed.
– If many tasks are being performed on a large number of hosts, consider using strategies like free or mitogen
for task execution to speed up the playbook.

Ansible Interview Questions – PlayBooks Troubleshooting


• Q: You're trying to use a module in your playbook, but Ansible can't seem to find it.
What steps would you take to troubleshoot this issue?
– A: This could be due to a few reasons:
– The module might not be available in the version of Ansible you're running. Check the module
documentation to confirm which versions of Ansible support it.
– If it's a custom module, make sure it's located in a directory specified in the ANSIBLE_LIBRARY
environment variable, or in the /library directory adjacent to your playbook.
– Make sure the module name in the playbook doesn't contain any typos or errors.
– If the module requires Python libraries that are not installed, it may not be found. Check the
module's documentation for any dependencies.

• Q: Your playbook fails with an error saying that a Jinja2 variable is undefined. How
would you resolve this issue?
– A: An "undefined variable" error typically means that Ansible can't find a variable that you're
referencing in your playbook. Here's how you might troubleshoot:
– Check for typos in the variable name in your playbook and in any variable files.
– Make sure the variable is defined for the correct hosts or groups. For example, if the variable is
defined in a host_vars file for a particular host, it won't be available for other hosts.
– If the variable is defined in a role, make sure the role is included in your playbook.

Ansible Interview Questions – PlayBooks Troubleshooting


• Q: A particular task in your playbook is failing, but you're not sure why.
How can you get more information about what's happening during the
task execution?
• A: If a task is failing and the output isn't clear, you can use the -vvv (or -
vvvv for even more verbosity) option when running the playbook to get
more detailed output:
ansible-playbook playbook.yml -vvv
This will show the exact commands that Ansible is running, along with the
output from those commands. If you're using modules, you might also find
the ansible_module_stderr and ansible_module_stdout variables in the
task results helpful.

Ansible Interview Questions – PlayBooks Troubleshooting


• Q: You're unable to retrieve the output of a command that's being run by the command module in your
playbook. How can you capture and view this output?
• A: You can use the register keyword to save the output of a command to a variable, and then use the
debug module to print the output.
---
- hosts: all
tasks:
- name: Run command
command: /path/to/command
register: command_output

- name: Print command output


debug:
var: command_output.stdout

Ansible Interview Questions – PlayBooks Troubleshooting


• Q: Despite no changes in your playbook or environment, a playbook that
used to work is now failing. What could be the reason and how would
you troubleshoot this?
– A: Here's how you might troubleshoot:
– Check if there have been any changes on the hosts that the playbook is running
against. This could include software updates, configuration changes, or network
changes.
– If your playbook depends on external resources (like APIs, remote files, etc.), check
if there have been any changes or outages with these resources.
– If you're using roles or modules from Ansible Galaxy, check if there have been any
updates or changes that might affect your playbook.
– Try running the playbook with the -vvv option to get more detailed output which
might help identify the issue.

Ansible Interview Questions – PlayBooks Troubleshooting


• Q: Your playbook uses a role from Ansible Galaxy, but when you try to run
it, Ansible can't seem to find the role. What might be the problem and
how would you fix it?
– A: If Ansible can't find a role, it might be due to a few reasons:
– The role hasn't been downloaded. You can use the ansible-galaxy command to
download the role:
ansible-galaxy install username.rolename
The role is not in a directory that Ansible is checking. By default, Ansible
looks in roles/ and /etc/ansible/roles. You can add other directories by
updating the roles_path option in the Ansible configuration file, or by
setting the ANSIBLE_ROLES_PATH environment variable.
There's a typo or mistake in the role name in the playbook. Make sure the
role name matches exactly with the name of the role on Ansible Galaxy.

Ansible Interview Questions – PlayBooks Troubleshooting


• Can you explain what an Ansible variable is and provide an example of how to define and use a variable in a playbook?

• Answer: Ansible variables are used to deal with differences between systems. With Ansible, you can start tasks on remote hosts
and refer to a variable that contains relevant system-specific information. Here is an example of how to define and use a
variable in a playbook:

---

- hosts: all

vars:

my_var: "Hello, World!"

tasks:

- name: Print a variable

debug:

msg: "{{ my_var }}"

In this playbook, my_var is defined under the vars keyword and then it's used within a task using the {{ my_var }} syntax.

Ansible Interview Questions – Variables


• What is the difference between defining variables in a playbook and
in an inventory file?
– Answer: Variables defined in an inventory file are generally meant for setting
variables that relate to specific hosts or groups. For instance, you could set
the http_port variable to a different value for each host or group in the
inventory file.
– Variables defined in a playbook are typically used for values that are going to
be the same regardless of the host that the playbook is run against. These
could be things like the name of a package to install.

Ansible Interview Questions – Variables


• Imagine you are given a playbook which uses a large number of
variables, but it seems like some variables are not being set
correctly leading to errors. What steps would you take to debug this
issue?
– Answer: You might start by using the debug module in the playbook to print
out the value of variables at various points to see what they are set to. You
could also check all places where a variable could be set to see if there are
any typos or oversights. This would include the playbook itself, the inventory
file, any group_vars/ or host_vars/ directories, and any role defaults or vars
files. Finally, you can also run the playbook with the -vvv option to get more
detailed output, which might help identify any issues with variables.

Ansible Interview Questions – Variables


• Can you explain how you might use a variable in a task loop and provide an example?

• Answer: A variable can be used in a loop in Ansible to perform a task multiple times, altering the task slightly each time. The
loop keyword is used to define the loop, and variables are used to provide the values to loop over.

---

- hosts: all

vars:

packages:

- nginx

- postgresql

tasks:

- name: Install multiple packages

apt:

name: "{{ item }}"

state: present

loop: "{{ packages }}“

In this playbook, the packages variable is a list of packages to install. The loop iterates over each item in the list, installing the
package on each iteration.
Q: Describe a scenario where you would use host-specific variables in Ansible. How would you define and use these variables?

A: You might use host-specific variables when different hosts need different values for the same variable. For instance, if you have
a web server and a database server, they might have different storage needs. To define host-specific variables, you can create a file
named after the host in the host_vars directory. For example, host_vars/dbserver could contain storage: 200GB, and
host_vars/webserver could contain storage: 50GB. In your playbook, you can then use the storage variable, and Ansible will
automatically use the correct value for each host.

Q: How can you encrypt sensitive information like passwords when using Ansible variables?

A: Ansible Vault is a tool for securely storing sensitive data. You can create an encrypted file with ansible-vault create filename, and
then refer to that file in your playbooks or inventory file with vars_files or group_vars, respectively. When running the playbook,
you need to either provide the vault password with --ask-vault-pass or put the password in a file and reference it with --vault-
password-file.

Q: What is the precedence of variables in Ansible when the same variable is defined in multiple places?

A: Ansible has a specific order of precedence for variables defined in different places. The order from lowest to highest precedence
is:
– variables defined in inventory file or passed via command line
– facts discovered about a system
– variables loaded from files specified in vars_files
– variables defined in playbooks
– variables defined in host_vars/group_vars
– variables passed to an include statement
– role defaults

Ansible Interview Questions – Variables


Q: How can you use variables to define the hosts or groups a playbook runs against?
A: You can use variables in the hosts field of a playbook to dynamically define which
hosts or groups to target. For example, if you have a variable target_hosts that is set to
"webservers" or "dbservers" depending on some external condition, you could use that
variable in your playbook like this:
---
- hosts: "{{ target_hosts }}"
tasks:
...
You would then need to pass in the target_hosts variable when running the playbook,
either through the command line with -e or through an external variables file.

Ansible Interview Questions – Variables


Q: How can you use a variable in the name field of a task in Ansible?
A: You can include a variable in the name field of a task by using the Jinja2 templating syntax.
---
- hosts: all
vars:
task_name: "my custom task"
tasks:
- name: "Running {{ task_name }}"
command: echo "Hello, world!"

In this example, the variable task_name is included in the task name, so the output when running
the playbook will be "Running my custom task".

Ansible Interview Questions – Variables


Let's assume you have a large number of servers where the Ansible control machine is
running, and you need to gather facts from all target hosts. However, the issue is that the
task is taking too long to complete, and this delays the completion of the entire
playbook.
What would be the possible reasons for the delay?
How can you optimize the fact gathering process in Ansible?
– Answer: The primary reason for the delay could be the fact gathering process itself. By default,
Ansible gathers all facts about the remote systems before executing any tasks. If you have a large
number of hosts, this could take a significant amount of time.
– You can optimize the fact gathering process in Ansible in the following ways:
– Disabling fact gathering: If you don't need to use facts in your playbook, you can disable fact
gathering by setting gather_facts: no at the playbook level.
– Gathering only required facts: If you only need certain facts, you can gather subsets of facts to
speed up the process. This can be done by specifying the subset in the gather_subset option in
ansible.cfg or at the playbook level.

Ansible Interview Questions – Variables - facts


- hosts: all
gather_facts: yes
gather_subset:
- network
tasks:
- name: Gather network facts only
debug:
var: ansible_facts
In this playbook, only network-related facts are gathered which can speed up the
execution of the playbook.

Ansible Interview Questions


• You are working on an application that requires system-level
configuration changes to be made during the installation process. To
automate this, you're using Ansible. However, different
environments (dev, staging, prod) require slightly different
configurations. How would you manage these differences with
Ansible facts?
– Answer: You can use Ansible facts combined with conditional tasks to solve
this problem. Ansible facts can provide you the information about the
environment (hostname, IP address, OS family, etc). You can set up your
inventory file in a way that groups hosts based on their environment. Then,
using the group_names magic variable, which is a list of all the groups the
current host is a member of, you can apply different configurations.

Ansible Interview Questions – Variables - facts


• Inventory file:
[dev]
dev_host_1
dev_host_2

[staging]
staging_host_1
staging_host_2

[prod]
prod_host_1
prod_host_2

Ansible Interview Questions – Variables - facts


• Playbook:

- hosts: all

tasks:

- name: Apply dev configuration

command: ./apply_dev_configuration.sh

when: "'dev' in group_names“

- name: Apply staging configuration

command: ./apply_staging_configuration.sh

when: "'staging' in group_names“

- name: Apply prod configuration

command: ./apply_prod_configuration.sh

when: "'prod' in group_names“

In this playbook, different tasks will be executed based on which group the current host is in, allowing for different configurations
to be applied to dev, staging, and prod environments.

Ansible Interview Questions – Variables - facts


Suppose you're in a situation where Ansible is not able to connect to
a host and you're unsure whether the problem is in the SSH
connection or Ansible itself. How could you leverage Ansible facts to
debug this situation?
• Answer: To test if Ansible can connect to a host, we can use a
simple Ansible command that uses the setup module to gather
facts. For instance:
ansible -m setup hostname
If this command returns facts about the host, then Ansible can
connect. If not, it may return an error message that can help you
understand why the connection failed, such as an SSH connection
error.

Ansible Interview Questions – Variables - facts


• If you've a playbook which is taking an unexpectedly long time to execute. Suspecting
that the fact gathering could be the issue, what would be your next steps to confirm
this and optimize it?
– Answer: The first step would be to time the fact gathering process by setting gather_facts: yes
and then running the playbook. Next, disable fact gathering by setting gather_facts: no and
rerun the playbook. If there is a significant difference in execution time, then the problem likely
lies with fact gathering. Optimization can be done by gathering only necessary facts, disabling
fact gathering, or caching facts to avoid gathering facts for each playbook run.

• How would you manage situations where you have to execute tasks on remote
machines that have different operating systems? How can Ansible facts assist you in
managing such scenarios?
– Answer: Ansible facts include the operating system information of the remote machine in
ansible_os_family. You can use this information to write conditional tasks which execute based
on the operating system. For instance, the package installation task for Debian and Red Hat
systems would look different, and using the ansible_os_family fact, you can manage which task
should be executed.

Ansible Interview Questions – Variables - facts


• If you've a fleet of servers and you want to check the disk space
usage across all servers using Ansible facts, how would you go about
it?
– Answer: Ansible facts include facts about the filesystem in the
ansible_mounts variable. You can write a playbook that uses this variable to
check and report the disk space usage on each server.

• How could you leverage Ansible facts for managing versions of


software installed on servers?
– Answer: You can use the ansible_pkg_mgr fact to determine the package
manager and then use the appropriate Ansible module (like yum, apt, etc.)
to check or ensure that the desired version of the software is installed.

Ansible Interview Questions – Variables - facts


• How would you configure the Ansible setup module to gather only a specific
subset of system facts?
– Answer:You can use the gather_subset configuration in your playbook to specify what
subsets of system facts you want to gather.

• If you need to regularly access some specific facts about a host and you don't
want to gather these facts every time you run a playbook, what can you do?
– Answer: You can use fact caching in Ansible to store the facts and reuse them across
multiple playbook runs. The cache can be stored in various formats such as JSON, YAML,
Redis etc.

• How can you create a report showing all available facts for each machine in
your inventory?
– Answer: You can create a playbook that uses the setup module without any parameters,
then uses the debug module to print all facts, and finally use the log_plays callback plugin
to log all the output to a file.

Ansible Interview Questions – Variables - facts


• Could you explain the use of the register keyword in Ansible and
provide an example of how it could be used in a playbook?
– Answer: In Ansible, the register keyword is used to capture the output of a
task. The output can then be used in subsequent tasks.

Ansible Interview Questions – Variables – Register/Magic


Variables
---

- hosts: all

tasks:

- name: Check if a file exists

stat:

path: /path/to/file

register: file_status

- name: Print a message if the file exists

debug:

msg: "File exists"

when: file_status.stat.exists

In this playbook, the stat module checks if a file exists and the result is stored in the file_status variable using the register keyword.
The next task then uses the file_status variable to print a message only if the file exists.

Ansible Interview Questions – Variables – Register/Magic


Variables
• How would you use a registered variable from one task as a
conditional for another task? Could you provide an example?
– Answer: A registered variable from one task can be used as a condition for
the execution of another task using the when keyword.

Ansible Interview Questions – Variables – Register/Magic


Variables
---
- hosts: all
tasks:
- name: Check for a file
stat:
path: /path/to/file
register: result

- name: Do something if the file exists


command: echo "The file exists"
when: result.stat.exists
In this playbook, the stat module is used to check if a file exists, and the result is registered to the result
variable. The next task is only executed if the result.stat.exists condition is true, which indicates that the file
exists.

Ansible Interview Questions – Variables – Register/Magic


Variables
• Explain what a Magic Variable in Ansible is and provide an example of its usage.
– Answer: Magic Variables in Ansible are special variables that contain information about the
current state of execution, like the current host's name or the current host's group names.
---
- hosts: all
tasks:
- name: Print current host's name
debug:
msg: "This playbook is running on {{ ansible_hostname }}“
In this playbook, ansible_hostname is a magic variable that holds the current
host's name.

Ansible Interview Questions – Variables – Register/Magic


Variables
• You've been asked to write a playbook that must behave differently
based on the host's environment (prod, dev, staging). How would
you use Ansible's magic variables to determine the host's
environment and run conditionally based on it?
– Answer: You can use the group_names magic variable, which is a list of all
the groups the current host is a part of. This can be used to conditionally run
tasks based on the environment.

Ansible Interview Questions – Variables – Register/Magic


Variables
---
- hosts: all
tasks:
- name: Execute only on prod servers
command: /usr/bin/prod_script
when: "'prod' in group_names"

- name: Execute only on dev servers


command: /usr/bin/dev_script
when: "'dev' in group_names“
In this playbook, the first task will only run on hosts that are part of the 'prod' group, and the
second task will only run on hosts that are part of the 'dev' group.

Ansible Interview Questions – Variables – Register/Magic


Variables
• How would you use a registered variable to handle error or failure
during the execution of a playbook?
– Answer: You can register the output of a task and use it to check whether a
task has failed or not. You can then use the ignore_errors option to allow the
playbook to continue running even if a task fails, and use a conditional task
to handle the error.

Ansible Interview Questions – Variables – Register/Magic


Variables
---

- hosts: all

tasks:

- name: Attempt to do something

command: /usr/bin/some_command

register: result

ignore_errors: true

- name: Handle failure

debug:

msg: "The previous task failed"

when: result is failed

In this playbook, if the command /usr/bin/some_command fails, the output is captured in the result variable. The ignore_errors
option prevents the playbook from stopping. The second task then checks whether result indicates a failure, and if so, it prints a
message.

Ansible Interview Questions – Variables – Register/Magic


Variables
• Q: Suppose you have a web application running on a server, and
you've written an Ansible playbook to update the application. How
would you use a handler to restart the service only when the
application has been updated?
– A: In the Ansible task that updates the application, you'd use the "notify"
directive to call a handler. The handler would contain the command to
restart the service. If the task sees changes (i.e., the application is updated),
it notifies the handler, and the service gets restarted. Otherwise, if no
change is detected, the handler does not run, and the service doesn't
restart.

Ansible Interview Questions – Handlers


tasks:
- name: Update nginx configuration
copy:
src: /path/to/local/nginx.conf
dest: /etc/nginx/nginx.conf
notify: Restart nginx

handlers:
- name: Restart nginx
service:
name: nginx
state: restarted

Ansible Interview Questions – Handlers


• Q: Can you provide an example where you would use the 'listen'
directive in a handler?
– A: One case for using the 'listen' directive is when multiple tasks may cause
the same change, necessitating the same handler action. For instance,
suppose there are two tasks in a playbook that can both modify a web
server's configuration file. Both tasks could "notify" a common term that the
handler "listens" for, leading the handler to restart the service.

Ansible Interview Questions – Handlers


tasks:

- name: task one

command: echo "This is task one"

notify: "Restart Services"

- name: task two

command: echo "This is task two"

notify: "Restart Services"

handlers:

- name: Restart nginx

service:

name: nginx

state: restarted

listen: "Restart Services"

- name: Restart apache

service:

name: apache2

state: restarted

listen: "Restart Services"


• Q: How can you ensure that your handler runs immediately after its
corresponding task has run?
– A: You can use the 'flush_handlers' task immediately after the task that
notifies the handler. This will force all handlers that have been notified up to
this point to run immediately, instead of waiting till the end of the playbook
execution.

Ansible Interview Questions – Handlers


tasks:

- name: Update nginx configuration

copy:

src: /path/to/local/nginx.conf

dest: /etc/nginx/nginx.conf

notify: Restart nginx

- name: Run handlers if notified

meta: flush_handlers

handlers:

- name: Restart nginx

service:

name: nginx

state: restarted
• Q: Imagine a scenario where you have several related tasks and
each one should notify a specific handler. What is the best way to
organize these handlers in your playbook?
– A: Handlers can be kept together at the end of the playbook for better
organization, or they can be placed in a separate file and included when
needed. The latter approach can be beneficial when these handlers are used
across multiple playbooks.

Ansible Interview Questions – Handlers


You can place handlers in a separate file and include them when needed:

tasks:

- name: Update nginx configuration

copy:

src: /path/to/local/nginx.conf

dest: /etc/nginx/nginx.conf

notify: Restart nginx

- name: Update apache configuration

copy:

src: /path/to/local/apache2.conf

dest: /etc/apache2/apache2.conf

notify: Restart apache

- include: handlers.yml

Ansible Interview Questions – Handlers


• And in the handlers.yml file:
handlers:
- name: Restart nginx
service:
name: nginx
state: restarted

- name: Restart apache


service:
name: apache2
state: restarted
• Q: Can you provide an example of a scenario where you would use the
'failed_when' condition in a handler?
– A: A scenario might involve a handler that tries to restart a service. You could use the
'failed_when' condition to catch errors during restart. For example, if the service doesn't
restart within a certain time frame, you can use 'failed_when' to mark the task as failed.
• Example:
handlers:
- name: Restart nginx
command: systemctl restart nginx
register: result
failed_when: "'Failed' in result.stdout"

Ansible Interview Questions – Handlers


• Q: What approach would you take to debug handlers in Ansible,
especially considering that they run at the end of playbook
execution?
– A: Debugging handlers can be done through verbose output ('-vvv') and
strategically placing 'debug' tasks in the playbook. Alternatively, you can
manually force a handler to run using the 'meta: flush_handlers' task at
specific points in your playbook.

Ansible Interview Questions – Handlers


tasks:

- name: Update nginx configuration

copy:

src: /path/to/local/nginx.conf

dest: /etc/nginx/nginx.conf

notify: Restart nginx

- debug:

msg: "Nginx configuration updated, should restart"

handlers:

- name: Restart nginx

service:

name: nginx

state: restarted

Ansible Interview Questions – Handlers


• Q: Imagine a scenario where the same handler needs to be called
with different data in different tasks. How would you handle this in
Ansible?
– A: In Ansible, if a handler is notified multiple times in the same play, it will
run only once, after all tasks are completed, and will use the values from the
last notification. If different data is needed, it might be better to use
separate handlers or to refactor the tasks and handlers so that they can
operate with the same data.

Ansible Interview Questions – Handlers


tasks:

- name: Update nginx configuration

copy:

src: /path/to/local/nginx.conf

dest: /etc/nginx/nginx.conf

notify:

- "restart nginx"

- name: Update apache configuration

copy:

src: /path/to/local/apache2.conf

dest: /etc/apache2/apache2.conf

notify:

- "restart apache"

handlers:

- name: "restart nginx"

service:

name: nginx

state: restarted

- name: "restart apache"

service:

name: apache2

state: restarted
• Q: How would you test handlers in a large and complex playbook?
– A: To test handlers, you could run the playbook in check mode with the '--
check' flag to see if the tasks report changes as expected. Additionally, you
could use a testing framework like Molecule, which allows for more complex
testing scenarios, including assertions about the desired state after the
playbook runs.
• ansible-playbook playbook.yml --check

Ansible Interview Questions – Handlers


• Q: If you have a handler that is used in multiple playbooks, how
would you structure your Ansible projects to avoid duplicating the
handler code?
– A: To avoid duplicating handler code, you could put the handler in a separate
file and include it in the playbooks that use it, using 'import_tasks' or
'include_tasks'.

Ansible Interview Questions – Handlers


• In handlers.yml:
handlers:
- name: Restart nginx
service:
name: nginx
state: restarted
Then, in the playbooks that need it:
tasks:
- name: Update nginx configuration
copy:
src: /path/to/local/nginx.conf
dest: /etc/nginx/nginx.conf
notify: Restart nginx
- import_tasks: handlers.yml

Ansible Interview Questions – Handlers


• Q: What is the purpose of the 'register' directive in Ansible?
• A: The 'register' directive in Ansible is used to capture the output of a task. The
registered value can be a simple string message, JSON, or other data returned
by a task. It can be used in subsequent tasks for conditional execution, error
handling, or debugging.
• Example:

- shell: /usr/bin/make_database.sh
register: result

- debug:
var: result

Ansible Interview Questions - Conditionals


• Q: How can you use the 'when' clause in an Ansible task?
• A: The 'when' clause in Ansible is used to conditionally run a task.
It's generally used in conjunction with registered variables or facts
about the system.
Example:
- shell: echo "This is a Unix system"
register: result
when: ansible_os_family == "Debian"

Ansible Interview Questions - Conditionals


• Q: Can you use 'when' with a loop (with_items)? Provide an
example.
• A: Yes, 'when' can be used with 'loop' (or the older 'with_items') to
conditionally run a task on each item in a list.

Ansible Interview Questions - Conditionals


vars:

users:

- name: john

active: true

- name: jane

active: false

tasks:

- name: Create active users

user:

name: "{{ item.name }}"

state: present

loop: "{{ users }}"

when: item.active

Ansible Interview Questions - Conditionals


• Q: How can you handle errors or failures in an Ansible task?
• A: You can handle errors using the 'ignore_errors' directive or the 'failed_when'
condition. 'ignore_errors' will allow the playbook to continue even if a task fails,
while 'failed_when' allows you to define what constitutes failure.
• Example:
- shell: /usr/bin/some_command.sh
ignore_errors: true

- shell: /usr/bin/some_other_command.sh
register: result
failed_when: "'FAILED' in result.stdout"

Ansible Interview Questions - Conditionals


• Q: How can you use handlers with 'register' and 'when' to
conditionally restart a service?
• A: Handlers, together with 'register' and 'when', can be used to
conditionally restart a service when a configuration change has
occurred.

Ansible Interview Questions - Conditionals


- name: Update nginx configuration
copy:
src: /path/to/local/nginx.conf
dest: /etc/nginx/nginx.conf
register: nginx_config

- name: Restart nginx


service:
name: nginx
state: restarted
when: nginx_config.changed

Ansible Interview Questions - Conditionals


• Q: Imagine you need to update multiple configuration files and
restart the corresponding services only if the files were changed.
How would you achieve this?
• A: You can use a loop with a list of services and configuration files,
and use a handler to restart the services when changes are
detected.

Ansible Interview Questions - Conditionals


vars:
services:
- { name: 'nginx', src: '/path/to/nginx.conf', dest: '/etc/nginx/nginx.conf' }
- { name: 'apache2', src: '/path/to/apache2.conf', dest: '/etc/apache2/apache2.conf' }

tasks:
- name: Update configuration files
copy:
src: "{{ item.src }}"
dest: "{{ item.dest }}"
loop: "{{ services }}"
notify: "restart {{ item.name }}"

Ansible Interview Questions - Conditionals


handlers:
- name: "restart nginx"
service:
name: nginx
state: restarted

- name: "restart apache2"


service:
name: apache2
state: restarted

Ansible Interview Questions - Conditionals


• The term find and ansible.builtin.find refer to the same Ansible module.
The difference in naming is due to the namespacing introduced in more
recent versions of Ansible.
– find is the shorthand and traditional way of referring to the find module.
– ansible.builtin.find refers to the same find module, but it's the fully qualified
collection name (FQCN) introduced in Ansible 2.10.

– Using ansible.builtin.find helps ensure you're calling the specific find module that
comes built into Ansible, rather than a module with the same name that might exist
in another collection. However, in many contexts, especially when not using third-
party collections, find and ansible.builtin.find will function the same way.

FQCN
• Q: How would you use the find module to search for all '.txt' files in
a specific directory?
• A: You can specify the directory to search in using the paths
parameter, and use the patterns parameter to specify a pattern to
match file names against.

Ansible Interview Questions – Managing Files


- name: Find all .txt files
ansible.builtin.find:
paths: /path/to/directory
patterns: '*.txt'
register: result

- debug:
var: result

Ansible Interview Questions


• Q: How would you find and delete all '.log' files that are older than a
week?
• A: You can use the age parameter to specify the age of files to
match, and then use the file module with state: absent to delete the
files.

Ansible Interview Questions


- name: Find all .log files older than a week
ansible.builtin.find:
paths: /var/log
patterns: '*.log'
age: '1w'
register: old_logs

- name: Delete old logs


file:
path: "{{ item.path }}"
state: absent
loop: "{{ old_logs.files }}"

Ansible Interview Questions


• Q: How would you find all empty directories within a specific directory?
• A: You can use the size parameter to find files (or directories) of a specific size.
- name: Find all empty directories
ansible.builtin.find:
paths: /path/to/directory
file_type: directory
size: 0
register: empty_dirs

- debug:
var: empty_dirs

Ansible Interview Questions


• Q: How would you use the find module to find files that were modified within the last hour?
• A: The age parameter can be used in conjunction with age_stamp set to mtime.

- name: Find recently modified files


ansible.builtin.find:
paths: /path/to/directory
age: '1h'
age_stamp: mtime
register: recent_files

- debug:
var: recent_files

Ansible Interview Questions


• Q: How would you find all files within a directory tree, ignoring specific directories?
• A: You can use the excludes parameter to specify directories to ignore.

- name: Find files, ignoring certain directories


ansible.builtin.find:
paths: /path/to/directory
excludes: /path/to/directory/excluded_dir
register: result

- debug:
var: result

Ansible Interview Questions


• Q: How would you use the find module to get the total size of all
'.mp3' files within a directory?
• A: You can use the patterns parameter to match '.mp3' files, and
then loop over the result to calculate the total size.

Ansible Interview Questions


- name: Find all .mp3 files
ansible.builtin.find:
paths: /path/to/directory
patterns: '*.mp3'
register: mp3_files

- name: Calculate total size


set_fact:
total_size: "{{ mp3_files.files | map(attribute='size') | list | sum }}"

- debug:
var: total_size

Ansible Interview Questions


• Q: How would you recursively search for a specific file within a directory tree?
• A: The recurse parameter can be used to enable recursive search.

- name: Find file recursively


ansible.builtin.find:
paths: /path/to/directory
patterns: specific_file
recurse: yes
register: found_files

- debug:
var: found_files

Ansible Interview Questions


• Q: How would you use the find module to find symbolic links within a directory?
• A: You can use the file_type parameter set to link to find symbolic links.

- name: Find symbolic links


ansible.builtin.find:
paths: /path/to/directory
file_type: link
register: found_links

- debug:
var: found_links

Ansible Interview Questions


• Q: How do you create a Jinja2 template to manage an Apache configuration file?

• A: An Apache configuration file can be managed by creating a Jinja2 template file with placeholders for the variables. For
instance, the template file might look like this:

ServerName {{ apache_server_name }}

DocumentRoot {{ apache_document_root }}

ErrorLog ${APACHE_LOG_DIR}/error.log

CustomLog ${APACHE_LOG_DIR}/access.log combined

Then, the Ansible playbook can look like this:

- name: Configure Apache

template:

src: /path/to/template.j2

dest: /etc/apache2/sites-available/000-default.conf

vars:

apache_server_name: www.example.com

apache_document_root: /var/www/html

Ansible Interview Questions - Templates


• Q: How would you use a loop in a Jinja2 template?
• A: Loops in Jinja2 can be used with the {% for %} syntax. Here's an example of a
template for an Apache virtual host configuration file that creates a virtual host for
each item in a list:
{% for vhost in apache_vhosts %}
<VirtualHost *:80>
ServerName {{ vhost.server_name }}
DocumentRoot {{ vhost.document_root }}
</VirtualHost>
{% endfor %}

Ansible Interview Questions - Templates


• Then, the Ansible playbook can define the list of virtual hosts:
- name: Configure Apache virtual hosts
template:
src: /path/to/template.j2
dest: /etc/apache2/sites-available/000-default.conf
vars:
apache_vhosts:
- server_name: www.example.com
document_root: /var/www/html
- server_name: www.example.org
document_root: /var/www/otherhtml

Ansible Interview Questions - Templates


• Q: How do you handle conditional statements in a Jinja2 template?
• A: Jinja2 supports conditional statements with the {% if %} syntax. For example:

ServerName {{ apache_server_name }}
DocumentRoot {{ apache_document_root }}
{% if apache_log_dir is defined %}
ErrorLog {{ apache_log_dir }}/error.log
CustomLog {{ apache_log_dir }}/access.log combined
{% endif %}

Ansible Interview Questions - Templates


• Q: How can you define a variable within a Jinja2 template?
• A: Jinja2 allows setting variables within the template using the {%
set %} tag. For example:

{% set server_name = 'www.example.com' %}


ServerName {{ server_name }}

Ansible Interview Questions - Templates


• Q: How can you use a Jinja2 template to generate a JSON file?

• A: You can use the to_json filter in a Jinja2 template to generate a JSON file. For example

{{ my_variable | to_json }}

Playbook:

- name: Generate JSON file

template:

src: /path/to/template.j2

dest: /path/to/file.json

vars:

my_variable:

key1: value1

key2: value2

This will generate a JSON file with the content {"key1": "value1", "key2": "value2"}.

Ansible Interview Questions - Templates


• Q: What is an Ansible role and when would you use one?
• A: An Ansible role is a way of organizing tasks and related files into a
cohesive unit that can be reused and shared. Roles are used when
you have tasks that are used frequently and need to be run in
different playbooks, across different hosts, or both.

Ansible Interview Questions – Roles and Collections


• Q: How do you create a role using the ansible-galaxy command?
• A: You can use the ansible-galaxy command with the init option to
create a new role. Here's an example:
ansible-galaxy init server_setup
This creates a directory named server_setup which contains
subdirectories for tasks, handlers, files, templates, etc., all according
to the standard Ansible role directory structure.

Ansible Interview Questions – Roles and Collections


• Q: Can you provide an example of a simple Ansible role?

• A: Sure, here's a simple Ansible role that installs and starts Apache on a server:

• The tasks/main.yml file for the role could look like this:

---

- name: Install Apache

apt:

name: apache2

state: present

become: yes

- name: Start Apache

service:

name: apache2

state: started

become: yes

Ansible Interview Questions – Roles and Collections


• This role can be included in a playbook like this:
---
- hosts: webservers
roles:
- server_setup

Ansible Interview Questions – Roles and Collections


• Q: What is an Ansible collection and how does it differ from a role?
A: An Ansible collection is a packaging format for distributing
Ansible content. It can include playbooks, roles, modules, and
plugins. While a role is a way to bundle automation tasks, a
collection provides a way to bundle roles and other Ansible content
together. It allows for easier sharing and distribution of this content.

Ansible Interview Questions – Roles and Collections


• Q: How do you install a collection from Ansible Galaxy?
• A: You can use the ansible-galaxy collection install command to
install a collection from Ansible Galaxy. For instance, to install the
community.general collection, you would run:

ansible-galaxy collection install community.general

Ansible Interview Questions – Roles and Collections


• Q: How do you create an Ansible collection?
• A: You can use the ansible-galaxy collection init command to create
a new collection. This command takes the name of the collection in
the format namespace.collection. Here's an example:
ansible-galaxy collection init mynamespace.mycollection

This will create a new directory mynamespace/mycollection with the


structure for the collection.

Ansible Interview Questions – Roles and Collections


• Q: How do you distribute a collection?
• A: You can distribute a collection by building it with the ansible-
galaxy collection build command and then publishing it to Ansible
Galaxy with the ansible-galaxy collection publish command. Here's
an example:
ansible-galaxy collection build mynamespace/mycollection
ansible-galaxy collection publish mynamespace-mycollection-
1.0.0.tar.gz

Ansible Interview Questions – Roles and Collections


• Q: What is the standard directory structure of an Ansible role?
• A: The standard directory structure of an Ansible role includes
directories for tasks, handlers, files, templates, variables, defaults,
and meta information:

Ansible Interview Questions – Roles and Collections


rolename/

├── defaults

│ └── main.yml

├── files

├── handlers

│ └── main.yml

├── meta

│ └── main.yml

├── README.md

├── tasks

│ └── main.yml

├── templates

├── tests

│ ├── inventory

│ └── test.yml

└── vars

└── main.yml
• Q: How does variable precedence work in Ansible roles?
• A: Variable precedence in Ansible is complex and depends on several factors, including where the variable
is defined and whether it's a fact or a regular variable. Within the context of roles, variables defined in the
playbook that uses the role will override the default variables defined in the role. Variables defined in the
vars directory of a role have a higher precedence than those defined in the defaults directory.

• Q: How would you use a role in multiple environments (like staging and production) with different variable
values?
• A: You can define different variable values for different environments by using group variables or by
including an additional variables file in your playbook. For instance, you might have a
group_vars/staging.yml file with variable values for the staging environment, and a
group_vars/production.yml file for the production environment. In your playbook, you would use the role
like you normally would, and Ansible would use the correct variable values based on the inventory used.

Ansible Interview Questions – Roles and Collections


• Q: How can you share a role with others?
• A: You can share an Ansible role with others by publishing it to Ansible Galaxy. You
would first need to create a galaxy_info section in your role's meta/main.yml file with
information about your role, then use the ansible-galaxy command to create a .tar.gz
archive of your role, and finally, upload it to Ansible Galaxy.

• Q: How can you handle sensitive data in a role?


• A: Sensitive data like passwords and API keys can be handled in Ansible roles by using
Ansible Vault. You can encrypt files or variables with Vault and then use them in your
role like you would use normal variables or files. The data will be decrypted on the fly
when the playbook is run, provided the correct Vault password is supplied.

Ansible Interview Questions – Roles and Collections


• Q: How can you pass variables to a role?
• A: You can pass variables to a role by defining them in the vars section when the role is
used in a playbook. For instance:

- hosts: webservers
roles:
- role: server_setup
vars:
apache_server_name: www.example.com
apache_document_root: /var/www/html

Ansible Interview Questions – Roles and Collections


• Step-by-step guide on creating an Ansible collection from scratch:
• Step 1: Create a Collection Directory Structure Create a directory with the name of
your collection. Inside the collection directory, create the necessary subdirectories and
files. The minimal required directory structure for an Ansible collection is as
follows:mycollection/
├── README.md
├── galaxy.yml
├── plugins
│ └── modules
└── roles
└── myrole
├── tasks
└── README.md

Creating an Ansible collection from scratch


• Step 2: Define Collection Metadata

• In the root directory of your collection, create a galaxy.yml file. This file defines metadata about your collection, such as its
name, version, and dependencies. Here's an example of a galaxy.yml file:

collection_info:

name: mynamespace.mycollection

version: 1.0.0

description: My custom Ansible collection

license: MIT

authors:

- name: Your Name

email: your@email.com

dependencies:

- role: mynamespace.myrole

version: ">=1.0.0"

Creating an Ansible collection from scratch


• Step 3: Create Roles (Optional)
• If your collection includes roles, create them in the roles directory. Each role should have its
own directory with the role name and contain the necessary tasks, files, templates, etc. For
example, you can create a role named myrole with the following structure:

mycollection/
└── roles
└── myrole
├── tasks
│ └── main.yml
└── README.md

Creating an Ansible collection from scratch


• Step 4: Create Custom Modules or Plugins (Optional)
• If your collection includes custom modules or plugins, create them in the
plugins directory. Modules should be placed in the plugins/modules
subdirectory. Create any necessary files or scripts for your custom
modules or plugins.

• Step 5: Create Documentation (Optional)


• Create a README.md file in the root directory of your collection to
provide documentation and usage instructions for your collection. You
can also create additional README files in subdirectories if needed.

Creating an Ansible collection from scratch


• Step 6: Build the Collection
• To package your collection for distribution, use the ansible-galaxy
command-line tool. Open a terminal, navigate to the root directory
of your collection, and run the following command:
ansible-galaxy collection build
This will create a .tar.gz file containing your collection, named
mynamespace-mycollection-1.0.0.tar.gz (based on the metadata
specified in galaxy.yml).

Creating an Ansible collection from scratch


• Step 7: Publish the Collection (Optional)
• If you want to distribute your collection through Ansible Galaxy, you can
publish it. Before publishing, ensure you have an account on Ansible
Galaxy and have the ansible-galaxy CLI tool installed. Run the following
command in the directory where your collection .tar.gz file is located:
ansible-galaxy collection publish mynamespace-mycollection-1.0.0.tar.gz
• This will publish your collection to Ansible Galaxy for others to discover
and use.

Creating an Ansible collection from scratch


• Q: How can you optimize Ansible playbook execution by running tasks in parallel?
• A: Ansible provides a mechanism called serial that allows you to control the number of hosts that are
acted upon at once. By setting serial to a value greater than 1, you can run tasks in parallel. Here's an
example:
---
- hosts: mygroup
serial: 3
tasks:
- name: Task 1
# Task 1 details

- name: Task 2
# Task 2 details
# Other tasks
In this example, Ansible will execute tasks on up to 3 hosts at a time.

Ansible Interview Questions


• Q: How can you optimize Ansible playbooks by reducing the number of
SSH connections?
• A: By default, Ansible opens a new SSH connection for each task, which
can be inefficient. You can enable pipelining and control the SSH
connection reuse behavior in the ansible.cfg file. Add the following lines
to the file:

[ssh_connection]
pipelining = True
control_path = %(directory)s/ansible-%%h-%%p-%%r

This enables pipelining and uses a control path to reuse SSH connections.

Ansible Interview Questions


• Q: How can you optimize Ansible playbooks by using dynamic
inventory sources?
• A: Ansible supports dynamic inventory sources that can retrieve
inventory information from external systems such as cloud providers
or configuration management databases. By using dynamic
inventory, you can automatically update inventory information and
eliminate the need for manual maintenance. An example of using
the AWS dynamic inventory plugin:

Ansible Interview Questions


---
- hosts: tag_Name_myinstance
gather_facts: false
tasks:
- name: Install package
# Task details

In this example, the tag_Name_myinstance is an AWS EC2 instance tag used to


dynamically retrieve the inventory.

Ansible Interview Questions


• Q: How can you optimize Ansible playbooks by leveraging built-in
modules for idempotence?
• A: Ansible modules are designed to be idempotent, meaning that
running a task multiple times produces the same result as running it
once. Utilizing idempotent modules reduces unnecessary changes
and can improve playbook performance. For example, the yum
module in Red Hat-based systems automatically handles package
state checks and updates only if necessary:

Ansible Interview Questions


---
- hosts: webservers
tasks:
- name: Install Apache
yum:
name: httpd
state: present

In this example, the yum module ensures that the httpd package is present on the target hosts
without unnecessary reinstallation.

Ansible Interview Questions


• Q: How can you optimize Ansible playbooks by using task when
conditionals?
• A: Task when conditionals allow you to control when a task should
be executed based on certain conditions. This optimization
technique helps skip unnecessary tasks and reduces playbook
execution time. Here's an example:

Ansible Interview Questions


---
- hosts: webservers
tasks:
- name: Install Apache on CentOS
yum:
name: httpd
state: present
when: ansible_facts['ansible_distribution'] == 'CentOS’

In this example, the task to install Apache will only be executed if the target host is running CentOS.

Ansible Interview Questions


• Q: How can you optimize Ansible playbooks by using block and
rescue keywords for error handling?
• A: The block and rescue keywords allow you to group multiple tasks
together and handle errors gracefully. By using error handling, you
can prevent playbook failures due to transient issues and continue
with subsequent tasks. Here's an example:

Ansible Interview Questions


---

- hosts: webservers

tasks:

- name: Set file permissions

block:

- name: Change ownership

file:

path: /path/to/file

owner: myuser

group: mygroup

- name: Set permissions

file:

path: /path/to/file

mode: 0644

rescue:

- name: Handle error

debug:

msg: "Failed to set file permissions“

In this example, if any task within the block fails, Ansible will execute the tasks within the rescue section, allowing for error handling.
• Q: How can you use filters to manipulate strings in Ansible?

• A: Ansible provides a wide range of filters that can be used to manipulate strings. For example, the regex_replace filter allows
you to replace parts of a string using regular expressions. Here's an example:

---

- hosts: localhost

tasks:

- name: Manipulate string using filters

debug:

msg: "{{ my_variable | regex_replace('^abc', 'xyz') }}"

vars:

my_variable: "abcdef"

In this example, the output will be xyzdef as the regex_replace filter replaces the string abc with xyz.

Ansible Interview Questions


• Q: How can you use filters to transform data structures in Ansible?

• A: Ansible filters can be used to transform data structures such as lists and dictionaries. The map filter allows you to apply a
filter to each element of a list. Here's an example:

---

- hosts: localhost

tasks:

- name: Transform list using filters

debug:

msg: "{{ my_list | map('upper') | list }}"

vars:

my_list:

- item1

- item2

In this example, the map('upper') filter is applied to each element of my_list, converting them to uppercase. The output will be
["ITEM1", "ITEM2"].

Ansible Interview Questions


• Q: How can you use filters to manipulate dates and times in Ansible?
• A: Ansible provides filters to work with dates and times. The strftime filter allows you
to format a date or time value. Here's an example:
---
- hosts: localhost
tasks:
- name: Manipulate date using filters
debug:
msg: "{{ ansible_date_time.iso8601 | to_datetime | strftime('%Y-%m-%d') }}“
In this example, the iso8601 fact is passed through filters to convert it to a datetime
object, and then the strftime filter is used to format it to YYYY-MM-DD format.

Ansible Interview Questions


• Q: How can you use filters to perform calculations in Ansible?
• A: Ansible provides filters to perform mathematical calculations. The math filter allows
you to perform basic arithmetic operations. Here's an example:
---
- hosts: localhost
tasks:
- name: Perform calculations using filters
debug:
msg: "{{ 5 | math('pow', 2) }}"
In this example, the math('pow', 2) filter calculates 5 raised to the power of 2, resulting in
25.

Ansible Interview Questions


• Q: How can you use filters to sort and filter lists in Ansible?

• A: Filters can be used to sort and filter lists in Ansible. The sort filter can sort a list in ascending or descending order. Here's an example:

---

- hosts: localhost

tasks:

- name: Sort list using filters

debug:

msg: "{{ my_list | sort }}"

vars:

my_list:

-c

-a

-b

In this example, the output will be ['a', 'b', 'c'] as the sort filter sorts the list in ascending order.

Ansible Interview Questions


• Q: How can you create a custom module plugin in Ansible?
– A: To create a custom module plugin, you need to create a Python module that follows Ansible's module
development guidelines. Here's an example:
– Create a directory named library within your playbook or role directory.
– Inside the library directory, create a Python module file with a .py extension, e.g., my_module.py.
– Implement the necessary functions and logic within the Python module file, following Ansible's module
development guidelines.
– The custom module can then be used in playbooks by referencing its name. Ansible will automatically
discover and use the custom module based on the module name specified in the playbook.

• Q: How can you create a custom inventory plugin in Ansible?


– A: Custom inventory plugins allow you to fetch inventory information from external sources. To create a
custom inventory plugin, follow these steps:
– Create a directory named inventory_plugins within your playbook or role directory.
– Inside the inventory_plugins directory, create a Python module file with a .py extension, e.g.,
my_inventory.py.
– Implement the necessary functions and logic within the Python module file, following Ansible's inventory
plugin development guidelines.
– Once the custom inventory plugin is created, you can specify it in your ansible.cfg file or provide it as a
command-line argument to Ansible.

Ansible Interview Questions


• Q: How can you create a custom callback plugin in Ansible?
– A: Custom callback plugins allow you to customize Ansible's output and
behavior. To create a custom callback plugin, follow these steps:
– Create a directory named callback_plugins within your playbook or role
directory.
– Inside the callback_plugins directory, create a Python module file with a .py
extension, e.g., my_callback.py.
– Implement the necessary callback functions within the Python module file,
following Ansible's callback plugin development guidelines.
– Once the custom callback plugin is created, you can enable it in your
ansible.cfg file or provide it as a command-line argument to Ansible.

Ansible Interview Questions


• Q: How can you configure Ansible Tower to integrate with version control systems like Git?
– A: Ansible Tower supports integration with version control systems like Git to fetch playbooks and other files directly from
the repository. To configure Git integration in Ansible Tower, follow these steps:
– Navigate to the Ansible Tower web interface and go to the "Projects" section.
– Click on "Add" to create a new project.
– Provide the necessary details, including the Git repository URL, credentials, and branch to use.
– Save the project, and Ansible Tower will automatically sync with the Git repository and fetch the playbooks.
– This enables teams to version control their playbooks and leverage the benefits of collaboration and change tracking.

• Q: How can you configure Ansible Tower to use dynamic inventories?


– A: Dynamic inventories allow Ansible Tower to fetch inventory information from external sources, such as cloud providers
or configuration management databases. To configure dynamic inventories in Ansible Tower, follow these steps:
– Navigate to the Ansible Tower web interface and go to the "Inventories" section.
– Click on the inventory you want to configure.
– Under the "Sources" tab, select "Manage Sources" and configure the source type (e.g., AWS, Azure, GCP, etc.).
– Provide the necessary details, such as credentials and filters, to fetch the dynamic inventory.
– Save the configuration, and Ansible Tower will fetch the inventory information from the configured source.
– This enables dynamic and automated management of inventory information, reducing the need for manual updates.

Ansible Interview Questions


• Q: How can you schedule jobs in Ansible Tower to run at specific times or
intervals?
– A: Ansible Tower provides the ability to schedule jobs, allowing you to automate
playbook execution at specific times or intervals. To schedule jobs in Ansible Tower,
follow these steps:
– Navigate to the Ansible Tower web interface and go to the "Templates" section.
– Select the playbook template you want to schedule.
– Under the "Schedules" tab, click on "Add Schedule" and provide the necessary
details, such as the desired frequency, start time, and days of the week.
– Save the schedule, and Ansible Tower will automatically trigger the job according to
the defined schedule.
– This enables teams to automate routine tasks and perform regular maintenance
operations with ease.

Ansible Interview Questions


• Q: How can you configure Ansible Tower to use role-based access control (RBAC)?
– A: Role-based access control (RBAC) allows you to define fine-grained access control policies within Ansible
Tower. To configure RBAC in Ansible Tower, follow these steps:
– Navigate to the Ansible Tower web interface and go to the "Settings" section.
– Under the "Authentication" tab, configure the authentication method (e.g., LDAP, Active Directory, etc.) to
integrate with your existing identity provider.
– Under the "Organizations" section, create organizations and assign users or teams to those organizations.
– Configure permissions and roles for each organization, defining what actions users or teams can perform.
– This provides a secure and controlled environment for managing Ansible automation, ensuring that users
have appropriate access levels.

• Q: How can you use Ansible Tower's REST API to interact with the system programmatically?
– A: Ansible Tower provides a REST API that allows you to programmatically interact with the system and
perform various operations. To use Ansible Tower's REST API, you can make HTTP requests to the relevant
API endpoints, providing the necessary headers and authentication details. For example, to launch a job
template, you can use a POST request to the /api/v2/job_templates/{id}/launch/ endpoint.
– Additionally, Ansible Tower provides an API documentation endpoint (/api/v2/apidocs/) where you can
explore and test the available API endpoints and their parameters.

Ansible Interview Questions


Docker Interview Questions
1. Docker Architecture

2. Docker Images and Containers

3. Dockerfile

4. Docker Networking

5. Docker Volumes and Data Persistence

6. Docker Compose

7. Docker Registry and Image Repositories

8. Docker Security

9. Docker Multi-Stage Builds

Docker Interview Questions


Question: You have a Docker container running in production that suddenly starts
behaving unusually. How would you debug this container without affecting its service?
– Answer: You can use the docker logs command to view the logs of the running container. If you
need to inspect the running container, docker exec -it [container-id] bash can be used to get an
interactive shell to the running container.

Question: How can you monitor the resource usage of Docker containers?
– Answer: Docker provides a command docker stats which can provide CPU, Memory, Network I/O,
Disk I/O usage statistics for running containers.

Question: How would you handle sensitive data (passwords, API keys, etc.) in Docker?
– Answer: Sensitive data should be managed using Docker Secrets or environment variables.
Secrets are encrypted and only available to containers on a need-to-know basis, thereby
increasing the security.

Docker Interview Questions


• Question: Your Docker container is running out of disk space, how would you increase
it?
– Answer: The Docker container shares the host machine's OS, including its file system. So, to
provide more disk space to Docker containers, you need to free up space on the host machine.

• Question: How would you go about setting up a CI/CD pipeline with Docker?
– Answer: Docker integrates well with most of the CI/CD tools like Jenkins, GitLab CI, Travis CI, etc.
You can create a Docker image of your application, push it to Docker Hub or a private registry as
part of your build stage, and pull and run the image in the deployment stage.

• Question: What if a Docker container is not able to communicate with another service
running on the same host?
– Answer: Docker containers are isolated and they have their own network interface by default.
You need to ensure proper network configuration is done for inter-service communication.
Docker networking options like bridge, host, or overlay networks can be utilized for this.

Docker Interview Questions


• Question: How can you handle persistence in Docker?
– Answer: Docker volumes can be used to persist data generated by and used by Docker
containers. Docker volumes are managed by Docker and a directory is set up within the
Docker host which is then linked to the directory in the container.

• Question: What would you do if Docker starts to consume a lot of CPU?


– Answer: Docker provides ways to limit the CPU usage by setting the CPU shares, CPU sets,
or CPU quota at the time of running the container using docker run.

• Question: How can you share data among Docker containers?


– Answer: Docker volumes can be used to share data between containers. Create a volume
using docker volume create, and then mount it into containers at run time.

Docker Interview Questions


• Question: If you wanted to run multiple services (like an app server and a database server) on a
single host with Docker, how would you manage it?
– Answer: Docker Compose is a great tool for this purpose. It allows you to define and run multi-container
Docker applications. You can create a docker-compose.yml file which defines your services, and then bring
up your entire app with just one command docker-compose up.

• Question: How would you ensure that a group of inter-dependent containers always run on the
same Docker host?
– Answer: Docker Swarm or Kubernetes can be used to orchestrate a group of containers across multiple
hosts. Docker Swarm has a concept of "services" which ensures that the defined set of containers are co-
located on the same host.

• Question: Your team has multiple environments (e.g. dev, staging, production). How would you
manage different configurations for these environments using Docker?
– Answer: Environment-specific configurations can be externalized from Docker images and provided at
runtime using environment variables.

Docker Interview Questions


• Question: How would you automate the deployment of a multi-container
application?
– Answer: Docker Compose or orchestration tools like Docker Swarm or Kubernetes can be
used to automate the deployment of multi-container applications.

• Question: If an application inside a Docker container is behaving erratically, how


can you check its logs?
– Answer: The docker logs [container-id] command can be used to view the logs of a
container.

• Question: What steps will you follow to troubleshoot a Docker container that
has stopped unexpectedly?
– Answer: To troubleshoot, start by checking the logs using docker logs [container-id]. If it's a
crash due to the application inside the container, the logs may contain the trace of it. You
can also use docker inspect [container-id] to view the container's metadata.

Docker Interview Questions


• Question: Your Docker container is running an older version of an application, and you want to
update it to a new version without downtime. How would you achieve this?
– Answer: You can use Docker's built-in rolling update feature if you're using Docker Swarm, or Kubernetes
rolling updates if you're using Kubernetes. This will ensure zero-downtime deployments.

• Question: How would you go about managing a Docker application that needs to scale based on
load?
– Answer: Docker Swarm or Kubernetes can be used to manage such applications. These tools have the
capability to auto-scale the application based on CPU usage or other metrics.

• Question: You are tasked with reducing the size of your Docker images. What are some
strategies you might use?
– Answer: Some strategies could include using alpine based images which are much smaller in size, reducing
the number of layers by minimizing the number of commands in Dockerfile, removing unnecessary tools
from the image, and cleaning up the cache after installing packages.

Docker Interview Questions


• Question: How would you deploy a new version of an image to a running Docker
container?
– Answer: You would need to pull the new image, stop and remove the current container, and then
start a new container with the new image.

• Question: How do you ensure that containers restart automatically if they exit?
– Answer: When running the container, Docker provides a restart policy which can be set to "no",
"on-failure", "unless-stopped", or "always" to determine when to restart the container.

• Question: You have an application that consists of five different services. How would
you deploy it using Docker?
– Answer: Docker Compose or Docker Swarm can be used to manage multi-service applications.
These services would be defined in a docker-compose.yml file or a Docker Stack file.

Docker Interview Questions


• Question: You are running a containerized database, and it seems to be responding slower than
usual. How would you investigate this?
– Answer: You can use docker stats to monitor the resource usage of your Docker container. If the database is
consuming too many resources, you might need to allocate more resources to the container or optimize
your database.

• Question: You've noticed that your Docker image takes a long time to build. How can you speed
up the build process?
– Answer: Use Docker's build cache effectively. If certain steps of your Dockerfile take a long time to execute
and do not change often, make sure they are run before the steps that change frequently. This will ensure
that Docker can cache the results of the slow steps and re-use them in future builds.

• Question: How would you handle a situation where a Docker container fails to start due to a
problem with a Dockerfile instruction?
– Answer: Docker build will give a log of what it is doing. The logs should give you a hint about which
instruction in the Dockerfile caused the failure. Once you've identified the problematic instruction, you can
modify it and retry building the image.

Docker Interview Questions


• Question: What would you do if a Docker image fails to push to a registry?
– Answer: This can happen due to several reasons. You may not be authenticated correctly, or the image may
not exist, or there may be a network problem. First, make sure you are logged in to the registry and the
image name is correct. If the problem persists, check your network connection and the status of the Docker
registry.

• Question: You have to run an application that requires specific kernel parameters to be tuned
on the host machine. How would you handle this while running the application in Docker?
– Answer: Docker supports the --sysctl flag that allows setting namespaced kernel parameters. This can be
used to set specific kernel parameters that the application might need. However, remember that not all
kernel parameters can be set in the Docker container as Docker uses the host kernel and is isolated from the
kernel of the host.

• Question: How would you isolate the network for Docker containers to avoid them being
accessible from outside?
– Answer: Docker provides network isolation features. You can create a user-defined bridge network and run
your containers in this isolated network. This network is isolated from the outside world unless you
specifically map ports from the containers to the host machine.

Docker Interview Questions


• Question: How do you run a Docker container with a specific memory and CPU limit?
– Answer: Docker run command provides flags -m or --memory to set the maximum amount of memory that
the container can use, and --cpus to specify the number of CPUs.

• Question: Your Docker container is stuck and not responding to any commands. How do you
force stop and remove it?
– Answer: You can force stop a Docker container by using the command docker stop -f <container-id>. After
the container is stopped, you can remove it by using the command docker rm <container-id>.

• Question: You have an application which when run in Docker, fails due to permissions issues on
a specific file. How would you debug and solve this issue?
– Answer: Use the docker cp command to copy the file from the container to the host machine and check its
permissions. Depending on the application's requirements, you can then change the file's permissions in the
Dockerfile using the RUN chmod or RUN chown command and rebuild the Docker image.

Docker Interview Questions


• Question: How would you troubleshoot a Docker container that starts but exits
immediately?
– Answer: Use docker logs [container-id] and docker inspect [container-id] to investigate why the
container is exiting. The issue could be with the application inside the container or with the
container's configuration itself.

• Question: You suspect that a memory leak in one of your applications is causing a
container to be killed. How would you confirm this?
– Answer: Use docker stats [container-id] to monitor the memory usage of the container. If the
memory usage is constantly growing over time, there may be a memory leak.

• Question: How would you ensure that your Docker images are free from any
vulnerabilities?
– Answer: You can use Docker Security Scanning or other third-party tools like Clair, Anchore, etc.
to scan your Docker images for any known vulnerabilities.

Docker Interview Questions


• Question: How do you handle rolling updates and rollbacks in a Docker Swarm?
– Answer: Docker service command provides --update-parallelism and --update-delay flags
for rolling updates and --rollback flag for rollback in Docker Swarm.

• Question: How can you connect Docker containers across multiple hosts?
– Answer: Docker Swarm or Kubernetes can be used to create a cluster of hosts and manage
networking between containers across these hosts. For Docker Swarm, an overlay network
can be created to facilitate this.

• Question: How would you troubleshoot a Docker daemon that is not starting?
– Answer: Check the Docker daemon logs, usually located at /var/log/docker.log on Linux. The
logs can provide information about why the daemon is failing to start.

Docker Interview Questions


• Question: What are some methods to secure Docker containers and images?
– Answer: Some methods include using trusted base images, scanning images for vulnerabilities,
using Docker Secrets to handle sensitive data, and minimizing the use of root privileges.

• Question: Your Docker container is crashing at startup and you suspect it's due to a
command in your Dockerfile's ENTRYPOINT instruction. How would you confirm this?
– Answer: Overwrite the ENTRYPOINT when running the container using docker run --entrypoint
and see if the container starts up correctly.

• Question: How would you go about decreasing the startup time of a Docker container?
– Answer: The startup time could be reduced by minimizing the number of instructions in your
Dockerfile that need to be run at container startup. Having your application ready to start
immediately upon container start can also help.

Docker Interview Questions


• Question: How do you share a Docker network between two different Docker Compose
projects?
– Answer: You can create an external network using the docker network create command and then
specify this network under the networks section in both Docker Compose files.

• Question: You want to temporarily override a command in a Docker container for


debugging purposes. How would you do it?
– Answer: You can override the default command by specifying a new one at the end of the docker
run command.

• Question: How would you deal with large Docker logs consuming too much disk space?
– Answer: Docker provides a --log-opt option where you can specify max-size and max-file to limit
log size and number of log files.

Docker Interview Questions


• Question: What steps would you take if a Docker container becomes unresponsive or hangs?
– Answer: First, I would use docker stats to check the resource usage of the container. If necessary, I would
then use docker exec to enter the container and check the processes running in the container. If it's still not
responding, I would check the Docker logs for any error messages.

• Question: How would you go about optimizing Dockerfile for faster build times?
– Answer: Some strategies for optimizing Dockerfile build times include leveraging build cache effectively,
reducing the number of layers by combining instructions, removing unnecessary components, and avoiding
the inclusion of unnecessary files with .dockerignore.

• Question: You suspect a Docker network is causing problems with container connectivity. How
would you diagnose and resolve the issue?
– Answer: Use docker network inspect to check the details of the network. Make sure the subnet and gateway
are correctly configured and there are no IP conflicts. Also, ensure the containers are correctly connected to
the network.

Docker Interview Questions


• Question: You have a legacy application that maintains state in local files. How would
you containerize this application without losing data?
– Answer: Docker volumes can be used to persist data. Create a volume and mount it to the
necessary directory in the container. The data in this directory will be stored in the volume and
will not be lost when the container stops.

• Question: How would you prevent a specific Docker container from consuming too
many resources on the host machine?
– Answer: When running the container, you can specify the amount of CPU and memory the
container is allowed to use with docker run's --cpus and -m options.

• Question: How can you isolate Docker containers in a multi-tenant environment?


– Answer: Docker's built-in isolation features like namespaces, cgroups, and user namespaces can
be used. Additionally, network isolation can be achieved using user-defined bridge networks or
overlay networks in Swarm.

Docker Interview Questions


• Question: How would you go about replicating a Docker environment issue from production in
a local development machine?
– Answer: Use the same Docker images and configuration (networking, volumes, environment variables, etc.)
that are being used in production. Docker's declarative nature makes it easy to recreate environments.

• Question: How would you ensure Docker containers always restart unless they are explicitly
stopped?
– Answer: Use the --restart unless-stopped option with docker run. This will ensure that the Docker container
always restarts unless it has been explicitly stopped by the user.

• Question: How do you troubleshoot a Docker container that is consuming more CPU resources
than expected?
– Answer: You can use docker stats to monitor CPU usage. If an application is consuming more CPU than
expected, it may be due to an infinite loop in the code, excessive thread usage, or some other issue in the
application code itself.

Docker Interview Questions


• Question: How do you perform health checks on Docker containers?
– Answer: Docker provides a HEALTHCHECK instruction in Dockerfile that can be used to perform
health checks. The health check command can be any command that signifies the health of the
container. Docker will execute this command at regular intervals to monitor the health of the
container.

• Question: You are facing an issue where a Docker container is not able to communicate
with another container. How would you diagnose and fix the issue?
– Answer: You can diagnose this issue by checking the networking configuration of the containers.
Use docker network inspect to check if both containers are in the same network and have the
correct IP addresses. Also, make sure the necessary ports are open and listening.

• Question: If a Docker container is terminated, how do you ensure that the data is not
lost?
– Answer: You can use Docker volumes or bind mounts to persist data. Even if the container is
terminated, the data in these volumes or bind mounts will not be lost.

Docker Interview Questions


• Question: You have a Dockerfile that has a RUN instruction which fails intermittently causing
the image build to fail. How would you handle this situation?
– Answer: The intermittent failure could be due to network issues or issues with the command itself. You can
add retry logic in the command to handle network failures. If the issue is with the command itself, you
might need to debug and fix the command.

• Question: You want to ensure that a specific Docker container always starts last in a multi-
container application. How would you achieve this?
– Answer: Docker Compose supports the depends_on option which can be used to control the startup order
of containers. You can make the specific container depend on all other containers to ensure it starts last.

• Question: You are seeing an error "Cannot connect to the Docker daemon. Is the docker
daemon running on this host?" How would you troubleshoot this error?
– Answer: This error typically means that the Docker daemon is not running. You can start the Docker daemon
using the command systemctl start docker. If it's already running, you might not have the necessary
permissions to communicate with the Docker daemon. You can either use sudo or add your user to the
docker group.

Docker Interview Questions


• Question: A specific Docker command is taking longer to execute than expected. How
would you find out what's causing the delay?
– Answer: Docker provides a --debug flag which can be used to get detailed debugging
information. Use this flag with the slow command to see what's happening during its execution.

• Question: How would you share a Docker volume between multiple containers?
– Answer: When running the containers, you can use the -v option to mount the volume in the
containers. The same volume name can be used in multiple containers to share the volume
between them.

• Question: You want to clean up unused Docker resources like images, containers, and
networks. How would you do it?
– Answer: Docker provides a system prune command that can be used to remove all unused
Docker resources. Be careful while using this command as it will remove all unused resources,
not just the ones related to a specific application.

Docker Interview Questions


• Question: You've been tasked with migrating a monolithic application to a microservices
architecture. The application currently runs on a single server. How would you use Docker to
facilitate this migration?
– Answer: Docker provides a way to containerize each component or service of the application, which can
then be managed independently. You would start by identifying the individual components of the
monolithic application and creating a Dockerfile for each component. Each Dockerfile would contain the
necessary instructions to build that component. These Docker containers could then be orchestrated using a
tool like Docker Compose or Kubernetes, depending on the complexity and scale of the application.

• Question: You're in a situation where a containerized application works perfectly fine on your
local machine but fails when deployed to a production server. How would you go about
troubleshooting this issue?
– Answer: The key to resolving such an issue lies in ensuring that the environment of the Docker container in
production matches that of the local machine. Tools like Docker Compose help in this regard, as they allow
you to declare your environment in a YAML file and ensure it's the same across different deployments. If the
issue persists, you'd want to look at the logs of the Docker container in the production environment using
docker logs <container_id> to identify any errors or issues. You could also inspect the container for further
clues using docker inspect <container_id>.

• Question: Your company has a policy of keeping Docker images for production use in a private
registry. However, your team wants to use an image from Docker Hub. What would be your
approach in this situation?
– Answer: The best approach would be to pull the image from Docker Hub, test it thoroughly to make sure it
meets your company's standards and then push it to your company's private registry. From there, it can be
used in production. This way, you're following the company's policy while still being able to use the image
that your team prefers.

Docker Interview Questions


• Question: Your application requires a specific version of a software library. However, the base
Docker image you are using comes with a different version of that library. How would you
handle this situation?
– Answer: In such a case, you can create a Dockerfile with the base image and add an instruction to update
the specific library to the version you need. Docker allows you to run commands to install or update
software libraries in the Dockerfile, giving you the flexibility to customize the Docker image according to
your application's requirements.

• Question: You need to deploy a multi-container application where each container needs to
communicate with others. The application also needs to scale easily based on the load. How
would you design this application using Docker?
– Answer: Docker Compose allows you to define a multi-container application in a YAML file, where you can
specify the different services (containers), their configuration, and how they are linked. Docker Compose
also supports scalability by allowing you to scale specific services. For larger deployments, you may want to
consider using Docker Swarm or Kubernetes, which provides more robust orchestration, scalability, and
management features for multi-container applications.

• Question: Can you explain Docker-in-Docker (DinD) and provide a use case where it might be
necessary?
– Answer: Docker-in-Docker (DinD) is a scenario where a Docker container runs a Docker daemon inside it.
This is different from Docker outside of Docker (DooD), where a container communicates with the Docker
daemon of the host system. DinD might be useful in continuous integration (CI) pipelines where a build
process requires creating Docker images or running other Docker containers.

Docker Interview Questions


• Question: What are some potential security concerns with Docker-in-Docker and how
would you mitigate them?
– Answer: One potential security concern with DinD is that it requires running the Docker daemon
in privileged mode, which gives it almost unrestricted host access and could lead to a container
breakout. To mitigate this risk, consider using Docker-outside-of-Docker (DooD) where possible,
as it provides better isolation. If DinD is necessary, ensure that only trusted, secure images are
run in the DinD environment.
• Question: You are setting up a continuous integration (CI) pipeline and are considering
using Docker-in-Docker. What might be some potential drawbacks of this approach?
– Answer: Docker-in-Docker (DinD) requires running a Docker daemon inside your Docker
container, which introduces overhead and may impact performance. Furthermore, DinD can
result in complex and tricky cleanup scenarios since a second Docker daemon has its own
volumes and networks. Also, DinD requires privileged mode, which can create security risks.
• Question: In a Docker-in-Docker scenario, how would you handle data persistence?
– Answer: Data persistence in a DinD scenario can be tricky because each Docker daemon has its
own set of volumes. Data stored in a DinD volume will be lost when the container running the
inner Docker daemon is removed. To ensure data persistence, consider mounting a volume from
the host into the DinD container, and then mount a subdirectory of that volume into the inner
Docker containers.

Docker Interview Questions


• Question: What steps would you take to improve the security of Docker
containers in production?
– Answer: There are several best practices to improve Docker security. These include:
Running containers with a non-root user when possible; Regularly updating Docker and the
host OS; Regularly scanning images for vulnerabilities using tools like Docker Bench or Clair;
Limiting resources that a container can use; Using Docker's built-in security features like
seccomp profiles, AppArmor, and Capabilities; Using user namespaces to isolate container's
user ID and group ID from the host.

• Question: What is Docker multi-stage build and why is it useful?


– Answer: Docker multi-stage build is a method that allows you to use multiple FROM
instructions in your Dockerfile. Each FROM instruction can use a different base image and
starts a new stage of the build. You can copy artifacts from one stage to another, leaving
behind everything you don't need in the final image. This helps to create smaller Docker
images, reduce build time and manage build dependencies more efficiently.

Docker Interview Questions


• Question: How would you troubleshoot a Docker networking issue where two containers are
unable to communicate with each other?
– Answer: Start by inspecting the network configuration of the containers using docker network inspect.
Verify that both containers are on the same network, and that their IP addresses and ports are correctly
configured. If the containers are on separate networks, you might need to connect them to the same
network or enable network communication between the two networks.

• Question: How would you secure a Docker registry?


– Answer: You can secure a Docker registry by implementing: Authentication - use basic auth or integrate with
an existing authentication service like LDAP or Active Directory; Authorization - control what users can do
after they've authenticated; Encryption - use HTTPS to encrypt the communication between the Docker
client and the registry; Vulnerability scanning - regularly scan images in the registry for known
vulnerabilities; Implement content trust - use Docker Content Trust (DCT) to verify the integrity of images in
the registry.

• Question: You're designing a Docker networking solution for a multi-tier application. The
frontend should be accessible from the internet, but the backend should be isolated. How
would you design this?
– Answer: Docker supports several networking options. In this case, you could use a bridge network for the
backend services to isolate them. For the frontend, you could either use a host network to expose the
service directly on the host's IP, or use a bridge network and publish the necessary ports to the host.

Docker Interview Questions


• Question: You have a Dockerfile that builds an application in one stage and packages it
in another. The build stage is failing, but the error message is not helpful. How would
you troubleshoot this?
– Answer: You can modify your Dockerfile to stop at the build stage by removing or commenting
out the later stages. Then build the image and run it interactively with a shell so you can inspect
the container, rerun the build, and see more detailed error messages.
• Question: Your company has a policy of scanning all Docker images for vulnerabilities
before they are pushed to the registry. How would you implement this?
– Answer: There are several tools available for scanning Docker images for vulnerabilities, such as
Clair, Docker Bench, and Anchore. You can integrate these tools into your CI/CD pipeline so that
every time a new image is built, it gets scanned before being pushed to the registry. If the scan
finds any vulnerabilities, the pipeline should fail and prevent the image from being pushed.
• Question: How would you limit the system resources (like CPU and memory) that a
Docker container can use?
– Answer: Docker provides options to limit the system resources a container can use. For example,
you can use the --cpus flag when running a container to limit the CPU usage, and the -m or --
memory flag to limit the memory usage.

Docker Interview Questions


• Question: In Docker, what's the difference between the COPY and ADD commands in a
Dockerfile and when should you use one over the other?
– Answer: Both COPY and ADD instructions in Dockerfile copy files from the host machine to the Docker
image. COPY is a straightforward instruction that copies files or directories into the image. ADD has
additional capabilities like local-only tar extraction and remote URL support. In a Docker multi-stage build,
COPY is generally preferred because of its simplicity and because the additional features of ADD are rarely
required.

• Question: How would you ensure that Docker containers only communicate with each other
through defined points of interaction?
– Answer: Docker's networking features can be used to control how containers communicate with each other.
By default, all containers on a network can reach each other. To restrict this, you can create custom bridge
networks and use the --link option to specify which containers can communicate. Alternatively, you can use
Docker's network isolation features to achieve more granular control.

• Question: How would you prevent an image with known vulnerabilities from being pushed to a
Docker registry?
– Answer: Implement a vulnerability scanning step in your CI/CD pipeline. There are tools available, like Clair,
Docker Bench, or Anchore, which can scan Docker images for known vulnerabilities. If the scan step detects
vulnerabilities, the pipeline should fail and stop the image from being pushed to the registry.

Docker Interview Questions


• Question: You've noticed that your Docker images are considerably large, resulting in
longer deployment times. How would you optimize your Docker images to reduce their
size?
– Answer: There are several ways to reduce the size of Docker images. One is to use smaller base
images, like Alpine Linux. Another is to use multi-stage builds, where build-time dependencies
are kept in separate stages and only the necessary artifacts are copied to the final image. Also,
clean up unnecessary files and packages at the end of each layer in the Dockerfile.
• Question: How can you prevent unauthorized access to a Docker registry?
– Answer: Docker Registry supports several methods of authentication including basic
(username/password), token, and OAuth2. Implementing one of these, along with TLS
encryption for data in transit, can help prevent unauthorized access. Additionally, consider
setting up a firewall or other network-level access controls to restrict which IP addresses can
access the registry.
• Question: Your Docker containers are having network connectivity issues in a specific
subnet. How would you troubleshoot this?
– Answer: You can use the docker network inspect command to check the network configuration
of the containers and see if they are correctly configured for the subnet. Also, check the subnet
configuration and routing on the host and any firewalls or security groups that may be affecting
network connectivity.

Docker Interview Questions


• Question: How would you securely manage secrets needed by a Docker container at
runtime?
– Answer: Docker has a built-in secrets management solution which allows you to securely store
and manage any sensitive data needed at runtime. Secrets are encrypted during transit and at
rest in a Docker swarm, and can be securely shared between services in the swarm.
• Question: A Docker container that's supposed to use only a limited amount of memory
is causing the host to run out of memory. How would you troubleshoot this?
– Answer: You can inspect the container using the docker stats command to check its real-time
resource usage. If it's using more memory than it should, it's possible the memory limit was not
set correctly when the container was started, or the container process has a memory leak. You
may need to adjust the memory limit or investigate the process running inside the container.
• Question: A Docker multi-stage build is failing, and you're not sure which stage is
causing the issue. How would you find out?
– Answer: To troubleshoot a failing multi-stage Docker build, you can build each stage separately
using the --target option with the docker build command. This will help isolate the stage that's
causing the build to fail.

Docker Interview Questions


• Question: You have two Docker containers on the same network that are supposed to
communicate with each other, but they can't. How would you troubleshoot this?
– Answer: Check the network configuration of the containers using the docker network inspect command to
make sure they're on the same network. If they are, check their IP addresses and ports. You can also try
pinging one container from the other to see if there's any network connectivity. If there isn't, check the
network configuration on the host and any firewall rules that may be blocking communication.

• Question: You're trying to push an image to a Docker registry, but the push is failing with an
authorization error. How would you troubleshoot this?
– Answer: Check that you're authenticated with the registry using the correct credentials. You can use the
docker login command to authenticate. If you're already authenticated, check that your user has the
necessary permissions to push images to the registry. You may need to contact the registry administrator to
resolve permission issues.

• Question: Your Docker images are larger than expected, even after using a multi-stage build.
How would you find out what's causing the large image size?
– Answer: You can inspect the layers of your Docker image using the docker history command, which shows
the size of each layer. This can help identify which layers are adding significant size to the image. Once
you've identified the large layers, review the corresponding Dockerfile instructions and see if there are ways
to reduce the size, such as removing unnecessary files or packages.

Docker Interview Questions


• Question: You're trying to pull an image from a Docker registry, but the connection is failing.
How would you troubleshoot this?
– Answer: First, check your network connection and make sure you can reach the registry by pinging its URL or
IP address. If your network connection is fine, check that you're authenticated with the registry and have
the necessary permissions to pull images. If you're still unable to pull the image, there might be an issue
with the registry itself, in which case you would need to contact the registry administrator.

• Question: A Docker container is having intermittent network connectivity issues. How would
you troubleshoot this?
– Answer: Intermittent network issues can be challenging to troubleshoot. You can start by checking the
Docker container's logs for any error messages. You can also try to ping other devices on the network from
the container when the issue occurs to check network connectivity. If the issue persists, check the network
configuration on the Docker host and any other devices on the network.

• Question: A secret provided to a Docker container is incorrect, causing the container to fail.
How would you troubleshoot this?
– Answer: Start by inspecting the secret in the Docker swarm using the docker secret inspect command to
check its details. Be careful not to expose the secret in logs or output. If the secret is indeed incorrect, you'll
need to update it. Be aware that you can't directly update a Docker secret; you must remove and recreate it.
Also, ensure the correct secret is mounted to the container.

Docker Interview Questions


• What is the purpose of ENTRYPOINT in a Dockerfile?
– ENTRYPOINT is used to configure the default executable command for a Docker
container. It specifies the command that will be executed when the container
starts.
• What is the difference between ENTRYPOINT and CMD in a Dockerfile?
– ENTRYPOINT sets the command and parameters that will be executed when the
container starts, and it cannot be overridden during runtime. On the other hand,
CMD sets the default command and parameters, which can be overridden by
providing command-line arguments when running the container.
• When would you use ENTRYPOINT over CMD, and vice versa?
– ENTRYPOINT is typically used when you want to define a container as an
executable, such as a specific service or application, and you want to ensure that
specific command is always run. CMD, on the other hand, is used to provide default
command and arguments that can be overridden, allowing more flexibility.

Docker Interview Questions


• Scenario: You have a Node.js application that needs to be containerized
using Docker. Write a Dockerfile to build and run the application.
• Question: Can you explain the different instructions you would include in
the Dockerfile and their purposes?
– Answer: In the Dockerfile for a Node.js application, you would typically include the
following instructions:
– FROM to specify the base image, such as node:14, which provides the Node.js
runtime.
– WORKDIR to set the working directory inside the container.
– COPY or ADD to copy the application source code into the container.
– RUN to install dependencies using a package manager like npm or yarn.
– EXPOSE to specify the port on which the application listens.
– CMD or ENTRYPOINT to define the command to run the application.

Docker Interview Questions


• Scenario: You want to create a Docker image for a Python application that
depends on specific Python packages. How would you handle the package
dependencies in the Dockerfile?
• Question: What instructions or techniques would you use in the Dockerfile to
ensure the required Python packages are installed in the image?
– Answer: In the Dockerfile for a Python application, you would include the following
instructions:
– FROM to specify the base image, such as python:3.9, which provides the Python runtime.
– WORKDIR to set the working directory inside the container.
– COPY or ADD to copy the application source code into the container.
– RUN to run pip install or another package manager command to install the required Python
packages specified in a requirements.txt file or directly in the Dockerfile.

Docker Interview Questions


• Scenario: You want to improve the build speed of your Docker image by leveraging Docker's layer caching
mechanism. How would you structure your Dockerfile to maximize layer reusability?
• Question: Can you explain the concept of layer caching in Docker and provide some best practices to
optimize layer reusability in a Dockerfile?
– Answer: Docker uses layer caching to optimize the image build process. Each instruction in the Dockerfile creates a new
layer, and Docker reuses previously built layers if the instructions and context remain unchanged. To maximize layer
reusability, it is recommended to:
– Order the instructions from least to most frequently changing. For example, copy source code or dependencies at the end,
after installing system-level dependencies, to prevent rebuilding those layers unnecessarily.
– Use multi-stage builds to separate build-time dependencies from runtime dependencies, reducing the size of the final
image.
– Leverage build-time caching mechanisms like --mount=type=cache to cache dependencies or intermediate build artifacts
for faster subsequent builds.

• Scenario: You need to run some initialization tasks or commands when the container starts. How would
you include these initialization steps in the Dockerfile?
– Question: Which Dockerfile instruction would you use to run initialization tasks or commands, and what considerations
would you take into account when adding them?
– Answer: In the Dockerfile, you can use the CMD or ENTRYPOINT instructions to define the command(s) that run when the
container starts. For example, you can use CMD ["node", "app.js"] to run a Node.js application as the default command.
Considerations when adding initialization steps:
– Use CMD to specify the default command, which can be overridden when running the container.
– Use ENTRYPOINT to define the executable that always runs, with CMD providing default arguments.
– Remember that CMD can be overwritten at runtime by passing additional arguments to the docker run command.

Docker Interview Questions


• Scenario: You want to ensure that the Docker image you build is as small as
possible to minimize its footprint. What techniques or strategies would you
employ in the Dockerfile to achieve this?
• Question: Can you provide some examples of how you would optimize the
Docker image size, including specific Dockerfile instructions or practices you
would follow?
– Answer: To optimize Docker image size, you can employ the following techniques:
– Use a minimal base image, such as alpine or scratch, for smaller footprints.
– Remove unnecessary files or dependencies after the installation step in the Dockerfile.
– Use .dockerignore to exclude files or directories that are not needed in the image.
– Combine multiple RUN instructions into a single instruction to reduce the number of layers.
– Use multi-stage builds to separate build-time dependencies from the final runtime image.
– Minimize the number of installed packages and libraries to only include what is necessary
for the application.
– Compress or optimize assets, such as JavaScript or CSS files, before copying them into the
image.

Docker Interview Questions


docker run: Run a container based on an image.

Example: docker run -d -p 8080:80 nginx

docker pull: Download an image from a registry.

Example: docker pull ubuntu

docker build: Build a Docker image from a Dockerfile.

Example: docker build -t myapp:1.0 .

docker images: List available Docker images.

Example: docker images

docker ps: List running containers.

Example: docker ps

Docker Commands For Reference


• docker stop: Stop a running container.

• Example: docker stop mycontainer

• docker rm: Remove a container.

• Example: docker rm mycontainer

• docker rmi: Remove an image.

• Example: docker rmi myimage

• docker exec: Execute a command in a running container.

• Example: docker exec -it mycontainer bash

• docker logs: View the logs of a container.

• Example: docker logs mycontainer

Docker Commands For Reference


• docker network: Manage Docker networks.

• Example: docker network create mynetwork

• docker volume: Manage Docker volumes.

• Example: docker volume create myvolume

• docker cp: Copy files between a container and the host.

• Example: docker cp myfile.txt mycontainer:/path/to/file

• docker commit: Create a new image from a container's changes.

• Example: docker commit mycontainer myimage:1.1

• docker tag: Add a tag to an image.

• Example: docker tag myimage:1.0 myrepo/myimage:latest

Docker Commands For Reference


• docker push: Push an image to a registry.

• Example: docker push myrepo/myimage:latest

• docker login: Log in to a Docker registry.

• Example: docker login myregistry.com

• docker logout: Log out from a Docker registry.

• Example: docker logout myregistry.com

• docker inspect: Display detailed information about a container, image, or network.

• Example: docker inspect mycontainer

• docker stats: Display live resource usage statistics of running containers.

• Example: docker stats

Docker Commands For Reference


• docker-compose up: Start containers defined in a Docker Compose file.

• Example: docker-compose up -d

• docker-compose down: Stop and remove containers defined in a Docker Compose file.

• Example: docker-compose down

• docker-compose build: Build or rebuild services defined in a Docker Compose file.

• Example: docker-compose build

• docker-compose logs: View the logs of containers defined in a Docker Compose file.

• Example: docker-compose logs myservice

• docker-compose exec: Execute a command in a running container defined in a Docker Compose file.

• Example: docker-compose exec myservice bash

Docker Commands For Reference


• docker-compose pull: Pull updated images for services defined in a Docker Compose file.

• Example: docker-compose pull

• docker-compose run: Run a one-time command in a new container defined in a Docker Compose file.

• Example: docker-compose run myservice python script.py

• docker-compose restart: Restart containers defined in a Docker Compose file.

• Example: docker-compose restart myservice

• docker-compose stop: Stop containers defined in a Docker Compose file.

• Example: docker-compose stop

• docker-compose ps: List containers defined in a Docker Compose file.

• Example: docker-compose ps

Docker Commands For Reference


• docker swarm init: Initialize a swarm and create a manager node.

• Example: docker swarm init

• docker swarm join: Join a swarm as a worker or manager node.

• Example: docker swarm join --token SWMTKN-1-0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef


mymanager:2377

• docker service create: Create a new service in a swarm.

• Example: docker service create --name myservice --replicas 3 myimage

• docker service scale: Scale the number of replicas for a service in a swarm.

• Example: docker service scale myservice=5

• docker service ls: List services in a swarm.

• Example: docker service ls

Docker Commands For Reference


• docker service inspect: Display detailed information about a service in a swarm.

• Example: docker service inspect myservice

• docker node ls: List nodes in a swarm.

• Example: docker node ls

• docker node inspect: Display detailed information about a node in a swarm.

• Example: docker node inspect mynode

• docker system df: Show Docker disk usage.

• Example: docker system df

• docker system prune: Remove unused Docker data (containers, images, networks, etc.) to free up disk space.

• Example: docker system prune

Docker Commands For Reference


• docker history: View the history of an image, including its layers and metadata.

• Example: docker history myimage

• docker save: Save an image to a tar archive.

• Example: docker save -o myimage.tar myimage

• docker load: Load an image from a tar archive.

• Example: docker load -i myimage.tar

• docker attach: Attach to a running container and interact with its console.

• Example: docker attach mycontainer

• docker export: Export the filesystem of a container as a tar archive.

• Example: docker export mycontainer > mycontainer.tar

Docker Commands For Reference


• docker import: Import the contents of a tar archive as a new Docker image.

• Example: docker import mycontainer.tar myimage

• docker network create: Create a new Docker network.

• Example: docker network create mynetwork

• docker network ls: List Docker networks.

• Example: docker network ls

• docker network inspect: Display detailed information about a Docker network.

• Example: docker network inspect mynetwork

• docker network connect: Connect a container to a Docker network.

• Example: docker network connect mynetwork mycontainer

Docker Commands For Reference


• docker network disconnect: Disconnect a container from a Docker network.

• Example: docker network disconnect mynetwork mycontainer

• docker volume create: Create a new Docker volume.

• Example: docker volume create myvolume

• docker volume ls: List Docker volumes.

• Example: docker volume ls

• docker volume inspect: Display detailed information about a Docker volume.

• Example: docker volume inspect myvolume

• docker volume prune: Remove unused Docker volumes.

• Example: docker volume prune

Docker Commands For Reference


• docker system events: Stream real-time events from the Docker server.

• Example: docker system events

• docker stats: Display live resource usage statistics of running containers.

• Example: docker stats

• docker top: Display the running processes of a container.

• Example: docker top mycontainer

• docker version: Show Docker version information.

• Example: docker version

• docker info: Display Docker system-wide information.

• Example: docker info

Docker Commands For Reference


• docker events: Display real-time events from the Docker server.

• Example: docker events

• docker pause: Pause processes within a running container.

• Example: docker pause mycontainer

• docker unpause: Unpause processes within a paused container.

• Example: docker unpause mycontainer

• docker kill: Send a signal to stop a running container.

• Example: docker kill mycontainer

• docker restart: Restart a container.

• Example: docker restart mycontainer

Docker Commands For Reference


• docker update: Update configuration of a running container.

• Example: docker update --cpus 2 --memory 512m mycontainer

• docker port: List port mappings of a container.

• Example: docker port mycontainer

• docker inspect: Display detailed information about a container, image, network, or volume.

• Example: docker inspect mycontainer

• docker diff: Show changes to files in a container's filesystem.

• Example: docker diff mycontainer

• docker logs: Fetch the logs of a container.

• Example: docker logs mycontainer

Docker Commands For Reference


• docker attach: Attach to a running container's console.

• Example: docker attach mycontainer

• docker wait: Block until a container stops, then print the exit code.

• Example: docker wait mycontainer

• docker cp: Copy files/folders between the container and the host.

• Example: docker cp myfile.txt mycontainer:/path/to/file

• docker rename: Rename a container.

• Example: docker rename mycontainer newcontainername

• docker system prune: Remove unused containers, networks, and images.

• Example: docker system prune

Docker Commands For Reference


• docker pause: Pause processes within a running container.

• Example: docker pause mycontainer

• docker unpause: Unpause processes within a paused container.

• Example: docker unpause mycontainer

• docker history: Show the history of an image.

• Example: docker history myimage

• docker search: Search Docker Hub for images.

• Example: docker search ubuntu

• docker login: Log in to a Docker registry.

• Example: docker login myregistry.com

• docker logout: Log out from a Docker registry.

• Example: docker logout myregistry.com

Docker Commands For Reference


Kubernetes Interview Questions
1. Kubernetes Architecture: Understanding of the master/worker nodes model, etcd, kubelet, API Server, Controller Manager,
Scheduler, and how they interact with each other.

2. Pods: As the smallest and simplest unit in the Kubernetes object model, understanding pods is fundamental. This includes
topics like pod lifecycle, multi-container pods, pod health checks, and more.

3. Controllers: Understanding the different types of controllers (Deployment, StatefulSet, DaemonSet, Jobs, CronJobs) and their
specific use cases is essential.

4. Services & Networking: Knowledge about ClusterIP, NodePort, LoadBalancer, Ingress controllers, network policies, service
discovery, CNI, etc., is crucial.

5. Volumes & Data: Persistent volumes, persistent volume claims, storage classes, stateful applications handling, etc.

6. Configuration & Secrets Management: ConfigMaps, Secrets, and managing sensitive data.

7. RBAC & Security: Understanding of Role-Based Access Control, Security Contexts, Network Policies, and overall Kubernetes
cluster security.

8. Resource Management: Understanding of requests and limits, Quality of Service (QoS) Classes, Resource Quota, Limit Ranges.

9. Observability: Experience with logging (using tools like Fluentd), monitoring (with tools like Prometheus), tracing, and
debugging in a Kubernetes environment.

10. Maintenance & Troubleshooting: Node maintenance, Cluster upgrades, debugging techniques, and tools, kube-apiserver, and
kubelet logs, etc.

Kubernetes Interview Questions


11. CI/CD in Kubernetes: Understanding of how to implement CI/CD in a Kubernetes environment using
tools like Jenkins, GitLab, Spinnaker, etc.
12. Helm: The usage of Helm for package management in Kubernetes.
13. Service Mesh: Knowledge about service meshes (like Istio, Linkerd) and their role in a Kubernetes
environment.
14. Kubernetes Operators: What are Operators, and how do they work?
15. Custom Resource Definitions (CRDs): How to extend Kubernetes API using CRDs.
16. Kubernetes Autoscaling: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster
Autoscaler.
17. Namespaces: Using namespaces for isolation and organizing cluster resources.
18. Cloud Provider Integrations: Knowledge about how Kubernetes interacts with cloud providers (like GCP,
AWS, Azure) for features like load balancers, node groups, etc.

Kubernetes Interview Questions


19.Kubernetes Security: This includes aspects such as:
19. Authentication and Authorization: Understanding of how Kubernetes handles user authentication
(including service accounts), as well as role-based access control (RBAC) for determining what
authenticated users can do.
20. Admission Controllers: Knowledge of what admission controllers are and how they contribute to the
security of a Kubernetes cluster.
21. Security Contexts: Understanding of how to use security contexts to control access to resources.
22. Network Policies: Knowledge of how to implement network policies to control network access into and
out of your pods.
23. Secrets Management: Knowledge of how to manage sensitive data using Kubernetes secrets and external
tools like Vault.
24. Image Security: Techniques for ensuring the security of container images, such as using trusted registries
and image scanning tools.
25. Audit Logging: Understanding of how to use Kubernetes audit logs for keeping track of what is happening
in your cluster.
26. Securing the Kubernetes API Server: Techniques for ensuring the API Server, which is the main gateway to
your Kubernetes cluster, is secure.
27. Kubernetes Hardening: Best practices for hardening a Kubernetes cluster, such as minimizing attack
surfaces, limiting direct access to nodes, etc.
28. TLS and Certificate Management: Handling of TLS certificates within Kubernetes for secure
communication.
29. Kubernetes Threat Modeling: Understanding of potential attacks, weaknesses, and how to mitigate them.

Kubernetes Interview Questions


• Kubernetes Architecture: Question: You've noticed that one of your worker nodes in
Kubernetes is no longer scheduling any new pods. What could be the reason and how would
you troubleshoot this?
– Answer: This could be due to various reasons - the node could be marked as NotReady, disk pressure or
other node conditions could be preventing scheduling, or the kubelet on the node might not be responding.
Use kubectl describe node <node-name> to check node conditions, and look at the kubelet logs on the node
for any errors.

• Kubernetes Architecture: Question: Your API server is failing to connect to the etcd cluster.
What steps would you take to troubleshoot this?
– Answer: I'd check the logs for the API server to look for any error messages. If the API server and etcd are
running in pods, kubectl logs can be used. If they are installed directly on the nodes, I'd SSH into the node
and manually check the logs there. It might also be useful to verify the network connectivity between the
API server and etcd.

• Question: How would you resolve an issue where the kubelet isn't registering nodes with the
Kubernetes API server?
– Answer: First, I would check the kubelet logs on the affected node for any error messages. It could be an
issue with the kubelet configuration or its connection to the API server. I'd also verify that the API server is
accessible from the node and that the correct certificates are being used for authentication.

• Question: The Kubernetes API server is running out of resources and becoming unresponsive.
How would you handle this scenario?
– Answer: One approach could be to scale the API server if it's set up in a High Availability (HA) configuration.
Otherwise, consider increasing the resources allocated to the API server. I would also investigate the cause
of the increased resource usage—it might be due to excessive requests from a certain source, in which case
rate limiting might be appropriate.

Kubernetes Interview Questions - Architecture


• Question: How would you troubleshoot an issue where etcd is consuming a lot of CPU
resources?
– Answer: I would investigate the source of the CPU usage, which could be due to a high number of requests
to etcd. This might be caused by the control plane components, operators, or user workloads. If the CPU
usage is too high, consider scaling the etcd cluster horizontally or vertically, or optimizing the workloads that
are using etcd.

• Question: How would you approach a scenario where the controller manager is continuously
restarting?
– Answer: I would first look at the logs for the controller manager to identify any error messages. I might need
to adjust the controller manager's configuration or resources, or resolve any issues with the API server or
etcd that could be causing the controller manager to restart.

• Question: The scheduler is not placing pods on certain nodes, despite the nodes having
available resources. How would you troubleshoot this?
– Answer: I would start by checking the events of the unscheduled pods with kubectl describe pod <pod-
name>. This could reveal issues like taints on the nodes, insufficient resources, or node affinity/anti-affinity
rules. I'd also check the scheduler logs for any errors.

• Question: How would you troubleshoot a scenario where kube-proxy is not correctly setting up
network rules, causing service discovery to fail?
– Answer: I would first describe the service and endpoints to verify that the service is correctly configured.
Then, I would check the kube-proxy logs for any errors. It could be an issue with the kube-proxy
configuration or the network plugin that's being used. If kube-proxy is running as a DaemonSet, I might also
check the status of the kube-proxy pods on the affected nodes.

Kubernetes Interview Questions - Architecture


• Question: Imagine a scenario where your Kubernetes master node becomes unresponsive. How would
you troubleshoot this issue?
– Answer: In this scenario, you would start by checking the logs of the master node's components (kube-apiserver, kube-
controller-manager, and kube-scheduler). Look for any error messages or indications of failures. Check if the master node
has enough resources (CPU, memory) to handle the workload. If the issue persists, you may need to restart the master
node or investigate potential networking or configuration issues.

• Question: Suppose you have a Kubernetes cluster with a large number of nodes, and you're experiencing
intermittent connectivity issues between some nodes. How would you troubleshoot and resolve this
issue?
– Answer: First, check the network configurations of the affected nodes and ensure they have proper network connectivity.
Use tools like ping and traceroute to identify potential network bottlenecks or misconfigurations. If the issue is not
resolved, examine the network infrastructure between the nodes, such as firewalls or network policies, and ensure that
the necessary ports are open for communication. Additionally, review any recent changes or updates that might have
affected the cluster's networking.

• Question: In a Kubernetes cluster, you notice that the kubelet on some worker nodes is failing to register
with the master. What could be the possible causes, and how would you troubleshoot this issue?
– Answer: Potential causes could include network connectivity issues, misconfiguration of the kubelet, or a failure of the
kubelet service itself. To troubleshoot this, start by checking the kubelet logs on the affected nodes (journalctl -u kubelet
or docker logs kubelet). Look for error messages indicating why the registration is failing. Verify that the kubelet's
configuration matches the cluster's specifications. Check network connectivity between the worker nodes and the master,
ensuring that necessary ports are open. If necessary, restart the kubelet service and monitor the logs for any recurring
errors.

• Question: You have a Kubernetes cluster where some pods are frequently evicted or failing to start due to
insufficient resources. How would you troubleshoot this issue and adjust the resource allocation?
– Answer: Start by checking the resource requests and limits specified in the pod specifications (kubectl describe pod <pod-
name>). Ensure that the requested resources are within the available capacity of the worker nodes. Use the kubectl top
command to monitor the resource usage of nodes and pods. If the resources are consistently exceeding the limits,
consider adjusting the resource requests and limits to better match the application's needs. Alternatively, you may need to
add more nodes or upgrade the existing nodes to increase the cluster's resource capacity.

Kubernetes Interview Questions - Architecture


• Question: In a Kubernetes cluster, you notice that the kube-apiserver is experiencing high CPU usage and
becomes unresponsive at times. How would you troubleshoot and resolve this issue?
– Answer: Begin by checking the kube-apiserver logs for any error messages or indications of high load. Identify any recent
changes or increases in traffic that might have caused the high CPU usage. Analyze the system's resource usage using tools
like top or monitoring solutions to identify potential resource constraints. Ensure that the kube-apiserver's configuration
matches the cluster's requirements. If the issue persists, consider horizontally scaling the kube-apiserver by adding more
replicas or upgrading the hardware to handle the increased load.

• Question: Suppose you have a Kubernetes cluster where the kube-scheduler is consistently failing to
assign pods to nodes, resulting in pod scheduling delays. How would you troubleshoot and address this
issue?
– Answer: First, check the kube-scheduler logs for any error messages or indications of failures. Ensure that the kube-
scheduler's configuration is correct and aligned with the cluster's specifications. Verify that the worker nodes have
sufficient resources to accommodate the pods' requested resources. If the kube-scheduler is overwhelmed, consider
scaling it by adding more replicas. You can also monitor the cluster's resource usage using tools like Prometheus and
Grafana to identify any resource constraints impacting the scheduling process.

• Question: You have a Kubernetes cluster where the etcd cluster, which serves as the cluster's data store, is
experiencing performance degradation and high latency. How would you troubleshoot this issue?
– Answer: Start by checking the etcd cluster's logs for any error messages or indications of performance issues. Verify that
the etcd cluster has enough resources (CPU, memory, storage) to handle the workload. Use tools like etcdctl to inspect the
cluster's health and performance metrics. Consider monitoring the I/O and network usage of the etcd nodes. If necessary,
scale up the etcd cluster by adding more nodes or upgrading the hardware to improve performance.

• Question: In a Kubernetes cluster, you observe that some pods are repeatedly crashing and restarting.
How would you troubleshoot and identify the root cause of this issue?
– Answer: Begin by examining the logs of the crashing pods using the kubectl logs command. Look for error messages or
stack traces that might indicate the cause of the crashes. Check if the pods are running out of resources, such as memory
or CPU, by inspecting the resource requests and limits. Ensure that the container images and configurations are
compatible with the cluster environment. Consider enabling additional logging or debug flags to gather more information.
If necessary, run the problematic container locally outside the cluster for further investigation.

Kubernetes Interview Questions - Architecture


• Question: You notice that a pod in your Kubernetes cluster is constantly
restarting. How would you diagnose and resolve this issue?
– Answer: First, I would examine the logs of the pod using kubectl logs <pod_name>.
If the issue wasn't clear from the logs, I would use kubectl describe pod
<pod_name> to see events associated with the pod. If it seems like a crash loop, it
might be an issue with the application inside the pod. If it's an issue like
"ImagePullBackOff", it could be a problem with the image or the image registry.
• Question: What will happen when a pod reaches its memory or CPU
limit?
– Answer: If a pod exceeds its CPU limit, it will be throttled and won't be allowed to
use more CPU than its limit. However, if a pod tries to use more memory than its
limit, the pod will be terminated, and a system out of memory (OOM) error will be
recorded.
• Question: What steps would you take to connect to a running pod and
execute commands inside the container?
– Answer: You can use the kubectl exec command to run commands in a container.
For example, kubectl exec -it <pod_name> -- /bin/bash will start a bash session in
the specified pod.

Kubernetes Interview Questions - Pods


• Question: How can you copy files to or from a Kubernetes pod?
– Answer: You can use the kubectl cp command to copy files between a pod and your local
system. For example, kubectl cp <pod_name>:/path/to/remote/file /path/to/local/file.
• Question: What would you do if a pod is in a Pending state?
– Answer: If a pod is in a Pending state, it means it has been accepted by the Kubernetes
system, but one or more of the container images has not been created. Reasons could
include insufficient resources on the node, or some issue pulling the image. I'd start by
looking at the pod's events with kubectl describe pod <pod_name>.

• Question: How can you ensure a group of pods can communicate with each
other and other objects can't interfere?
– Answer: Network Policies can be used to control network access into and out of your pods.
A network policy is a specification of how groups of pods are allowed to communicate with
each other and other network endpoints.
• Question: How would you share storage between pods?
– Answer: Sharing storage between pods can be achieved using a Persistent Volume (PV) and
Persistent Volume Claims (PVCs). The PV corresponds to the actual storage resource, while
the PVC is a user's request for storage. Pods can then mount the storage using the PVC.

Kubernetes Interview Questions - Pods


• Question: A pod failed to start due to an error "ImagePullBackOff". What
does this mean and how would you fix it?
– Answer: The "ImagePullBackOff" error indicates that Kubernetes wasn't able to pull
the container image for the pod. This could be due to a number of reasons like the
image doesn't exist, the wrong image name or tag was provided, or there are
access issues with the Docker registry. To fix this, I would verify the image name
and tag, and check the imagePullSecrets for the pod or service account.
• Question: Can you scale a specific pod in Kubernetes? If not, how do you
scale in Kubernetes?
– Answer: In Kubernetes, you don't scale pods directly. Instead, you would scale a
controller that manages pods, like a Deployment. You can scale these controllers
using the kubectl scale command.
• Question: How would you limit the amount of memory or CPU that a pod
can use?
– Answer: You can specify resource limits for a pod or container in the pod
specification. This can include limits for CPU and memory.

Kubernetes Interview Questions - Pods


• Question: What is a "taint" in Kubernetes, and how does it affect pods?
– Answer: Taints are a property of nodes, they allow a node to repel a set of pods. Tolerations
are applied to pods and allow (but do not require) the pods to schedule onto nodes with
matching taints.
• Question: How would you ensure certain pods only run on certain nodes?
– Answer: You can use NodeSelector, node affinity, and taints and tolerations to constrain
pods to run on particular nodes in a Kubernetes cluster.
• Question: What is the "kube-proxy" in Kubernetes and how does it affect
communication between pods?
– Answer: kube-proxy is a network proxy that runs on each node in the cluster. It maintains
network rules that allow network communication to your Pods from network sessions
inside or outside of your cluster.
• Question: How can you update the image of a running pod?
– Answer: In Kubernetes, you don't update a pod directly. Instead, you would update a
Deployment that manages the pod. If you update the image in the Deployment, it will
create a new ReplicaSet and scale it up, while scaling down the ReplicaSet of the old
version.

Kubernetes Interview Questions - Pods


• Question: What is the lifecycle of a Pod in Kubernetes?
– Answer: The lifecycle of a Pod in Kubernetes goes through several phases: Pending,
Running, Succeeded, Failed, Unknown.
• Question: How can you store sensitive information (like passwords) and make it
available to your pods?
– Answer: Sensitive information can be stored in Kubernetes using Secrets. The data in
Secrets is base64 encoded and can be accessed by pods based on role-based access control
(RBAC).
• Question: What are Init Containers and how are they different from regular
containers in a Pod?
– Answer: Init Containers are specialized containers that run before app containers and can
contain utilities or setup scripts not present in an app image.
• Question: How do Kubernetes probes work, and how would you use them to
ensure your pods are healthy?
– Answer: Kubernetes provides liveness, readiness, and startup probes that are used to check
the health of your pods. Liveness probes let Kubernetes know if your app is alive or dead. If
your app is dead, Kubernetes removes the Pod and starts a new one to replace it. Readiness
probes let Kubernetes know if your app is ready to serve traffic. Startup probes indicate
whether the application within the container is started.

Kubernetes Interview Questions - Pods


• Question: Can you describe what a sidecar pattern is and give a real-world example of when
you would use one?
– Answer: A sidecar pattern is a single-node pattern that consists of two containers. The first is the application
container, and the second, the sidecar, aims to enhance or extend the functionality of the first. A classic
example of a sidecar is a logging or monitoring agent running alongside an application.

• Question: How do you configure two containers in a pod to communicate with each other?
– Answer: Containers within the same pod share the same network namespace, meaning they can
communicate with each other using 'localhost'. They can also communicate using inter-process
communication (IPC), as they share the same IPC namespace.

• Question: Suppose your application writes logs to stdout. You need to send these logs to a
remote server using a tool that expects logs to be in a file. How would you solve this?
– Answer: This is a classic use case for a sidecar container. The application can continue to write logs to
stdout, and a sidecar container can collect these logs from the Docker log driver and write them to a file,
which can then be processed and sent to the remote server.

• Question: If the sidecar container fails, what happens to the main application container?
– Answer: By default, if a sidecar container fails, the main application container continues to run. However, it
might not function correctly if it depends on the sidecar. To ensure that both the main container and the
sidecar container are treated as a single unit, we can use a feature called Pod Lifecycle to control the startup
and shutdown behavior.

Kubernetes Interview Questions - Pods


• Question: A specific node in your cluster is underperforming, and you
suspect it's because of a particular pod. How would you confirm this and
solve the problem?
– Answer: I would use the kubectl top pod command with the -n flag specifying the
node's name to view the CPU and memory usage of the pods running on that node.
If the pod is consuming too many resources, I would either adjust the resource
requests and limits for that pod or consider moving it to a different node if the
node's overall capacity is the issue.
• Question: You have a pod that needs to be scheduled on a specific type of
node (e.g., GPU-enabled). How would you ensure this happens?
– Answer: I can use NodeSelectors, Node Affinity/Anti-Affinity, or Taints and
Tolerations to influence pod scheduling. NodeSelectors are the simplest way to
constrain pods to nodes with specific labels. For more complex requirements, Node
Affinity/Anti-Affinity and Taints and Tolerations can be used.
• Question: How would you drain a node for maintenance while minimizing
disruption to running applications?
– Answer: You can use the kubectl drain command, which safely evicts all pods from
the node while respecting the PodDisruptionBudget. This ensures that the services
provided by the pods remain available during the maintenance.

Kubernetes Interview Questions - Pods


• Question: Your applications need to connect to a legacy system that uses IP whitelisting for
security. How can you ensure that traffic from your pods goes through a specific set of IPs?
– Answer: Kubernetes supports egress traffic control using Egress Network Policies or NAT gateways provided
by cloud providers. You can create an Egress Network Policy that allows traffic only to the legacy system, or
use a NAT gateway with a static IP address, and then whitelist that IP in the legacy system.

• Question: What can you do if your pods are frequently getting OOMKilled?
– Answer: If pods are frequently getting OOMKilled, it means they're trying to consume more memory than
their limit. To resolve this issue, I would first use kubectl describe pod to get more information about the
pod's resource usage. If the pod is indeed exceeding its memory limit, I would either increase the memory
limit (if feasible) or optimize the application to use less memory.

• Question: How would you configure a pod so that it automatically restarts if it exits due to an
error?
– Answer: The restart policy for a pod is controlled by the restartPolicy field in the pod specification. By
setting restartPolicy to Always, the pod will automatically restart if it exits with an error.

• Question: Your application needs to read a large dataset at startup, and this is causing long
startup times. How could you use an Init Container to solve this problem?
– Answer: An Init Container could be used to download the dataset and perform any necessary preprocessing.
The data could be stored in a volume that's shared with the application container. This way, by the time the
application container starts, the data is already prepared and ready to use, reducing startup time.

Kubernetes Interview Questions - Pods


• Question: You have a stateful application that needs to persist data
between pod restarts. How would you accomplish this?
– Answer: To persist data across pod restarts, I would use a PersistentVolume (PV)
and PersistentVolumeClaim (PVC). The PVC would be used in the pod specification
to mount the PersistentVolume to the appropriate path in the container.
• Question: How would you prevent a pod from being scheduled on a
master node?
– Answer: Master nodes are tainted to prevent pods from being scheduled on them
by default. However, if needed, I could manually add a taint to the master nodes
using the kubectl taint command and then ensure the pods don't have a toleration
for this taint.
• Question: Your application needs to communicate with an external
service that uses a self-signed certificate. How would you configure your
pods to trust this certificate?
– Answer: I would create a Kubernetes Secret containing the certificate, and then
mount this secret as a volume in the pod. The application would then configure its
truststore to include this certificate.

Kubernetes Interview Questions - Pods


• Question: What is the main difference between a Deployment and a StatefulSet, and
when would you prefer one over the other?
– Answer: Deployments are great for stateless applications, where each replica is identical and
independent, whereas StatefulSets are used for stateful applications where each replica has a
unique and persistent identity and a stable network hostname. In scenarios where data
persistence and order of scaling and termination is crucial, we use StatefulSets. On the other
hand, Deployments are more suited for stateless services where scaling and rolling updates are
important.
• Question: How does a DaemonSet ensure that some or all nodes run a copy of a pod?
– Answer: A DaemonSet operates by using a scheduler in Kubernetes, which automatically assigns
pods to nodes. When a DaemonSet controller creates a pod, the scheduler ensures that the pod
runs on a specific node. When a node is added to the cluster, a new pod gets scheduled onto it,
and when a node is removed, the pod is garbage collected.
• Question: How can you achieve a run-to-completion scenario for a task in Kubernetes?
– Answer: A Job in Kubernetes would be suitable for a run-to-completion scenario. A Job creates
one or more pods and ensures that a specified number of them successfully terminate. When a
specified number of successful completions is reached, the Job is complete.

Kubernetes Interview Questions - Deployment, StatefulSet,


DaemonSet, Jobs, CronJobs
• Question: How do you execute a task at a specific time or periodically on the
Kubernetes cluster?
– Answer: A CronJob manages time-based Jobs in Kubernetes, specifically, Jobs that run at
predetermined times or intervals. This would be the ideal choice for scheduling tasks to run
at a specific time or periodically.
• Question: Can you explain how rolling updates work with Deployments?
– Answer: When a Deployment is updated, it creates a new ReplicaSet and gradually
increases the number of replicas in the new ReplicaSet as it decreases the number in the
old ReplicaSet. This achieves a rolling update, minimizing the impact on availability and load
handling capacity.
• Question: Suppose you have a multi-node Kubernetes cluster and you want to
ensure that an instance of a specific pod is running on each node, including
when new nodes are added to the cluster. How can you achieve this?
– Answer: In this case, you can use a DaemonSet. A DaemonSet ensures that all (or some)
nodes run a copy of a pod. When nodes are added to the cluster, the pods are added to
them. When nodes are removed from the cluster, those pods are garbage collected.

Kubernetes Interview Questions - Deployment, StatefulSet,


DaemonSet, Jobs, CronJobs
• Question: How would you perform a rollback of a Deployment in Kubernetes?
– Answer: You can perform a rollback of a Deployment using the kubectl rollout undo
command. This will revert the Deployment to its previous state.

• Question: If a Job fails, how does Kubernetes handle it? Can you configure this
behavior?
– Answer: If a Job's pod fails, the Job controller will create a new pod to retry the task. You
can customize this behavior by adjusting the backOffLimit and activeDeadlineSeconds
parameters in the Job configuration.

• Question: How would you create a time-based job that removes temporary files
from your application's persistent volume every night at midnight?
– Answer: This is a classic use-case for a CronJob in Kubernetes. A CronJob creates Jobs on a
time-based schedule, and can be used to create a Job that runs a pod every night at
midnight to remove the temporary files.

Kubernetes Interview Questions - Deployment, StatefulSet,


DaemonSet, Jobs, CronJobs
• Question: You have a stateful application that needs to maintain its state even when
rescheduled. How would you manage this application in Kubernetes?
– Answer: For stateful applications, it's typically best to use a StatefulSet rather than a Deployment.
StatefulSets maintain a sticky identity for each of their pods, which ensures that if a pod is rescheduled, it
can continue to access its persistent data and maintain the same network identity.

• Question: Imagine you need to deploy a stateful application, such as a database, on your
Kubernetes cluster. However, you're concerned about the possibility of losing data during an
update. How would you manage updates to ensure data integrity?
– Answer: When dealing with stateful applications, it's essential to ensure that updates do not lead to data
loss. One way to manage this is to use StatefulSets with a persistent volume for data storage. Before
updating, ensure you have a backup strategy in place. During the update, Kubernetes will update each pod
one at a time in a reverse order. This way, if there are issues with the update, you can halt the process and
minimize the impact.

• Question: In your application, you have a long-running job that can't be interrupted. However,
Kubernetes evicts it because it exceeds its memory limit. How would you prevent this from
happening in the future?
– Answer: You should consider setting both resource requests and limits in the pod specification. The request
should be the amount of memory the job needs to run under normal conditions, and the limit should be the
maximum amount of memory that the job can use. If the job requires more memory, you may need to
optimize it, increase its memory limit, or run it on nodes with more memory.

Kubernetes Interview Questions - Deployment, StatefulSet,


DaemonSet, Jobs, CronJobs
• Question: You need to deploy a DaemonSet to help with monitoring, but you don't want it to
run on your GPU nodes as those are exclusively for model training jobs. How would you
configure this?
– Answer: You can use taints and tolerations for this. You could add a specific taint to your GPU nodes, like
kubectl taint nodes gpu-node key=value:NoSchedule. Then, you would not include a toleration for that taint
in your DaemonSet specification.

• Question: You need to perform a major upgrade to a stateful application, and you anticipate
that the new version might have compatibility issues with the old data. How would you manage
this upgrade?
– Answer: I would approach this cautiously by first backing up the data. Then, I would start by updating a
single instance (pod) of the application and check for compatibility issues. If there are problems, I would
revert that instance to the old version and work on data migration strategies.

• Question: You have a CronJob that's supposed to run every night, but you've noticed that it
doesn't always run successfully. You want to make sure that if the job fails, it is retried. How
would you accomplish this?
– Answer: You can configure the spec.backoffLimit field in the Job template of the CronJob. This field
represents the number of retries before marking the job as failed. Also, you can use
spec.activeDeadlineSeconds to specify the duration the job can stay active.

Kubernetes Interview Questions - Deployment, StatefulSet,


DaemonSet, Jobs, CronJobs
• Question: You're running a cluster in a cloud environment, and you want to
make sure that a specific Deployment only runs on instances with SSD storage.
How can you ensure this?
– Answer: I would label the nodes that have SSD storage, like kubectl label nodes <node-
name> disktype=ssd. Then, in the Deployment specification, I would use a nodeSelector to
ensure that the pods are only scheduled on nodes with the disktype=ssd label.

• Question: You need to deploy a new version of a StatefulSet. However, the new
version includes a change to the volumeClaimTemplates. Kubernetes doesn't let
you update this field, so how can you deploy this change?
– Answer: To change the volumeClaimTemplates field, you would need to delete and recreate
the StatefulSet. However, you have to be careful not to delete the PersistentVolumeClaims
(PVCs) when deleting the StatefulSet, or you will lose your data. After recreating the
StatefulSet with the new volumeClaimTemplates, the existing pods will continue to use the
old PVCs, and new pods will use the new PVCs.

Kubernetes Interview Questions - Deployment, StatefulSet,


DaemonSet, Jobs, CronJobs
• Question: How would you expose a service running inside your cluster to external traffic?
– Answer: We can use a Service of type LoadBalancer, NodePort, or an Ingress Controller. LoadBalancer type
creates an external load balancer and assigns a fixed, external IP to the service. NodePort exposes the
service on a static port on the node's IP. Ingress, however, can provide load balancing, SSL termination, and
name-based virtual hosting.

• Question: How do ClusterIP and NodePort services differ?


– Answer: ClusterIP exposes the service on a cluster-internal IP, making the service only reachable from within
the cluster. NodePort, on the other hand, exposes the service on each Node’s IP at a static port.

• Question: Your application is trying to communicate with a service in another namespace, but
the requests are not getting through. What could be causing this, and how would you resolve
it?
– Answer: This might be due to a Network Policy that restricts traffic to the service. You can inspect and
update the NetworkPolicy objects in the namespace of the service. Alternatively, the service may not be
configured correctly. You can use kubectl describe to check its endpoint and selectors.

Kubernetes Interview Questions - Services & Networking


• Question: What is the role of a CNI plugin in a Kubernetes cluster, and can you
name a few popular ones?
– Answer: CNI (Container Network Interface) plugins are responsible for setting up network
interfaces and configuring the network stack for containers. Popular CNI plugins include
Flannel, Calico, Cilium, and Weave.

• Question: How do you implement SSL/TLS for services in a Kubernetes cluster?


– Answer: You can use an Ingress controller that supports SSL termination. The SSL certificate
can be stored in a Secret, which the Ingress controller references.

• Question: How would you restrict certain pods from communicating with each
other in a cluster?
– Answer: This can be accomplished by using Network Policies. You can define egress and
ingress rules to control the flow of traffic to and from specific pods.

Kubernetes Interview Questions - Services & Networking


• Question: Suppose your service is under a DDoS attack. How can you protect it?
– Answer: I would use a combination of an Ingress controller and a cloud-based DDoS
protection service. I could also limit the rate of requests using an admission controller.

• Question: You have an application with services that need to discover each
other dynamically. How would you enable this?
– Answer: Services in Kubernetes are discoverable by other services in the same Kubernetes
cluster by default. This is accomplished using DNS. For example, a service named "my-
service" in the "my-namespace" namespace would be accessible at "my-service.my-
namespace".

• Question: You have a single replica of a service that you want to expose to the
internet. How would you do it and why?
– Answer: I would use a LoadBalancer service. This will create a cloud provider's load
balancer that automatically routes traffic from the external IP to the service's ClusterIP.

Kubernetes Interview Questions - Services & Networking


• Question: You are running a multi-tenant cluster where each team has their
own namespace. How would you isolate network traffic at the namespace
level?
– Answer: I would use Network Policies to isolate traffic at the namespace level. I could
define a default deny all ingress/egress traffic NetworkPolicy in each namespace, and then
create additional NetworkPolicies to allow specific traffic.

• Question: How would you load balance traffic between pods of a service in a
Kubernetes cluster?
– Answer: Kubernetes Services automatically load balance traffic between the pods that
match their selector. This works for both TCP and UDP traffic.

• Question: How would you restrict internet access for pods in a Kubernetes
cluster?
– Answer: I would use a NetworkPolicy to deny all egress traffic by default, and then define
additional NetworkPolicies to allow specific outbound traffic as necessary.

Kubernetes Interview Questions - Services & Networking


• Question: How do you manage DNS resolution for services within a Kubernetes
cluster?
– Answer: Kubernetes includes a DNS server (CoreDNS by default) for internal service
discovery. Services are automatically assigned a DNS name that follows the pattern: service-
name.namespace-name.svc.cluster.local.

• Question: What would you do if a NodePort service is not accessible from


outside the cluster?
– Answer: I would first check if the nodes' firewall rules allow traffic on the NodePort. I would
also check if the correct node IP and NodePort are being used, and if the service has
endpoints and the pods are running.

• Question: How would you route HTTP traffic to two versions of an application
based on the path in the URL?
– Answer: I would use an Ingress with path-based routing rules. Each path would be
associated with a different backend service, which routes to pods of each version of the
application.

Kubernetes Interview Questions - Services & Networking


• Question: You're seeing intermittent connectivity issues between your application and
a database service within your cluster. How would you troubleshoot this?
– Answer: I would first describe the service and the pods to check their status and events. I would
also check the service's endpoints. I could then look at the application and kube-proxy logs on
the nodes where the application and database pods are running.

• Question: You want to make sure your web application is not accessible via HTTP. How
would you enforce this policy?
– Answer: I would set up an Ingress that only accepts HTTPS traffic and redirects all HTTP traffic to
HTTPS.

• Question: Your application is deployed across multiple clusters, and you want to make
sure a user always connects to the closest cluster. How would you accomplish this?
– Answer: This can be accomplished using a Global Load Balancer provided by cloud providers or
DNS-based geographic routing.

Kubernetes Interview Questions - Services & Networking


• Question: How can you ensure that network traffic from your application to an
external service is secure?
– Answer: I would use a service mesh like Istio or Linkerd that supports mutual TLS for
service-to-service communication. This would encrypt the traffic between the application
and the external service.

• Question: How would you expose a legacy application running on a VM to


services within your Kubernetes cluster?
– Answer: I would use a Service without selectors and manually create Endpoints that point
to the IP of the VM.

• Question: You've set up an Ingress with a wildcard host, but you're not able to
access your application using arbitrary subdomains. What could be the issue?
– Answer: This could be a DNS configuration issue. I would check if a wildcard DNS record has
been set up that resolves to the Ingress controller's external IP.

Kubernetes Interview Questions - Services & Networking


• Question: How would you ensure that only trusted traffic can reach a service in
your cluster?
– Answer: I would use Network Policies to restrict ingress traffic to the service, allowing only
from certain IP ranges or other services.

• Question: How would you configure your application to use an external


database securely?
– Answer: I would use Kubernetes Secrets to store the database credentials. These secrets
can be mounted into the pods at runtime, keeping the credentials out of the application's
code and configuration.

• Question: How would you enable client source IP preservation in a


LoadBalancer service?
– Answer: This depends on the cloud provider. Some providers support a
service.beta.kubernetes.io/external-traffic: OnlyLocal annotation on the Service, which
preserves the client source IP.

Kubernetes Interview Questions - Services & Networking


• Question: You want to migrate an application from a VM to a pod. The application
needs a specific IP, and you want to use the same IP in the pod. How would you do it?
– Answer: This is generally not possible in Kubernetes as pods have their own IP space. However,
some CNI plugins or cloud providers might support this use case. Alternatively, you can expose
the pod on the VM's IP using a NodePort service and bind the service to the VM's network
interface.
• Question: What are the potential downsides of using NodePort services?
– Answer: NodePort services expose the service on a high port (30000-32767) on all nodes, which
could be a security risk. They also require the client to be able to reach every node in the cluster.
• Question: How do you ensure that all incoming and outgoing traffic to a service in your
cluster goes through a network firewall?
– Answer: This can be accomplished using a service mesh like Istio or Linkerd that supports egress
and ingress gateway configurations.
• Question: You have two services that need to communicate with each other over a
protocol other than TCP or UDP. How do you configure this?
– Answer: By default, Services in Kubernetes support TCP and UDP. For other protocols, you may
need to use a CNI plugin that supports that protocol or use an application-level protocol proxy.

Kubernetes Interview Questions - Services & Networking


• Question: How can you ensure that services running in a development namespace cannot
communicate with services in a production namespace?
– Answer: I would use Network Policies to deny all traffic between the two namespaces by default and then
create additional NetworkPolicies to allow specific traffic as necessary.

• Question: How can you minimize the downtime during a rolling update of a service in
Kubernetes?
– Answer: I would use the readinessProbe and livenessProbe in the Pod specification to control the traffic to
the pods during the update. This way, new pods will not receive traffic until they are ready, and failed pods
will be restarted.

• Question: You are designing a service which needs to be accessible from both within the cluster
and from the internet, but you want to enforce different rules for internal and external traffic.
How would you do it?
– Answer: I would expose the service internally using a ClusterIP and externally using a LoadBalancer or
Ingress. This way, I can use Network Policies to control the intra-cluster traffic and the LoadBalancer or
Ingress controller's features to control the external traffic. Depending on the cloud provider and Ingress
controller, I might also be able to use different services or paths in the Ingress for the same set of pods, each
with different rules.

Kubernetes Interview Questions - Services & Networking


• Question: How would you prevent IP spoofing in your Kubernetes cluster?
– Answer: There are a few strategies that I could implement. At the node level, I could enable reverse path
filtering. Some CNI plugins and network policies can also help prevent IP spoofing. Using a service mesh or
enabling mutual TLS for service-to-service communication can also provide additional security.

• Question: You are running a latency-sensitive application. How would you minimize network
latency between your microservices?
– Answer: One way to do this would be to schedule pods that communicate with each other frequently on the
same node or at least in the same zone, using node/pod affinity and anti-affinity rules. I would also ensure
that the cluster's network is well optimized, and consider using a service mesh with features that help
reduce latency.

• Question: Your company follows strict data residency regulations. You need to ensure that a
service only communicates with a database in the same country. How do you enforce this?
– Answer: I would use Network Policies to restrict the egress traffic from the service to the IP range of the
database service in the same country. If the database is exposed as a service in the cluster, I could use a
policy based on namespaces or labels.

Kubernetes Interview Questions - Services & Networking


• Question: You need to implement an application-level gateway that performs complex routing,
transformation, and protocol translation. How would you do it in Kubernetes?
– Answer: I would consider using a service mesh, which can provide advanced traffic routing and
transformation features. Istio, for example, supports routing rules, retries, failovers, and fault injection. For
protocol translation, I would use an Envoy filter or a similar mechanism.

• Question: You're seeing packet loss between pods in your cluster. How would you investigate
and solve this issue?
– Answer: Packet loss could be caused by many factors. I would first use kubectl describe nodes to check the
status of the nodes. I could then use tools like ping, traceroute, or mtr to test the connectivity between
nodes and pods. I would also check the network policies and the CNI plugin's logs and metrics.

• Question: How would you ensure that a service in your Kubernetes cluster can only be accessed
from a specific country?
– Answer: Enforcing geographic restrictions at the Kubernetes level is not straightforward. I would typically
handle this at the edge of my network, before traffic reaches the cluster. This could be done using a cloud
provider's load balancer, a CDN service with geo-blocking features, or a firewall with geo-IP filtering
capabilities.

Kubernetes Interview Questions - Services & Networking


• Question: You're running a stateful application that requires sticky sessions. How would
you ensure that a client always connects to the same pod?
– Answer: I would use a Service with sessionAffinity set to "ClientIP". This will make the kube-
proxy route the traffic from a particular client IP to the same pod, as long as the pod is running.

• Question: How can you route traffic to pods based on HTTP headers or cookies?
– Answer: This can be done using an Ingress controller that supports this feature, such as the
NGINX Ingress Controller or Traefik, or a service mesh like Istio or Linkerd.

• Question: You have a multi-region cluster and you want to ensure that a service only
communicates with a database in the same region. How do you enforce this?
– Answer: If the database is running in a pod, I would use pod anti-affinity rules to schedule the
service's pods and the database's pods in the same region. I could also use a NetworkPolicy to
restrict traffic based on labels or namespaces. If the database is external, I could use an egress
gateway in a service mesh to control the destination of outbound traffic.

Kubernetes Interview Questions - Services & Networking


• Question: You're managing a stateful application on Kubernetes that requires data persistence.
During a pod rescheduling event, you notice that the new pod can't access the data of the old
pod. What could be going wrong and how would you address it?
– Answer: It sounds like the application might not be using a PersistentVolume (PV) for data storage. A PV
would ensure that data is not lost when a pod is rescheduled. I would modify the application configuration
to use a PersistentVolumeClaim (PVC) to claim a PV for storage. This would allow the data to persist across
pod restarts or rescheduling.

• Question: You're given a scenario where you have an application that needs to store large
amounts of data but the reads and writes are intermittent. What type of storage class would
you choose in a cloud environment like AWS and why?
– Answer: I would likely use a storage class that utilizes Amazon S3 (Simple Storage Service) for this use case,
as it's designed for storing and retrieving any amount of data at any time. If the data needs to be block
storage, then the 'sc1' or 'st1' EBS volume types might be appropriate as they are designed for infrequent
access.

• Question: You have a cluster running with various stateful and stateless applications. How do
you manage and orchestrate data backup and recovery for your stateful applications?
– Answer: I would use Persistent Volumes (PV) with Persistent Volume Claims (PVC) for each stateful
application to ensure data persistence. For data backup, I'd consider a cloud-native solution or third-party
tool like Velero, which can backup and restore Kubernetes resources and persistent volumes.

Kubernetes Interview Questions - Kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: You are running a multi-tenant Kubernetes cluster where each tenant should
only be able to access a certain amount of storage. How would you enforce this?
– Answer: Kubernetes has built-in support for Resource Quotas, which can be used to limit the
total amount of storage a namespace (tenant) can use. I would configure ResourceQuotas in
each tenant's namespace to limit the amount of storage they can request.
• Question: You are running a stateful application that requires a certain IOPS for
performance reasons. However, your cluster is running on a cloud provider where IOPS
is tied to the size of the disk. How do you manage this?
– Answer: I would create a PersistentVolume with a specific size to meet the IOPS requirements of
the stateful application. For example, in AWS, the number of provisioned IOPS is tied to the size
of the disk. Therefore, if a certain IOPS is required, you would have to provision a disk of an
appropriate size to meet that requirement.
• Question: How do you manage sensitive data, such as database passwords, that your
applications need to access?
– Answer: Sensitive data like passwords and API keys should be stored in Kubernetes Secrets.
Secrets are similar to ConfigMaps, but are designed to store sensitive information. This data can
then be mounted as a volume or exposed to a pod as an environment variable in a secure way.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: You have a stateful application that requires a certain layout on the
filesystem. How can you ensure this layout is prepared before the application starts?
– Answer: I would use an Init Container for this. The Init Container can run a script to prepare the
filesystem as required by the application. This might include creating directories, setting
permissions, or even downloading files. Once the Init Container has completed, the application
container starts and can make use of the prepared filesystem.
• Question: You have a stateful application that needs to process a huge data file.
However, you noticed that the processing starts from scratch when a pod gets
restarted. How would you solve this issue?
– Answer: This issue can be solved using a PersistentVolume (PV) with a PersistentVolumeClaim
(PVC). This allows the pod to mount the volume and continue the processing from where it left
off even after a restart.
• Question: How can you share a PersistentVolume across multiple pods in ReadWrite
mode?
– Answer: Most volume types do not support multiple pods mounting a volume in ReadWrite
mode. However, we can use a NFS (Network File System) or a cloud-based shared filesystem (like
AWS's EFS or GCP's Filestore) to achieve this.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: Your application needs to read configuration data at startup. This
data must not be stored in the container image for security reasons. How would
you provide this data to your application?
– Answer: I would use a Kubernetes Secret to store the configuration data. The Secret can be
mounted as a volume and read by the application at startup.
• Question: You need to set up a stateful, distributed database that requires each
node to have a unique, consistent identity. What Kubernetes resource would
you use?
– Answer: I would use a StatefulSet for this. A StatefulSet provides each pod with a unique,
consistent identifier that is based on its index, which makes it suitable for stateful,
distributed systems.
• Question: Your stateful application needs to access an existing NFS share. How
would you set up the Kubernetes resources to allow this?
– Answer: I would create a PersistentVolume with NFS as the volume type, and specify the
NFS server and path. Then, I would create a PersistentVolumeClaim for the application to
use, which would allow the pod to mount the NFS share.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: You need to dynamically provision storage for your pods. However,
your cluster is running in an on-premises data center, not in the cloud. How
would you achieve this?
– Answer: Dynamic provisioning requires a StorageClass. I would create a StorageClass that
uses a volume plugin that supports dynamic provisioning in an on-premises environment,
such as NFS, iSCSI, or Fibre Channel.
• Question: You are migrating an application to Kubernetes. The application
currently writes logs to a file, and you need to retain these logs for compliance
reasons. How would you handle this in Kubernetes?
– Answer: I would use a sidecar container that runs a logging agent in each pod. The
application would write logs to a shared volume, and the sidecar container would read
these logs and forward them to a log aggregation service.
• Question: You have a stateful application running in a StatefulSet. However, the
application does not handle SIGTERM gracefully and needs a specific command
to initiate shutdown. How would you handle this?
– Answer: I would use a preStop lifecycle hook to run the shutdown command when the pod
is going to be terminated. This gives the application the chance to shut down gracefully
before Kubernetes sends the SIGKILL signal.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: Your stateful application requires manual intervention when
scaling down. How can you control the scale-down process?
– Answer: I would use a StatefulSet with the OnDelete update strategy. This strategy
does not automatically delete pods when the StatefulSet is scaled down, allowing
for manual intervention.
• Question: How would you make a sensitive piece of information (like a
password or a token) available to your application?
– Answer: I would store the sensitive information in a Secret, and then mount that
Secret as a volume in the pod. The application could then read the sensitive data
from the volume.
• Question: Your application writes temporary data to an ephemeral
volume. However, this data is lost when a pod restarts. How can you
ensure the data survives a pod restart?
– Answer: I would use a PersistentVolumeClaim to request a PersistentVolume for
storing the temporary data. This would ensure the data survives a pod restart.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: You need to migrate data from an old PersistentVolume to
a new one. However, the data must be available to the application
at all times. How would you handle this?
– Answer: I would use a tool that can copy data between volumes while the
data is in use, such as rsync. First, I would start the rsync process to copy the
data to the new volume. Then, I would set up a periodic job to rsync the
changes until the new volume is up to date. At this point, I would schedule a
brief maintenance window to switch the application to the new volume.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: How would you prevent a pod from being evicted due to low
disk space on the node?
– Answer: I would monitor the node's disk usage and ensure there is enough capacity
for all pods. If a pod uses more storage than expected, I could set a resource limit
on the pod's storage usage to prevent it from using all the available disk space.
• Question: You need to expose a ConfigMap as a volume to a pod, but you
only want to expose a subset of the ConfigMap's data. How would you do
this?
– Answer: When defining the volume in the pod spec, I can use the items field to
specify which keys in the ConfigMap to expose.
• Question: How would you provide an initialization script to a database
container at startup?
– Answer: I would create a ConfigMap with the initialization script and mount it as a
volume in the container. The database software should be configured to execute
any scripts it finds in the initialization directory.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: How would you clean up a PersistentVolumeClaim and its
associated data when a pod is deleted?
– Answer: By default, a PersistentVolumeClaim is not deleted when a pod is
deleted. If I wanted to change this behavior, I could set the
persistentVolumeReclaimPolicy of the associated PersistentVolume to
Delete.

Kubernetes Interview Questions - kubernetes Volumes & Data:


Persistent volumes, persistent volume claims, storage classes,
stateful applications handling
• Question: You have a microservices architecture with multiple pods that
require the same configuration. How would you ensure consistent
configuration across all pods?
– Answer: I would use a ConfigMap to store the common configuration and mount it
as a volume or set environment variables in the pods. This way, all pods can access
the same configuration from the ConfigMap.

• Question: You have a configuration file that needs to be updated for a


running application without restarting the pod. How can you achieve this
in Kubernetes?
– Answer: I would create a ConfigMap with the updated configuration and then
perform a rolling update of the pods, specifying the new ConfigMap. Kubernetes
will update the pods one by one, ensuring a smooth transition without downtime.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: How can you ensure that a Secret is encrypted at rest and in transit?
– Answer: By default, Kubernetes encrypts Secrets at rest in etcd, the default datastore for
Kubernetes. To ensure encryption in transit, you can configure your Kubernetes cluster to
use secure communication channels, such as TLS, between its components.
• Question: You want to use an external database for your application running in
Kubernetes, but you don't want to expose the database credentials in your pod
specifications or configuration files. How can you manage this?
– Answer: I would store the database credentials in a Secret and then mount the Secret as a
volume or set environment variables in the pods. This way, the database credentials are
securely managed and not exposed directly in the configuration files.
• Question: Your application needs access to an API key or token for integration
with external services. How would you securely provide this information to the
application running in a pod?
– Answer: I would store the API key or token in a Secret and then mount the Secret as a
volume or set environment variables in the pods. This ensures that the sensitive
information is securely managed and easily accessible to the application.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: You have a third-party library that requires a configuration file to be present in a
specific location inside the pod. How would you provide this configuration file securely?
– Answer: I would create a ConfigMap with the required configuration file and mount it as a volume in the
pod, ensuring that the file is available at the expected location. This way, the configuration file can be
securely managed and accessed by the application.

• Question: How can you update the data in a ConfigMap or Secret without restarting the pods?
– Answer: Updating the data in a ConfigMap or Secret doesn't automatically trigger a rolling update of the
pods. However, you can use the kubectl rollout restart command to manually trigger a rolling restart of the
pods, which will ensure that the updated data is used.

• Question: You have a multi-tenant environment in Kubernetes, where each tenant has different
configuration requirements. How can you manage this effectively?
– Answer: I would use namespaces to separate the tenants and create ConfigMaps or Secrets specific to each
tenant. By applying proper RBAC (Role-Based Access Control), each tenant can access only their respective
ConfigMaps or Secrets, ensuring proper isolation and management of their specific configurations.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: You want to store sensitive data in a Secret, but you also
need to share it with another namespace. How can you achieve this
securely?
– Answer: I would create the Secret in the source namespace and then use the
kubectl create secret command with the --from flag to create a copy of the
Secret in the target namespace. This ensures that the sensitive data is
securely shared between namespaces without directly exposing it.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: You have a scenario where the Secret data needs to be
updated frequently. How would you handle this situation without
causing downtime for the pods?
– Answer: I would use the kubectl create secret command with the --dry-
run=client -o yaml option to create a new Secret manifest file with the
updated data. Then, I would use the kubectl apply command to update the
Secret, triggering a rolling update of the pods without downtime.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: You have a scenario where Secrets need to be rotated
periodically for security compliance. How would you handle this in
Kubernetes?
– Answer: I would implement a process or automation that periodically generates
new Secrets with updated data. The new Secrets can be created alongside the
existing ones, and then a rolling update of the pods can be triggered to use the new
Secrets without any downtime.
• Question: Your application needs to access Secrets stored in an external
key management system (KMS). How can you integrate this securely with
Kubernetes?
– Answer: I would create a custom controller or operator that interfaces with the
external KMS and retrieves the Secrets as needed. The controller can then populate
the Secrets dynamically in Kubernetes, ensuring secure access to the external KMS
without exposing sensitive data in Kubernetes itself.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: You want to enforce fine-grained access control to Secrets based on roles
and permissions. How can you achieve this in Kubernetes?
– Answer: I would use Kubernetes RBAC (Role-Based Access Control) to define roles and
permissions for accessing Secrets. By creating appropriate Role and RoleBinding or ClusterRole
and ClusterRoleBinding configurations, access to Secrets can be restricted based on the specific
roles assigned to users or service accounts.
• Question: You have multiple applications that share a common Secret. However, you
want to restrict access to specific applications only. How would you handle this
situation?
– Answer: I would create separate namespaces for each application and associate the appropriate
ServiceAccounts with each application. Then, I would configure RBAC policies to grant access to
the specific Secrets only for the corresponding ServiceAccounts and applications.
• Question: Your organization has compliance requirements that mandate the auditing of
Secret access and modifications. How would you implement auditing for Secrets in
Kubernetes?
– Answer: I would enable auditing in the Kubernetes cluster and configure the audit policy to
include Secrets-related operations. This way, all access and modification of Secrets will be logged
and auditable for compliance purposes.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: You need to ensure that Secrets are securely replicated
and available across multiple Kubernetes clusters in different
regions or availability zones. How would you implement this?
– Answer: I would consider using Kubernetes federation or a multi-cluster
management solution to manage the replication and availability of Secrets
across multiple clusters. These solutions provide mechanisms to synchronize
Secrets across clusters securely.
• Question: Your application needs to access multiple Secrets, but you
want to avoid hard-coding Secret names or keys in your code. How
can you dynamically discover and use Secrets in Kubernetes?
– Answer: I would use the Kubernetes API to dynamically discover and retrieve
Secrets based on certain criteria, such as labels or annotations. This allows
for more flexible and dynamic handling of Secrets within the application
code.
Kubernetes Interview Questions - kubernetes Configuration &
Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Q: What is the primary purpose of Kubernetes RBAC?
– A: Kubernetes Role-Based Access Control (RBAC) is used to control who can access the
Kubernetes API and what permissions they have. It is used to restrict system access to authorized
users and helps in maintaining the security of your Kubernetes environment.

• Q: What is a Role in Kubernetes RBAC and how does it differ from a ClusterRole?
– A: In Kubernetes RBAC, a Role is used to grant access rights to resources within a specific
namespace, whereas a ClusterRole is a non-namespaced resource that grants access at the
cluster level across all namespaces.

• Q: How do you bind a user to a Role or ClusterRole in Kubernetes?


– A: To bind a user to a Role or ClusterRole in Kubernetes, you need to create a RoleBinding or
ClusterRoleBinding, respectively. These binding resources associate the Role or ClusterRole with
one or more users, groups, or service accounts.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: What is a NetworkPolicy in Kubernetes?
– A: NetworkPolicy is a specification of how groups of pods are allowed to communicate with each
other and other network endpoints. It defines the rules for ingress (incoming) and egress
(outgoing) traffic for a set of pods.

• Q: What is a SecurityContext at the Pod level in Kubernetes?


– A: A SecurityContext defines privilege and access control settings for a Pod or Container. When
defined at the Pod level, it applies to all containers in the Pod.

• Q: How do you define a security context for a specific container in a Pod in


Kubernetes?
– A: To define a security context for a specific container in a Pod, you include the securityContext
field in the container's definition within the Pod's configuration file.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: How do you enforce network policies in Kubernetes?
– A: Network policies are enforced in Kubernetes using a network plugin that
understands the NetworkPolicy resource, such as Calico or Weave. If no
network plugin is enabled, NetworkPolicy resources have no effect.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: In Kubernetes RBAC, what's the difference between a RoleBinding and
ClusterRoleBinding?
– A: A RoleBinding grants the permissions defined in a role to a user within a certain namespace,
whereas a ClusterRoleBinding grants the permissions defined in a ClusterRole across the entire
cluster, irrespective of the namespace.

• Q: What are some of the security risks that can be mitigated using Kubernetes RBAC?
– A: Some security risks mitigated by RBAC include unauthorized access to the Kubernetes API,
unauthorized actions on resources (like pods, services), and restriction of system access to
authorized users only.

• Q: How would you restrict a user's access to only view Pods within a specific
namespace using Kubernetes RBAC?
– A: Create a Role with get, list, and watch permissions on pods, and then bind that role to the
user using a RoleBinding within the specific namespace.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: What steps would you take to secure sensitive data, like passwords or keys, in
Kubernetes?
– A: Use Kubernetes Secrets or integrate with a secure vault system to store sensitive data. Secrets
can be volume mounted into pods for applications to consume.

• Q: If a Pod needs to run with root privileges, how would you define this using Security
Contexts?
– A: You can define this in the securityContext at the container level in the Pod specification by
setting the runAsUser field to 0.

• Q: What purpose does setting the readOnlyRootFilesystem field in a SecurityContext


serve?
– A: Setting readOnlyRootFilesystem to true in a SecurityContext is a good practice to prevent
modifications to the container's filesystem, thus limiting the impact of potential attacks like
installing malicious software.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: If a network policy is not defined in a namespace, what is the default network traffic
behavior for Pods?
– A: If a network policy is not defined in a namespace, the default behavior is to allow all ingress and egress
traffic to and from Pods in that namespace.

• Q: How would you prevent Pods from different namespaces from communicating with each
other?
– A: This can be achieved by creating NetworkPolicies that deny all non-namespace traffic by default and only
allow traffic from the same namespace.

• Q: How would you ensure that a set of Pods can only communicate with a specific service?
– A: This can be achieved by creating a NetworkPolicy that allows traffic only to the specific service's selectors
from the set of Pods.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: What is the purpose of Kubernetes Secrets, and how are they
different from ConfigMaps?
– A: Kubernetes Secrets are intended to hold sensitive information, such as
passwords, OAuth tokens, and ssh keys, while ConfigMaps hold non-
confidential data like configuration files and environment-specific settings.
Secrets provide more security for sensitive information, as they can be
encrypted at rest and in transit.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: How can you limit the system resources (CPU, memory) that a container can use in
Kubernetes?
– A: Kubernetes allows you to specify resource limits and requests for containers using the resources field in
the container specification. This helps to avoid resource starvation and ensures fair resource allocation
among all Pods in the cluster.

• Q: In Kubernetes, how would you enforce that containers don't run using the root user?
– A: You can define this in the securityContext at the Pod or container level by setting the runAsNonRoot field
to true.

• Q: In the context of Kubernetes RBAC, what is impersonation and when might you use it?
– A: Impersonation, or user impersonation, allows users to act as other users. This is helpful for admins who
need to debug authorization policies. In Kubernetes, impersonation can be achieved using the --as flag in
kubectl commands.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: If a specific service account needs permissions to create pods in any namespace, how would
you implement it using Kubernetes RBAC?
– A: You would create a ClusterRole with permissions to create pods, then bind that ClusterRole to the service
account using a ClusterRoleBinding.

• Q: How do Kubernetes NetworkPolicies interact with other firewall policies implemented in the
cluster?
– A: Kubernetes NetworkPolicies define how pods communicate with each other and other network
endpoints within the Kubernetes cluster. If other firewall policies are implemented, they should be
coordinated with the NetworkPolicies to ensure they do not contradict and override each other.

• Q: What is a privileged container in Kubernetes, and what security risks does it pose?
– A: A privileged container in Kubernetes is one that is given essentially all the same privileges as a process
running directly on the host machine. This poses significant security risks, as such a container can potentially
gain full control of the host machine, escape the container, or disrupt other containers.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: How would you apply the principle of least privilege when configuring RBAC in a Kubernetes
cluster?
– A: When configuring RBAC, the principle of least privilege can be applied by only granting the permissions
necessary for a user, group, or service account to perform their intended tasks. This can be done by creating
fine-grained roles and assigning them using role bindings as needed.

• Q: How can you prevent a Kubernetes service account from accessing the Kubernetes API?
– A: By default, service accounts have no permissions unless explicitly assigned with RBAC. If the service
account has been granted permissions and you want to prevent it from accessing the API, you would need
to modify or delete the corresponding RoleBinding or ClusterRoleBinding.

• Q: How can you configure a Pod to use a specific service account?


– A: In the Pod specification, set the serviceAccountName field to the name of the service account you want
the Pod to use.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: In Kubernetes RBAC, can a user have multiple roles?
– A: Yes, a user can have multiple roles. This is achieved by creating multiple RoleBindings or
ClusterRoleBindings for the user, each associated with a different role.

• Q: What are Pod Disruption Budgets (PDBs) in Kubernetes and how do they relate to
Kubernetes security?
– A: Pod Disruption Budgets (PDBs) are a Kubernetes feature that allows you to specify the number or
percentage of concurrent disruptions a Pod can tolerate. While not directly a security feature, they can help
maintain the availability of your applications during voluntary disruptions, which contributes to the overall
robustness of your system.

• Q: What are taints and tolerations in Kubernetes, and how can they be used to improve cluster
security?
– A: Taints and tolerations are a Kubernetes feature that allows you to constrain which nodes a Pod can be
scheduled on. By using taints and tolerations, you can ensure that certain Pods only run on trusted nodes,
improving your cluster's security.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: What is the Kubernetes Audit feature and how does it contribute
to the security of a cluster?
– A: The Kubernetes Audit feature records all requests made to the
Kubernetes API server. The audit logs can be used for troubleshooting,
monitoring suspicious activity, and investigating potential security breaches.

• Q: How can you rotate the certificates used by Kubernetes


components for secure communication?
– A: Kubernetes provides a Certificate Rotation feature that allows for the
automatic rotation of all component certificates when they are close to
expiry.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: In the context of Kubernetes RBAC, what are aggregated roles?
– A: Aggregated roles allow a ClusterRole to be assembled from multiple ClusterRoles. When a ClusterRole has
the aggregationRule field set, the RBAC controller creates or updates the role with any permissions from
other ClusterRoles that match the provided label selector.

• Q: How can you use RBAC to control access to the Kubernetes Dashboard?
– A: You can create a Role or ClusterRole with the necessary permissions, and then bind that role to the
Dashboard's service account using a RoleBinding or ClusterRoleBinding.

• Q: What are admission controllers in Kubernetes, and how do they contribute to the security of
a cluster?
– A: Admission controllers are part of the kube-apiserver that intercept requests to the Kubernetes API server
prior to persistence of the object, but after the request is authenticated and authorized. They can be used
to enforce security policies, limit resource usage, and implement custom logic.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: What would you do if you need to create an RBAC Role that doesn't
map directly to the API resources in Kubernetes?
– A: For such a case, you would need to use Non-Resource URLs to specify the non-
resource request paths as a part of your RBAC Role.

• Q: How would you allow a user to drain a node in Kubernetes using


RBAC?
– A: Draining a node requires a variety of permissions. The user must have 'list', 'get',
'create', 'delete' permissions for pods and 'update' permission for nodes. You can
create a custom ClusterRole with these permissions and bind it to the user with a
ClusterRoleBinding.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: How can you use Security Context to prevent a container from
making changes to its filesystem?
– A: By setting readOnlyRootFilesystem: true in the container's
SecurityContext, the container will have its filesystem mounted as read-only
and cannot write to its filesystem.

• Q: How would you enforce that network egress from a namespace


only goes to specific IP addresses?
– A: You can create a NetworkPolicy that specifies Egress rules with the
specific IP addresses or IP ranges in to field of ipBlock.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: How can you rotate a service account token in Kubernetes?
– A: To rotate a service account token in Kubernetes, delete the Secret containing the token.
Kubernetes will automatically create a new token.

• Q: How can you prevent certain Pods from being scheduled on a specific node?
– A: You can use taints and tolerations, or you can use Node Affinity/Anti-Affinity rules.

• Q: How can you ensure that all images deployed in your cluster are from a
trusted registry?
– A: You can implement an ImagePolicyWebhook admission controller that enforces that all
images are pulled from a trusted registry.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: How can you prevent containers in your cluster from running as root, while allowing specific
containers to do so if necessary?
– A: Set the runAsNonRoot: true option in the PodSecurityContext, and override this setting in the
SecurityContext for specific containers where necessary.

• Q: If a container needs to use a hostPath volume, how can you ensure that it can't read or write
any other files on the node's filesystem?
– A: You can set the readOnly: true option in the volumeMounts section of the container specification.
However, the use of hostPath volumes is generally discouraged due to the potential security risks.

• Q: How can RBAC rules be tested and validated to ensure they're functioning as expected?
– A: You can use the kubectl auth can-i command to test whether a user has a specific permission.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• Q: How can you restrict a Pod's network access to only its own namespace?
– A: You can define a NetworkPolicy that restricts ingress and egress to only the same namespace.

• Q: How can you use RBAC to allow a user to perform operations (like get, list, watch)
on any "pods/log" in a specific namespace?
– A: You can define a Role that allows get, list, and watch on pods/log in the specific namespace,
and then bind the user to this Role using a RoleBinding.

• Q: What would happen if a Pod has both a PodSecurityContext and a SecurityContext


set? Which one takes precedence?
– A: If both are set, the settings in the SecurityContext of the container take precedence over
those set in the PodSecurityContext.

Kubernetes Interview Questions - kubernetes RBAC & Security:


Understanding of Role-Based Access Control, Security Contexts,
Network Policies, and overall Kubernetes cluster security.
• What is the difference between Requests and Limits in Kubernetes?
– Requests are what the container is guaranteed to get. If a container requests a resource,
Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other
hand, is the maximum amount that a container can use. If a container goes over the limit, it will
be terminated.

• Can a pod function without specifying resource requests and limits?


– Yes, a pod can function without specifying resource requests and limits. However, it's not
recommended for production environments since this could lead to resource starvation or
overutilization of resources.

• Explain how Kubernetes handles resource allocation if you don't specify Requests and
Limits.
– If you don't specify requests and limits, Kubernetes defaults to giving pods as much resource as
they need, assuming limitless resources. This could potentially lead to resource overutilization.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• What is a Resource Quota in Kubernetes?
– Resource Quotas are a feature in Kubernetes that allows administrators to limit the amount
of resources a namespace can use.

• Can you have more than one Resource Quota in a namespace?


– Yes, you can have multiple Resource Quotas in a namespace. However, they cannot conflict
with each other. The sum of all Resource Quotas should be the actual quota.

• Explain how a Limit Range works in Kubernetes.


– A Limit Range sets minimum and maximum compute resource usage per Pod or Container
in a namespace. If a resource (like a pod or a container) is created or updated without any
resource specification, the Limit Range policy can automatically set default resource
requests/limits.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• What happens if a Pod exceeds its specified Limits?
– If a Pod tries to use more resources than its limit, it will be terminated and will be subject to
restarting depending on its restartPolicy.

• How does Kubernetes ensure resource isolation between different pods?


– Kubernetes uses cgroups (a Linux kernel feature) to isolate resources among different pods.

• What would happen if you set a Resource Quota that's less than the sum of
Requests of all pods in the namespace?
– You wouldn't be able to set a Resource Quota that's less than the sum of Requests of all
pods in the namespace. Kubernetes would throw an error when trying to create such a
Resource Quota.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• How does Kubernetes handle memory management for Pods and containers?
– Kubernetes allows administrators to set both Requests and Limits for memory. If a container
tries to use more than its memory limit, it will be terminated. If it uses more than its
request, it might be evicted depending on overall cluster memory usage.

• What are the default Requests and Limits for a Pod if not explicitly specified?
– If Requests and Limits are not specified, Kubernetes does not limit the resources a Pod can
use. The Pod's QoS class is "BestEffort" in this case.

• Can a Pod have different resource requests/limits for each of its containers?
– Yes, each container in a Pod can specify its own resource requests and limits.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• How can you view the resource usage of a Pod?
– You can view the resource usage of a Pod using the kubectl top pod <pod-name>
command.

• How does setting Requests and Limits impact Pod scheduling?


– When scheduling a Pod, Kubernetes ensures that the total resource requests of all
containers in the Pod can be met by a single Node.

• How does Kubernetes handle Pods that consume too much CPU?
– If a Pod uses more CPU than its limit, it will not be terminated but will have its CPU
usage throttled.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• What happens if a Pod tries to exceed its resource quota?
– If a Pod tries to exceed its resource quota, the API server will not allow it to be created.

• What types of resources can be limited using a Limit Range?


– A Limit Range can be used to limit CPU, memory, and storage requests and limits per Pod or
Container.

• What happens if you create a Pod that exceeds the Limit Range for its
namespace?
– If a Pod exceeds the Limit Range for its namespace, the API server will not allow it to be
created.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• How does a Resource Quota work with a Limit Range in the same namespace?
– A Resource Quota sets aggregate limits for the namespace, whereas a Limit Range controls
the minimum and maximum resource usage per Pod or Container.

• What is the difference between a Resource Quota and a Limit Range?


– A Resource Quota is used to limit the total amount of resources that can be used in a
namespace, while a Limit Range sets minimum and maximum compute resource usage per
Pod or Container.

• Can a namespace have multiple Limit Ranges?


– Yes, a namespace can have multiple Limit Ranges, but they cannot conflict with each other.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• How does Kubernetes prioritize Pods when resources are scarce?
– Kubernetes uses Quality of Service (QoS) classes to prioritize pods. Pods with
"Guaranteed" QoS class have the highest priority, followed by "Burstable"
and "BestEffort".

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• What resources does a Resource Quota limit?
– A Resource Quota can limit compute resources like CPU and memory, storage
resources, and object count like Pods, Services, PersistentVolumeClaims, etc.

• How can you determine the resource consumption of a namespace?


– You can use the kubectl describe namespace <namespace> command to see the
Resource Quota and usage of a namespace.

• Can you change the Requests and Limits of a running Pod?


– No, you cannot change the Requests and Limits of a running Pod. You need to
create a new Pod with the updated values.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• What units are used for CPU Requests and Limits?
– CPU resources are measured in milliCPU units. 1 CPU is equivalent to 1000m.

• Can you specify different requests and limits for different containers
within the same Pod?
– Yes, each container in a Pod can have its own requests and limits.

Kubernetes Interview Questions - kubernetes Resource


Management: Understanding of requests and limits, Resource
Quota, Limit Ranges
• How does the kubelet handle OOM (Out of Memory) situations?
– The kubelet uses a fail-safe mechanism known as the OOM killer to terminate Pods that consume too much
memory and are causing an Out of Memory situation.

• How does the kube-reserved and system-reserved flags affect resource allocation?
– The kube-reserved and system-reserved flags allow you to reserve a portion of the node's resources for the
Kubernetes system processes and the rest of the system processes, respectively. This ensures that these
processes always have sufficient resources to run.

• How do resource requests and limits affect QoS (Quality of Service) classes?
– Pods that have both CPU and memory requests and limits set to the same values are assigned a QoS class of
"Guaranteed". Pods with any of those not set or set to different values are assigned a QoS class of
"Burstable". Pods that don't have requests and limits set are assigned a QoS class of "BestEffort".

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• What happens when a Node runs out of allocatable resources?
– If a Node runs out of allocatable resources, new Pods cannot be scheduled on it. If
a Pod is already running on the node and tries to use more resources than
available, it may be evicted or its execution may be throttled.
• How can you restrict certain types of resources using a Resource Quota?
– A Resource Quota can be configured to restrict the quantity of various types of
resources in a namespace, such as the number of Pods, Services,
PersistentVolumeClaims, etc. You can also restrict the amount of CPU, memory, and
storage resources used in the namespace.
• Can you apply a Resource Quota to multiple namespaces?
– Resource Quotas are applied per namespace. If you want to enforce similar quotas
across multiple namespaces, you'd have to define a Resource Quota for each
namespace.

Kubernetes Interview Questions - kubernetes Configuration &


Secrets Management: ConfigMaps, Secrets, and managing
sensitive data.
• Question: You have an application running on a Kubernetes cluster, but
the app is not responding as expected. How can you view the logs for a
specific pod to troubleshoot the issue?
– Answer: You can use the kubectl logs command to view the logs of a pod. For
example, if your pod's name is my-app-pod, the command would be kubectl logs
my-app-pod.
• Question: One of your worker nodes has been marked as 'NotReady'.
How can you identify what's wrong?
– Answer: You can use kubectl describe node <node-name> to view detailed
information about the node and identify any issues.
• Question: How would you drain a node for maintenance?
– Answer: You can use the command kubectl drain <node-name>. This evicts or
deletes all pods on the node and marks the node as unschedulable.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: What is the process for upgrading a Kubernetes cluster using
kubeadm?
– Answer: The general steps involve first upgrading kubeadm on your control plane,
then upgrading the control plane components, and finally upgrading the worker
nodes.
• Question: How would you access the kube-apiserver logs for debugging?
– Answer: The method depends on how your Kubernetes cluster is set up. If you're
using a system with systemd, you can use journalctl -u kube-apiserver. If your kube-
apiserver runs as a container, you can use docker logs or kubectl logs depending on
your setup.
• Question: A pod in your Kubernetes cluster is not reachable from the
outside world. How can you troubleshoot the issue?
– Answer: You could check the service that exposes the pod to ensure it's correctly
configured and its endpoints are correctly associated. You could also check network
policies and routing within your cluster.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: How would you view the events related to a specific pod in
Kubernetes?
– Answer: You can use the command kubectl describe pod <pod-name> to see the
events related to a specific pod.
• Question: What are the steps to debug a pod that is continually
restarting?
– Answer: First, view the logs of the pod with kubectl logs <pod-name>. Then,
describe the pod using kubectl describe pod <pod-name> to view events and
additional details.
• Question: How can you view resource utilization in your Kubernetes
cluster?
– Answer: Use the kubectl top command to view resource utilization. For example,
kubectl top nodes to view node resource usage or kubectl top pods to view pod
resource usage.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: How would you debug a pod that is failing to schedule?
– Answer: Use the kubectl describe pod <pod-name> command to view the events
and error messages associated with the pod scheduling attempt.

• Question: How can you check if a specific service is correctly routing


traffic to its pods?
– Answer: You can use the kubectl describe svc <service-name> command to view the
Endpoints section which lists the pods the service is routing traffic to.

• Question: How can you debug a pod that is in a 'Pending' state?


– Answer: Use kubectl describe pod <pod-name> to check the events and error
messages. The issue could be due to insufficient resources on the nodes, node
taints, or persistent volume claims not being fulfilled.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: What could cause a node to be marked as 'NotReady' in
Kubernetes?
– Answer: A node could be marked 'NotReady' due to several reasons, such as a
kubelet problem, network connectivity issue, or if the node is running out of
resources.
• Question: How would you enable verbose logging for the kubelet for
debugging?
– Answer: You can adjust the verbosity of kubelet logging by setting the -v or --v
command-line flag. For instance, kubelet -v=2 would set verbosity to level 2.
• Question: How can you determine if a specific Kubernetes service is
exposing the correct ports?
– Answer: You can use the kubectl get svc <service-name> command to view the
ports exposed by the service.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: You suspect a node in your Kubernetes cluster might be experiencing
high disk I/O, which is impacting application performance. How can you confirm
this?
– Answer: You can use the iostat tool on the node itself to monitor disk I/O.
• Question: How can you check the version of your Kubernetes cluster and its
components?
– Answer: Use kubectl version to view the version of the client and the server. For
component-specific versions, you can access the /version endpoint on the component's
HTTP(S) server, e.g., [master-node-ip]:6443/version.
• Question: What is the role of the kube-apiserver in a Kubernetes cluster, and
how would you diagnose issues with it?
– Answer: The kube-apiserver is the front-end for the Kubernetes control plane and exposes
the Kubernetes API. If you suspect issues with the kube-apiserver, you can check its logs or
use the kubectl get componentstatuses command.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: What can cause a service in Kubernetes to be inaccessible from outside the cluster?
– Answer: This could be due to various reasons including but not limited to misconfiguration of the service
type (e.g., it should be a LoadBalancer or NodePort for external access), issues with the Ingress controller, or
network policies blocking access.

• Question: You are not able to deploy a new application due to insufficient CPU resources. How
would you solve this?
– Answer: You can solve this by scaling your cluster by adding more nodes or upgrading your nodes to ones
with more resources. Alternatively, you could also optimize resource requests/limits for your existing
workloads.

• Question: A pod stays in a 'ContainerCreating' status for a long time. How would you debug
this?
– Answer: This often indicates an issue with pulling the container image. You can use kubectl describe pod
<pod-name> to check the events and get more information.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: Your Kubernetes cluster is running low on memory. How can you
identify which pods are consuming the most memory?
– Answer: Use the kubectl top pods command, which will show the CPU and memory usage
of each pod. You can sort and filter this list to identify the biggest consumers.
• Question: You have performed a cluster upgrade and some applications are
now behaving unexpectedly. How can you roll back the cluster upgrade?
– Answer: If you used kubeadm to upgrade, you can also use it to downgrade your cluster. You
would need to downgrade each control plane node and then each worker node individually.
Always ensure you have a good backup strategy in place in case of such scenarios.
• Question: How can you ensure that a specific pod is always scheduled on a
specific node?
– Answer: You can use nodeSelector, node affinity, or taints and tolerations to ensure a pod is
scheduled on a specific node.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: How can you diagnose issues with persistent volume claims in
Kubernetes?
– Answer: You can use kubectl describe pvc <pvc-name> to get more information
about the PVC, such as its status and events.
• Question: If a node becomes unresponsive, how would you remove it
from the cluster?
– Answer: First, you would drain the node with kubectl drain <node-name>. After
that, you can remove it with kubectl delete node <node-name>.
• Question: What is the best way to monitor the health of a Kubernetes
cluster?
– Answer: You can use monitoring tools like Prometheus and Grafana to collect and
visualize metrics from your cluster. You can also use logging solutions like Fluentd
and ELK (Elasticsearch, Logstash, Kibana) to centralize your logs.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: How can you debug a pod that is failing readiness checks?
– Answer: You can use kubectl describe pod <pod-name> to view the pod's events and identify
why the readiness probe is failing. The issue could be in the readiness probe configuration or in
the application itself.

• Question: How can you check the kubelet logs on a specific node?
– Answer: This depends on your setup. If kubelet runs as a systemd service, you can use journalctl
-u kubelet. If it's running in a container, you can use the container runtime's logs command.

• Question: What could cause the kubectl get nodes command to fail?
– Answer: This could be due to issues with the kube-apiserver, network issues, or a
misconfiguration of your kubeconfig file.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: How would you diagnose DNS issues in a Kubernetes
cluster?
– Answer: You can debug DNS issues by execing into a pod and using DNS
utilities like nslookup or dig. If a service's FQDN is not resolving, you could
also check the kube-dns or coredns pod logs and configuration.
• Question: How can you monitor the requests and responses to the
kube-apiserver?
– Answer: You can use the audit logging feature in Kubernetes, which logs all
requests made to the kube-apiserver, along with source IP, user, timestamp,
and response.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: Your cluster has become sluggish and unresponsive. How
can you check if the etcd cluster is healthy?
– Answer: You can use the etcdctl cluster-health command on the etcd server.
High latency or failed nodes can impact etcd performance, and as a result,
the overall performance of the Kubernetes cluster.
• Question: You need to conduct an audit of the security of your
Kubernetes cluster. What methods and tools can you use to analyze
the cluster's security posture?
– Answer: Kubernetes provides an audit logging feature that can help with
this. For a more in-depth analysis, tools like kube-bench or kube-hunter from
Aqua Security can be used to conduct security assessments based on the CIS
Kubernetes Benchmark and to simulate potential attacks respectively.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: If a node fails in your cluster and workloads are moved to another
node, but those workloads perform poorly on the new node, what could be
some potential reasons?
– Answer: This could be due to resource contention if the new node is overcommitted,
network issues if the new node is in a different zone, or storage performance differences if
persistent volumes are node-specific.
• Question: How would you diagnose performance issues in a Kubernetes cluster,
such as high latency or slow response times?
– Answer: You can use monitoring tools like Prometheus to track performance metrics of your
workloads and nodes over time. Additionally, use kubectl top to see resource usage. For
network-related issues, tools like traceroute and ping can be helpful.
• Question: Your cluster has lost quorum and etcd is not working. How would you
recover it?
– Answer: You would need to restore etcd from a backup on a sufficient number of nodes to
regain quorum. This process will depend on your specific etcd and Kubernetes setup.

Kubernetes Interview Questions - kubernetes Maintenance &


Troubleshooting: Node maintenance, Cluster upgrades, debugging
techniques, and tools, kube-apiserver, and kubelet logs
• Question: Explain the difference between Horizontal Pod Autoscaling and
Vertical Pod Autoscaling.
– Answer: Horizontal Pod Autoscaler (HPA) scales the number of pod replicas. This is achieved
by increasing or decreasing the number of pod replicas in a replication controller,
deployment, replica set, or stateful set based on observed CPU utilization. On the other
hand, Vertical Pod Autoscaler (VPA) adjusts the amount of CPU and memory allocated to a
pod. This is achieved by changing the CPU and memory requests of the containers in a pod.
• Question: When would you use Horizontal Pod Autoscaler instead of Vertical
Pod Autoscaler?
– Answer: HPA is used when you need to handle more traffic by adding more pods (scale out),
i.e., when your application is stateless and supports multiple concurrent instances. VPA is
used when you need more resources for existing pods (scale up), i.e., when your application
is stateful or doesn't support running multiple instances.
• Question: How is the Cluster Autoscaler different from the HPA and VPA?
– Answer: The Cluster Autoscaler scales the number of nodes in a cluster, not the pods. It will
add a node when there are pods that failed to schedule on any existing node due to
insufficient resources, and remove a node if it has been underutilized for a period of time
and its pods can be easily moved to other existing nodes.

Kubernetes Interview Questions - Kubernetes Autoscaling:


Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and
Cluster Autoscaler
• Question: How does the HPA determine when to scale?
– Answer: HPA uses a control loop that fetches metrics from a series of
aggregated APIs (e.g., metrics.k8s.io, custom.metrics.k8s.io, and
external.metrics.k8s.io). It then determines whether to scale up or down
based on current resource utilization against predefined target utilization.
• Question: What metrics can be used by HPA for autoscaling?
– Answer: HPA can use a variety of metrics for autoscaling, including CPU
utilization, memory utilization, and custom metrics.

Kubernetes Interview Questions - Kubernetes Autoscaling:


Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and
Cluster Autoscaler
• Question: What do you mean by "cool down" period in the context of
autoscaling?
– Answer: The "cool down" period refers to a duration during which the autoscaler should not
make additional scale-up or scale-down actions. This is to ensure system stability and
prevent rapid fluctuations in the number of pods or nodes.

• Question: Can you change the HPA configuration without restarting the pod?
– Answer: Yes, you can edit the HPA configuration and apply it using kubectl apply. The
changes are picked up without needing to restart the pod.
• Question: How do you set up a Vertical Pod Autoscaler?
– Answer: VPA is set up by creating a VerticalPodAutoscaler resource. You define the target
(the pods to scale), the update policy, and the resource policy. Once you apply this
configuration, the VPA recommender starts providing recommendations for the target pods,
and the updater acts based on the policy and recommendations.

Kubernetes Interview Questions - Kubernetes Autoscaling:


Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and
Cluster Autoscaler
• Question: How do you configure custom metrics for HPA?
– Answer: To configure custom metrics for HPA, you would first need to have a metrics server
running that can provide these custom metrics. Then, in your HPA configuration, you can
specify the custom metrics under the metrics field, specifying type: Pods or type: Object
and defining the metric name, target type (Value, AverageValue), and target.
• Question: What is the use of the minReplicas and maxReplicas parameters in
the HPA configuration?
– Answer: minReplicas and maxReplicas set the lower and upper limit for the number of
replicas that the HPA can scale to. The HPA won't scale the number of replicas beyond these
boundaries.
• Question: What is the downside of setting a low minReplicas value in HPA?
– Answer: A potential downside of setting a low minReplicas value is that your application
might not have enough pods to handle incoming requests during peak traffic times,
resulting in slow response times or even downtime.

Kubernetes Interview Questions - Kubernetes Autoscaling:


Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and
Cluster Autoscaler
• Question: What are some factors to consider when deciding
between HPA and VPA?
– Answer: Some factors to consider include:
– If the application is stateless and can handle requests concurrently, HPA
might be a better choice.
– If the application is single-threaded and can't handle requests concurrently,
VPA might be more suitable.
– The latency of scaling. Scaling pods horizontally can often be faster than
scaling vertically because vertical scaling requires restarting the pod.
– The potential waste of resources. If there is a wide discrepancy between the
requests and limits of your pods, VPA can help make better use of resources.

Kubernetes Interview Questions - Kubernetes Autoscaling:


Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and
Cluster Autoscaler
• Question: Can you provide an example of how to configure HPA to
scale based on custom metrics?
– Answer: Certainly! Here's an example YAML configuration for HPA that scales
based on a custom metric called custom_metric:
• In this example, the HPA is targeting a deployment named my-
deployment. It sets the minimum replicas to 1 and maximum
replicas to 10. The HPA is configured to scale based on the custom
metric custom_metric, with a target average value of 50.

Kubernetes Interview Questions - Kubernetes Autoscaling:


Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and
Cluster Autoscaler
apiVersion: autoscaling/v2beta2

kind: HorizontalPodAutoscaler

metadata:

name: my-hpa

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

name: my-deployment

minReplicas: 1

maxReplicas: 10

metrics:

- type: Object

object:

metricName: custom_metric

target:

type: AverageValue

averageValue: 50
How can namespaces in Kubernetes be used to isolate different environments, such as
development, staging, and production?
– Namespaces allow you to create logical partitions within a Kubernetes cluster. You can use
namespaces to isolate different environments by creating separate namespaces for each
environment. For example, you can create a "development" namespace, a "staging" namespace,
and a "production" namespace. Each namespace can have its own set of resources, such as pods,
services, and deployments, that are specific to that environment. This ensures that resources in
one environment do not interfere with resources in another environment.
What are the benefits of using namespaces in Kubernetes?
– Some benefits of using namespaces in Kubernetes include:
• Resource isolation: Namespaces provide a way to segregate resources and prevent conflicts
between different applications or environments.
• Access control: You can assign different access controls and permissions to different namespaces,
allowing fine-grained control over who can access and manipulate resources within each
namespace.
• Organization: Namespaces help in organizing resources and provide a logical structure within a
cluster, making it easier to manage and maintain.
• Resource quotas: You can set resource quotas on a per-namespace basis, limiting the amount of
CPU, memory, and other resources that can be consumed by the resources within a namespace.
• Namespace-specific configurations: Namespaces allow you to apply specific configurations, such
as network policies or storage classes, that are applicable only to a particular namespace.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
• How can you create a namespace in Kubernetes?
• To create a namespace in Kubernetes, you can use the kubectl
command-line tool. The following command creates a namespace
named "development":
• kubectl create namespace development
• You can replace "development" with the desired name of your
namespace.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
How can you list all the namespaces in a Kubernetes cluster?
• To list all the namespaces in a Kubernetes cluster, you can use the
following command:
• kubectl get namespaces
• This command will display a list of all the namespaces along with
their status, age, and other details.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
• How can you specify the namespace for a resource in Kubernetes?
• When creating or managing resources in Kubernetes, you can specify the
namespace using the --namespace flag with the kubectl command. For
example, to create a deployment named "myapp" in the "development"
namespace, you can use the following command:
• kubectl create deployment myapp --namespace=development --
image=myapp:latest
• This ensures that the deployment is created in the specified namespace.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
How can you switch the default namespace in Kubernetes?
• To switch the default namespace in Kubernetes, you can use the
following command:
• kubectl config set-context --current --namespace=development
• Replace "development" with the desired namespace name. This
command updates the current context in your kubeconfig file to set
the specified namespace as the default for subsequent commands.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
• How can you view resources from a specific namespace in
Kubernetes?
• To view resources from a specific namespace in Kubernetes, you can
use the --namespace flag with the kubectl command. For example,
to list all the pods in the "staging" namespace, you can use the
following command:
• kubectl get pods --namespace=staging
• This command will display a list of pods specific to the specified
namespace.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
How can you delete a namespace in Kubernetes?
• To delete a namespace in Kubernetes, you can use the following
command:
• kubectl delete namespace <namespace-name>
• Replace <namespace-name> with the name of the namespace you
want to delete. This command will delete all the resources
associated with the namespace, including pods, services,
deployments, and more.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
How can you set resource quotas for a namespace in Kubernetes?

To set resource quotas for a namespace in Kubernetes, you can create a YAML file defining the quota specifications and then apply
it using the kubectl command. Here's an example YAML file that sets CPU and memory limits for a namespace named
"development":

apiVersion: v1

kind: ResourceQuota

metadata:

name: my-quota

spec:

hard:

limits.cpu: "4"

limits.memory: "8Gi"

Save the above content in a file named quota.yaml, and then apply it using the following command:

kubectl apply -f quota.yaml --namespace=development

This sets the CPU limit to 4 CPU units and memory limit to 8 gigabytes for the "development" namespace.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
How can you enforce resource quotas across multiple namespaces in Kubernetes?
– By default, resource quotas are applied on a per-namespace basis in Kubernetes. However, you can enforce
resource quotas across multiple namespaces using the "ResourceQuotaScopeSelector" feature. This feature
allows you to define a label selector for resource quotas, specifying the namespaces to which the quotas
should be applied.
– For example, you can create a resource quota with the label selector matchLabels: { scope: my-namespace-
group } and apply it to multiple namespaces by adding the label scope: my-namespace-group to those
namespaces.

How can you limit the number of nodes available for pods within a namespace in Kubernetes?
– To limit the number of nodes available for pods within a namespace in Kubernetes, you can use a
combination of the PodAffinity and PodAntiAffinity features. PodAffinity allows you to specify rules for pod
placement based on the affinity to other pods, while PodAntiAffinity allows you to specify rules for pod
placement based on the anti-affinity to other pods.
– By using these features, you can define rules that restrict the placement of pods within a specific
namespace to a subset of nodes in the cluster.

How can you create a shared storage volume accessible across multiple namespaces in
Kubernetes?
– To create a shared storage volume accessible across multiple namespaces in Kubernetes, you can use a
PersistentVolume (PV) and PersistentVolumeClaim (PVC) combination. PVs are cluster-wide resources that
represent physical storage, while PVCs are namespace-specific resources that request storage from PVs.
– First, create a PV that represents the shared storage volume. Then, create PVCs in each namespace that
require access to the shared storage. By specifying the same storage class and claim specifications in the
PVCs, they will be bound to the same PV and can access the shared storage.

Kubernetes Interview Questions - Kubernetes Namespaces: Using


namespaces for isolation and organizing cluster resources
• Kubernetes Architecture:
• Get the list of nodes: kubectl get nodes
• Describe a node: kubectl describe node <node_name>
• Get pods running on a specific node: kubectl get pods --field-selector
spec.nodeName=<node_name>
• Get the list of control plane components: kubectl get componentstatuses
• View the API server logs: kubectl logs -n kube-system kube-apiserver-<pod_suffix>
• Get cluster roles: kubectl get clusterroles
• Describe the kubelet service: kubectl describe service kubelet -n kube-system
• Check the status of etcd: kubectl get pod -n kube-system -l component=etcd
• View the scheduler logs: kubectl logs -n kube-system kube-scheduler-<pod_suffix>
• Check the status of the kube-proxy: kubectl get daemonsets -n kube-system kube-proxy

Kubernetes Command for Reference


• Pods:
• Get all pods in all namespaces: kubectl get pods --all-namespaces
• Describe a pod: kubectl describe pod <pod_name>
• Delete a pod: kubectl delete pod <pod_name>
• Exec into a pod: kubectl exec -it <pod_name> -- /bin/bash
• Get pod logs: kubectl logs <pod_name>
• Port-forward to a pod: kubectl port-forward <pod_name> <local_port>:<pod_port>
• Get pod events: kubectl get events --field-selector involvedObject.name=<pod_name>
• Create a multi-container pod: kubectl create -f multi-container-pod.yaml
• Get pod health status: kubectl get pods --field-selector=status.phase=Running
• Set resource limits on a pod: kubectl set resources pod <pod_name> --limits=<resource_limits>

Kubernetes Command for Reference


• Controllers:
• Get deployments: kubectl get deployments
• Scale a deployment: kubectl scale deployment <deployment_name>
--replicas=<replica_count>
• Get statefulsets: kubectl get statefulsets
• Get daemonsets: kubectl get daemonsets
• Get jobs: kubectl get jobs

Kubernetes Command for Reference


• Services & Networking:
• Get services: kubectl get services
• Expose a deployment: kubectl expose deployment
<deployment_name> --type=ClusterIP --port=<port>
• Get ingress resources: kubectl get ingress
• Apply a network policy: kubectl apply -f network-policy.yaml
• Get CNI details: kubectl describe nodes | grep ContainerNetwork

Kubernetes Command for Reference


• Volumes & Data:
• Get persistent volumes: kubectl get persistentvolumes
• Get persistent volume claims: kubectl get persistentvolumeclaims
• Create a persistent volume claim: kubectl create -f pvc.yaml
• Get storage classes: kubectl get storageclasses
• Attach a volume to a pod: kubectl attach <pod_name> -c <container_name> -i -t
• Create a secret from a file: kubectl create secret generic <secret_name> --from-file=<path_to_file>
• Get config maps: kubectl get configmaps
• Create a config map: kubectl create configmap <configmap_name> --from-literal=<key>=<value>

Kubernetes Command for Reference


• Configuration & Secrets Management:
• Get secrets: kubectl get secrets
• Create a secret from a literal value: kubectl create secret generic
<secret_name> --from-literal=<key>=<value>
• Get service accounts: kubectl get serviceaccounts
• Create a service account: kubectl create serviceaccount
<service_account_name>
• Get roles: kubectl get roles
• Create a role: kubectl create role <role_name> --verb=<verb> --
resource=<resource>

Kubernetes Command for Reference


• RBAC & Security:
• Get role bindings: kubectl get rolebindings
• Create a role binding: kubectl create rolebinding <rolebinding_name> --
role=<role_name> --user=<user_name>
• Get pod security policies: kubectl get podsecuritypolicies
• Describe a network policy: kubectl describe networkpolicy
<network_policy_name>
• Enable RBAC authorization: Modify the API server configuration and
restart the control plane components.

Kubernetes Command for Reference


• Resource Management:

• Get resource quotas: kubectl get resourcequotas

• Get limit ranges: kubectl get limitranges

• Get horizontal pod autoscalers: kubectl get hpa

• Get vertical pod autoscalers: kubectl get verticalpodautoscalers

• Get cluster autoscalers: kubectl get clusterautoscalers

• Maintenance & Troubleshooting:

• Drain a node: kubectl drain <node_name>

• Uncordon a node: kubectl uncordon <node_name>

• Get kubelet logs: kubectl logs -n kube-system kubelet-<node_name>

• View cluster events: kubectl get events

• Get cluster info: kubectl cluster-info

Kubernetes Command for Reference


• Helm:
• Search for charts: helm search repo <keyword>
• Install a chart: helm install <release_name> <chart>
• Upgrade a release: helm upgrade <release_name> <chart>
• Uninstall a release: helm uninstall <release_name>

• Custom Resource Definitions (CRDs):


• Get custom resources: kubectl get <custom_resource>
• Create a custom resource: kubectl create -f cr.yaml
• Delete a custom resource: kubectl delete <custom_resource> <resource_name>

Kubernetes Command for Reference


• Kubernetes Autoscaling:
• Enable Horizontal Pod Autoscaler (HPA): kubectl autoscale deployment <deployment_name> --
min=<min_replicas> --max=<max_replicas> --cpu-percent=<cpu_threshold>
• Enable Vertical Pod Autoscaler (VPA): Follow the VPA installation and configuration guide.
• Enable Cluster Autoscaler: Follow the Cluster Autoscaler installation and configuration guide.

• Namespaces:
• Get namespaces: kubectl get namespaces
• Create a namespace: kubectl create namespace <namespace_name>

Kubernetes Command for Reference


• Kubernetes Security:

• Get pod security policies: kubectl get podsecuritypolicies

• Get network policies: kubectl get networkpolicies

• Enable admission controllers: Configure the admission controllers in the API server configuration.

• TLS and Certificate Management:

• Generate a self-signed certificate: openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

• Create a secret from a TLS certificate: kubectl create secret tls <secret_name> --key key.pem --cert cert.pem

• Use the TLS secret in an Ingress: Modify the Ingress resource definition to reference the TLS secret.

• Kubernetes Threat Modeling:

• Perform a security audit of the cluster: Use tools like kube-bench and kube-hunter to identify vulnerabilities.

• Review Kubernetes configuration files: Use kube-score to assess the security of your configuration.

Kubernetes Command for Reference


• I'm a DevOps Engineer with extensive hands-on experience in automating
and optimizing workflows. My expertise lies in using Ansible for
configuration management, Kubernetes for container orchestration, and
Terraform for infrastructure as code. I’ve good experience using Bash
scripting.
• In addition, I have utilized GitLab for CI/CD to automate build, testing,
and deployment processes, and Jira for issue tracking and workflow
management, enhancing team productivity and reducing issue resolution
time.
• I believe my technical skills, adaptability, and proactive communication
make me an ideal candidate for your team. I'm excited about the
possibility of contributing to and learning from your esteemed
organization."

About Your Self


• I am a Devops Engineer with "x" years of extensive experience in the field. I have been working for a
leading product-based company that focuses on automating infrastructure and streamlining software
development processes. I have hands-on expertise in using a variety of tools and technologies to create,
deploy, and maintain applications effectively.
• My technical skills include: :
• Terraform infrastructure as code, in my current project we are using it to automate and creating modular,
reusable, and scalable Azure cloud infrastructure.
• Gitlab CI/CD has been used for continuous integration and continuous deployment of pipelines, and using
gitlab we are enabling faster and more reliable software releases.
• Kubernetes: we are managing containerized applications using Kubernetes, integrated with GitLab for
seamless application rollouts.
• Using Ansible: and Am Experienced in automating application configuration management using Ansible,
ensuring consistency and stability across environments.
• using Bash Scripting: and Am well-versed in writing Bash scripts to automate various tasks and trigger
pipelines, improving overall efficiency and reducing manual intervention.
• we use Jira: and am Proficient in using Jira for issue tracking, task management, and Agile project
management, ensuring effective communication and collaboration within the team.

About Your Self


• Role: DevOps Engineer as a Bridge Between Development and
Operations
• As a DevOps Engineer, I facilitate seamless collaboration between
the development and operations teams. My role ensures faster,
more efficient delivery of high-quality software by eliminating
traditional silos and fostering a culture of shared responsibility.
• Importance: The Need for DevOps in Streamlining Workflows,
Reducing Errors, and Increasing Speed of Delivery
• DevOps is crucial in today's fast-paced digital world. It streamlines
workflows through automation, significantly reduces errors with
early detection mechanisms, and accelerates delivery speed,
ensuring rapid deployment of robust software solutions.

Daily Tasks – Introduction


• Daily Tasks:
• Each day, I begin by checking Jira, our project management tool. This
allows me to quickly identify and prioritize the day's tasks, ensuring that I
focus my efforts where they are most needed. I evaluate both our sprint
tasks, focusing on our goals, and any immediate support tasks that
require attention.
• Planning:
• Using Jira effectively allows me to organize and manage tasks across
different DevOps tools such as Gitlab CICD, Terraform, Kubernetes, and
Ansible. This leads to a structured work plan, ensuring smooth
deployment and operation, while reducing any potential risks or delays.

Daily Tasks – Day Starts with Jira


• Daily Routine:
• Routine check of test pipelines is a crucial part of DevOps Engineer.
By identifying and resolving potential issues early, they can ensure
the stability of code and keep delivery timelines intact.
• Stability:
• Constant monitoring of our CI/CD pipelines is pivotal to maintaining
system reliability. It helps to avoid disruptions, guarantee seamless
user experience, and ultimately drive user satisfaction.

Daily Routine
• Support:
• An integral part of DevOps Engineer role involves providing prompt
and effective support to customers facing application issues,
ensuring minimal disruption to their operations.
• Troubleshooting:
• Leveraging technical expertise, DevOps Engineer follow a systematic
approach to troubleshoot - identify the problem, analyze its root
cause, and implement a suitable solution. This approach ensures
rapid resolution, maintaining the high reliability of applications.

CICD Support
• Automation:
• DevOps work involves designing robust CI/CD pipelines using GitLab or
any CICD Tool. This allows for automated testing, building, and packaging
of our applications, resulting in a streamlined, efficient workflow.
• Problem Solving:
• As an example of problem-solving, consider the deployment of an
application or platform update. DevOps Engineer uses Terraform for
setting up the infrastructure, Ansible for ensuring correct configurations,
and Kubernetes for smooth deployment and scaling. This integrated
approach simplifies the deployment process, guarantees consistency
across environments, and enhances the reliability of applications.

DevOps Engineer Priority Job: Automation & Deployment


• Unplanned Tasks:
• In DevOps Engineer dynamic role, handling unexpected situations such as
ad-hoc meetings or system emergencies is part of the norm. DevOps
Engineer should have abilities to respond effectively to these unforeseen
challenges, ensuring minimal disruption to processes.
• Prioritization:
• In a fast-paced environment, being able to prioritize tasks efficiently is
key. DevOps Engineer should adept at juggling multiple tasks and rapidly
adjusting priorities based on the business needs, ensuring that critical
issues are addressed promptly and effectively.

Unplanned Tasks 
• Review:
• At the end of the day, DevOps Engineer should believe in reflecting on the
tasks accomplished. This includes updating progress in Jira, which not
only provides transparency about their current standing but also helps in
maintaining a clear record of workflows.
• Future Planning:
• Looking ahead is as important as reflecting on the past. DevOps Enginner
should engage in end-of-day planning, preparing for the upcoming day's
tasks, and aligning their focus with the team's objectives. often have
meetings to discuss the day's progress, upcoming challenges, and devise
strategies to tackle them, keeping them prepared and proactive.

End The Day with Review


• Summary:
• As a DevOps engineer, role is diverse and ever-evolving, mirroring the dynamic nature of
technology itself. With a foundation in adaptability and problem-solving, DevOps Engineer
should navigate through complex workflows, enhance system efficiency, and ensure seamless
deployments.
• Commitment:
• You should remain steadfast in commitment to continuous learning, staying abreast of
emerging technologies, and mastering new tools. This, along with your dedication to teamwork,
enables you to contribute significantly to the project's success.
• If you're beginning your journey in DevOps, remember that it's about embracing change,
fostering collaboration, and leveraging automation to deliver quality at speed. It's a challenging
yet rewarding field that sits at the heart of modern digital transformation. Every challenge you
encounter is an opportunity to learn and grow, making you an integral part of any software
development process.

Thank You 

DEVOPS
INTERVIEW
PREPARATION
GUIDE
By DevOps Architect
All The Best
• Remember, the goal of this course is not to teach you specific 
answers to every possible question but to prov
About Me
• Name and Role: Jai M, DevOps Architect
• Experience: 
• Total years in IT: "15 years"
• Specialization: "Specializ
Pre Requisite 
• DevOps 
– Linux and Bash Scripting
– Git and GitHub
– GitLab CI CD
– Azure Cloud
– Terraform
– Ansible
– Doc
Content
• Linux Interview Questions with Realtime Scenarios
• Bash Scripting Interview Questions with Realtime Scenarios
• Gi
Linux Interview Questions
• Scenario: You're unable to SSH into a Linux server. What steps would you take 
to troubleshoot th
Linux Interview Questions
Check if SSH service is running:
If you have access to the server's console, you can check if the S
Linux Interview Questions
Check if the SSH port is open:
The default port for SSH is 22. You can use telnet to check if it's
Linux Interview Questions
Check SSH configuration:
The main configuration file for SSH is located at /etc/ssh/sshd_config. Yo
Linux Interview Questions
Scenario: Your Linux server is running slower than usual. What steps can you 
take to identify and

You might also like