You are on page 1of 10

1- What is virtual Memory? Swap space. It's a area on the storage (hard disk) wich uses if RAM is low.

Some times it's calling paging space. 1.1- what is memory pressure? The Linux VM code tries to use up spare memory for cache, thus there is normally little free memory on a running system. The intent is to use memory as efficiently as possible, and that cache should be easily recove rable when needed. When the system needs a page and there are insufficient available the system wil l trigger reclaim, that is it will start the process of identifying and releasing currently in-use pages (kswapd and direct reclaim). 1.2- Is there a command that would show the system to be in such a state? The vmstat or free commands can be used to monitor system virtual memory activit y - /proc/meminfo, /proc/stat, /proc/*/stat, /proc/sys/vm/* It shows the amount of space allocated on the swap device, the number of unused pages in RAM, the sizes of the buffer and page cache and the number of blocks swapped in and out over a given period of time. 1.3- what is a page fault? A page fault occurs when a process must access the disk or swap device to satisf y the page request. 1.4- what is the oom-killer? when is it triggered? The OOM killer subsystem is an algorithm which is invoked when the system runs o ut of paging space. When the system runs out of memory, you will typically see messages either from the OOM killer, or page allocation failures. These are typically symptoms that either your workload is to high for the machin e or something is wrong. If the workload does not fit into RAM + SWAP, then you are going to run out of m emory. If it does not fit into RAM, then it will probably perform badly, but should still work. 1.5- Can the sysadmin force the OOM killer to start? Yes we can, by sending SysReq commands "m" to dump current memory info to your c onsole and "f" to call oom_kill to kill a memory hog process. # echo "f" > /proc/sysrq-trigger 1.6- Can the kernel swap to a ramdisk? Yes. # mkswap /dev/ram0 # swapoff -a # swapon /dev/ram0 1.7- How can a 32-bit system use more than 4GB of RAM? It possible if you will install PAE kernel (Physical Address Extention) 2- How does one list the pci peripherals connected to the system? # lspci 2.1- For one particular PCI peripheral, how can one know if it is supported by a ny of the modules provided in linux? # lsmod # cat /lib/modules/2.6.*.el5/modules.pcimap 2.2- what other methods/commands can be used to find out which devices are conne cted to the computer (including disks and other peripherals in general) # dmesg

What are the reasons why some applications insist on accessing raw devices? Because they need bypass the system buffer cache 4. suid.Is it possible to have a hard link spanning two files in different filesyst ems? What is the reason? No. An ELF file s tarts with an ELF header.file + flags sgid. Hard link works with inode. It is used for both regular binary executa bles. files etc. It's possible to extend in 8k on Intel Itanium for example.write permission x or 1 . parted ls -l /dev/ hdparm permission w or 2 . no in case of symbolic link instances. where: r or 4 .4. sticky bit 5.1. 4. 4. you won't be able to reference an inode on a different disk or eve n a different filesystem.What is the main binary executable format used in linux? Can you describe the most important components of this format? ELF (Executable and Linkable Format). Since the inodes are unique to eac h filesystem. 4. Filesystem will share the same code (inode) in the syscall if the directory instanced hardly (hard link) linked. everything is configured as a file. Because character devices are read and written without buffering .Do all filesystems implement all the file and directory access system calls ? When making a directory for instance. In Linux ACL on files and directories like drwx-rw-r-x. With hierarchy of catalog s.3.What is an access control list? How is it used in Linux? An access control list (ACL) is a table that defines the user privileges policy of the operating system. because their internal structure is quite similar. Different file systems have different features and restrictions. inodes or fact that in Linux. This .directory b . do filesystems share the same code in th e syscall? Yes. 4.execute permission d . Not clear a bit.can you use 8k block ext3 filesystems on x86 machines? No.2. x86 filesystem block is always 4k by default and never larger than the size of a memory page. Hard link possible to create only in one filesystem. object code (.block device .# # # # # # # # # dmidecode lsusb lshal ethtool cat /proc/cpuinfo cat /proc/memory fdisk.5.6.7. shared libraries.What type of file system would be most suitable for a character device? Raw filesystem. 4. followed by an image header and the actual data.What is a filesystem? It's system for hard drives to storage of information. In other reason you sho ud use soft link.What are the main structures used in a filesystem? FHS. 4.o files) and core dumps.

How can one recover from the situation on (5.255. ldd -v /bin/bash 5.What kind of memory protection is needed in order for the operating system to correctly implement shared libraries? Shared libraries are designed with a technique for placing library functions int o a single unit that can be shared by multiple processes at run time.0.255. for debugging purposes or transparent extensions. 6.1.168. Sysadmin does a few commands: # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:11:44:43:63:E1 inet addr:192. now many applications do not load.255.255.1. But I am not sure about as I had no chance to deal to much with shared libraries in my experience.2. This techn ique save both disk space and RAM.0 192.How can one see all the shared libraries which a binary is linked to? ldd -v "way to file" e.A sysadmin removed by accident all the links in the /lib directory leaving the binaries. UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2231 errors:0 dropped:0 overruns:0 frame:0 TX packets:2522 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:780859 (762. 5.0 0.104 Bcast:192.what is linking? Linking is the process of combining various pieces of code and data together to form a single executable that can be loaded in memory. 5.1.0 Flags Metric Ref U 0 0 UG 0 0 Use Iface 0 eth0 0 eth0 . being that only eth0 is plugged in and working. And it makes sense to do that.g.what is a shared library? The shared library is a library/program that allow executables to dynamically ac cess external functionality at run time and thereby reduce their overall memory footprint by bringing functionality in when it's needed.4)? # ldconfig /lib 5. What is the best way of finding the links through which a library should be accessed? Copy all files from other systems by hand or make links from /usr/lib 5. 5.1.Is it possible to override the functions defined in a shared library when r unning an application? The current Linux shared libraries are much more flexible and sophisticated that permit us to override the specific functions in a library when executing a part icular program.0 sysadmin is operating a linux system with 6 network cards. Mask:255. It can be done without messing up with the library source code o r having root permissions in order to install a patched version of the library.4.1 # iptables -L Chain INPUT (policy ACCEPT) Genmask 255.0.0 0. a concept familiar to any Assembly language programmer .3.5 KiB) Base address:0x8000 Memory:c0220000-c0240000 # route -n Kernel IP routing table Destination Gateway 192.6.5 KiB) TX bytes:373261 ( divided into sections. 5.

016147 IP 192.1.089529 IP 192.42414 192.104.104. length 40 14:04: IP 192.1.1 id 10254. IP 192.168. length 40 14:04:09.104 192.104.59999 1.What is the IP address of this system and the router it is configured to us e? IP address of this system is 192.104: ICMP 192.168.1 192. length 40 14:04: IP 192.How can one know the pci card that is currently eth0 and the driver it's cu rrently using? # lspci | grep -i net # cat /etc/modprobe.168.1. length 64 13:43:36.1.087797 IP 192.1.1. length 40 14:04:09.168.168. With the following output from tcpdump.104.1.33382 192.1 192.016281 IP 192.56430 192. length 64 13:43:35. and if they succeeded or failed? 13:43: udp port 33435 un reachable.1.1 udp port 33434 un reachable.168.016953 IP 192.2. seq 1 .104 length 40 14:04:09. seq 2.Sysadmin left a tcpdump running as follows: # tcpdump -n -nn -i eth0 on one terminal. Default gateway (router) is 192.168.33437: UDP.33561 192.168. can you tell what commands were run.168.33440: UDP.33438: UDP.015820 IP length 40 14:04:09.1.21-k4. seq 2 .1.168. length 64 14:04:09.33435: UDP.33439: UDP.016034 IP ICMP echo reply.1: ICMP echo request.45147 192.168. length 40 14:04: 192.0 6.1.56784 ICMP echo reply.088225 IP 192. length 76 14:04:09.How can one know the speed of the interface eth0? # mii-tools # ethtool eth0 | grep -i speed IP 192. id 10254. ICMP echo request. seq 1.33434: UDP.1.1. while issuing commands on another terminal.015376 IP 192.3.33441: UDP.1 192.1. id 10254.168.015974 IP 192.51276 prot opt source destination destination destination Chain FORWARD (policy ACCEPT) target prot opt source Chain OUTPUT (policy ACCEPT) target prot opt source Questions: 6.1.1-NAPI firmware-version: N/A bus-info: 0000:02:00.087483 IP 192. length 76 .3.1.104: ICMP 192. length 40 14:04:09.1.35841 UDP.1.conf # lsmod | grep e1000 or # ethtool -i eth0 driver: e1000 version: 7. id 10254.1. length 64 13:43: UDP.104.168.168. length 40 14:04:09.

timestamp 5390780 574003265> 14:51:55.33796 192.80 IP 192.1.251983 IP 192. 14:04:09.33442: UDP.104.23 192.1.42034 192.1.33382 ack 1 win 46 <nop.019201 IP 192.015820 IP 14:04:09.1.nop. . 192.1 udp port 33439 un reachable.168. 14:04:09.1.1 reachable.016953 IP 192.1.104 14:51:48. # traceroute 192.timestamp 574003265 5390742> 14:51:55.015877 IP 192.nop .timestamp 574003286 5390800> # ping 192.016091 IP 192.1 is-at 00:1e:35:3e:1c:41 14:51: 192.80 192.1 tell 192.168.104: ICMP 192.33796 UDP.1.1 udp port 33438 un reachable.289779 IP 192. 14:04: P 1:344(343) ack 174 win 46 <nop. UDP.1.1 192.80: F 174:174(0) ack 345 win 54 <nop.56430 length 64 13:43: IP 192.168.1 13:43:35.1.104. 14:04: IP 192.59999 R 0:0(0) ack 2650503082 w in 0 14:51:48.33437: UDP.168. seq 2.nop.104.nop.1 192.timestamp 574003267 5390780> 14:51:55.016034 IP 192.56784 192.timestamp 5390740 0.104.nop. ack 174 win 46 <nop.104.104: ICMP IP 192.104: ICMP 192.168.017452 IP 192.33434: UDP. IP 192.1.1 udp port 33437 un reachable.104 192.168. length 76 14:04:09.1. seq 1.1.timestamp 574003228 5390742> 14:51:55.1.80: .309637 IP 192.33435: UDP.017952 IP 192.104.251580 IP 192.168.80 192.1.nop.168.104 192. id 10254.nop.33438: UDP. length 76 14:04:09.104: ICMP echo reply.168.1.017600 arp reply 192.timestamp 574003227 5390740.1.1.33796: .168.wscale 7> 14:51:55.wscale 7> 14:51:43.015376 IP length 40 length 40 length 40 length 40 length 40 length 40 length 40 length 40 udp port 33434 un length 40 udp port 33435 un .1.1.168. 14:04: 5390800 574003267> 14:51:55.33796 192.nop.timestamp 5390742 574003227> 14:51:55.018452 IP 192.289839 IP 192. length 76 14:04:09.1 ICMP echo reply. ICMP 192.1.016147 IP 192. 14:04: IP length 64 13:43:36.015759 IP udp port 33436 un reachable.sackOK. UDP. length 76 14:51:43. 14:04:09.104. 14:04: IP 192.1: ICMP echo request.33441: UDP.1 192.1. seq 1 192.80: P 1:174(173) ack 1 win 46 <nop.168.33796: S 2605137501:2605137501(0 ) ack 2660968685 win 5792 <mss 1460.104.168.168.nop. ack 175 win 46 <nop.sackOK. IP . ICMP 192.1.wscal e 7> 14:51:55.1.timestamp 5378506 0. ack 344 win 54 <nop.1.1.1. id 10254.23: S 2650503081:2650503081(0 ) win 5840 <mss 1460.017352 arp who-has 192.1.168. 192.104. seq 2 192.251545 IP 192.33796 S 2660968684:2660968684(0 ) win 5840 <mss 1460.104.015974 IP 192.087797 IP . .t imestamp 5390742 574003227> 14:51:55.104.168. 14:04:09.1: ICMP echo request.nop .168.104: ICMP 192.1. length 64 13:43:36.33796 192.1.33796: F 344:344(0) ack 174 win 46 <nop.80: .168. length 76 14:04: id 10254.291768 IP 192.088225 IP length 64 SUCCEEDED -c2 192.168.018084 IP 192.42414 192. id 10254.sackOK.168.

104: icmp_seq=1 ttl=64 time=0.168.0 14:51:48.104.1 reachable.80 S 2650503081:2650503081(0 ) win 5840 <mss 1460.310262 IP 80 and HEAD / HTTP/1.168.104: ICMP 192.1 udp port 33436 un 192.168.082/0.timestamp 5390780 574003265> 14:51:55.1.018084 IP 192.168.1. How can one tell if t his is a problem or not.1.1. length 76 14:04:09.timestamp 574003227 5390740.249807 IP 192.1.1 or telnet 192.nop.104.nop. ack 1 win 46 <nop.80: F 174:174(0) ack 345 win 54 <nop.wscal e 7> 14:51:55.timestamp 574003265 5390742> 14:51:55. length 76 14:04:09.33796: S 2605137501:2605137501(0 ) ack 2660968685 win 5792 <mss 1460.168. The sysadmin runs the c ommand again in flood mode.nop.1.1.80 192.1.1.timestamp 5390742 574003227> 14:51:55.nop .1 udp port 33439 un # telnet 192.168.017952 IP 192.timestamp 574003228 5390742> 14:51:55.1.067/0.1.nop .80 192.80: S 2660968684:2660968684(0 ) win 5840 <mss 1460.289779 IP 192.33796: F 344:344(0) ack 174 win 46 <nop.017600 arp reply 192.42034 192.sackOK.018890 IP 192.168.104: ICMP 192.1.1. 3 received.017352 arp who-has IP 192.1 reachable.1.1. 64 bytes from 192.168.1. length 76 SUCCEEDED 192.104.nop.104: ICMP 192.018452 IP 192.057/0.168.33796 192.1 udp port 33438 un 192.33796 and still nothing is captured.057 ms --.104.252543 IP 192.33796: .42034: R 0:0(0) ack 2650503082 w in 0 FAILED # wget 192.014 ms However the tcpdump does not capture any of the traffic.291768 IP 192.168.sackOK. ms 64 bytes from ICMP 192.1.80 192.019201 IP 192.104 (192.289839 IP 192.reachable.wscale 7> 14:51:55.1 reachable. 5378506 0.251580 IP 192.104 -c3 PING 192.1.309637 IP 23 14:51: 14:51:48.1.t imestamp 5390742 574003227> 14:51:55.168. length 76 14:04:09.1.wscale 7> 14:51:43. ack 344 win 54 <nop.168.80: .1.33796 udp port 33437 un 192.sackOK. 56(84) bytes of data.1.33796: .082 ms 64 bytes from 192.1 is-at 00:1e:35:3e:1c:41 14:51:55. time 1999ms rtt min/avg/max/mdev = length 76 14:04:09.1 tell 192.1.104: icmp_seq=3 ttl=64 time= icmp_seq=2 ttl=64 time=0.104.168.timestamp 574003286 5390800> SUCCEEDED 6.80: P 1:174(173) ack 1 win 46 <nop. IP 192.168.80: .168.33796: P 1:344(343) ack 174 win 46 <nop.104 ping statistics --3 packets transmitted.nop.1.33796 192. and what files or commands could be able to prove the p revious statement? .1.168.33796 sysadmin runs the command: # ping 192.nop .168.1.timestamp 5390740 reachable.timestamp 5390800 574003267> 14:51:55.1.1.nop.168.1.017452 IP 192.Still with the tcpdump running. 0% packet loss. ack 174 win 46 <nop.nop.1.1.timestamp 574003267 5390780> 14:51:55. ack 175 win 46 <nop.168.

1. 7. it is stateful. How does one differentiate what version of the proto col will be mounted? Can you provide a packet dump showing exactly this request? 7. so an additional Network Lock Manager (NLM) p rotocol. When doing an `ls /data' o ne lists another directory exported by the second cluster node.the /data mount actually comes from a cluster.1. # tcpdump -nni lo 7. root can not create files on it. The node that was serving /d ata had a sudden power failure.105 migrated to another node.105:/data on /data type nfs (rw. The virtual IP 192.7.intr.We should listen for network traffic on loopback interface not on eth0 (our case ). NFSv3 . it is stateless.1. Also NLM is stateful in that the server LOCKED k eeps track of locks. nfs3 and nf s4 all share the same port.To mount nfs4 shares it is not necessary to have all the daemons necessary for NFSv3. one can see that nfs2. Locki ng operations(open/read/write/lock/locku/close) are part of the protocol proper.By looking at the port numbers for nfsd.statd.168. lockd.mountd. lockd.2 section.Even though permissions are set correctly in the local system. rpc. rpc.168. NLM is not used by NFSv4.statd chkconfig chkconfig chkconfig chkconfig portmap on nfs on nfslock on netfs on 7. What are the probable causes of this happening? Perhaps in /etc/exportfs seted parameter root_squash and we shoud switch off to no_root_squash.4. NFSv3 and NFSv4? NFSv2 .105) 7. designed for internet use. rpc.1. not allowing clients to access more than 2Gb of file data.originally operated entirely over UDP. depends on portmapper.soft. What could be ha ppening in this case? .uses only TCP protocol.3. 7.168. it als o does not support 64-bit file sizes and offsets. NFSv4 . it asynchronous writes on the server. an auxiliary protocol for file locking is required to support locking o f NFS-mounted files READ/WRITE. but the mount /data is now showing wrong data.uses both UDP and TCP protocols.How can one list what port is being used by each rpc daemon on a system? # rpcinfo -p 7.What daemons are necessary to be running on an NFS server if a client wants to mount an NFSv3 share? rpcbind (rhel6).5. and rpc.The directory /data is mounted via NFS: 192. does not support safe asynchronou s writes (it writes syncronous) and has a poor error handling than NFSv3.What are the main differences between NFSv2.6.2.How can one show the statistics of NFS calls made to a server? # nfsstats -s 7.8.addr=192. 7. why is that? See the answer from 7.

What is the meaning of an unfinished syscall? When the system call is being executed and meanwhile another one is being called from a different thread/process then strace try to preserve the order of those events and mark the ongoing call as being unfinished.9. strace shows an unfinished system call./script.3.An NFS server is very loaded. 9. or how to debug the p roblem further? A process is said to be stuck when it cannot proceed because it is waiting for a n event that cannot.2.2. when attaching strace t o the pid.7.Can you create a very small bash script which segfaults? Segmentation fault The user notices that changing some of the values inside a loop in the bash scri pt prevents the segfault from happening. but it seems to be stuck.4. 9.What is the best way to find out what happened in the segfault? Use gdb to track exact source of problem.User is running a process.1.--system-. running 8 nfsd threads.What is the best approach to understand what is happening in this process? strace -f 10. So. How can one prove to the sysadmin that it is necessary to increase the number of NFS threads? 8. or does not. 10. .-----io---. occur.3. 9.1. 10. what can you say about this system during v mstat 1 was running? procs r b 6 15 0 0 5 13 0 5 10 0 2 13 0 5 16 0 0 3 16 0 8 17 0 -----------memory---------.How can one tell exactly where the process is stuck.A user runs a bash script and gets a segfault $ .-----cpu----swpd free buff cache si so bi bo in cs us sy id wa st 1635256 11096 124 8088 584 176100 1064 176104 1257 1923 21 79 0 1711616 1764100 1835916 1939504 1981832 8548 8512 8528 9104 8588 128 132 120 112 120 144 8156 8260 7836 48 76360 0 52484 92 69424 280 76368 439 375 26 70 4 0 256 52484 720 882 25 74 0 0 384 69424 544 993 24 76 0 0 396 102796 491 995 20 80 0 224 40608 447 547 23 65 12 0 580 76960 588 755 25 75 0 0 8168 128 102796 7700 128 40608 7960 92 76960 2058792 14376 9.What is the best way of instrumenting a script so that it doesn't segfault? 9.from the output of this vmstat 1. if we want to find where the process is stuck we should put that in debug mo de by creating a break points.---swap-.Can the logic of a script lead it to a segfault? No.

instead of trying to attac h to it at runtime. what kind of information would you try to obtain. 11.We might want to consider running it from a debugger. .connect -p 15918 4976 1928 pts/1 S+ 12:22 0:00 links ht # strace -f -e poll.User is running a process. doi ng 'ps auxw | grep sync' the sysadmin notices that it is in 'D' state. The only way to kill the process in state D is reboot of the machine. which works fine. when attaching strace t o the pid.%{ARCH}\n' | s ort -rn 15.0 tp://google.What can one do to debug this case even further? 12. Supposing that a support call will be opened to find the root cause of this issu e. the signals do not stop the process or alter the behavior and it means also that the process is holding a semaphore or a crit ical system resources.0 0. W hen rebooting the system they notice that the system does not boot anymore.Superuser runs 'sync' on a linux system. Can the s ysadmin kill this process? The sysadmin tried to strace the process. largest package first (use dpkg and related commands if you're n ot familiar with RPM and related commands) # rpm -qa --queryformat '%10{SIZE}\t%{NAME}-%{VERSION}-%{RELEASE}. and why? The process with flag D is uninterruptable sleep and basically can not be killed by users and/or admins. 11. which only showed the unfinished sync() syscall.2. but it seems to be # ps aux | grep links kripton 15918 0. which should fit as the tape supports 200Gb.1 root root to write his 3G file ont o it. The disk that was removed was part of the VG but contained no data. What is the proc edure to bring the machine back to operation and the VG back with its LVs? Boot into rescue mode # lvm vgscan # lvm vgchange --removemissing vgname # lvm vgextend --restoremissing vgname pvname (if the removed disk was readed) 14. but this command never returns. can one understand what is happening and the status of the process? We can test it by using links command: # links http://google.By accident a sysadmin removed one of the disks belonging to an active VG.Write a shell command that lists all the RPMs installed in a system sorted b y package size. The system already ha d a drive called /dev/st0.1.A sysadmin just added a new TAPE drive to his system. 1 Mar 2 23:16 /dev/st1 The sysadmin loads a tape into the drive and runs a dd. The sysadmin added the tape drive a nd created a /dev/st1 for it: # ls -l /dev/st1 brw-r--r-. Status uninterruptable means that process is performing so-called critical task. strace shows nothing at all.

and now the boot loader is not working p roperly.tar of=/dev/st1 bs=32k However the dd finishes with an error saying that the device is full at 250Mb.2. 15. grub> grub> grub> grub> grub> find /boot/grub/stage1 root (hd0. and GRUB complains about a problem at stage 1.The sysadmin rebooted the system.5 is one of the most common problem that the grub ha s lost configuration and there are several way to restore it back.3.# dd if=/tmp/backup.1.1) kernel /vmlinuz root=/dev/sda2 ro initrd /initrd boot .Can you explain why this is not working as expected? 15.5.Is it a problem caused by the block size of the dd? Are there any situatio ns where the block size used can change the behaviour of a drive? 15. What should one do? The GRUB problem at stage 1.