Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project

Student Guide

Sun Microsystems, Inc. ,

Part No: 819–5580–10 March, 2006

Copyright 2006 Sun Microsystems, Inc.

,,

All rights reserved.

Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more U.S. patents or pending patent applications in the U.S. and in other countries. U.S. Government Rights – Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements. This distribution may include materials developed by third parties. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, the Solaris logo, the Java Coffee Cup logo, docs.sun.com, Java, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements. Products covered by and information contained in this publication are controlled by U.S. Export Control laws and may be subject to the export or import laws in other countries. Nuclear, missile, chemical or biological weapons or nuclear maritime end uses or end users, whether direct or indirect, are strictly prohibited. Export or reexport to countries subject to U.S. embargo or to entities identified on U.S. export exclusion lists, including, but not limited to, the denied persons and specially designated nationals lists is strictly prohibited. DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Copyright 2006 Sun Microsystems, Inc. ,, Tous droits réservés.

Sun Microsystems, Inc. détient les droits de propriété intellectuelle relatifs à la technologie incorporée dans le produit qui est décrit dans ce document. En particulier, et ce sans limitation, ces droits de propriété intellectuelle peuvent inclure un ou plusieurs brevets américains ou des applications de brevet en attente aux Etats-Unis et dans d’autres pays. Cette distribution peut comprendre des composants développés par des tierces personnes. Certaines composants de ce produit peuvent être dérivées du logiciel Berkeley BSD, licenciés par l’Université de Californie. UNIX est une marque déposée aux Etats-Unis et dans d’autres pays; elle est licenciée exclusivement par X/Open Company, Ltd. Sun, Sun Microsystems, le logo Sun, le logo Solaris, le logo Java Coffee Cup, docs.sun.com, Java et Solaris sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture développée par Sun Microsystems, Inc. L’interface d’utilisation graphique OPEN LOOK et Sun a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique pour l’industrie de l’informatique. Sun détient une licence non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence couvrant également les licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui, en outre, se conforment aux licences écrites de Sun. Les produits qui font l’objet de cette publication et les informations qu’il contient sont régis par la legislation américaine en matière de contrôle des exportations et peuvent être soumis au droit d’autres pays dans le domaine des exportations et importations. Les utilisations finales, ou utilisateurs finaux, pour des armes nucléaires, des missiles, des armes chimiques ou biologiques ou pour le nucléaire maritime, directement ou indirectement, sont strictement interdites. Les exportations ou réexportations vers des pays sous embargo des Etats-Unis, ou vers des entités figurant sur les listes d’exclusion d’exportation américaines, y compris, mais de manière non exclusive, la liste de personnes qui font objet d’un ordre de ne pas participer, d’une façon directe ou indirecte, aux exportations des produits ou des services qui sont régis par la legislation américaine en matière de contrôle des exportations et la liste de ressortissants spécifiquement designés, sont rigoureusement interdites. LA DOCUMENTATION EST FOURNIE "EN L’ETAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A L’ABSENCE DE CONTREFACON.

060306@2851

Contents

1

What is the OpenSolaris Project? .................................................................................................... 7 Web Resources for OpenSolaris ...................................................................................................... 10 Discussions .........................................................................................................................................11 Communities ......................................................................................................................................11 Projects ................................................................................................................................................11 OpenGrok .......................................................................................................................................... 12

2

Planning the OpenSolaris Environment ...................................................................................... 15 Development Environment Configuration ................................................................................... 17 Networking ........................................................................................................................................ 18

3

OpenSolaris Policies ........................................................................................................................ 21 Development Process and Coding Style ......................................................................................... 23

4

Features of the OpenSolaris Project ............................................................................................. 25 Overview ............................................................................................................................................ 26 Security Technology: Least Privilege ............................................................................................... 26 Predictive Self-Healing ..................................................................................................................... 26 Zones .................................................................................................................................................. 28 Branded Zones (BrandZ) ................................................................................................................. 28 Zettabyte Filesystem (ZFS) .............................................................................................................. 29 Dynamic Tracing (DTrace) .............................................................................................................. 29 Modular Debugger (MDB) .............................................................................................................. 30

3

............................................................................................................................................................ 92 Creating a Filesystem and /home Directories .................................................................................................Contents 5 Programming Concepts ........................... 57 DTracing Applications .................................................... 87 DTracing a Process Running in a Zone ....................................... 85 Global and Non-Global Zones ................................................................................................................................................................................................................................... 89 Creating Pools With Mounted Filesystems ................................................................................. 2006 ........................................ 74 10 Observing Processes in Zones With DTrace ........................................................................ 36 CPU Scheduling ................................................................................................... 49 Programming in D ........... 93 4 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March....................... 52 7 Debugging Applications With DTrace ...................................................................................................................................... 33 Threaded Programming ............................................................................................................................................................................................................................................................................................................ 91 Creating Mirrored Storage Pools ......................................................................................................................... 55 Enabling User Mode Probes ..................................................................................... 31 Process and System Management ............................................................................... 61 Using DTrace to Profile and Debug A C++ Program .................................................................................................................................................... 58 8 Debugging C++ Applications With DTrace ................ 47 Listing Traceable Probes .................................................................... 71 Software Memory Management ........ 88 11 Configuring Filesystems With ZFS ..................................................................................... 43 6 Getting Started With DTrace ........................................................................................................................... 73 Using DTrace and MDB to Examine Virtual Memory .............. 45 Enabling Simple DTrace Probes ................................................................................................................................................. 41 Process Debugging ............................................................................................................................................... 38 Kernel Overview ..................................................... 62 9 Managing Memory with DTrace and MDB ...........

.................................................124 Dummy Driver Source ................................................................................................................120 Building and Installing the Template Driver ..........122 Reading and Writing the Device .............97 Writing the Template Driver ......................................................................................................................................................... 110 Writing the Driver Data Structures ...........................................................................................................................................................................................................Configuring RAID-Z .......................................................................121 Testing the Template Driver ............................................................95 Overview of the Template Driver Example ................98 Writing the Autoconfiguration Entry Points ............................................................98 Writing the Loadable Module Configuration Entry Points .................................................................................................................132 ....................................................................................................................................................................................125 13 Debugging Drivers With DTrace ................................................................................................................................................................................................................................................................................131 Porting the smbfs Driver from Linux to the Solaris OS ...........................................94 12 Writing a Template Character Device Driver .....................................................................................................................102 Writing the User Context Entry Points .................................................................................................................122 Adding the Template Driver ....................................................123 Removing the Template Driver ............................................................................................................................................................................. 113 Writing the Device Configuration File .......................................

6 .

and source browser for the OpenSolaris project. we’ll briefly describe how the features and documentation enable straightforward configuration of a development environment and initiation into the development process.1 Objectives M O D U L E 1 What is the OpenSolaris Project? The objective of this course is to learn about operating system computing by using the Solaris™ Operating System source code that is freely available through the OpenSolaris project. Finally. projects. we’ll work through the following labs which are designed to demonstrate typical operating system issues by using OpenSolaris: I Process Debugging I I I I Enabling Simple DTrace Probes Listing Traceable Probes Programming in D Enabling User Mode DTrace Probes I Application Debugging I I DTracing Applications Using DTrace to Profile and Debug a C++ Program I Memory Management I Using DTrace and MDB to Examine Virtual Memory I Observing Processes I DTracing a Process Running in a Zone I Configuring Filesystems I I Creating Mirrored ZFS Storage Pools Creating a Filesystem and /home Directories 7 . We’ll start by showing you where to go to access the code. Then. discussions. communities.

What is the OpenSolaris Project? I I Configuring RAID-Z Device Drivers I I Writing a Template Character Device Driver Debugging a Device Driver with DTrace 8 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. 2006 .

2005 to create a community development effort using the Solaris OS code as a starting point.000 participants have become registered members.What is the OpenSolaris Project? Relevance The OpenSolaris project was launched on June 14. and rock-solid code base Availability under the OSI-approved Common Development and Distribution License (CDDL) allows royalty-free use. In the first eight months. x86 and AMD x64 architectures Leadership on 64–bit computing $0. seamless. and end users of the Solaris Operating System. Teaching with the OpenSolaris project provides the following advantages over instructional operating systems: I I Access to code for the revolutionary technologies in the Solaris 10 operating system Access to code for a commercial OS that is used in many environments and that scales to large systems Hardware platform support including SPARC. exciting. modification. including being the basis for future versions of the Solaris OS product. The OpenSolaris project is currently sponsored by Sun Microsystems.00 for infinite right-to-use Free. It is a nexus for a community development effort where contributors from Sun and elsewhere can collaborate on developing and improving operating system technology. Inc. and derived works I I I I I Module 1 • What is the OpenSolaris Project? 9 . over 12. complete. The engineering community is continually growing and changing to meet the needs of developers. innovative. other operating system projects. third-party products and distributions of interest to the community. system administrators. The OpenSolaris source code will find a variety of uses.

downloads. projects. view the license terms and access instructions for building source and installing the pre-built archives at: http://www. In addition.org/os/downloads.opensolaris. 2006 . communities. as shown in the upper-left of the graphic.Web Resources for OpenSolaris Web Resources for OpenSolaris You can download the OpenSolaris source. the OpenSolaris web site provides search across all of the site content and aggregated blogs. and source browser resources as shown in the following graphic. The icons in the upper-right of the OpenSolaris web pages link you to discussions. 10 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.

opensolaris. documents.opensolaris.org/os/community/device_drivers http://www.opensolaris.org/os/community/tools http://www. Discussions also provide an archive of previous conversations that you can reference for answers to your questions. technologies.org/os/communities for the complete list. Projects that are submitted and accepted by at least one other interested participant are given space on the projects page to get started.opensolaris. See http://opensolaris.org/os/community/documentation http://www. See http://www. See http://www.org web site are collaborative efforts that produce objects such as code changes. Projects have code repositories and committers and may live within a community or independently. Communities Communities provide connections to other participants with similar interests in the OpenSolaris project.opensolaris.org/os/community/edu http://www.opensolaris. New projects are initiated by participants by request on the discussions.org/os/projects for the current list of new projects.opensolaris. tools. Module 1 • What is the OpenSolaris Project? 11 .org/os/discussions for the complete list of forums to which you can subscribe.org/os/community/zfs http://www.Web Resources for OpenSolaris Discussions Discussions provide you with access to the experts who are working on new open source technologies.org/os/community/dtrace http://www. support.org/os/community/os_user_groups Academic and Research DTrace ZFS Zones Documentation Device Drivers Tools User Groups These are only a few of over 30 communities actively working on OpenSolaris.opensolaris. Communities form around interest groups. and user groups. or joint-authored products.org/os/community/zones http://www.opensolaris. for example: http://www.opensolaris. Projects Projects hosted on the opensolaris. graphics.

See http://www.org/os/project/opengrok to find out about the ongoing development project.Web Resources for OpenSolaris OpenGrok OpenGrok™ is the fast and usable source code search and cross reference engine used in OpenSolaris. OpenGrok understands various program file formats and version control histories like SCCS. and CVS. 12 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Take an online tour of the source and you’ll discover cleanly written. If you just need to know how some features work in the Solaris OS. RCS. so that you can better understand the open source.org/source to try it out! The first project to be hosted on opensolaris. you can download the complete codebase. 2006 .opensolaris. See http://cvs. The following graphic shows the results of an OpenGrok file path search on fbt. If you’re interested in working on an OpenSolaris project.org was OpenGrok. extensively commented code that reads like a book. the source code browser provides a convenient alternative.opensolaris.

Web Resources for OpenSolaris Module 1 • What is the OpenSolaris Project? 13 .

14 .

M O D U L E Planning the OpenSolaris Environment 2 2 Objectives The objective of this module is to understand the system requirements. support information and documentation available for the OpenSolaris project installation and configuration. 15 .

com/bigadmin/features/articles/laptop_resources. 2006 . Sun Studio 11: C User’s Guide. Click Sun Studio 11 Collection to see Sun Studio books about dbx. Inc.html I 16 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. and other software development topics. 2005.sun.. 2005.. Sun Microsystems. Inc.Planning the OpenSolaris Environment Additional Resources I I Solaris 10 Installation Guide: Basic Installations. Resources for Running Solaris OS on a Laptop: http://www. Sun Microsystems. Performance Analyzer. dmake.

forums. see the Solaris OS Hardware Compatibility List at http://www.tar.com/bigadmin/hcl.opensolaris.PLATFORM. See http://www. SPARC64. Source files Install images BFU archives The on-bfu-DATE.sun. Build 32 or newer.org/gswiki/Download-form.bz2 file is provided if you build from source. Build tools Module 2 • Planning the OpenSolaris Environment 17 . Pre-built OpenSolaris distributions are limited to the Solaris Express: Community Release [DVD Version]. Pentium. try http://www. The unique challenges of kernel development and access to root privileges for a system are made simpler by the tools. For supported systems. For the OpenSolaris kernel with the GNU user environment. The SUNWonbld-DATE. and Xeon EM64T.gnusolaris.Development Environment Configuration Development Environment Configuration There is no substitute for hands-on experience with operating system code and direct access to kernel modules.tar.PLATFORM.org/os/downloads for detailed instructions about how how to build from source. AMD64. and documentation provided for the OpenSolaris project. Consider the following features of OpenSolaris as you plan your development environment: TABLE 2–1 Configurable Lab Component Support Configurable Component Support From the OpenSolaris Project Hardware OpenSolaris supports systems that use the SPARC® and x86 families of processor architectures: UltraSPARC®.bz2 file is provided if you are installing from pre-built archives.

refer to http://www. 2006 .org/ os/community/xen/ for details and links to the Xen project. OpenSolaris is also a VMWare™ guest.org/os/project/content/articles/vmware for draft version of a recent article describing how to get started. Networking The OpenSolaris project meets future networking challenges by radically improving your network performance without requiring changes to your existing applications. Refer to Module 2 for more information about how Zones and Branded Zones enable kernel and user mode development of Solaris and Linux applications without impacting developers in separate zones. See http://www. I Memory/Disk Requirements I Memory requirement: 256M minimum.org/ os/community/tools/gcc for the gcc community. 1GB recommended Disk space requirement: 350M bytes Virtual OS environments Zones and Branded Zones in OpenSolaris provide protected and virtualized operating system environments within an instance of Solaris. OpenSolaris supports Xen. I Speeds application performance by about 50 percent by using an enhanced TCP/IP stack 18 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.Development Environment Configuration TABLE 2–1 Configurable Lab Component Support Configurable Component (Continued) Support From the OpenSolaris Project Compilers and tools Sun Studio 10 compilers and tools are freely available for use by OpenSolaris developers.opensolaris.org/ os/community/tools/sun_studio_tools/ for instructions about how to download and install the latest versions. See http://www. allowing one or more processes to run in isolation from other activity on the system. Also.opensolaris. an open-source virtual machine monitor developed by the Xen team at the University of Cambridge Computer Laboratory.opensolaris. see http://opensolaris.

Your lab environment becomes self-sustaining when hosted on OpenSolaris because you are always running the latest and greatest environment. and hardware offloading Accommodates high-availability. and Voice over IP (VoIP) networking features through extended routing and protocol support Supports current IPv6 specifications I I Find out more about ongoing networking developments in the OpenSolaris project here: http://opensolaris. Module 2 • Planning the OpenSolaris Environment 19 .org/os/community/networking/. empowered to update it yourself. streaming.Development Environment Configuration I Supports many of the latest networking technologies. such as 10 Gigabit Ethernet. Participation in the OpenSolaris project can improve overall performance across your network with the latest technologies. wireless networking.

20 .

M O D U L E OpenSolaris Policies 3 3 Objectives The objective of this module is to understand at a high-level the development process steps and the coding style that is used in the OpenSolaris project. 21 .

org/os/community/onnv/os_dev_process/ C Style and Coding Standards for SunOS.opensolaris. 2006 .opensolaris.org/os/community/documentation/getting_started_docs/ I 22 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. http://www. http://www.OpenSolaris Policies Additional Resources I OpenSolaris Development Process.

Design The Design phase determines whether or not a formal design review is even needed.opensolaris. and completeness. Implementation The Implementation phase consists of the following: I Writing of the actual code in accordance with policies and standards Download C Style and Coding Standards for SunOS here:http://www. The announcement has the following benefits: I I I I Precipitate discussion of the change or enhancement Determine the complexity of the proposed change(s) Gauge community interest Identify potential team members 2. Integration Integration happens after all reviews have been completed and permission to integrate has been granted. If a formal review is needed.org web page. someone has an idea for an enhancement or has a gripe about a defect. documentation. which means conducting reviews for code.Development Process and Coding Style Development Process and Coding Style The development process for the OpenSolaris project follows the following high-level steps: 1. Search for an existing bug or file a new bug or request for enhancement (RFE) by using the http://bugs.opensolaris. complete the following next steps: I I I I Identify design and architectural reviewers Write a design document Write a test plan Conduct design reviews and get the appropriate approvals 3. The Integration phase is to make sure everything that was supposed to be done has in fact been done. Next. announce it to other developers on the appropriate E-mail list.org/os/community/documentation/getting_started_docs/ I I I I Writing the test suites Passing various unit and pre-integration tests Writing or updating the user documentation. if needed Identifying code reviewers in preparation for integration 4. Module 3 • OpenSolaris Policies 23 . Idea First.

opensolaris. automate the diagnosis. software or hardware. Two tools for checking many elements of the coding style are available as part of the OpenSolaris distribution. Serviceability – It must be possible to diagnose both fatal and transient issues and wherever possible. 24 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. in a consistent and straightforward manner.Development Process and Coding Style The formal process document for OpenSolaris describes the previous steps in greater detail. Security – OpenSolaris security must be designed into the operating system. Platform Neutrality – OpenSolaris must continue to be platform neutral and lower level abstractions must be designed with multiple and future platforms in mind. Like many projects. with flow charts that illustrate the development phases. This style is described in detail at http://opensolaris. regardless of its source. and hdrchk(1) for checking the style of C and C++ headers.org/os/community/onnv/. Availability – Services must be designed to be restartable in the event of an application failure and OpenSolaris itself must be able to recover from non-fatal hardware failures. Maintainability – OpenSolaris must be architected so that common subroutines are combined into libraries or kernel modules that can be used by an arbitrary number of consumers. Compatibility – New subsystems and interfaces must be extensible and versioned in order to allow for future enhancements and changes without sacrificing compatibility. OpenSolaris enforces a coding style on contributed code. providing accurate results with no data loss or corruption. Performance – The performance of OpenSolaris must be second to none when compared to other operating systems running on identical environments. I I I I I I I I Refer to http://www.org/os/community/onnv/os_dev_process/ for more detailed information about the process that is used for collaborative development of OpenSolaris code. That document also details the following design principles and core values that are to be applied to source code development for the OpenSolaris project: I Reliability – OpenSolaris must perform correctly. with mechanisms in place in order to audit changes done to the system and by whom. 2006 . Manageability – It must allow for the management of individual components. These tools are cstyle(1) for verifying compliance of C code with most style guidelines.

M O D U L E Features of the OpenSolaris Project 4 4 Objectives The objective of this module is to describe the major features of the OpenSolaris project and how the features have fundamentally changed operating system computing. 25 .

participate in research. Fault Management Architecture (FMA) The Solaris OS provides a new architecture. for building resilient error handlers. processes. response agents. This section describes the new Fault Management Architecture and Services Management Facility that make up the self-healing technology. error telemetry. and a consistent model of system failures for a management stack. The least privilege allows students to be granted the privileges that they need to complete their course work. automated diagnosis software. the UltraSPARC 26 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. let’s briefly talk about the following features of the operating environment: I I I I I I I Security Technology: Least Privilege Services Management Facility (SMF) Zones Branded Zones (BrandZ) Zetabyte File System (ZFS) Dynamic Tracing Facility (DTrace) Modular Debugger (MDB) Security Technology: Least Privilege UNIX® has historically had an all-or-nothing privilege model that imposes the following restrictions: I I I I No way to limit root user privileges No way for non-root users to perform privileged operations Applications needing only a few privileged operations must run as root Very few are trusted with root privileges and virtually no students are so trusted In the Solaris OS we’ve developed fine-grained privileges. and guidelines for OpenSolaris development. 2006 . Many parts of Solaris are already participating in FMA. Predictive Self-Healing Predictive self-healing was implemented in two ways in the Solaris 10 OS. FMA. Fine-grained privileges allows applications and users to run with just the privileges they need. including the CPU and Memory error handling for UltraSPARC III and IV.Overview Overview Now that you have considered the components. and maintain a portion of the campus or department infrastructure.

and more. Module 4 • Features of the OpenSolaris Project 27 . simple view for administrators and system management software. A variety of projects are underway. and integration with various management stacks. all with a unified. See http://opensolaris.d(4) startup mechanism and includes an enhanced inetd(1M) . The smf(5) framework replaces (in a compatible manner) the existing init. ftp requests. and I/O faults on Opteron. The sophisticated resource management facilities of zones addresss the unique challenges of application development and testing in shared environments.org/os/community/fm for information about how to participate in the Fault Management community or to download the Fault Management MIB that is currently in development. including full support for CPU.org/os/community/smf/scfdot to see a graph of the SMF services and their dependencies on an x86 system freshly installed with the Solaris OS Nevada build 24. and remote command execution in the OpenSolaris project. Opteron support is scheduled for build 34. for both microscopic and macroscopic system resources. and telemetry events are produced that drive automated diagnosis and response. promoting the service to a first-class operating system object. and observation Access to service-based resource management Simplified boot-process debugging I I I See http://opensolaris. unified model for management of an enormous number of services. Services Management Facility (SMF) SMF creates a supported. Beyond consistent error handling. or uncorrectable hardware errors A single API for service management. error handling is made resilient so that the system can continue to operate despite some underlying failure. When a subsystem is converted to participate in Fault Management.Overview PCI HBAs. such as email delivery. the OpenSolaris project provides application-level features and functionality to create separate and protected run-time environments. conversion of key device drivers. In addition to service-level management improvements. SMF gives developers the following: I Automated restart of services in dependency order due to administrative errors. configuration. software bugs. Memory. The Fault Management tools and architecture enable development of self-healing content for software and hardware failures.

To ease the labor of managing multiple applications and their environments. Together. This feature is only available for x86 and AMD x64 architectures at this time. the resource management facilities can be used to prevent processes in one zone from using too much of a system resource or to guarantee them a certain service level. they co-exist within one operating system instance. A branded zone may be as simple as an environment where the standard Solaris utilities are replaced by their GNU equivalents. within zones that are running a complete Linux user space. Zones can be combined with the resource management facilities which are present in OpenSolaris to provide more complete. 28 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. only 32-bit Linux applications are able to run. Regardless of the underlying kernel. Zones provide protected environments for Solaris applications. However. While the zone supplies the security. The OpenSolaris project addresses the unique challenges of operating system development and testing for application performance using features like zones. The applications are protected from each other to provide software fault isolation. name space and fault isolation.org/os/community/brandz/install/ for the installation requirements and instructions. See http://opensolaris. and includes the tools necessary to install a CentOS or Red Hat Enterprise Linux distribution inside a zone on a Solaris system. the OpenSolaris project takes zones a step further and provides separate and protected run-time environments. and are usually managed as one entity. using BrandZ. isolated environments. Refer to http://opensolaris. zones and resource management are often referred to as containers. for Linux applications. for example. The lx brand will run on x86/x64 systems booted with either a 32-bit or 64-bit kernel. The lx brand enables Linux binary applications to run unmodified on Solaris.org/os/community/zones/faq/ for answers to a large number of common questions about zones and links to the latest administration documentation. 2006 . The lx brand enables user-level Linux software to run on a machine with a OpenSolaris kernel. Branded Zones (BrandZ) BrandZ is a framework that extends the zones infrastructure to create Branded Zones. or as complex as a complete Linux user space.Overview Zones A zone is a virtual operating system abstraction that provides a protected environment in which applications run. which are zones that contain non-native operating environments. Additionally. porting to SPARC might be an interesting community project because BrandZ lx is still very much a work in progress. filesystem partitioning for kernel development is simplified by the ZFS code in the OpenSolaris project.

the code made available in the OpenSolaris project provides a sophisticated dynamic tracing facility (DTrace) for debugging kernel and application behavior. similar to RAID-5. so they can be created easily and quickly like directories. a demonstration of administering mirrored pools with ZFS. The combined I/O bandwidth of all devices in the pool is available to all filesystems at all times. In addition to pooled storage.opensolaris. ZFS uses variable-width RAID stripes so that all writes are full-stripe writes. See http://www.org/os/community/zfs/demos/basics/ for 100 Mirrored Filesystems in 5 Minutes. In RAID-Z. They grow automatically within the space allocated to the storage pool. This is only possible because ZFS integrates filesystem and device management in such a way that the filesystem’s metadata has enough information about the underlying data replication model to handle variable-width RAID stripes. ZFS provides RAID-Z data redundancy configuration. Dynamic Tracing (DTrace) DTrace provides a powerful infrastructure to permit administrators. RAID-Z is the world’s first software-only solution to the RAID-5 write hole. wasted bandwidth. Each storage pool is comprised of one or more virtual devices. and stranded storage. DTrace enables you to do the following: I I I I I I Dynamically enable and manage thousands of probes Dynamically associate predicates and actions with probes Dynamically manage trace buffers and probe overhead Examine trace data from a live system or from a system crash dump Implement new trace data providers that plug into DTrace Implement trace data consumers that provide data display 29 Module 4 • Features of the OpenSolaris Project . ZFS presents a pooled storage model that eliminates the concept of volumes and the associated problems of partitions. RAID-Z is a virtual device that stores data and parity on multiple disks.Overview Zettabyte Filesystem (ZFS) ZFS filesystems are not constrained to specific devices. developers. which describe the layout of physical storage and its fault characteristics. In addition to enhanced configuration and administration features that simplify and support developer requirements. provisioning. and service personnel to concisely answer arbitrary questions about the behavior of the operating system and user programs.

Overview I Implement tools that configure DTrace probes Find the DTrace community pages here http://www. MDB is available as two commands that share common features: mdb and kmdb. You can use the kmdb command to debug the live operating system kernel and device drivers when you also need to control and halt the execution of the kernel.org/os/community/mdb 30 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. for example. the live operating system. object files.opensolaris. and other files. In addition to DTrace. You can use the mdb command interactively or in scripts to debug live user processes. where you can ask the experts or review previous conversations and common questions. 2006 . examination of core files. user process core files. kernel crash dumps.org/os/community/dtrace. Modular Debugger (MDB) MDB is a debugger designed to facilitate analysis of problems that require low-level debugging facilities.opensolaris. Generally. and knowledge of assembly language to diagnose and correct. device driver development. There is an active community for MDB. See http://www. kernel and device developers rely on mdb to determine why and where their code went wrong. the OpenSolaris project provides debugging facilities for low-level types of development.

M O D U L E Programming Concepts 5 I I I I 5 Objectives This module provides a high-level description of the fundamental concepts of the OpenSolaris programming environment. as follows: Threaded Programming Kernel Overview CPU Scheduling Process Debugging 31 .

2004). Prentice Hall PTR (May 12. 2005. Prentice Hall PTR (August 19. Inc. Inc. STREAMS Programming Guide. 2005. 2005. 2006) by Jim Mauro and Richard McDougall Solaris Systems Programming. 2006 . Sun Microsystems.Programming Concepts Additional Resources I Solaris Internals (2nd Edition). I I I I 32 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Inc.. Sun Microsystems.. by Rich Teer Multithreaded Programming Guide. Solaris 64-bit Developer’s Guide. Sun Microsystems..

When the pools facility is disabled. New pools can be created and associated with processor sets. pool_default. and labelled such that workload components are associated with a subset of a system’s total resources. The resource pools facility brings together process-bindable resources into a common abstraction called a pool. which is a grouping mechanism for processes. Processes in a given task or a given project can only be bound to different pools if they were rebound individually one by one as single processes. Threads or LWPs of the same process do not have pool bindings. By default. Process IDs (PIDs) are numbered sequentially throughout the system.Process and System Management Process and System Management The basic unit of workload is the process. That is. Also. all processes belong to the same pool. or (in case of an error) they will be all left bound to the old pool. The following picture shows one possible pool configuration with three pools and three processor sets. and processor sets are managed through the pset() system call. If we search OpenGrok for pool. Each successful login to a project creates a new task. and are bound to the same resource sets associated with the resource pool of that process. Processes may be bound to pools that have non-empty resource sets. Note that processor set "foo" is not associated with any pools and therefore cannot have any processes bound to it. processor sets must be managed by using the pools facility. A task contains the login process as well as subsequent child processes. Module 5 • Programming Concepts 33 . which is a network-wide administrative identifier. we find that the code comments provide a graphical representation of these relationships: 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 * * * * * * * * * * * * * * * * * The operation that binds tasks and projects to pools is atomic.c. note that processes in Task 2 are bound to different pools. Two pools (default and foo) are associated with the same processor set (default). each user is assigned by the system administrator to a project. either all processes in a given task or a project will be bound to a new pool. grouped. When the pools facility is enabled. Processor sets and other entities are configured.

|....... tasks.............::.. often for security purposes.. A zone can be thought of as a container in which one or more applications run isolated from all other applications on the system.::.|. : | :: | | | :: | | : : +---+ :: +---+ +---+ +---+ :: +---+ +---+ : Processes : | p | :: | p | | p | | p | :: | p |...| p | : : +---+ :: +---+ +---+ +---+ :: +---+ +---+ : :.. New types of resource sets will be added in the future........ 34 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March..... and processor sets... Processes can be optionally be run inside a zone..... pools..... projects........ 2006 ..: Task 1 Task 2 Task N | | | | | | | +-----------+ | +-----------+ +--| Project 1 |--+ | Project N | +-----------+ +-----------+ This is just an illustration of relationships between processes..Process and System Management 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Processor Sets +---------+ +--------------+========================> | default | a| | +---------+ s| | || s| | +---------+ o| | | foo | c| | +---------+ i| | || a| | +---------+ t| | +------> | bar | e| | | +---------+ d| | | | | +---------+ +---------+ +---------+ Pools | default |======| foo |======| bar | +---------+ +---------+ +---------+ @ @ @ @ @ @ b| | | | | | o| | | | | | u| +-----+ | +-------+ | +---+ n| | | | | | ............... in order to isolate groups of users or processes from one another..d|..........|....|.|.. Zones are setup by system administrators.

At each of these points. An example might be the ability to change the system’s time-of-day clock. Applications which require direct access to these devices may need to be modified to work correctly. for example. in some cases this may increase security risks. should continue to be used within the global zone. should work correctly. The few applications which fall into this category may need applications to run properly inside a zone or in some cases. Applications should instead use one of the many IP services.Process and System Management Most software that runs on OpenSolaris will run unmodified in a zone. will usually work if the zone is configured correctly. a brand may choose to supplement or replace the standard behavior of the Solaris OS. Fundamentally different brands may require new interposition points. recompiling an application is not necessary in order to run it inside a zone. Each brand may provide pre-boot and post-boot scripts that allow us to do any final boot-time setup or configuration. I I BrandZ extends the Zones infrastructure in user space in the following ways: I I A brand is an attribute of a zone. set at zone configuration time. Since zones do not change the OpenSolaris Application Programming Interface (APIs) or Application Binary Interface (ABI). For example. a disk partition. The zonecfg and zoneadm tools can set and report a zone’s brand type. However. process loading path. Here are some guidelines: I An application which accesses the network and files. These interposition points are only applied to processes in a branded zone. I I I Module 5 • Programming Concepts 35 . etc. and performs no other I/O. Applications which require direct access to certain devices. thread creation path. Each brand provides its own installation routine. /dev/kmem. or a network device. I I BrandZ provides a set of interposition points in the kernel: I These points are found in the syscall path. A small number of applications which are normally run as root or with certain privileges may not run inside a zone if they rely on being able to access or change some global resource. which allows us to install an arbitrary collection of software in the branded zone.

The libraries are libpthread for POSIX threads. I I I 36 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. 2006 . To modify a resource. const pthread_attr_t *tattr. a process is also an address space. int pthread_create(pthread_t *tid. But. projects. Go to /on/usr/src/lib/libc/spec/threads. Counting semaphores typically coordinate access to resources. so programming with multiple processes is programming with multiple threads. The four synchronization objects are mutex locks. and libthread for OpenSolaris threads. and semaphores. The pthread_create() function is called with attr that has the necessary state behavior. let’s discuss processes in the context of threads. void *arg). zones. start_routine is the function with which the new thread begins execution. inlcuding a common address space and open file descriptors. and creating a process involves creating a new address space. In OpenSolaris. the thread exits with the exit status set to the value returned by start_routine. resource pools. So. data produced by one thread is immediately available to all the other threads. condition variables. Communication between the threads of one process is simple because the threads share everything. Read/write locks permit concurrent reads and exclusive writes to a protected shared resource. a thread must first acquire the exclusive write lock. Any other return value indicates that an error occurred. Traditional UNIX already supports the concept of threads. The count is the limit on how many threads can have access to a semaphore. When the count is reached. pthread_create() returns zero when the call completes successfully. void*(*start_routine)(void *). or to access specific data. When start_routine returns. Use pthread_create(3C) to add a new thread of control to the current process. read/write locks. and branded zones. An exclusive write lock is not permitted until all read locks have been released. I Mutex locks allow only one thread at a time to execute a specific section of code. multithreading support for both sets of interfaces is provided by the standard C library.spec in OpenGrok for the complete list of pthread functions and declarations. the thread that is trying to access the resource blocks. Condition variables block threads until a particular condition is true. Each process contains a single thread.Process and System Management Threaded Programming Now that we’ve learned about processes in the context of tasks. Thread synchronization enables you to control program flow and access to shared data for concurrently executing threads. Multithreading provides flexibility by decoupling kernel-level and user-level resources.

. Both libthreads block signals under the bind_guard/bind_clear interfaces. When called via _ld_concurrency() from libthread these vectors are reassigned to real threads interfaces. Synchronization objects can also be placed in files. TI_VERSION == 2 Under this model only libthreads bind_guard/bind_clear and thr_self interfaces are used. * * * * * * * * * * * * * * * * * * * * * Implementation of all threads interfaces between ld. The synchronization objects can have lifetimes beyond the life of the creating process. Module 5 • Programming Concepts 37 . Under lib/libthread these interfaces provided _sigon/_sigoff (unlike lwp/libthread that provided signal blocking via bind_guard/bind_clear. The threads can communicate with each other even though the threads in different processes are generally invisible to each other.. Two models are supported: TI_VERSION == 1 Under this model libthread provides rw_rwlock/rw_unlock. In a non-threaded environment all thread interfaces are vectored to noops. The use of mutexes over reader/writer locks also enables the use of condition variables for controlling thread concurrency (allows access to objects only after their .1 and libthread. Code comments in the mutex. This removes recursive problems encountered when obtaining locking interfaces from libthread. through which we vector all rt_mutex_lock/rt_mutex_unlock calls.c file reveal the following: 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 .so. Lower level locking is derived from internally bound _lwp_ interfaces.init has completed). Threads in different processes can communicate with each other through synchronization objects that are placed in threads-controlled shared memory.Process and System Management Synchronization Synchronization objects are variables in memory that you access just like data.

Process and System Management

OpenGrok results for a full search on POSIX reveal the POSIX.pod file that includes the module, as described in the following comments:
POSIX 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ... Perl interface to IEEE Std 1003.1 =head1 SYNOPSIS use POSIX; use POSIX qw(setsid); use POSIX qw(:errno_h :fcntl_h); printf "EINTR is %d\n", EINTR; $sess_id = POSIX::setsid(); $fd = POSIX::open($path, O_CREAT|O_EXCL|O_WRONLY, 0644); # note: that’s a filedescriptor, *NOT* a filehandle =head1 DESCRIPTION The POSIX module permits you to access all (or nearly all) the standard POSIX 1003.1 identifiers. Many of these identifiers have been given Perl-ish interfaces. Things which are C<#defines> in C, like EINTR or O_NDELAY, are automatically exported into your namespace. All functions are only exported if you ask for them explicitly. Most likely people will prefer to use the fully-qualified function names. This document gives a condensed list of the features available in the POSIX module.

Now that you understand a bit about how synchronization objects are defined in multi-threaded programming, let’s learn how these objects are managed by using scheduling classes.

CPU Scheduling
Processes run in a scheduling class with a separate scheduling policy applied to each class, as follows:
I

Realtime (RT) – The highest-priority scheduling class provides a policy for those processes that require fast response and absolute user or application control of scheduling priorities. RT scheduling can be applied to a whole process or to one or more lightweight processes (LWPs) in a process. You must have the proc_priocntl privilege to use the Realtime class. See the privileges(5) man page for details.

38

Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March, 2006

Process and System Management

I

System (SYS) – The middle-priority scheduling class, the system class cannot be applied to a user process. Timeshare (TS) – The lowest-priority scheduling class is TS ,which is also the default class. The TS policy distributes the processing resource fairly among processes with varying CPU consumption characteristics. Other parts of the kernel can monopolize the processor for short intervals without degrading the response time seen by the user. Inter-Active (IA) – The IA policy distributes the processing resource fairly among processes with varying CPU consumption characteristics, while also providing good responsiveness for user interaction. Fair Share (FSS) – The FSS policy distributes the processing resource fairly among projects, independent of the number of processes they own by specifying shares to control the process entitlement to CPU resources. Resource usage is remembered over time, so that entitlement is reduced for heavy usage and increased for light usage with respect to other projects. Fixed-Priority (FX) – The FX policy provides a fixed priority preemptive scheduling policy for those processes requiring that the scheduling priorities do not get dynamically adjusted by the system and that the user or application have control of the scheduling priorities. This class is a useful starting point for affecting CPU allocation policies.

I

I

I

I

A scheduling class is maintained for each lightweight process (LWP). Threads have the scheduling class and priority of their underlying LWPs. Each LWP in a process can have a unique scheduling class and priority that are visible to the kernel. Thread priorities regulate contention for synchronization objects. The RT and TS scheduling classes both call priocntl(2) to set the priority level of processes or LWPs within a process. Using OpenGrok to search the code base for priocntl, we find the variables that are used in the RT and TS scheduling classes in the rtsched.c file as follows:
27 #pragma ident "@(#)rtsched.c 1.10 05/06/08 SMI" 28 29 #include "lint.h" 30 #include "thr_uberdata.h" 31 #include <sched.h> 32 #include <sys/priocntl.h> 33 #include <sys/rtpriocntl.h> 34 #include <sys/tspriocntl.h> 35 #include <sys/rt.h> 36 #include <sys/ts.h> 37 38 /* 39 * The following variables are used for caching information 40 * for priocntl TS and RT scheduling classs. 41 */

Module 5 • Programming Concepts

39

Process and System Management

42 43 44 45 46 47 48 49 50 ...

struct pcclass ts_class, rt_class; static static static static static static static rtdpent_t *rt_dptbl; int rt_rrmin; int rt_rrmax; int rt_fifomin; int rt_fifomax; int rt_othermin; int rt_othermax; /* RT class parameter table */

Typing the man priocntl command in a terminal window shows the details of each scheduling class and describes attributes and usage. For example:
% man priocntl Reformatting page. Please Wait... done User Commands NAME priocntl - display or set scheduling parameters of specified process(es) SYNOPSIS priocntl -l priocntl -d [-i idtype] [idlist] priocntl -s [-c class] [ class-specific i idtype] [idlist] priocntl -e [-c class] [ class-specific [argument(s)] options] [priocntl(1)

options] command

DESCRIPTION The priocntl command displays or sets scheduling parameters of the specified process(es). It can also be used to display the current configuration information for the system’s process scheduler or execute a command with specified scheduling parameters. Processes fall into distinct classes with a separate scheduling policy applied to each class. The process classes currently supported are the real-time class, time-sharing class, interactive class, fair-share class, and the fixed

40

Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March, 2006

including file systems. Schedules and switches threads. System software is protected from user programs. Provides applications with system services such as I/O management. I I I The following section discusses several important differences between kernel modules and user programs. virtual memory. With appropriate permissions. TimeSharing Class. pages memory. processes. the --More--(4%) Kernel Overview Now that you have a high-level understanding of processes. threads. A module runs in kernel space. and physical devices. The characteristics of these classes and the class-specific options they accept are described below in the USAGE section under the headings Real-Time Class. A user program typically executes sequentially and performs a single task from beginning to end. Execution Differences Between Kernel Modules and User Programs The following characteristics of kernel modules highlight important differences between the execution of kernel modules and the execution of user programs: I Kernel modules have separate address space. A kernel module does not execute sequentially. and scheduling. The Solaris kernel does the following: I I Manages the system resources. Inter-Active Class. Fair-Share Class. Code that runs in kernel space has greater privilege than code that runs in user space. Assigns priorities. and services hardware interrupts and exceptions. Coordinates interactions of all user processes and system resources. A kernel module registers itself in order to serve future requests. services resource requests. 41 I I Module 5 • Programming Concepts . Kernel modules have higher execution privilege. and swaps processes. Kernel space and user space have their own memory address spaces. let’s discuss the kernel and how kernel modules are different from user programs.Process and System Management priority class. and Fixed-Priority Class. Kernel modules do not execute sequentially. An application runs in user space. and scheduling.

I I I I 42 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Kernel modules use different header files. have no main() routine. In a symmetric multiprocessor (SMP) system. something which OpenSolaris has for some of the more recent x86/x64 and UltraSPARC platforms. otherwise customized code can be written for both kernel and user/libraries. I I Structural Differences Between Kernel Modules and User Programs The following characteristics of kernel modules highlight important differences between the structure of kernel modules and the structure of user programs: I Kernel modules do not define a main program. As much as possible. 2006 . Kernel modules are linked only to the kernel. So. Kernel modules can dedicate process registers to specific roles. Kernel code can be optimized for a specific processor. Kernel modules can be customized for hardware. More than one process can request your driver at the same time. your driver could be executing concurrently on more than one CPU. Kernel modules must be preemptable. Different threads of an application program need not share data. the data structures and routines that constitute a driver are shared by all threads that use the driver. Kernel modules can share data. For example. When you must use global symbols. Kernel modules do not link in the same libraries that user programs link in. including device drivers. while the kernel can dedicate certain registers to certain roles. declare symbols as static. give them a prefix that is unique within the kernel. You cannot assume that your driver code is safe just because your driver code does not block. Your driver must be able to handle contention issues that result from multiple requests. a kernel module is a collection of subroutines and data. Design your driver data structures carefully to keep multiple threads of execution separate.Process and System Management I Kernel modules can be interrupted. Kernel modules should avoid global variables. The only functions a kernel module can call are functions that are exported by the kernel. Avoiding global variables in kernel modules is even more important than avoiding global variables in user programs. The required header files are listed in the man page for each function. an interrupt handler can request your driver at the same time that your driver is serving a system call. Design your driver assuming your driver might be preempted. Kernel modules require a different set of header files than user programs require. Kernel modules. Instead. Kernel modules can include header files that are shared by user programs if the user and kernel interfaces within such shared header files are defined conditionally using the _KERNEL macro. By contrast. Using this prefix for private symbols within the module also is a good practice. You can also have customized libraries as well.

or using the services provided by libthread_db (if 36 * the process is linked with libthread). The collection of subroutines and data that constitute a device driver can be compiled into a single loadable module of object code.c 1. This might happen if you were looking at two multi-threaded 45 * user processes inside of a crash dump. Once an object is 51 * loaded. There are also two possible 39 * libthread implementations (one in /usr/lib and one in /usr/lib/lwp) so we 40 * cannot link mdb against libthread_db directly. This mechanism also has the nice property that we don’t bother 53 * loading libthread_db until we need it. a process may begin 37 * life as a single-threaded process and then later dlopen() libthread. instead. the proc target must be 33 * able to query and modify information such as a thread’s register set using 34 * either the native LWP services provided by libproc (if the process is not 35 * linked with libthread). Process Debugging Debugging processes at all levels of the development stack is a key part of writing kernel modules. one using /usr/lib/libthread.c file that describe the connection between multi-threaded debugging and how mdb works: #pragma ident "@(#)mdb_tdb. we don’t bother unloading it unless the entire cache is explicitly 52 * flushed. You can add functionality to the kernel while the system is up and running. 50 * and fill in an ops vector which we return to the caller. To meet these requirements. Additionally. look up the symbols we need to reference. reveals the following code comments in the mdb_tdb.so the victim 42 * process has open. This loadable module can then be statically or dynamically linked into the kernel and unlinked from the kernel. so the debugger starts up faster. so we 38 * must be prepared to switch modes on-the-fly.so. we dlopen() it. A full search for libthread in OpenGrok. You can test new versions of your driver without rebooting your system. 54 */ Module 5 • Programming Concepts 43 . mdb is designed so that multiple targets can be 43 * active simultaneously.4 05/06/08 SMI" 28 29 /* 30 * libthread_db (tdb) cache 31 * 32 * In order to properly debug multi-threaded programs. The proc target calls 48 * mdb_tdb_load() with the pathname of a libthread_db to load. and if it is 49 * not already open. we must dlopen the 41 * appropriate libthread_db on-the-fly based on which libthread. so we could even have *both* libthread_db’s open at 44 * the same time.so and 46 * the other using /usr/lib/lwp/libthread. Finally.Process and System Management I Kernel modules can be loaded and unloaded on demand. we 47 * implement a libthread_db "cache" in this file.

address::context Context switch to the specified process. addr ::delete [id | all] Delete the event specifiers with the given ID number. 2006 . pid::attach Attaches to process by using the pid. Set a breakpoint at the specified locations. [ addr ] ::bp [+/-dDestT] [-c cmd] [-n count] sym ..Process and System Management The following mdb commands can be used to access the LWPs of a multi-threaded program: I I I I $l Prints the LWP ID of the representative thread if the target is a user process. 44 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. or process ID. We’ll start the hands-on lab exercises with DTrace and then add MDB when the debugging becomes more complex.. $L Prints the LWP IDs of each LWP in the target if the target is a user process. These commands to set conditional breakpoints are often useful. The process can subsequently be continued by prun(1) or it can be resumed by applying MDB or another debugger. ::release Releases the previously attached process or core file. I I I DTrace probes are constructed in a manner similar to MDB queries.

45 .M O D U L E Getting Started With DTrace 6 6 Objectives The objective of this lab is to introduce you to DTrace using a probe script for a system call using DTrace.

2006 .Getting Started With DTrace Additional Resources I Solaris Dynamic Tracing Guide. 2005.. Inc. 46 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Sun Microsystems.

are printed. In this example. You can use the dtrace(1M) utility’s -n option to enable a probe using its string name. which fires once each time you start a new tracing request. Notice that by default. Once you see this output. dtrace remains paused waiting for other probes to fire. Since you haven’t enabled any other probes and BEGIN only fires once. Module 6 • Getting Started With DTrace 47 .Getting Started With DTrace Enabling Simple DTrace Probes Completion of the lab exercise will result in basic understanding of DTrace probes. the integer name of the CPU on which this probe fired is displayed. Enable the probe: # dtrace -n BEGIN After a brief pause. We’re going to start learning DTrace by building some very simple requests using the probe named BEGIN. the CPU column indicates that the dtrace command was executing on CPU 0 when the probe fired. press Control-C in your shell to exit dtrace and return to your shell prompt: 3 Return to your shell prompt by pressing Control-C: # dtrace -n BEGIN dtrace: description ’BEGIN’ matched 1 probe CPU ID FUNCTION:NAME 0 1 :BEGIN ^C # The output tells you that the probe named BEGIN fired once and both its name and integer ID. 1 2 Open a terminal window. 1. you will see dtrace tell you that one probe was enabled and you will see a line of output indicating that the BEGIN probe fired.

The END probe fires once when tracing is completed.Getting Started With DTrace You can construct DTrace requests using arbitrary numbers of probes and actions. Let’s create a simple request using two probes by adding the END probe to the previous example command. 48 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. 2006 . As you can see. pressing Control-C to exit DTrace triggers the END probe. 4 Add the END probe: # dtrace -n BEGIN -n END dtrace: description ’BEGIN’ matched 1 probe dtrace: description ’END’ matched 1 probe CPU ID FUNCTION:NAME 0 ^C 0 2 :END # 1 :BEGIN The END probe fires once when tracing is completed. DTrace reports this probe firing before exiting.

Providers are used to classify the probes.Getting Started With DTrace Listing Traceable Probes The objective of this lab is to explore probes in more detail and to show you how to list the probes on a system. you learned to use two simple probes named BEGIN and END. 3 Type the dtrace command with the -l option: # dtrace -l | more ID PROVIDER 1 dtrace 2 dtrace 3 dtrace 4 lockstat 5 lockstat 6 lockstat 7 lockstat --More-MODULE FUNCTION NAME BEGIN END ERROR mutex_enter adaptive-acquire mutex_enter adaptive-block mutex_enter adaptive-spin mutex_exit adaptive-release genunix genunix genunix genunix The probes that are available on your system are listed with the following five pieces of data: I I ID . each of which performs a particular kind of instrumentation to create probes. When you use DTrace.The name of the probe. 49 I I I Module 6 • Getting Started With DTrace . But where did these probes come from? DTrace probes come from a set of kernel modules called providers. Module .Name of the Provider. This is also the method of instrumentation. the syscall provider provides probes in every system call and the fbt provider provides probes into every function in the kernel. Name . In the preceding examples. 1 2 Open a terminal window. You can then enable and bind your tracing actions to any of the probes that have been published.The name of the Unix module or application library of the probe. For example. Type the following command: # dtrace The dtrace command options are printed to the output. Function . each provider is given an opportunity to publish the probes it can provide to the DTrace framework.The name of the function in which the probe exists.Internal ID of the probe listed. Provider .

2006 . 5 Add one of the following options to filter the list: I I I I -P for provider -m for module -f for function -n for name Consider the following examples: # dtrace -l -P lockstat ID PROVIDER MODULE 4 lockstat genunix 5 lockstat genunix 6 lockstat genunix 7 lockstat genunix FUNCTION mutex_enter mutex_enter mutex_enter mutex_exit NAME adaptive-acquire adaptive-block adaptive-spin adaptive-release Only the probes that are available in the lockstat provider are listed in the output. The number will vary depending on your system type. # dtrace -l -n start ID PROVIDER 506 proc 2766 io 2768 io 5909 io MODULE unix genunix genunix nfs FUNCTION lwp_rtt_initial default_physio aphysio nfs4_bio NAME start start start start 50 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. # dtrace -l -m ufs ID PROVIDER 15 sysinfo 16 sysinfo 356 fbt MODULE FUNCTION NAME ufs ufs_idle_free ufsinopage ufs ufs_iget_internal ufsiget ufs allocg entry Only the probes that are in the UFS module are listed in the output.Getting Started With DTrace 4 Pipe the previous command to wc to find the total number of probes in your system: # dtrace -l | wc -l 30122 The number of probes that your system is currently aware of is listed in the output. # dtrace -l -f open ID PROVIDER 4 syscall 5 syscall 116 fbt 117 fbt MODULE FUNCTION open open open open NAME entry return entry return genunix genunix Only the probes with the function name open are listed.

Getting Started With DTrace The above command lists all the probes that have the probe name start. Module 6 • Getting Started With DTrace 51 .

"Hello. the string “hello. and then print it out.d dtrace: script ’hello. you did not have to wait and press Control-C. enabling. 1 2 3 Open a terminal window.d.d. 52 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Type in your first D program: BEGIN { trace("hello. world # As you can see. you’re ready to write the DTrace version of everyone’s first program. Your first statement uses the function trace() to indicate that DTrace should record the specified argument.d file. and an optional set of actions to perform when the probe fires. in addition to constructing DTrace experiments on the command line. and listing probes. each clause describing one or more probes to enable. Unlike the previous example. you can also write them in text files using the D programming language. The second statement uses the function exit() to indicate that DTrace should cease tracing and exit the dtrace command. In a text editor. Each D program consists of a series of clauses.d’ matched 1 probe CPU ID FUNCTION:NAME 0 1 :BEGIN hello.Getting Started With DTrace Programming in D Now that you understand a little bit about naming. These changes were the result of the actions you specified for your BEGIN probe in hello. either. exit(0). world"). Each statement ends with a semicolon (.). } 4 5 Save the hello. world”. create a new file called hello." This lab demonstrates that. Run the program by using the dtrace -s option: # dtrace -s hello. dtrace printed the same output as before followed by the text “hello. World. world”. when the BEGIN probe fires. Let’s explore the structure of your D program in more detail in order to understand what happened. The actions are listed as a series of statements enclosed in braces { } following the probe name. 2006 .

you specify its name followed by a parenthesized list of arguments. If you’ve never written a C program before. Indeed. Module 6 • Getting Started With DTrace 53 . let’s take a step back from language rules and learn more about how DTrace works. To call a function.Getting Started With DTrace DTrace provides a set of useful functions like trace() and exit() for you to call in your D programs. and then we’ll return to learning how to build more interesting D programs. If you’ve written a C program before. The complete set of D functions is described in Solaris Dynamic Tracing Guide. you will be able to immediately transfer most of your knowledge to building tracing programs in D. But first. By now. D is derived from a large subset of C combined with a special set of functions and variables to help make tracing easy. you’ve probably realized from the name and our examples that DTrace’s D programming language is very similar to C and awk(1). if you’re familiar with the C programming language. learning D is still very easy.

54 .

M O D U L E Debugging Applications With DTrace 7 7 Objectives The objective of this module is to use DTrace to monitor application events. 55 .

2005.Debugging Applications With DTrace Additional Resources Application Packaging Developer’s Guide. 56 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Sun Microsystems. Inc. 2006 ..

A probe description has the following syntax: pid:mod:function:name I I I I pid: mod: name: format pidprocessid (for example pid5234) name of the library or a.out(executable) name of the function entry for function entry return for function return function: Module 7 • Debugging Applications With DTrace 57 . DTrace probes can be turned on just by calling the provider.Enabling User Mode Probes Enabling User Mode Probes DTrace allows you to dynamically add probes into user level functions. special flags. The user code does not need any recompilation. or even a restart.

Enabling User Mode Probes DTracing Applications In this exercise we will learn to use DTrace on user applications. } d. In a text editor. Run the script that you just wrote. b. Use pid$1:::entry as the probe-description. 3 Follow the steps below to create a D-script that counts the number of times any function in the gcalctool is called. 1 2 From the Application or Program menu. 58 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. increasing the amount and depth of information about the application behavior that is output. start the calculator. create a new file called proc_func. In the action section. # dtrace -qs proc_func. add an aggregate to count the number of times the function is called using the aggregate statement @[probefunc]=count(). f. pid$1:::entry { @[probefunc]=count(). Find the process ID of the process you just started # pgrep gcalctool 8198 This number is the process ID of the calc process. Press Control+C in the window where you ran the D-script.d. The steps increase in complexity to the end of the exercise. This lab builds on the use of a process ID in the probe description to trace the associated application. we will call it procid. 2006 . Perform a calculation on the calculator. $1 is the first argument that you will send to your script. leave the predicate part empty. c.d procid Replace procid with the process ID of your gcalctool e. a.

Now run the script. save timestamp in variable ts. Timestamp is a DTrace built-in that counts the number of nanoseconds from a point in the past. Create a file and name it func_time. If you do not need to print the aggregation you collected. Press Control+C in the window where you ran the D-script to see the output. b. Module 7 • Debugging Applications With DTrace 59 . Your new script should look like the following: pid$1:libc::entry { } 5 @[probefunc]=count(). Copy the proc_func.d. 6 Finally. Perform a calculation on the calculator.d to proc_libc. b. # dtrace -qs proc_libc. a. modify the script to find how much time is spent in each function. Write the second probe as follows: pid$1:::return d. modify the script to only count functions from the libc library.d file to the following: pid$1:libc::entry c. DTrace will print it for you. Write the first probe as follows: pid$1:::entry c.d.d.Enabling User Mode Probes Note – The DTrace script collects data and waits for you to stop the collection by pressing Control+C. We will use two probe descriptions in func_time. In the action section of the first probe.d procid Replace procid with the process ID of your gcalctool a. a. 4 Now. b. Modify the probe description in the proc_libc.

^C gdk_xid__equal _XSetLastRequestRead _XDeq .d procid Replace procid with the process ID of your gcalctool a.. Perform a calculation on the calculator. Press Control+C in the window where you ran the D-script to see the output. b.ts) f.Enabling User Mode Probes e. 2006 . The time is in nanoseconds. pid$1:::return /ts/ @[probefunc]=sum(timestamp . In the action section of the second probe calculate nanoseconds that have passed using the following aggregation: @[probefunc]=sum(timestamp . The new func_time.. Run the new func_time.d script: # dtrace -qs func_time.d script should match the following: pid$1:::entry { } { } 7 ts = timestamp. 2468 2998 3092 The left column shows you the name of the function and the right column shows you the amount of wall clock time that was spent in that function.ts). 60 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.

61 . These examples are also used to compare DTrace with other application debugging tools. including Sun Studio 10 software and mdb.M O D U L E Debugging C++ Applications With DTrace 8 8 Objectives The examples in this module demonstrate the use of DTrace to diagnose C++ application errors.

but you cannot readily determine whether these symbols are associated with constructors. semi-intelligible strings of characters and digits..Using DTrace to Profile and Debug A C++ Program Using DTrace to Profile and Debug A C++ Program A sample program CCtest was created to demonstrate an error common to C++ applications -.the memory leak. and to distinguish instances of the same name declared in different namespaces and classes. In many cases. and c++filt. but the examples were tested with both Sun Studio 9 and 10. From this output. This name mangling is an implementation detail required for support of C++ function overloading. Note – Sun Studio 10 software is used here. The Sun Studio compiler includes the following three utilities that can be used to translate the mangled symbols to their C++ counterparts: nm -C. you have an additional choice for demangling your application -. you may correctly assume that a number of these mangled symbols are associated with a class named TestClass.. destructors. 2006 . to provide valid external names for C++ function names that include special characters.in addition to c++filt. CCtest 53|FUNC 47|FUNC 37|FUNC 71|FUNC 37|FUNC 71|FUNC 16|OBJT 16|FUNC |GLOB |GLOB |GLOB |GLOB |GLOB |GLOB |GLOB |GLOB |0 |0 |0 |0 |0 |0 |0 |0 |9 |9 |9 |9 |9 |9 |18 |9 |__1cJTestClass2T5B6M_v_ |__1cJTestClass2T6M_v_ |__1cJTestClass2t5B6M_v_ |__1cJTestClass2t5B6Mpc_v_ |__1cJTestClass2t6M_v_ |__1cJTestClass2t6Mpc_v_ |__1cJTestClassG__vtbl_ |__1cJTestClassJClassName6kM_pc_ Note – Source code and makefile for CCtest are included at the end of this module. When debugging a C++ program. For example. [61] | 134549248| [85] | 134549301| [76] | 134549136| [62] | 134549173| [64] | 134549136| [89] | 134549173| [80] | 134616000| [91] | 134549348| .. you may notice that your compiler converts some C++ names into mangled. which recognizes both Sun Studio and 62 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. using nm to extract the symbol table from a sample program named CCtest produces the following output: # /usr/ccs/bin/nm .. and such is the case with the program contained in this module. dem. or class functions. but never destroyed. If your C++ application was compiled with gcc/g++. a memory leak occurs when an object is created.

Examples: Sun Studio symbols without c++filt: # nm [65] [56] [92] .Using DTrace to Profile and Debug A C++ Program GNU mangled names. We can use the DTrace pid provider to enable probes associated with our mangled C++ symbols.. let’s start by counting the following: Module 8 • Debugging C++ Applications With DTrace 63 . To test our constructor/destructor theory. g++ symbols with gc++filt: # nm gCCtest | grep TestClass | gc++filt [86] | 134550070| 41|FUNC |GLOB |0 |12 |TestClass::TestClass(char*) [110] | 134550180| 68|FUNC |GLOB |0 |12 |TestClass::TestClass(int) [114] | 134549984| 43|FUNC |GLOB |0 |12 |TestClass::TestClass() . And finally. the open source gc++filt found in /usr/sfw/bin can be used to demangle the symbols contained in your g++ application...... displaying symbols with nm -C: [64] | 134549344| 71|FUNC |GLOB |0 |9 |TestClass::TestClass() [__1cJTestClass2t6M_v_] [87] | 134549424| 70|FUNC |GLOB |0 |9 |TestClass::TestClass(const char*) [__1cJTestClass2t6Mpkc_v_] [57] | 134549504| 95|FUNC |GLOB |0 |9 |TestClass::TestClass(int) [__1cJTestClass2t6Mi_v_] Let’s use this information to create a DTrace script to perform an aggregation on the object calls associated with our test program... CCtest | grep TestClass | c++filt | 134549280| 37|FUNC |GLOB |0 |9 |TestClass::TestClass() | 134549352| 54|FUNC |GLOB |0 |9 |TestClass::TestClass(int) | 134549317| 35|FUNC |GLOB |0 |9 |TestClass::TestClass(char*) g++ symbols without gc++filt: [86] | 134550070| 41|FUNC |GLOB |0 |12 |_ZN9TestClassC1EPc [110] | 134550180| 68|FUNC |GLOB |0 |12 |_ZN9TestClassC1Ei [114] | 134549984| 43|FUNC |GLOB |0 |12 |_ZN9TestClassC1Ev . CCtest | grep TestClass | 134549280| 37|FUNC |GLOB |0 |9 |__1cJTestClass2t6M_v_ | 134549352| 54|FUNC |GLOB |0 |9 |__1cJTestClass2t6Mi_v_ | 134549317| 35|FUNC |GLOB |0 |9 |__1cJTestClass2t6Mpc_v_ Sun Studio symbols with c++filt: # nm [65] [56] [92] .

}’‘ | egrep "new|delete" __1c2k6Fpv_v_ == void operator delete(void*) __1c2n6FI_pv_ == void*operator new(unsigned) The corresponding DTrace script is used to enable probes on new() and delete() (saved as CCagg./CCagg.Using DTrace to Profile and Debug A C++ Program I I The number of objects created -. with the following caution.calls to delete() Use the following script to extract the symbols corresponding to the new() and delete() functions from the CCtest program: # dem ‘nm CCtest | awk -F\| ’{ print $NF. } Start the CCtest program in one window.d ‘pgrep CCtest‘ | c++filt The DTrace output is piped through c++filt to demangle the C++ symbols. printa(@d). go to another window on your system and type: # pkill dtrace Use this sequence of steps for the rest of the exercises: 64 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.calls to new() The number of objects destroyed -. To display the output of this command. Caution – You can’t exit the DTrace script with a ^C as you would do normally because c++filt will be killed along with DTrace and you’re left with no output. } pid$1::__1c2k6Fpv_v_: { @d[probefunc] = count(). 2006 .d): #!/usr/sbin/dtrace -s pid$1::__1c2n6FI_pv_: { @n[probefunc] = count(). } END { printa(@n). then execute the script we just created in another window as follows: # dtrace -s .

named CCaddr.d: #!/usr/sbin/dtrace -s #pragma D option quiet /* __1c2k6Fpv_v_ == void operator delete(void*) __1c2n6FI_pv_ == void*operator new(unsigned) */ /* return from new() */ pid$1::__1c2n6FI_pv_:return { printf("%s: %x\n". probefunc. we should see the same pointer value as arg0 in the call to delete(). With a slight modification to our initial script./CCtest Window 2: # dtrace -s scriptname | c++filt Window 3: # pkill dtrace The output of our aggregation script in window 2 should look like this: void*operator new(unsigned) void operator delete(void*) 12 8 So. Since a pointer to the object is contained in the return value of new(). Let’s check the memory addresses of our objects and attempt to match the instances of new() and delete(). } Execute this script: # dtrace -s . arg0). we now have the following script. The DTrace argument variables are used to display the addresses associated with our objects. probefunc./CCaddr. we may be on the right track with the theory that we are creating more objects than we are deleting.d ‘pgrep CCtest‘ | c++filt Module 8 • Debugging C++ Applications With DTrace 65 .Using DTrace to Profile and Debug A C++ Program Window 1: # . arg1). } /* call to delete() */ pid$1::__1c2k6Fpv_v_:entry { printf("%s: %x\n".

d in Window 2. renamed CCstack. It seems that the first new() of the repeating pattern does not have a corresponding call to delete(). arg0). probefunc.d: #!/usr/sbin/dtrace -s #pragma D option quiet /* __1c2k6Fpv_v_ == void operator delete(void*) __1c2n6FI_pv_ == void*operator new(unsigned) */ pid$1::__1c2n6FI_pv_:entry { ustack(). Here’s the modification to our previous script. } pid$1::__1c2n6FI_pv_:return { printf("%s: %x\n". } Execute CCstack. arg1). Including a call to ustack() on entry to new() provides a hint.Using DTrace to Profile and Debug A C++ Program Wait for a bit. then type this in window 3: # pkill dtrace Our output looks like a repeating pattern of three calls to new() and two calls to delete(): void*operator void*operator void*operator void operator void operator new(unsigned): new(unsigned): new(unsigned): delete(void*): delete(void*): 809e480 8068a70 809e4a0 8068a70 809e4a0 As you inspect the repeating output. } pid$1::__1c2k6Fpv_v_:entry { printf("%s: %x\n". then type pkill dtrace in Window 3 to print the following output: 66 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. 2006 . a pattern emerges. We still do not know what type of class is associated with the object created at address 809e480. At this point we have identified the source of the memory leak! Let’s continue with DTrace and see what else we can learn from this information. probefunc.

1‘void*operator new(unsigned) CCtest‘main+0x19 CCtest‘0x8050cda void*operator new(unsigned): 80a2bd0 libCrun.%ebp main+3: subl $0x38.1 ld.so.-0x38(%ebp) main+0x12: pushl $0x8 main+0x14: call -0x2e4 <PLT=libCrun.-0x30(%ebp) main+0xc: movl %esi.1 ] > main::dis main: pushl %ebp main+1: movl %esp.-0x2c(%ebp) main+9: movl %ebx.so.1478 Loading modules: [ libc.d ‘pgrep CCtest‘ | c++filt libCrun.1‘void*operator new(unsigned) CCtest‘main+0x9a CCtest‘0x8050cda void*operator new(unsigned): 80a2bf0 void operator delete(void*): 8068a70 void operator delete(void*): 80a2bf0 The ustack() data tells us that new() is called from main+0x19.1‘__1c2n6FI_pv_> main+0x19: addl $0x4.so.1‘void*operator new(unsigned) CCtest‘main+0x57 CCtest‘0x8050cda void*operator new(unsigned): 8068a70 libCrun.so. main+0x57.%esp main+0x1c: movl %eax.so. at main+0x19. To determine the type of constructor called at main+0x19.so.-0x34(%ebp) main+0xf: movl %edi. and main+0x9a -we’re interested in the object associated with the first call to new(). we can use mdb as follows: # gcore ‘pgrep CCtest‘ gcore: core.%eax main+0x22: pushl %eax main+0x23: call +0x1d5 <__1cJTestClass2t5B6M_v_> ..1478 dumped # mdb core.%esp main+6: movl %esp./CCstack.Using DTrace to Profile and Debug A C++ Program # dtrace -s . Module 8 • Debugging C++ Applications With DTrace 67 .-0x10(%ebp) main+0x1f: movl -0x10(%ebp)..

DTrace features used in this example include: aggregations. a call to new TestClass() at main+0x19 is the cause of the memory leak. This example is intended to model the DTrace approach to interactive process debugging."). cout << tt->ClassName(). and viewing the user call stack. delete(tt). Using dem to demangle this symbol produces: # dem __1cJTestClass2t5B6M_v_ __1cJTestClass2t5B6M_v_ == TestClass::TestClass #Nvariant 1() Thus. cout << t->ClassName(). t = new TestClass((const char *)"Hello..").h class TestClass { public: TestClass(). we have identified a call to the constructor __1cJTestClass2t5B6M_v_ that is never destroyed. TestClass(const char *name).cc source file reveals: . virtual char *ClassName() const.. Examining the CCtest. is overwritten by the second use: t = new TestClass((const char *)"Hello. displaying function arguments and return values. The dem and c++filt commands in Sun Studio software and the gc++filt in gcc were used to extract the function probes from the program symbol table and display the DTrace output in a source-compatible format. virtual ~TestClass(). . tt = new TestClass((const char *)"Goodbye.. cout << t->ClassName(). So. t = new TestClass(). TestClass(int i). Source files created for this example: EXAMPLE 8–1 TestClass. 2006 . 68 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. It’s clear that the first use of the variable t = new TestClass().. at offset main+0x23..Using DTrace to Profile and Debug A C++ Program Our constructor is called after the call to new. delete(t). The memory leak has been identified and a fix can be implemented. The DTrace pid provider allows you to enable a probe at any instruction associated with a process that is being examined.").

h> <unistd.h" int main(int argc.h> <unistd. } TestClass::~TestClass() { if ( str ) free(str).h> <stdio.h> "TestClass.cc #include #include #include #include #include <iostream.h> "TestClass.Using DTrace to Profile and Debug A C++ Program EXAMPLE 8–1 TestClass.h> <stdlib. TestClass. "Integer = %d".h (Continued) private: char *str.cc: #include #include #include #include #include <stdio. sprintf(str. }.h> <stdlib. i).h" TestClass::TestClass() { str=strdup("empty."). } TestClass::TestClass(const char *name) { str=strdup(name). } EXAMPLE 8–2 CCtest. } char *TestClass::ClassName() const { return str. } TestClass::TestClass(int i) { str=(char *)malloc(128).h> <string. char **argv) Module 8 • Debugging C++ Applications With DTrace 69 .

} } EXAMPLE 8–3 Makefile OBJS=CCtest.o TestClass.o: $(CC) $(CFLAGS) -c $< 70 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.cc. TestClass *tt. tt = new TestClass((const char *)"Goodbye." clean: rm $(OBJS) $(PROGS) CCtest: $(OBJS) $(CC) -o CCtest $(OBJS) . cout << t->ClassName().cc (Continued) { TestClass *t. t = new TestClass((const char *)"Hello."). delete(tt).Using DTrace to Profile and Debug A C++ Program EXAMPLE 8–2 CCtest."). sleep(1). while (1) { t = new TestClass(). 2006 . cout << tt->ClassName().o PROGS=CCtest CC=CC all: $(PROGS) echo "Done. delete(t). cout << t->ClassName().

we’ll incorporate low-level debugging with MDB to find the problem in the code. 71 . Then.M O D U L E Managing Memory with DTrace and MDB 9 9 Objectives This module will build on what we’ve learned about using DTrace to observe processes by examining a page fault.

Managing Memory with DTrace and MDB Additional Resources Solaris Modular Debugger Guide. 2005.. Sun Microsystems. 2006 . 72 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Inc.

Most of the data structures involved in the software side of memory management are defined in /usr/include/vm/*. we’ll examine the code and data structures used to handle page faults.h. In this module. Module 9 • Managing Memory with DTrace and MDB 73 .Software Memory Management Software Memory Management OpenSolaris memory management uses software constructs called segments to manage virtual memory of processes as well as the kernel itself.

The script prints the user virtual address that caused the fault. exit(0). args[0]). Create a file called pagefault. 2006 . } pagefault:return /self->in == 1/ { self->in = 0. Look for the <----symbol to find associated text in the output. and then traces every function that is called from the time of the fault until the page fault handler returns. Note – In this module. 1 2 Open a terminal window. self->in = 1.Software Memory Management Using DTrace and MDB to Examine Virtual Memory The objective of this lab is to examine a page fault using DTrace and MDB. We’ll use the output of the script to determine what source code needs to be examined for more detail.d with the following script: #!/usr/sbin/dtrace -s #pragma D option flowindent pagefault:entry /execname == $$1/ { printf("fault occurred on address = %p\n". we’ve added text to the extensive code output to guide the exercise. We’ll start with a DTrace script to trace the actions of a single page fault for a given process. } entry /self->in == 1/ { } return /self->in == 1/ { } 74 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.

Note – You need to specify mozilla-bin as the executable name.c -> hat_probe <-.look for page table entry for page <-.search segments for segment <.page tables are hashed on x86 -> htable_getpte <-.segments are in AVL tree -> as_segcompar <-.c -> htable_lookup <.i86pc/vm/htable.avl_find <. as mozilla is not an exact match with the name.as_segcompar -> as_segcompar <.as_segat -> segvn_fault <-. which is only used with ASSERT(). (not SEGV) <-./pagefault.i86pc/vm/hat_i86./pagefault. so you’ll see various calls to mutex_owner().d mozilla-bin dtrace: script ’.as_segcompar -> as_segcompar <.as_segcompar -> as_segcompar <.d’ matched 42626 probes CPU FUNCTION 0 -> pagefault fault occurred on address = fb985ea2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | pagefault:entry <-. Also.as_segcompar -> as_segcompar <.c <.common/vm/vm_as. # .Software Memory Management 3 Run the script on Mozilla. for instance.containing fault address -> as_segcompar <-.c or sun4/vm/vm_dep.c or sfmmu/vm/hat_sfmmu.htable_lookup -> htable_va2entry <.c -> htable_getpage <-.c -> as_fault <-.common/vm/seg_vn.segment containing fault is found.i86pc/vm/vm_machdep.htable_va2entry Module 9 • Managing Memory with DTrace and MDB 75 .as_segcompar -> as_segcompar <.generic address space fault common/vm/vm_as.as_segcompar <-.c -> as_segat -> avl_find <-. Assertions are turned on only for debug kernels.as_segcompar <. assertions are turned on.as_segcompar -> as_segcompar <.

read some pages (common/vm/vm_pvn.check for page already in memory -> page_lookup_create <-.common/vm/vm_page.logged ufs read -> bdev_strategy <-.get block number of page from inode -> bread_common -> getblk_common <.page_lookup -> ufs_getpage_miss <-. 2006 .x86pte_access_pagetable -> x86pte_release_pagetable <.pvn_read_kluster -> pageio_setup <-.driver sets up dma and starts page in <.c) <-.getblk_common <.setup page(s) for io common/os/bio.htable_getpage -> htable_release <.hat_kpm_pfn2va <.htable_getpte <.file operation to retrieve page(s) -> ufs_getpage <-.c <.read block device (disk) common/os/driver.direct attached disk (dad(7D)) <-.dadk_strategy <.c -> cmdkstrategy <-.c) -> page_create_va &lt-.page wasn’t in memory -> bmap_read <-.htable_release <.bmap_read -> pvn_read_kluster &lt-.file is in ufs fs (common/fs/ufs/ufs_vnops.create some pages <.hat_probe -> fop_getpage <-.c <.common disk driver (cmdk(7D)) <-.segvn_kluster <.x86pte_release_pagetable <.page_create_va -> segvn_kluster <.Software Memory Management 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -> x86pte_get <-.cmdkstrategy 76 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.return a page table entry -> x86pte_access_pagetable -> hat_kpm_pfn2va <.bmap_has_holes -> page_lookup <-.c -> dadk_strategy <-.page_lookup_create <-.bread_common <.create page if needed <.common/io/dktp/disk/cmdk.c) -> bmap_has_holes <-.check for sparse file <.pageio_setup -> lufs_read_strategy <-.used for ide disks (common/io/dktp/dcdev/dadk.x86pte_get <.

-> restorectx <-.get page frame number <.fill in pte into page table -> x86pte_access_pagetable -> hat_kpm_pfn2va <.biowait -> pageio_done <-.bdev_strategy -> biowait <-.hment_assign Module 9 • Managing Memory with DTrace and MDB 77 .hati_mkpte -> hati_pte_map <-.ufs_getpage_miss <-..someone else is running here.Software Memory Management 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <.pageio_done -> pvn_plist_init <.call hat to load pte(s) for page(s) -> hat_memload -> page_pptonum <-.x86_hm_enter -> hment_prepare <.wakeup via sema_v from completion interrupt -> swtch <-.fop_getpage -> segvn_faultpage <-.pvn_plist_init <.let someone else run (common/disp/disp.savectx <-.s -> savectx <-.save old context <.locate entry in page table -> x86_hm_enter <.ufs_getpage <.build page table entry <..hat_kpm_pfn2va <.c -> sema_p <-.page_pptonum -> hati_mkpte <-.dispatch to next thread to run <.resume <.hment_prepare -> x86pte_set <-.restorectx <.page is in memory <.x86pte_set -> hment_assign <.x86pte_release_pagetable <.wait for pagein to complete common/os/bio.c) -> disp <-.undo pageio_setup <.disp -> resume <-.x86pte_access_pagetable -> x86pte_release_pagetable <.sema_p <.restore context (we’ve been awakened) <.intel/ia32/ml/swtch.actual switching occurs here <-.s or sun4/ml/swtch.swtch <.

the virtual address that caused the page fault should now be mapped to a valid physical page. If no such segment is found. other threads will run. see strategy(9E) for an overview of what the strategy routine is supposed to do. so we call ufs_getpage(). the instruction causing the page fault will be retried and should now complete successfully. At this point. the process is sent a SIGSEGV (segmentation violation) signal. Here. I I I I I I I I I I 78 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. the thread causing the page fault blocks (i. this is segvn_fault() segvn_fault() looks for the faulting page already in memory.e. While the page is being read.segvn_faultpage <.segvn_fault <. it is "reclaimed" off the free list. When pagefault() returns. At a high level.x86_hm_exit <.hat_memload <. we need to page it in. If the page does not already exist. 2006 . segvn_fault() then calls segvn_faultpage(). For most segments. At this point. segvn_faultpage() calls the HAT (Hardware Address Translation) layer to load the page table entry(s) (PTE)s for the page. the following has happened on the page fault: I I I The pagefault() routine is called to handle page faults. The pagefault() routine calls as_fault() to handle faults on a given address space. the disk driver interrupt handler wakes up the blocked mozilla-bin thread. Then we call a device driver strategy routine. When the paging I/O has completed. the page is not already in memory.Software Memory Management 0 0 0 0 0 0 0 0 # -> x86_hm_exit <. a segment specific fault handler is called. The disk driver returns through the file system code out to segvn_fault(). ufs_getpage() finds the block number(s) of the page(s) within the file system by calling bmap_read().hati_pte_map <.pagefault Remember that the above output has been shortened. as_fault() walks an AVL tree of seg structures looking for a segment containing the faulting address. If the page already exists (but has been freed).. If the segment is found.as_fault <. switches out) via a call to swtch().

greater/equal to base and < base+size s_szc = 0 s_flags = 0 s_as = 0xffffffff828b61d0 s_tree = { avl_child = [ 0xffffffff82fa7920. Note – The search for the segment containing the fault address found the correct segment after 8 segments..find the mozilla-bin process R 933 919 887 885 100 0x42014000 ffffffff81d6a040 mozilla-bin > ffffffff81d6a040::print proc_t p_as | ::walk seg | ::print struct seg <-. Find the number of segments used by mozilla by using pmap as follows: # pmap -x ‘pgrep mozilla-bin‘ | wc 368 2730 23105 # The output shows that there are approximately 368 segments. Note – If you want to follow along.Software Memory Management 4 Use mdb to examine the kernel data structures and locate the page of physical memory that corresponds to the fault as follows: a. --> { s_base = 0xfb800000 <-. Open a terminal window. b.. Using an AVL tree shortens the search! c. fault addr (fb985ea2) s_size = 0x561000 <-. Or. # mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs ip sctp usba random fctl s1394 nca lofs crypto nfs audiosup sppp cpc fcip ptm ipc ] > ::ps !grep mozilla-bin <-. 0xffffffff82fa7c80 ] avl_pcb = 0xffffffff82fa796d } s_ops = segvn_ops s_data = 0xffffffff82d85070 } Module 9 • Managing Memory with DTrace and MDB 79 . you can just run mdb within an editor buffer. you may want to use: ::log /tmp/logfile in mdb and then !vi /tmp/logfile to search. Use mdb to locate the segment containing the fault address. See calls to as_segcompar in the DTrace output above.this is the seg we want.Lots of output has been omitted.

rounding down to page boundary gives 185000 (4kpage size) > ffffffff82f9e480::walk page !wc <-.1236 pages.from s_data { lock = { _opaque = [ 0 ] } segp_slock = { _opaque = [ 0 ] } pageprot = 0x1 prot = 0xd maxprot = 0xf type = 0x2 offset = 0 vp = 0xffffffff82f9e480 <-.and lots more output omitted --> > ffffffff82d85070::print segvn_data_t <-. (not all are necessarily valid) > ffffffff82f9e480::walk page | ::print page_t <-.walk list of pages on vnode_t 1236 1236 21012 <-.here is matching page p_vnode = 0xffffffff82f9e480 80 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.so" > fb985ea2-fb800000=K <-.points to a vnode_t anon_index = 0 amp = 0 <-.walk page list on vnode <-.we’ll look at anonymous space later vpage = 0xffffffff82552000 cred = 0xffffffff81f95018 swresv = 0 advice = 0 pageadvice = 0x1 flags = 0x490 softlockcnt = 0 policy_info = { mem_policy = 0x1 mem_reserved = 0 } } > ffffffff82f9e480::print vnode_t v_path v_path = 0xffffffff82f71090 "/usr/sfw/lib/mozilla/components/libgklayout.offset within segment 185ea2 <-. 2006 .lots of pages omitted in output --> { p_offset = 0x185000 <-.Software Memory Management <-.

dump 16 64-bit hex values at physical address 0xbd62ea2: 2ccec81ec8b55 e8575653f0e48300 32c3815b00000000 5d89d46589003ea7 840ff6850c758be0 e445c7000007df 1216e8000000 dbe850e4458d5650 7d830cc483ffeeea 791840f00e4 c085e8458904468b 500c498b088b2474 8b17eb04c483d1ff e8458de05d8bd465 c483ffeeeac8e850 458b0000074ce904 Module 9 • Managing Memory with DTrace and MDB 81 .and lots more output omitted --> > bd62*1000=K <-.Software Memory Management p_selock = 0 p_selockpad = 0 p_hash = 0xfffffffffae21c00 p_vpnext = 0xfffffffffaca9760 p_vpprev = 0xfffffffffb3467f8 p_next = 0xfffffffffad8f800 p_prev = 0xfffffffffad8f800 p_lckcnt = 0 p_cowcnt = 0 p_cv = { _opaque = 0 } p_io_cv = { _opaque = 0 } p_iolock_state = 0 p_szc = 0 p_fsdata = 0 p_state = 0 p_nrm = 0x2 p_embed = 0x1 p_index = 0 p_toxic = 0 p_mapping = 0xffffffff82d265f0 p_pagenum = 0xbd62 <-.the page frame number of page p_share = 0 p_sharepad = 0 p_msresv_1 = 0 p_mlentry = 0x185 p_msresv_2 = 0 } <-.multiple page frame number time page size (hex) bd62000 <-.10/K <-.here is physical address of page > bd62000+ea2.

%esi 0xfb985ec8: je +0x7e5 <0xfb9866ad> 0xfb985ece: movl $0x0.10/ai <-.%ebx 0xbd62ebd: movl %esp.data looks like code.change context from kernel to mozilla-bin debugger context set to proc ffffffff81d6a040.looks like a match 0xfb985ea3: movl %esp.10/ai <-.%esi 0xbd62ec8: je +0x7e5 <0xbd636ad> 0xbd62ece: movl $0x0.-0x1c(%rbp) > ffffffff81d6a040::context <-.-0x2c(%rbp) 0xfb985ec0: movl %ebx. 2006 .and dump from faulting virtual address 0xfb985ea2: 0xfb985ea2: pushq %rbp <-. let’s try dumping as code 0xbd62ea2: 0xbd62ea2: pushq %rbp 0xbd62ea3: movl %esp.-0x1c(%rbp) > 0::context debugger context set to kernel > ffffffff81d6a040::print proc_t p_as <-.-0x2c(%rbp) 0xbd62ec0: movl %ebx.%esi 0xbd62ec6: testl %esi.-0x20(%rbp) 0xbd62ec3: movl 0xc(%rbp).%esp 0xbd62eab: andl $0xfffffff0.%esi 0xfb985ec6: testl %esi.%esp 0xfb985eab: andl $0xfffffff0. the address of the process > fb985ea2.-0x20(%rbp) 0xfb985ec3: movl 0xc(%rbp).%ebp 0xfb985ea5: subl $0x2cc.%esp 0xfb985eae: pushq %rbx 0xfb985eaf: pushq %rsi 0xfb985eb0: pushq %rdi 0xfb985eb1: call +0x5 <0xfb985eb6> 0xfb985eb6: popq %rbx 0xfb985eb7: addl $0x3ea732.%esp 0xbd62eae: pushq %rbx 0xbd62eaf: pushq %rsi 0xbd62eb0: pushq %rdi 0xbd62eb1: call +0x5 <0xbd62eb6> 0xbd62eb6: popq %rbx 0xbd62eb7: addl $0x3ea732.get as for mozilla-bin 82 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.%ebx 0xfb985ebd: movl %esp.%ebp 0xbd62ea5: subl $0x2cc.Software Memory Management > bd62000+ea2.

physical address matches Once the segment is found. Once the page_t is located.Software Memory Management p_as = 0xffffffff828b61d0 > fb985ea2::vtop -a ffffffff828b61d0 <-. We locate the page corresponding to the offset within the segment. we have the page frame number.check our work virtual fb985ea2 mapped to physical bd62ea2 <-. We then convert the page frame number to a physical address and examine some of the data at the address. Module 9 • Managing Memory with DTrace and MDB 83 . It turns out this data is code. The vnode_t contains a list of pages that "belong to" the vnode_t. a vnode_t maps the segment data. we print the segvn_data structure. In this segment. We then check the physical address by using the vtop (virtual-to-physical) mdb command.

84 .

10 M O D U L E 1 0 Observing Processes in Zones With DTrace Objectives The objective of this module is to build on knowledge of DTrace to observe processes that run inside a zone. 85 .

Observing Processes in Zones With DTrace Additional Resources I System Administration Guide: Solaris Containers-Resource Management and Solaris Zones 86 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. 2006 .

By default. then processes running in the zone run in that scheduling class by default. a single zone can be bound to a specific pool. The zone is then installed by the global administrator. The global administrator uses the zonecfg command to configure a zone by specifying various parameters for the zone’s virtual platform and application environment. Multiple zones can share a resource pool or in order to meet service guarantees. You can also set the scheduling class for a zone through the dynamic resource pools facility. The global zone has a dual function.scheduler property set to a valid scheduling class. The whole root zone model provides the maximum file system configurability. let’s work on debugging applications that run in zones. all zones including the global zone have one (1) fair share scheduler share assigned to them. The sparse root zone model optimizes the sharing of objects. The global zone is both the default zone for the system and the zone used for system-wide administrative control. Percentage of the CPU the zone is entitled to is the ratio of its shares and the total number of shares for all zones bound to a particular resource pool. who uses the zone administration command zoneadm to install software at the package level into the file system hierarchy established for the zone. At first login. Module 10 • Observing Processes in Zones With DTrace 87 . Every OpenSolaris system contains a global zone. The global administrator can log in to the installed zone by using the zlogin command. the internal configuration for the zone is completed.Global and Non-Global Zones Global and Non-Global Zones Now that we have some knowledge of debugging applications. There are two types of non-global zone root file system models: sparse and whole root. The scheduling class for a non-global zone is set to the scheduling class for the system. If the zone is associated with a pool that has its pool. The zoneadm command is then used to boot the zone.

Global and Non-Global Zones DTracing a Process Running in a Zone This lab will focus on observing processes running in a zone. 2006 . DTrace may be used from the global zone and supports a zonename variable and the pr_zoneid field in psinfo_t for use with the proc provider. Log into the global zone: % zlogin password: # 3 Count the number of I/O operations per zone: # dtrace -n io:::start{@[zonename] = count()} 88 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. process tools like prstat(1M). ps(1) and truss(1) can be used to observe processes in other zones. 1 2 Open a terminal window. From the global zone.

11 M O D U L E 1 1 Configuring Filesystems With ZFS Objectives The objective of this lesson is to provide an introduction to ZFS by showing you how to create a simple ZFS pool with a mirrored filesystem. 89 .

Configuring Filesystems With ZFS Additional Resources ZFS Administration Guide and man pages: http://opensolaris. 2006 .org/os/community/zfs/docs/ 90 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.

this is a hard drive that is visible to the system in the /dev/dsk directory. large slice. In this module. In traditional storage configurations which use partitions or volumes. This can be any block device of at least 128 Mbytes in size. The recommended mode of operation is to use an entire disk. ZFS formats the disk using an EFI label to contain a single. A storage device can be a whole disk (c0t0d0) or an individual slice (c0t0d0s7). which describe the layout of physical storage and its fault characteristics. in which case the disk does not need to be specially formatted. the storage is fragmented across disks. ZFS uses pooled storage to eliminate the management problems associated with volumes and to enable all storage to be shared. Typically. The value of shared storage is the ability to repair damaged data. Then we’ll show you how to configure RAID-Z.Creating Pools With Mounted Filesystems Creating Pools With Mounted Filesystems Each storage pool is comprised of one or more virtual devices. Module 11 • Configuring Filesystems With ZFS 91 . The most basic building block for a storage pool is a piece of physical storage. we’ll start by learning about mirrored storage pool configuration.

Creating Pools With Mounted Filesystems Creating Mirrored Storage Pools The objective of this lab exercise is to create and list a mirrored storage pool using the zpool command.3G AVAIL 47. 2006 . so let’s get on with it! It’s time to create your first pool: 1 2 Open a terminal window. 3 Validate that the pool was created: # zpool list NAME tank SIZE 80.7G CAP HEALTH 28% ONLINE ALTROOT - 4 Create a mirror of tank: # zpool create tank mirror c1t2d0 c2t2d0 The storage pool is mirrored on c2t2d0. ZFS is easy. 92 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.0G USED 22. with a single filesystem mounted at /tank. Create a single-disk storage pool named tank: # zpool create tank c1t2d0 You now have a single-disk storage pool named tank.

create home directories for all of your developers: # zfs create tank/home/developer1 # zfs create tank/home/developer2 # zfs create tank/home/developer3 # zfs create tank/home/developer4 The mountpoint property is inherited as a pathname prefix. That is. Module 11 • Configuring Filesystems With ZFS 93 .Creating Pools With Mounted Filesystems Creating a Filesystem and /home Directories The objective of this lab exercise is to learn how to set up a filesystem with several /home directories. set the mount point for the home directory: # zfs set mountpoint=/export/home tank/home 6 Finally. Create the /var/mail filesystem: # zfs create tank/mail 3 Set the mount point for the /var/mail filesystem: # zfs set mountpoint=/var/mail tank/mail 4 Create the home directory: # zfs create tank/home 5 Then. 1 2 Open a terminal window. tank/home/developer1 is automatically mounted at /export/home/developer1 because tank/home is mounted at /export/home. In this lab. we’ll use the zfs command to create a filesystem and set its mountpoint.

Disks can be specified using their full path. except that the raidz keyword is used instead of mirror. You need at least two disks for a RAID-Z configuration. The above command is just an example of using disk slices in a storage pool. 2006 . You might want to configure RAID-Z instead of mirrored pools for greater redundancy. no special hardware is required to create a RAID-Z configuration. 1 2 Open a terminal window. the disk must have been pre-formatted to have an appropriately sized slice zero. Other than that. Note that there is no requirement to use disk slices in a RAID-Z configuration. Create a pool with a single RAID-Z device consisting of 5 disk slices: # zpool create tank raidz c0t0d0s0 c0t0d1s0 c0t0d2s0 c0t0d3s0 c0t0d4s0 In the above example. Creating a RAID-Z pool is identical to a mirrored pool. /dev/dsk/c0t0d4s0 is identical to c0t0d4s0 by itself. 94 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.Creating Pools With Mounted Filesystems Configuring RAID-Z The objective of this lab exercise is to introduce you to the RAID-Z configuration.

load the driver. This driver demonstrates the minimum functionality that any character driver must implement. compile the driver. This module explains how to write the driver and configuration file. working driver. 95 . The driver that is shown in this module is a pseudo device driver that merely writes a message to a system log every time an entry point is entered.12 M O D U L E 1 2 Writing a Template Character Device Driver Objectives This module shows you how to develop a very simple. You can use this driver as a template for building a complex driver. and test the driver.

Solaris Modular Debugger Guide. 2005. 2006 . 2005. Inc. 96 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.Writing a Template Character Device Driver Additional Resources I I Writing Device Drivers.. Sun Microsystems.. Inc. Sun Microsystems.

6. getinfo(9E). close(9E). Build and install the driver.c. detach(9E). Write the entry points for loadable module configuration: _init(9E). _info(9E). Test the driver by loading the driver.conf. 7. Write the entry points for user context: open(9E). the device operations structure dev_ops(9S). and _fini(9E). 2. 3.Overview of the Template Driver Example Overview of the Template Driver Example This example guides you through the following steps: 1. and prop_op(9E). Write the entry points for autoconfiguration: attach(9E). and write(9E). Create a directory where you can develop your driver and open a new text file named dummy. 5. and unloading the driver. Module 12 • Writing a Template Character Device Driver 97 . read(9E). Create the driver configuration file dummy. reading from and writing to the device node. 8. and the module linkage structures modldrv(9S) and modlinkage(9S). Define the data structures: the character and block operations structure cb_ops(9S). 4.

open a new text file named dummy. 2006 . You do not need to investigate what the values of the arguments of these functions should be. When mod_remove(9F) is successful. The _fini(9E) routine prepares a loadable module for unloading. This driver is named dummy because this driver does not do any real work. The _init(9E) routine must at least call the mod_install(9F) function and return the success or failure value that is returned by mod_install(9F). The _info(9E) routine must at least call the mod_info(9F) function and return the value that is returned by mod_info(9F). regardless of the functionality of the driver.Writing the Template Driver Writing the Template Driver This section describes the entry points and data structures that are included in this driver and shows you how to define them. create a directory where you can develop your driver. Next. the following code is added to the dummy. In this section. This section describes the following entry points and data structures: I I I I I I Loadable module configuration entry points Autoconfiguration entry points User context entry points Character and block operations structure Device operations structure Module linkage structures First. The _info(9E) routine returns information about a loadable module. The _fini(9E) routine must at least call the mod_remove(9F) function and return the success or failure value that is returned by mod_remove(9F). You can copy these function calls from this example and paste them into every driver you write. the _fini(9E) routine must undo everything that the _init(9E) routine did. All of these data structures and almost all of these entry points are required for any character device driver. mod_info(9F). and mod_remove(9F) functions are used in exactly the same way in every driver.c source file: 98 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. I I The mod_install(9F).c. Writing the Loadable Module Configuration Entry Points Every kernel module of any type must define at least the following three loadable module configuration entry points: I The _init(9E) routine initializes a loadable module.

"Inside _init"). Use the cmn_err(9F) function to write a message to a system log. and _fini(9E) routine names are not unique to any particular kernel module. } int _fini(void) { cmn_err(CE_NOTE. The first Module 12 • Writing a Template Character Device Driver 99 .Writing the Template Driver /* Loadable module configuration entry points */ int _init(void) { cmn_err(CE_NOTE.h header file. the ddi.h header file. return(mod_remove(&ml)). This driver is supposed to write a message each time an entry point is entered. The cmn_err(9F) function usually is used to report an error condition. _info(9E). modinfop)). The cmn_err(9F) function requires you to include the cmn_err. Do not declare these three routines in dummy. The _init(9E) routine must call the mod_install(9F) function and return the success or failure value that is returned by mod_install(9F). } int _info(struct modinfo *modinfop) { cmn_err(CE_NOTE. return(mod_install(&ml)). You need to include the modctl. The cmn_err(9F) function takes two arguments.c file. but the names of these routines are not unique. return(mod_info(&ml. Defining the Module Initialization Entry Point The _init(9E) routine returns type int and takes no arguments.h header file. and the sunddi.h header file in your dummy. The cmn_err(9F) function also is useful for debugging in the same way that you might use print statements in a user program.h header file. These three routines are declared in the modctl. You customize the behavior of these routines when you define them in your module.c. The mod_install(9F) function takes an argument that is a modlinkage(9S) structure. "Inside _fini"). "Inside _info"). } Declaring the Loadable Module Configuration Entry Points The _init(9E).

int _init(void) { cmn_err(CE_NOTE. Use CE_NOTE for the value of this severity constant. The message written by this driver is not an error message but is simply a test message. } Defining the Module Information Entry Point The _info(9E) routine returns type int and takes an argument that is a pointer to an opaque modinfo structure. return(mod_info(&ml. The following code is the _init(9E) routine that you should enter into your dummy. The second argument to mod_info(9F) is the same modinfo structure pointer that is the argument to the _info(9E) routine. 100 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. modinfop)). } Defining the Module Unload Entry Point The _fini(9E) routine returns type int and takes no arguments.c file. The mod_info(9F) function takes two arguments. "Inside _info"). The second argument the cmn_err(9F) function takes is a string message. The _info(9E) routine must return the value that is returned by the mod_info(9F) function. Use the cmn_err(9F) function to write a message to the system log in the same way that you used the cmn_err(9F) function in your _init(9E) entry point. "Inside _init"). The _fini(9E) routine must call the mod_remove(9F) function and return the success or failure value that is returned by mod_remove(9F).Writing the Template Driver argument is a constant that indicates the severity of the error message. The first argument to mod_info(9F) is a modlinkage(9S) structure. return(mod_install(&ml)). 2006 . The following code is the _info(9E) routine that you should enter into your dummy. The ml structure is the modlinkage(9S) structure.c file. int _info(struct modinfo *modinfop) { cmn_err(CE_NOTE. The modinfop argument is a pointer to an opaque structure that the system uses to pass module information. The mod_info(9F) function returns the module information or returns zero if an error occurs.

The _fini(9E) routine can be called at any time when a module is loaded. The _fini(9E) routine must deallocate anything that was allocated. and the module can be unloaded. the _fini(9E) routine must undo everything that the _init(9E) routine did.c file. the _fini(9E) routine often fails. then the kernel calls the detach(9E) entry point of the driver. This driver is busy if one of the following conditions is true: I I A device node that is managed by this driver is open. the module determines that devices were detached. In normal operation. and _fini(9E) continues its cleanup work. then mod_remove(9F) fails and _fini(9E) fails. If mod_remove(9F) fails. Use the cmn_err(9F) function to write a message to the system log in the same way that you used the cmn_err(9F) function in your _init(9E) entry point. I I If the driver is busy. "Inside _fini"). return(mod_remove(&ml)). The following actions take place when mod_remove(9F) is called: I The kernel checks whether this driver is busy. and the module cannot be unloaded. If mod_remove(9F) is successful. A module depends on this driver if the module was linked using the -N option with this driver named as the argument to that -N option. The following code is the _fini(9E) routine that you should enter into your dummy.Writing the Template Driver When mod_remove(9F) is successful. close anything that was opened. then mod_remove(9F) fails and _fini(9E) fails. } Module 12 • Writing a Template Character Device Driver 101 . Another module that depends on this driver is open. I I If detach(9E) fails. int _fini(void) { cmn_err(CE_NOTE. If the driver is not busy. the module determines that devices were not detached. The mod_remove(9F) function takes an argument that is a modlinkage(9S) structure. then mod_remove(9F) succeeds. and destroy anything that was created in the _init(9E) routine. This behavior is normal because the kernel allows the module to determine whether the module can be unloaded. The _fini(9E) routine must call mod_remove(9F) because the _init(9E) routine called mod_install(9F). If detach(9E) succeeds. See the ld(1) man page for more information.

I I I In this section. Include this code near the top of your dummy. The kernel calls these routines when the device driver is loaded. ddi_get_instance(dip). I The attach(9E) routine must call ddi_create_minor_node(9F). Use the prop_op(9E) entry point to customize the behavior of the ddi_prop_op(9F) function. The detach(9E) routine must call ddi_remove_minor_node(9F) to deallocate everything that was allocated by ddi_create_minor_node(9F). You can call the ddi_prop_op(9F) function instead of writing your own prop_op(9E) entry point.h> <sys/sunddi. "Inside dummy_attach"). #include #include #include #include <sys/modctl.Writing the Template Driver Including Loadable Module Configuration Header Files The _init(9E). "0".h header file. S_IFCHR. The following header files are required by the three loadable module configuration routines that you have written in this section.c file. _info(9E). The cmn_err(9F) function requires you to include the cmn_err. _fini */ all entry points for this driver */ all entry points for this driver */ all entry points for this driver */ Writing the Autoconfiguration Entry Points Every character driver must define at least the following autoconfiguration entry points. The ddi_create_minor_node(9F) function provides the information the system needs to create the device files. _fini(9E). _info. The getinfo(9E) routine returns requested device driver information through one of its arguments. if (ddi_create_minor_node(dip.h header file. The prop_op(9E) routine returns requested device driver property information through a pointer.h header file. and the sunddi. 2006 .h> <sys/ddi.h> <sys/cmn_err. ddi_attach_cmd_t cmd) { cmn_err(CE_NOTE. the following code is added: /* Device autoconfiguration entry points */ static int dummy_attach(dev_info_t *dip. DDI_PSEUDO. switch(cmd) { case DDI_ATTACH: dummy_dip = dip. and mod_install(9F) functions require you to include the modctl.h> /* /* /* /* used used used used by by by by _init.0) 102 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. the ddi.h header file. The detach(9E) routine must undo everything that the attach(9E) routine did.

default: return DDI_FAILURE. void **resultp) { cmn_err(CE_NOTE. } } static int dummy_detach(dev_info_t *dip. "dummy". ddi_prop_op_t prop_op. switch(cmd) { case DDI_INFO_DEVT2DEVINFO: *resultp = dummy_dip. NULL).Writing the Template Driver != DDI_SUCCESS) { cmn_err(CE_NOTE. ddi_remove_minor_node(dip. switch(cmd) { case DDI_DETACH: dummy_dip = 0. ddi_info_cmd_t cmd. "%s%d: attach: could not add character node. dev_info_t *dip. } else return DDI_SUCCESS. case DDI_INFO_DEVT2INSTANCE: *resultp = 0. return(DDI_FAILURE). } } static int dummy_prop_op(dev_t dev. void *arg. return DDI_SUCCESS. } } static int dummy_getinfo(dev_info_t *dip. "Inside dummy_getinfo"). return DDI_SUCCESS. return DDI_SUCCESS. default: return DDI_FAILURE. ddi_detach_cmd_t cmd) { cmn_err(CE_NOTE. 0). Module 12 • Writing a Template Character Device Driver 103 . "Inside dummy_detach"). default: return DDI_FAILURE.".

detach(9E). The attach(9E) routine must return either DDI_SUCCESS or DDI_FAILURE.h. All of the autoconfiguration entry point routines take a dev_info argument. dev_info_t *dip. Every attach(9E) routine must define behavior for at least DDI_ATTACH. This practice makes debugging much easier. caddr_t valuep.dip. ddi_attach_cmd_t cmd).name. int flags.lengthp)). void *arg. char *name. The following declarations are the autoconfiguration entry point declarations you should have in your dummy. ddi_info_cmd_t cmd. The attach(9E) routine takes two arguments. int *lengthp) { cmn_err(CE_NOTE. the prefix used for function and data names that are unique to this driver is either the name of this driver or an abbreviation of the name of this driver.flags. } Declaring the Autoconfiguration Entry Points The attach(9E). In a realistic driver. caddr_t valuep. char *name. Use the same prefix throughout the driver. The second argument is a constant that specifies the attach type. These two constants are defined in sunddi. ddi_detach_cmd_t cmd).valuep. The value that is passed through this second argument is either DDI_ATTACH or DDI_RESUME.prop_op. void **resultp).c file. static int dummy_getinfo(dev_info_t *dip. getinfo(9E). 2006 . "Inside dummy_prop_op"). The DDI_ATTACH code must initialize a device instance. and prop_op(9E) entry point routines need to be uniquely named for this driver. Each instance of the driver has its own copy of the state 104 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. return(ddi_prop_op(dev. static int dummy_prop_op(dev_t dev. static int dummy_detach(dev_info_t *dip. you define and manage multiple instances of the driver by using a state structure and the ddi_soft_state(9F) functions. In the example shown in this module. Defining the Device Attach Entry Point The attach(9E) routine returns type int. Note that each of these functions is declared static. int *lengthp). Note – By convention. All of the autoconfiguration entry point routines except for prop_op(9E) return either DDI_SUCCESS or DDI_FAILURE. static int dummy_attach(dev_info_t *dip. Choose a prefix to use with each entry point routine. ddi_prop_op_t prop_op. The first argument is a pointer to the dev_info structure for this driver. dummy_ is used for the prefix to each function and data name that is unique to this example.Writing the Template Driver int flags.

DDI_PSEUDO. Each device instance file is pointed to by a separate device instance pointer. } } First.". This driver still must declare a device instance pointer and initialize the pointer value in the attach(9E) routine. first assign the device instance pointer from the dummy_attach() argument to the dummy_dip variable that you declared above. ddi_attach_cmd_t cmd) { cmn_err(CE_NOTE. "dummy". /* keep track of one instance */ The following code is the dummy_attach() routine that you should enter into your dummy. Enter the following code near the beginning of dummy.c to declare a device instance pointer for this driver: dev_info_t *dummy_dip. switch(cmd) { case DDI_ATTACH: dummy_dip = dip. the device instance pointer is used by the ddi_get_instance(9F) function to return the instance number. this driver does not use a state structure. In this dummy_attach() routine. The device instance pointer and the instance number both are used by ddi_create_minor_node(9F) to create a new device node. Then provide DDI_ATTACH behavior. static int dummy_attach(dev_info_t *dip. You need to save this pointer value in the global variable so that you can use this pointer to get information about this instance from dummy_getinfo() and detach this instance in dummy_detach(). Each instance of the device driver is represented by a separate device file in /devices. if (ddi_create_minor_node(dip. as you did in your _init(9E) entry point. } else return DDI_SUCCESS. "%s%d: attach: could not add character node. Because this driver allows only one instance. S_IFCHR. One of the pieces of data that is specific to each instance is the device instance pointer. ddi_get_instance(dip). "Inside dummy_attach"). Module 12 • Writing a Template Character Device Driver 105 .0) != DDI_SUCCESS) { cmn_err(CE_NOTE.c file. This dummy driver allows only one instance. return(DDI_FAILURE). use cmn_err(9F) to write a message to the system log. Within the DDI_ATTACH code. default: return DDI_FAILURE.Writing the Template Driver structure that holds data specific to that instance. "0". 0).

2006 . The ddi_create_minor_node(9F) man page lists the possible node types. The first argument is a pointer to the dev_info structure for this driver. The second argument is a constant that specifies the detach type.Writing the Template Driver A realistic driver probably would use the ddi_soft_state(9F) functions to create and manage a device node. return DDI_FAILURE. "Inside dummy_detach"). This dummy driver is a character driver.c file. In the DDI_ATTACH code in your attach(9E) routine. so set this argument value to 0. In the DDI_DETACH code in this detach(9E) routine. The DDI_PSEUDO node type is for pseudo devices. This dummy driver uses the ddi_create_minor_node(9F) function to create a device node. return DDI_SUCCESS. The ddi_create_minor_node(9F) function takes six arguments. The fourth argument to the ddi_create_minor_node(9F) function is the minor number of this minor device. Every detach(9E) routine must define behavior for at least DDI_DETACH. The sixth argument to the ddi_create_minor_node(9F) function specifies whether this is a clone device. The value that is passed through this second argument is either DDI_DETACH or DDI_SUSPEND. you saved the address of a new dev_info structure and you called the ddi_create_minor_node(9F) function to create a new node. ddi_detach_cmd_t cmd) { cmn_err(CE_NOTE. The second argument is the name of this minor node. The following code is the dummy_detach() routine that you should enter into your dummy. you need to reset the variable that pointed to the dev_info structure for this node. and destroy anything that was created in the attach(9E) routine. This number is also called the instance number. close anything that was opened. write a message to the system log and return DDI_FAILURE. If the ddi_create_minor_node(9F) call is successful. The ddi_get_instance(9F) function returns this instance number. If the ddi_create_minor_node(9F) call is not successful. The fifth argument to the ddi_create_minor_node(9F) function is the node type. The detach(9E) routine must deallocate anything that was allocated. The DDI_DETACH code must undo everything that the DDI_ATTACH code did. If this dummy_attach() routine receives any cmd other than DDI_ATTACH. The third argument is S_IFCHR if this device is a character minor device or is S_IFBLK if this device is a block minor device. This is not a clone device. You also need to call the ddi_remove_minor_node(9F) function to remove this node. switch(cmd) { case DDI_DETACH: 106 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. static int dummy_detach(dev_info_t *dip. The first argument to the ddi_create_minor_node(9F) function is the device instance pointer that points to the dev_info structure of this device. Defining the Device Detach Entry Point The detach(9E) routine takes two arguments.

Module 12 • Writing a Template Character Device Driver 107 . The value of this second argument is either DDI_INFO_DEVT2DEVINFO or DDI_INFO_DEVT2INSTANCE. The information stored at this location depends on the value you passed in the second argument to the getinfo(9E) routine. use cmn_err(9F) to write a message to the system log. The first argument is the device instance pointer that points to the dev_info structure of this device. default: return DDI_FAILURE. ddi_remove_minor_node(dip. The return value of the getinfo(9E) routine is DDI_SUCCESS or DDI_FAILURE. Because the DDI_DETACH code of this driver always removes all instances. return DDI_SUCCESS. If this dummy_detach() routine receives any cmd other than DDI_DETACH. The pointer or instance number requested from the getinfo(9E) routine is returned through a pointer argument. call the ddi_remove_minor_node(9F) function to remove this device node. This dummy driver supports only one instance.Writing the Template Driver dummy_dip = 0. Next. as you did in your _init(9E) entry point. Then provide DDI_DETACH behavior. If the value of the minor node argument is NULL. this dummy driver supports only one instance. Within the DDI_DETACH code. This dev_info structure argument is obsolete and is no longer used by the getinfo(9E) routine. The second argument to the getinfo(9E) routine is a constant that specifies what information the getinfo(9E) routine must return. The second argument is the name of the minor node you want to remove. If the value of the cmd argument to this dummy_detach() routine is DDI_DETACH. You cannot reset this device instance pointer unless you remove all instances of the device. The fourth argument is a pointer to the place where the getinfo(9E) routine must store the requested information. then ddi_remove_minor_node(9F) removes all instances of this device. The ddi_remove_minor_node(9F) function takes two arguments. } } First. The getinfo(9E) routine takes four arguments. remove all instances of this device and return DDI_SUCCESS. first reset the dummy_dip variable that you set in dummy_attach() above. NULL). The first argument is a pointer to the dev_info structure for this driver. Defining the Get Driver Information Entry Point The getinfo(9E) routine takes a pointer to a device number and returns a pointer to a device information structure or returns a device instance number. The third argument to the getinfo(9E) routine is a pointer to a device number. return DDI_FAILURE.

Writing the Template Driver The following table describes the relationship between the second and fourth arguments to the getinfo(9E) routine.c file. return DDI_SUCCESS. switch(cmd) { case DDI_INFO_DEVT2DEVINFO: *resultp = dummy_dip. Within the DDI_INFO_DEVT2INSTANCE code. A realistic driver would use arg to get the instance number of this device node. This dummy driver supports only one instance. void **resultp) { cmn_err(CE_NOTE. use cmn_err(9F) to write a message to the system log. return DDI_SUCCESS. A realistic driver would then call the ddi_get_soft_state(9F) function and return the device information structure pointer from that state structure. static int dummy_getinfo(dev_info_t *dip. ddi_info_cmd_t cmd. Next. void *arg. simply return 0. Then provide DDI_INFO_DEVT2DEVINFO behavior. 2006 . as you did in your _init(9E) entry point. This dummy driver supports only one instance and does not use a state structure. "Inside dummy_getinfo"). default: return DDI_FAILURE. TABLE 12–1 Get Driver Information Entry Point Arguments cmd arg resultp DDI_INFO_DEVT2DEVINFO DDI_INFO_DEVT2INSTANCE Device number Device number Device information structure pointer Device instance number The following code is the dummy_getinfo() routine that you should enter into your dummy. 108 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. In the DDI_INFO_DEVT2DEVINFO code of this dummy_getinfo() routine. simply return the one device information structure pointer that the dummy_attach() routine saved. provide DDI_INFO_DEVT2INSTANCE behavior. case DDI_INFO_DEVT2INSTANCE: *resultp = 0. The instance number of that one instance is 0. } } First.

int flags.h and sunddi.lengthp)). Then call the ddi_prop_op(9F) function with exactly the same arguments as the dummy_prop_op() function. caddr_t valuep. int *lengthp) { cmn_err(CE_NOTE. } First.Writing the Template Driver Defining the Report Driver Property Information Entry Point The prop_op(9E) entry point is required for every driver. then your driver can use the ddi_prop_op(9F) function for the prop_op(9E) entry point. use cmn_err(9F) to write a message to the system log. char *name. The prop_op(9E) entry point and the ddi_prop_op(9F) function both take the same seven arguments. #include <sys/modctl. These arguments are not discussed here because this dummy driver does not create and manage its own properties. The prop_op(9E) and the ddi_prop_op(9F) functions require the types.h header file. If your driver does not need to customize the behavior of the prop_op(9E) entry point.h> /* used by _init.name. Including Autoconfiguration Header Files All of the autoconfiguration entry point routines and all of the user context entry point routines require that you include the ddi.prop_op.flags.h header files. You already included these two header files for the cmn_err(9F) function. as you did in your _init(9E) entry point.h> /* defines S_IFCHR used by ddi_create_minor_node */ Module 12 • Writing a Template Character Device Driver 109 . Drivers that create and manage their own properties need a custom prop_op(9E) routine. See the prop_op(9E) man page to learn about the prop_op(9E) arguments.h header file. _fini */ #include <sys/types. ddi_prop_op_t prop_op. _info. static int dummy_prop_op(dev_t dev.dip.c file for the four autoconfiguration routines you have written in this section and the three loadable module configuration routines you wrote in the previous section. return(ddi_prop_op(dev.valuep. The ddi_create_minor_node(9F) function requires the stat. The following code is the list of header files that you now should have included in your dummy. dev_info_t *dip.h> /* used by prop_op. The dummy_attach() routine calls the ddi_create_minor_node(9F) function. "Inside dummy_prop_op"). This dummy driver uses a prop_op(9E) routine to call cmn_err(9F) before calling the ddi_prop_op(9F) function. The prop_op(9E) entry point and the ddi_prop_op(9F) function both require that you include the types.h header file. The following code is the dummy_prop_op() routine that you should enter into your dummy.c file. ddi_prop_op */ #include <sys/stat.

110 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. The close(9E) routine must undo everything that the open(9E) routine did.Writing the Template Driver #include <sys/cmn_err. return DDI_SUCCESS. However. the following code is added: /* Use context entry points */ static int dummy_open(dev_t *devp. } static int dummy_read(dev_t dev. and ddi_prop_op */ Writing the User Context Entry Points User context entry points correspond closely to system calls. struct uio *uiop. } static int dummy_close(dev_t dev. then the open(9E) routine in the driver for that device is called. and write(9E) user context routines are optional. The read(9E) routine reads data from the device node. "Inside dummy_read"). The close(9E).h> /* #include <sys/ddi.h> /* /* #include <sys/sunddi. return DDI_SUCCESS. int otyp. cred_t *cred) { cmn_err(CE_NOTE. cred_t *credp) { cmn_err(CE_NOTE. */ ddi_get_instance. the open(9E) routine can be nulldev(9F). 2006 . ddi_prop_op */ used by all entry points for this driver */ also used by ddi_create_minor_node. I I In this section.h> /* /* /* used by all entry points for this driver */ used by all entry points for this driver */ also used by ddi_get_instance. int otyp. The close(9E) routine relinquishes access to the device. I I The open(9E) routine gains access to the device. read(9E). int flag. The write(9E) routine writes data to the device node. "Inside dummy_open"). When a system call opens a device file. cred_t *cred) { cmn_err(CE_NOTE. All character and block drivers must define the open(9E) user context entry point. "Inside dummy_close"). int flag.

"Inside dummy_open"). void *arg. The following code is the dummy_open() routine that you should enter into your dummy. struct uio *uiop. } Module 12 • Writing a Template Character Device Driver 111 . cred_t *cred).c file: static int dummy_attach(dev_info_t *dip. ddi_info_cmd_t cmd. static int dummy_detach(dev_info_t *dip. Defining the Open Device Entry Point The open(9E) routine returns type int. int otyp. "Inside dummy_write"). static int dummy_open(dev_t *devp. int flag. void **resultp).c file. static int dummy_read(dev_t dev. Use the same prefix for each of the user context entry points that you used for each of the autoconfiguration entry point routines. ddi_attach_cmd_t cmd). int otyp. static int dummy_open(dev_t *devp. caddr_t valuep. The open(9E) routine takes four arguments. return DDI_SUCCESS. int *lengthp). int otyp. cred_t *cred). static int dummy_getinfo(dev_info_t *dip. cred_t *credp) { cmn_err(CE_NOTE. cred_t *credp). struct uio *uiop. ddi_detach_cmd_t cmd). ddi_prop_op_t prop_op. static int dummy_prop_op(dev_t dev. int flag. The following declarations are the entry point declarations you should have in your dummy. } static int dummy_write(dev_t dev. cred_t *credp). dev_info_t *dip. int flags. struct uio *uiop. The open(9E) routine should return either DDI_SUCCESS or the appropriate error number. return DDI_SUCCESS. } Declaring the User Context Entry Points The user context entry point routines need to be uniquely named for this driver.Writing the Template Driver return DDI_SUCCESS. Write a message to the system log and return success. static int dummy_close(dev_t dev. int flag. This dummy driver is so simple that this dummy_open() routine does not use any of the open(9E) arguments. static int dummy_write(dev_t dev. cred_t *cred) { cmn_err(CE_NOTE. char *name.

} Defining the Write Device Entry Point The write(9E) routine returns type int. The following code is the dummy_read() routine that you should enter into your dummy. struct uio *uiop. int flag.c file. } Defining the Read Device Entry Point The read(9E) routine returns type int. This dummy driver is so simple that this dummy_close() routine does not use any of the close(9E) arguments. In this dummy driver.Writing the Template Driver Defining the Close Device Entry Point The close(9E) routine returns type int. The following code is the dummy_close() routine that you should enter into your dummy. This dummy driver is so simple that this dummy_read() routine does not use any of the read(9E) arguments. "Inside dummy_close"). int otyp. static int dummy_close(dev_t dev. The close(9E) routine must deallocate anything that was allocated. cred_t *credp) { cmn_err(CE_NOTE. The write(9E) routine takes three arguments. close anything that was opened. return DDI_SUCCESS. The close(9E) routine must undo everything that the open(9E) routine did. The close(9E) routine should return either DDI_SUCCESS or the appropriate error number. The read(9E) routine takes three arguments. "Inside dummy_read"). This dummy driver is so simple that this dummy_write() routine does not use any of the write(9E) arguments. 2006 .c file. and destroy anything that was created in the open(9E) routine. 112 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. static int dummy_read(dev_t dev. return DDI_SUCCESS. The read(9E) routine should return either DDI_SUCCESS or the appropriate error number. Write a message to the system log and return success. the open(9E) routine is so simple that nothing needs to be reclaimed or undone in the close(9E) routine. The write(9E) routine should return either DDI_SUCCESS or the appropriate error number. The close(9E) routine takes four arguments. Write a message to the system log and return success. cred_t *cred) { cmn_err(CE_NOTE.

close */ used by open. and the sunddi. _info. modldrv.h header file. close.c file. return DDI_SUCCESS. */ and ddi_prop_op */ used by open.h. You need to include the file.h> /* #include <sys/ddi.h> /* /* /* used by modlinkage. read. close. } Including User Context Header Files The four user context entry point routines require your module to include several header files. write */ used by open. close.h header file. prop_op. cred.h> /* #include <sys/open. you must define the cb_ops(9S) structure first. read */ used by read */ defines S_IFCHR used by ddi_create_minor_node */ used by all entry points for this driver */ used by all entry points for this driver */ also used by ddi_get_instance and */ ddi_prop_op */ used by all entry points for this driver */ also used by ddi_create_minor_node. read. and ddi_prop_op */ Writing the Driver Data Structures All of the data structures described in this section are required for every device driver. open.h. The modldrv(9S) linkage structure for loadable Module 12 • Writing a Template Character Device Driver 113 . */ ddi_get_instance. Because the dev_ops(9S) structure includes a pointer to the cb_ops(9S) character and block operations structure. write */ used by open.h> /* /* /* #include <sys/sunddi. All drivers must define a dev_ops(9S) device operations structure. "Inside dummy_write").h> /* #include <sys/stat. Write a message to the system log and return success.c file for all the entry points you have written in this section and the previous two sections: #include <sys/modctl.h. write.h> /* #include <sys/cmn_err. You already have included the types. The following code is the list of header files that you now should have included in your dummy. close. read.h> /* #include <sys/cred.h header files. cred_t *credp) { cmn_err(CE_NOTE.h> /* /* #include <sys/types.h> /* #include <sys/uio.h> /* /* #include <sys/file. */ and _fini */ used by open.h header file.h> /* #include <sys/errno. struct uio *uiop. errno.Writing the Template Driver The following code is the dummy_write() routine that you should enter into your dummy. static int dummy_write(dev_t dev. _init. and uio.h. the ddi.

dummy_write. Some optional entry points and other related data also are initialized in these data structures. The loadable module configuration entry points are not initialized in driver data structures. In this section. /* no segmap */ nochpoll. /* compatibility flags: see conf. 2006 . /* no identify . Except for the loadable module configuration entry points. all above */ /* fields are ignored */ D_NEW | D_MP. /* cb_ops revision number */ nodev.h */ CB_REV. /* returns ENXIO for non-pollable devices */ dummy_prop_op. /* no devmap */ nodev. nodev. /* no dump */ dummy_read. 0.Writing the Template Driver drivers includes a pointer to the dev_ops(9S) structure. _info(9E). The _init(9E). Initializing the entry points in these data structures enables the driver to be dynamically loaded. if not NULL. The modlinkage(9S) module linkage structure includes a pointer to the modldrv(9S) structure. /* streamtab struct. /* no print */ nodev. and _fini(9E) entry points are required for all kernel modules and are not specific to device driver modules. the following code is added: /* cb_ops structure */ static struct cb_ops dummy_cb_ops = { dummy_open. /* no ioctl */ nodev. 114 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. /* reference count */ dummy_getinfo. nulldev.nodev returns ENXIO */ nodev. NULL. /* dev_ops structure */ static struct dev_ops dummy_dev_ops = { DEVO_REV. dummy_close. all of the required entry points for a driver are initialized in the character and block operations structure or in the device operations structure. /* no aread */ nodev /* no awrite */ }. /* no probe */ dummy_attach.nulldev returns 0 */ nulldev. nodev. /* no strategy . /* no mmap */ nodev.

*/ "dummy driver". This dummy driver does not use all of the elements in the cb_ops(9S) structure. (struct bus_ops *)NULL. /* no strategy .c file: static struct cb_ops dummy_cb_ops = { dummy_open. Prepend the static type modifier to the declaration. /* no print */ nodev.nodev returns ENXIO */ nodev. /* modlinkage structure */ static struct modlinkage ml = { MODREV_1. NULL }. &md. The following code is the cb_ops(9S) structure that you should enter into your dummy. /* modldrv structure */ static struct modldrv md = { &mod_driverops. /* no reset . This is a driver. nodev. /* keep track of one instance */ Defining the Character and Block Operations Structure The cb_ops(9S) structure initializes standard character and block interfaces. /* Type of module. nodev. See the cb_ops(9S) man page to learn what each element is and what the value of each element should be. /* Name of the module.Writing the Template Driver dummy_detach. /* no dump */ dummy_read. */ &dummy_dev_ops }. See the description that follows the code sample. use the same dummy_ prefix that you used for the names of the autoconfiguration routines and the names of the user context routines. When you name this structure. Module 12 • Writing a Template Character Device Driver 115 . /* dev_info structure */ dev_info_t *dummy_dip. dummy_write. dummy_close. nodev /* no power */ }.nodev returns ENXIO */ &dummy_cb_ops.

NULL. The strategy(9E). Enter the names of the read(9E) and write(9E) entry points for this driver as the values of the sixth and seventh elements of this structure. nodev. See the conf. 116 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. nodev. Specify NULL for the streamtab(9S) STREAMS entity declaration structure because this driver is not a STREAMS driver.Writing the Template Driver nodev. nochpoll. and dump(9E) routines are for block drivers only. This driver does not define devmap(9E). mmap(9E). print(9E). nodev }. and must specify this D_MP flag. The D_64BIT flag means this driver supports 64-bit offsets and block numbers. nodev. The nodev(9F) function returns the ENXIO error code. Enter the name of the prop_op(9E) entry point for this driver as the value of the thirteenth element in this structure. if not NULL. The compatibility flags are defined in the conf. D_NEW | D_MP. or segmap(9E) entry points because this driver does not support memory mapping. /* /* /* /* /* /* /* /* /* /* /* no ioctl */ no devmap */ no mmap */ no segmap */ returns ENXIO for non-pollable devices */ streamtab struct. all above */ fields are ignored */ compatibility flags: see conf.h */ cb_ops revision number */ no aread */ no awrite */ Enter the names of the open(9E) and close(9E) entry points for this driver as the values of the first two elements of this structure. This driver does not define an ioctl(9E) entry point because this driver does not use I/O control commands.h header file for more compatibility flags. The D_MP flag means this driver safely allows multiple threads of execution. Specify the nochpoll(9F) function for the chpoll(9E) element of the cb_ops(9S) structure because this driver is not for a pollable device. This dummy driver does not define these three routines because this driver is a character driver. Initialize all of these unused function elements to nodev(9F). CB_REV. This driver does not does not define aread(9E) or awrite(9E) entry points because this driver does not perform any asynchronous reads or writes.h header file. nodev. CB_REV is defined in the devops. dummy_prop_op. All drivers must be multithreaded-safe. The CB_REV element of the cb_ops(9S) structure is the cb_ops(9S) revision number. 2006 . The D_NEW flag means this driver is a new-style driver.h header file.

The driver reference count is the number of instances of this driver that are currently open.Writing the Template Driver Defining the Device Operations Structure The dev_ops(9S) structure initializes interfaces that are used for operations such as attaching and detaching the driver. /* reference count */ dummy_getinfo. Initialize this structure element to nulldev. attach(9E). The next six elements of the dev_ops(9S) structure are the names of the getinfo(9E). The reset() function is obsolete. See the description that follows the code sample.c file: static struct dev_ops dummy_dev_ops = { DEVO_REV. nodev /* no power */ }. The following code is the dev_ops(9S) structure that you should enter into your dummy. use the same dummy_ prefix that you used for the names of the autoconfiguration routines and the names of the user context routines. Initialize this structure element to nulldev(9F). See the dev_ops(9S) man page to learn what each element is and what the value of each element should be. The identify(9E) function is obsolete. /* no identify .h header file. The nulldev(9F) function returns success. The next element of the dev_ops(9S) structure is a pointer to the bus operations structure. identify(9E). The probe(9E) function determines whether the corresponding device exists and is valid. (struct bus_ops *)NULL. This dummy driver does not define a probe(9E) function. /* no reset .nodev returns ENXIO */ &dummy_cb_ops. Only nexus drivers have bus operations structures. This dummy driver is not a nexus driver. This dummy driver does not use all of the elements in the dev_ops(9S) structure. Prepend the static type modifier to the declaration. dummy_detach. The DEVO_REV element of the dev_ops(9S) structure is the driver build version. /* no probe */ dummy_attach. Initialize this value to zero. probe(9E). Module 12 • Writing a Template Character Device Driver 117 .nulldev returns 0 */ nulldev. Set this value to NULL because this driver is a leaf driver. 0. Enter &dummy_cb_ops for the value of the pointer to the cb_ops(9S) structure. nodev. The second element in this structure is the driver reference count. and reset() functions for this particular driver. Initialize the reset() function to nodev(9F). When you name this structure. DEVO_REV is defined in the devops. The next element of the dev_ops(9S) structure is a pointer to the cb_ops(9S) structure for this driver. The driver cannot be unloaded if any instances of the driver are still open. detach(9E). nulldev.

Set this value to the address of the mod_driverops structure. Usually this string contains the name of this module and the version number of this module. Enter the value NULL to terminate this list of linkage structures. The next element of the modlinkage(9S) structure is the address of a null-terminated array of pointers to linkage structures.c file. */ &dummy_dev_ops }. See the man pages for each structure to learn what each element is and what the value of each element should be.Writing the Template Driver The last element of the dev_ops(9S) structure is the name of the power(9E) routine for this driver. &md. Set this value to MODREV_1. The modlinkage(9S) module linkage structure is used by the _init(9E). Enter the address of the md structure for the value of this element of the modlinkage(9S) structure. The modldrv(9S) linkage structure for loadable drivers exports driver-specific information to the kernel. _info(9E). and retrieve information from a module. /* Name of the module. The mod_driverops structure is declared in the modctl. The last element of the modldrv(9S) structure is a pointer to the dev_ops(9S) structure for this driver. Set the value of this structure element to nodev.c. This is a driver. The second element in the modldrv(9S) structure is a string that describes this module. 118 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. The mod_driverops structure tells the kernel that the dummy. so do not declare the mod_driverops structure in dummy. Defining the Module Linkage Structures Two other module loading structures are required for every driver. static struct modlinkage ml = { MODREV_1. 2006 .c module is a loadable driver module. The power(9E) routine operates on a hardware device. remove. NULL }.c source file. The first element in the modldrv(9S) structure is a pointer to a structure that tells the kernel what kind of module this is. /* Type of module. */ "dummy driver". This driver does not drive a hardware device. The mod_driverops structure is defined in the modctl. The first element in the modlinkage(9S) structure is the revision number of the loadable modules system. The following code defines the modldrv(9S) and modlinkage(9S) structures for the driver shown in this module: static struct modldrv md = { &mod_driverops. and _fini(9E) routines to install. You already included the modctl.h header file. Driver modules have only one linkage structure.h header file in your dummy.

write */ #include <sys/open.c file: #include <sys/devops. ddi_create_minor_node. close.h> /* used by open. close. write */ #include <sys/cred. */ /* and _fini */ #include <sys/types.h> /* used by modlinkage.h header file for the loadable module configuration entry points. read. */ /* ddi_get_instance.h> /* defines S_IFCHR used by ddi_create_minor_node */ #include <sys/cmn_err.h> /* used by dev_ops */ #include <sys/conf.h> /* used by open.h header files.Writing the Template Driver Including Data Structures Header Files The cb_ops(9S) and dev_ops(9S) structures require you to include the conf.h> /* used by all entry points for this driver */ /* also used by cb_ops. prop_op. and ddi_prop_op */ Module 12 • Writing a Template Character Device Driver 119 .h> /* used by all entry points for this driver */ /* also used by cb_ops. _init. modldrv. write. read.h> /* used by open. close. The following code is the complete list of header files that you now should have included in your dummy. close */ #include <sys/errno.h header file. read. _info. You already included the modctl. The modlinkage(9S) and modldrv(9S) structures require you to include the modctl.h> /* used by open. ddi_get_instance. */ /* and ddi_prop_op */ #include <sys/file.h and devops. close.h> /* used by read */ #include <sys/stat.h> /* used by open.h> /* used by all entry points for this driver */ #include <sys/ddi. and */ /* ddi_prop_op */ #include <sys/sunddi.h> /* used by dev_ops and cb_ops */ #include <sys/modctl. read */ #include <sys/uio.

Writing the Device Configuration File

Writing the Device Configuration File
This driver requires a configuration file. The minimum information that a configuration file must contain is the name of the device node and the name or type of the device’s parent. In this simple example, the node name of the device is the same as the file name of the driver. Create a file named dummy.conf in your working directory. Put the following single line of information into dummy.conf:
name="dummy" parent="pseudo";

120

Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March, 2006

Building and Installing the Template Driver

Building and Installing the Template Driver
This section shows you how to build and install the driver for a 32-bit platform. Compile and link the driver. Use the -D_KERNEL option to indicate that this code defines a kernel module. The following example shows compiling and linking for a 32-bit architecture using the Sun Studio C compiler:
% cc -D_KERNEL -c dummy.c % ld -r -o dummy dummy.o

Make sure you are user root when you install the driver. Install drivers in the /tmp directory until you are finished modifying and testing the _info(), _init(), and attach() routines. Copy the driver binary to the /tmp directory. Link to the driver from the kernel driver directory.
# cp dummy /tmp

Link to the following directory for a 32-bit architecture:
# ln -s /tmp/dummy /usr/kernel/drv/dummy

Copy the configuration file to the kernel driver area of the system.
# cp dummy.conf /usr/kernel/drv

Module 12 • Writing a Template Character Device Driver

121

Testing the Template Driver

Testing the Template Driver
This dummy driver merely writes a message to a system log each time an entry point routine is entered. To test this driver, watch for these messages to confirm that each entry point routine is successfully entered. The cmn_err(9F) function writes low priority messages such as the messages defined in this dummy driver to /dev/log. The syslogd(1M) daemon reads messages from /dev/log and writes low priority messages to /var/adm/messages. In a separate window, enter the following command and monitor the output as you perform the tests described in the remainder of this section:
% tail -f /var/adm/messages

Adding the Template Driver
Make sure you are user root when you add the driver. Use the add_drv(1M) command to add the driver:
# add_drv dummy

You should see the following messages in the window where you are viewing /var/adm/messages:
date time machine dummy: [ID 513080 kern.notice] NOTICE: Inside _info date time machine dummy: [ID 874762 kern.notice] NOTICE: Inside _init date time machine dummy: [ID 678704 kern.notice] NOTICE: Inside dummy_attach

The _info(9E), _init(9E), and attach(9E) entry points are called in that order when you add a driver. The dummy driver has been added to the /devices directory:
% ls -l /devices/pseudo | grep dummy drwxr-xr-x 2 root sys 512 date time dummy@0 crw------- 1 root sys 92, 0 date time dummy@0:0

The dummy driver also is the most recent module listed by modinfo(1M):
% modinfo Id Loadaddr 180 ed192b70 Size Info Rev Module Name 544 92 1 dummy (dummy driver)

The module name, dummy driver, is the value you entered for the second member of the modldrv(9S) structure. The value 92 is the major number of this module.
122 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March, 2006

Testing the Template Driver

% grep dummy /etc/name_to_major dummy 92

The Loadaddr address of ed192b70 is the address of the first instruction in the dummy driver. This address might be useful, for example, in debugging.
% mdb -k > dummy‘_init $m BASE LIMIT ed192b70 ed192ff0 > $q

SIZE NAME 480 dummy

The dummy driver also is the most recent module listed by prtconf(1M) in the pseudo device section:
% prtconf -P pseudo, instance #0 dummy, instance #0 (driver not attached)

A driver is automatically loaded when a device that the driver manages is accessed. A driver might be automatically unloaded when the driver is not in use. If your driver is in the /devices directory but modinfo(1M) does not list your driver, you can use either of the following methods to load your driver:
I I

Use the modload(1M) command. Access the device. The driver is loaded automatically when a device that the driver manages is accessed. The following section describes how to access the dummy device.

Reading and Writing the Device
Make sure you are user root when you perform the tests described in this section. If you are not user root, you will receive “Permission denied” error messages when you try to access the /devices/pseudo/dummy@0:0 special file. Test reading from the device. Your dummy device probably is named /devices/pseudo/dummy@0:0. The following command reads from your dummy device even if it has a slightly different name:
# cat /devices/pseudo/dummy*

You should see the following messages in the window where you are viewing /var/adm/messages:
date time machine dummy: [ID 136952 kern.notice] NOTICE: Inside dummy_open date time machine dummy: [ID 623947 kern.notice] NOTICE: Inside dummy_getinfo date time machine dummy: [ID 891851 kern.notice] NOTICE: Inside dummy_prop_op

Module 12 • Writing a Template Character Device Driver

123

Using the echo(1) command causes the kernel to access the write(9E) entry point of the driver.notice] kern.notice] NOTICE: Inside dummy_detach date time machine dummy: [ID 812373 kern. Using the cat(1) command causes the kernel to access the read(9E) entry point of the driver. 2006 .Testing the Template Driver date date date date date time time time time time machine machine machine machine machine dummy: dummy: dummy: dummy: dummy: [ID [ID [ID [ID [ID 623947 891851 623947 709590 550206 kern.notice] kern.notice] kern.notice] kern.notice] NOTICE: Inside _info date time machine dummy: [ID 617648 kern. this output from the write test is almost identical to the output you saw from the read test.notice] NOTICE: NOTICE: NOTICE: NOTICE: NOTICE: NOTICE: NOTICE: NOTICE: Inside Inside Inside Inside Inside Inside Inside Inside dummy_open dummy_getinfo dummy_prop_op dummy_getinfo dummy_prop_op dummy_getinfo dummy_write dummy_close As you can see. Removing the Template Driver Make sure you are user root when you unload the driver.notice] kern.notice] NOTICE: NOTICE: NOTICE: NOTICE: NOTICE: Inside Inside Inside Inside Inside dummy_getinfo dummy_prop_op dummy_getinfo dummy_read dummy_close Test writing to the device: # echo hello > ‘ls /devices/pseudo/dummy*‘ You should see the following messages in the window where you are viewing /var/adm/messages: date date date date date date date date time time time time time time time time machine machine machine machine machine machine machine machine dummy: dummy: dummy: dummy: dummy: dummy: dummy: dummy: [ID [ID [ID [ID [ID [ID [ID [ID 136952 623947 891851 623947 891851 623947 672780 550206 kern.notice] kern. Use the rem_drv(1M) command to unload the driver and remove the device from the /devices directory: # rem_drv dummy You should see the following messages in the window where you are viewing /var/adm/messages: date time machine dummy: [ID 513080 kern.notice] kern. The text argument that you give to echo(1) is ignored because this driver does not do anything with that data. The only difference is in the seventh line of the output.notice] kern.notice] kern.notice] NOTICE: Inside _fini The dummy device is no longer in the /devices directory: # ls /devices/pseudo/dummy* /devices/pseudo/dummy*: No such file or directory 124 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.notice] kern.notice] kern.

Dummy Driver Source The following code is the complete source for the dummy driver described in this module: /* * Minimalist pseudo-device. _init.h> /* used by open.h> /* used by dev_ops and cb_ops */ #include <sys/modctl. * * Build the driver: * cc -D_KERNEL -c dummy.h> /* used by open. * Writes a message whenever a routine is entered. You can use the modunload(1M) command to unload the driver but not remove the device from /devices. Press Control-C to stop tailing the /var/adm/messages messages. close.conf /usr/kernel/drv * cp dummy /tmp * ln -s /tmp/dummy /usr/kernel/drv/dummy * Add the driver: * add_drv dummy * Test (1) read from driver (2) write to driver: * cat /devices/pseudo/dummy@* * echo hello > ‘ls /devices/pseudo/dummy@*‘ * Verify the tests in another window: * tail -f /var/adm/messages * Remove the driver: * rem_drv dummy */ #include <sys/devops. write */ Module 12 • Writing a Template Character Device Driver 125 . */ /* and _fini */ #include <sys/types.o * Copy the driver and the configuration file to /usr/kernel/drv: * cp dummy. _info. write */ #include <sys/open. close.h> /* used by open.h> /* used by dev_ops */ #include <sys/conf.h> /* used by modlinkage. prop_op. read. modldrv.Testing the Template Driver The next time you want to read from or write to the dummy device. read.h> /* used by open. the driver is automatically loaded. */ /* and ddi_prop_op */ #include <sys/file. write. close */ #include <sys/errno.c * ld -r -o dummy dummy. you must load the driver again using add_drv(1M). Then the next time you read from or write to the dummy device. read. close.

and ddi_prop_op */ static int dummy_attach(dev_info_t *dip. static int dummy_close(dev_t dev. close. /* no mmap */ nodev.h> <sys/cmn_err. /* no devmap */ nodev. NULL. int flag.h> <sys/ddi. /* no dump */ dummy_read. static int dummy_detach(dev_info_t *dip. static int dummy_getinfo(dev_info_t *dip. struct uio *uiop. int *lengthp). cred_t *credp). static int dummy_write(dev_t dev.h> <sys/uio. cred_t *cred). cred_t *credp). void **resultp). */ ddi_get_instance. cred_t *cred). /* cb_ops structure */ static struct cb_ops dummy_cb_ops = { dummy_open. char *name.h */ CB_REV. ddi_info_cmd_t cmd. read */ used by read */ defines S_IFCHR used by ddi_create_minor_node */ used by all entry points for this driver */ used by all entry points for this driver */ also used by cb_ops. ddi_prop_op_t prop_op. /* compatibility flags: see conf. /* cb_ops revision number */ nodev. nodev.nodev returns ENXIO */ nodev.h> used by open. all above */ /* fields are ignored */ D_NEW | D_MP. static int dummy_prop_op(dev_t dev. static int dummy_read(dev_t dev. /* no strategy . ddi_attach_cmd_t cmd). and */ ddi_prop_op */ used by all entry points for this driver */ also used by cb_ops. /* returns ENXIO for non-pollable devices */ dummy_prop_op.Testing the Template Driver #include #include #include #include #include /* /* /* /* /* /* /* #include <sys/sunddi. int flags. dev_info_t *dip. dummy_close.h> <sys/stat. ddi_create_minor_node. /* no segmap */ nochpoll. int otyp. ddi_detach_cmd_t cmd). /* no aread */ nodev /* no awrite */ }. struct uio *uiop. 2006 . int flag.h> /* /* /* <sys/cred. /* streamtab struct. int otyp. dummy_write. nodev. ddi_get_instance. void *arg. 126 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. if not NULL. /* no ioctl */ nodev. caddr_t valuep. /* no print */ nodev. static int dummy_open(dev_t *devp.

nodev /* no power */ }. "Inside _init"). /* Type of module. /* reference count */ dummy_getinfo. /* keep track of one instance */ /* Loadable module configuration entry points */ int _init(void) { cmn_err(CE_NOTE.nodev returns ENXIO */ &dummy_cb_ops.nulldev returns 0 */ nulldev. /* modldrv structure */ static struct modldrv md = { &mod_driverops. return(mod_install(&ml)). (struct bus_ops *)NULL. /* modlinkage structure */ static struct modlinkage ml = { MODREV_1. } int _info(struct modinfo *modinfop) { cmn_err(CE_NOTE. This is a driver. nodev. "Inside _info"). NULL }. */ "dummy driver". /* no identify . */ &dummy_dev_ops }. /* no reset . /* no probe */ dummy_attach. /* dev_info structure */ dev_info_t *dummy_dip. nulldev. /* Name of the module.Testing the Template Driver /* dev_ops structure */ static struct dev_ops dummy_dev_ops = { DEVO_REV. &md. 0. Module 12 • Writing a Template Character Device Driver 127 . dummy_detach.

} } 128 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. "%s%d: attach: could not add character node. } else return DDI_SUCCESS. } int _fini(void) { cmn_err(CE_NOTE. ddi_detach_cmd_t cmd) { cmn_err(CE_NOTE. ddi_attach_cmd_t cmd) { cmn_err(CE_NOTE. return(DDI_FAILURE).Testing the Template Driver return(mod_info(&ml. switch(cmd) { case DDI_ATTACH: dummy_dip = dip.". ddi_remove_minor_node(dip. return DDI_SUCCESS. } } static int dummy_detach(dev_info_t *dip. modinfop)). 0). return(mod_remove(&ml)). 2006 .0) != DDI_SUCCESS) { cmn_err(CE_NOTE. "Inside dummy_detach"). "dummy". default: return DDI_FAILURE. S_IFCHR. "0". switch(cmd) { case DDI_DETACH: dummy_dip = 0. NULL). } /* Device configuration entry points */ static int dummy_attach(dev_info_t *dip. "Inside dummy_attach"). default: return DDI_FAILURE. DDI_PSEUDO. "Inside _fini"). if (ddi_create_minor_node(dip. ddi_get_instance(dip).

int *lengthp) { cmn_err(CE_NOTE.prop_op. int flag. "Inside dummy_open"). return DDI_SUCCESS. return DDI_SUCCESS. } static int dummy_open(dev_t *devp. int otyp. } static int dummy_read(dev_t dev. return(ddi_prop_op(dev. cred_t *cred) { cmn_err(CE_NOTE. ddi_prop_op_t prop_op.valuep. cred_t *cred) { cmn_err(CE_NOTE. char *name. } } /* Main entry points */ static int dummy_prop_op(dev_t dev. cred_t *credp) { cmn_err(CE_NOTE. int flags. default: return DDI_FAILURE.flags. ddi_info_cmd_t cmd.name. case DDI_INFO_DEVT2INSTANCE: *resultp = 0.lengthp)). switch(cmd) { case DDI_INFO_DEVT2DEVINFO: *resultp = dummy_dip. "Inside dummy_prop_op").Testing the Template Driver static int dummy_getinfo(dev_info_t *dip. "Inside dummy_read"). struct uio *uiop. } static int dummy_close(dev_t dev. void **resultp) { cmn_err(CE_NOTE. return DDI_SUCCESS. int otyp. Module 12 • Writing a Template Character Device Driver 129 . "Inside dummy_getinfo"). void *arg. "Inside dummy_close").dip. return DDI_SUCCESS. int flag. dev_info_t *dip. caddr_t valuep.

cred_t *credp) { cmn_err(CE_NOTE. } 130 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. struct uio *uiop. return DDI_SUCCESS. 2006 .Testing the Template Driver return DDI_SUCCESS. "Inside dummy_write"). } static int dummy_write(dev_t dev.

13 M O D U L E 1 3 Debugging Drivers With DTrace Objectives The objective of this module is to learn about how you can use DTrace to debug your driver development projects by reviewing a case study. 131 .

historical approaches to kernel development and debugging are quite time-consuming. First copy the prototype driver to /usr/kernel/fs and attempt to modload it by hand: # modload /usr/kernel/fs/smbfs can’t load module: Out of memory or no room in system tables And the /var/adm/messages file contains: genunix: [ID 104096 kern. } fbt::mod_getsysnum:return { 132 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. After the driver compiles successfully. Developers with a talent for assembly language can use adb and create custom modules in C for mdb to diagnose software errors. This cumbersome process requires guesswork. reveals it is in the function mod_getsysent() in the file modconf.c. test that the driver can be loaded and unloaded successfully. However. on a failed call to mod_getsysnum. The magnitude of the benefit provided by DTrace can best be provided through a few simple examples. re-compilation. here’s a simple DTrace script to enable all entry and return events in the fbt (Function Boundary Tracing) provider once mod_getsynum() is entered. DTrace provides a diagnostic short-cut. #!/usr/sbin/dtrace -s #pragma D option flowindent fbt::mod_getsysnum:entry /execname == "modload"/ { self->follow = 1. and system reboots to uncover software coding errors. First.Porting the smbfs Driver from Linux to the Solaris OS Porting the smbfs Driver from Linux to the Solaris OS This case study focuses on leveraging the DTrace capability for device driver development. Instead of manually searching the flow of mod_getsysnum() from source file to source file. DTrace can be used to capture information on only the events that you as a developer wish to view. Instead of sifting through the /var/adm/messages file or pages of truss output. Historically.warning] WARNING: system call missing from bind file Searching for the system call missing message. debugging a device driver required that a developer use function calls like cmn_err() to log diagnostic information to the /var/adm/messages file. 2006 . create an smbfs driver template based on Sun’s nfs driver.

trace(arg1). or nm_hash() returning ’41’ is the culprit.nm_hash 0 -> strcmp 0 <./mod_getsysnum.d script: Module 13 • Debugging Drivers With DTrace 133 .find_mbind 0 <. To view the contents of the search string we add a strcmp() trace to our previous mod_getsysnum.d dtrace: script ’. Viewing the source to find_mbind() in /usr/src/uts/common/os/modsubr./mod_getsysnum.Porting the smbfs Driver from Linux to the Solaris OS self->follow = 0. Let’s use DTrace to display the contents of the search string and hash table.mod_getsysnum 41 4294967295 7 0 4294967295 Thus either find_mbind() returning ’0’. A quick look at find_mbind() reveals that a return value of 0 indicates an error state. } fbt:::entry /self->follow/ { } fbt:::return /self->follow/ { trace(arg1). reveals that we’re searching for a char string in a hash table.strcmp 0 -> strcmp 0 <. } Note – trace(arg1) displays the function’s return value. Executing this script and running the modload command in another window produces the following output: # .d’ matched 35750 probes CPU FUNCTION 0 -> mod_getsysnum 0 -> find_mbind 0 -> nm_hash 0 <.c.strcmp 0 <.

It looks like we forgot to include a configuration entry for my driver.Porting the smbfs Driver from Linux to the Solaris OS fbt::strcmp:entry { printf("name:%s.strcmp 4294967295 0 -> strcmp 0 | strcmp:entry name:smbfs. ’smbfs 177’ (read_binding_file() is read once at boot time. and a function pointer. stringof(arg0)./mod_getsysnum. 2006 .d’ matched 35751 probes CPU FUNCTION 0 -> mod_getsysnum 0 -> find_mbind 0 -> nm_hash 0 <.find_mbind 0 0 <. A few more clicks on our source code browser reveal the contents of the config file to be defined as /etc/name_to_sysnum in the file /usr/src/uts/common/os/modctl. which takes as its arguments a config file.c. hash:%s". the hash table.) After rebooting the driver can be loaded successfully. # modload /usr/kernel/fs/smbfs Verify that the driver is loaded with the modinfo command: 134 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. hash:lwp_sema_post 0 <.mod_getsysnum 4294967295 So we’re looking for smbfs in a hash table.strcmp 7 0 <. and it’s not present.nm_hash 41 0 -> strcmp 0 | strcmp:entry name:smbfs. A quick search of the source code reveals that sb_hashtab is initialized with a call to read_binding_file(). } Here are the results of our next attempt to load our driver: # .d dtrace: script ’./mod_getsysnum. stringof(arg1)). Add the following to the /etc/name_to_sysnum file and reboot. How does smbfs get into this hash table? Let’s return to find_mbind() and observe that the hash table variable sb_hashtab is passed to the failing nm_hash() function. hash:timer_getoverrun 0 <.

we now have access to 1002 entry and return events contained in the driver. we have access to all of the smbfs functions: # dtrace -l fbt:smbfs:: | wc -l 1002 This is amazing! Without any special coding. and comm) (network filesystem) (network filesystem version 2) (network filesystem version 3) Note – Remember that this driver was based on an nfs template. So. which explains this output. client. But now. } It seems that the smbfs code is not being accessed by modunload. let’s use DTrace to look at modunload with this script: #!/usr/sbin/dtrace -s #pragma D option flowindent fbt::modunload:entry Module 13 • Debugging Drivers With DTrace 135 . since the smbfs driver is a loaded module.Porting the smbfs Driver from Linux to the Solaris OS # modinfo | grep 160 feb21a58 160 feb21a58 160 feb21a58 160 feb21a58 smbfs 351ac 351ac 351ac 351ac 177 24 25 26 1 1 1 1 smbfs smbfs smbfs smbfs (SMBFS syscall. using this simple DTrace script: #!/usr/sbin/dtrace -s #pragma D option flowindent fbt:smbfs::entry { } fbt:smbfs::return { trace(arg1). Let’s make sure we can also unload the module: # modunload -i 160 can’t unload the module: Device busy This is most likely due to an EBUSY errno return value. These 1002 function handles allow us to debug my work without a special ’instrumented code’ version of the driver! Let’s monitor all smbfs calls when modunload is called.

trace(arg1).mod_release 3602566648 0 <. } Here’s the output of this script: # .mod_release_mod 3602566648 0 <. trace(arg0).Porting the smbfs Driver from Linux to the Solaris OS { self->follow = 1. trace(execname).mod_hold_by_id 3602566648 0 -> moduninstall 0 <.mod_circdep 0 0 -> mod_hold_by_modctl 0 <.d’ matched 36695 probes CPU FUNCTION 0 -> modunload modunload 160 0 | modunload:entry 0 -> mod_hold_by_id 0 -> mod_circdep 0 <./modunload. } fbt::modunload:return { self->follow = 0./modunload.d dtrace: script ’. } fbt:::entry /self->follow/ { } fbt:::return /self->follow/ { trace(arg1).modunload 16 136 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March.mod_hold_by_modctl 0 0 <.moduninstall 16 0 -> mod_release_mod 0 -> mod_release 0 <. 2006 .

3. "_fini") == NULL ) 4. ((struct modctl *)arg0)->mod_prim). printf("mod_prim:%d\n". ((struct modctl *)arg0)->mod_nenabled). } fbt::moduninstall:return { self->follow = 0. A failed call to smbfs _fini() routine We can’t directly access all of these possibilities. but let’s approach them from a process of elimination. } fbt::kobj_lookup:entry /self->follow/ { } fbt::kobj_lookup:return /self->follow/ { trace(arg1). 2. ((struct modctl *)arg0)->mod_loadflags). We’ll use the following script to display the contents of the various structures and return values in moduninstall: #!/usr/sbin/dtrace -s #pragma D option flowindent fbt::moduninstall:entry { self->follow = 1. printf("mod_ref:%d\n". if (mp->mod_prim || mp->mod_ref || mp->mod_nenabled != 0) return (EBUSY). printf("mod_loadflags:%d\n". so let’s look at the following possibilities: 1. Module 13 • Debugging Drivers With DTrace 137 . printf("mod_nenabled:%d\n".Porting the smbfs Driver from Linux to the Solaris OS Observe that the EBUSY return value ’16’ is coming from moduninstall. ((struct modctl *)arg0)->mod_ref). if ( detach_driver(mp->mod_modname) != 0 ) return (EBUSY). if ( kobj_lookup(mp->mod_mp. moduninstall returns EBUSY in a few locations. trace(arg1). Let’s take a look at the source code for moduninstall.

2006 .d dtrace: script ’.kobj_lookup 0 <.moduninstall 0 4273103456 16 Comparing this output to the code tells us that the failure is not due to the mp structure values or the return values from detach_driver() of kobj_lookup(). by a process of elimination.Porting the smbfs Driver from Linux to the Solaris OS } fbt::detach_driver:entry /self->follow/ { } fbt::detach_driver:return /self->follow/ { trace(arg1). it must be the status returned via the status = (*func)(). We’ve used the Function Boundary Tracing provider exclusively in these examples./moduninstall. call. } This script produces the following output: # . which calls the smbfs _fini() routine. Thus. 138 Introduction to Operating Systems: A Hands-On Approach Using the OpenSolaris Project • March. Note that fbt is only one of DTrace’s many providers.d’ matched 6 probes CPU FUNCTION 0 -> moduninstall mod_prim:0 mod_ref:0 mod_nenabled:0 mod_loadflags:1 0 -> detach_driver 0 <. } Changing the return value to ’0’ and recompiling the code results in a driver that we can now load and unload. And here’s what the smbfs _fini() routine contains: int _fini(void) { /* don’t allow module to be unloaded */ return (EBUSY).detach_driver 0 -> kobj_lookup 0 <. thus we have completed the objectives of this exercise./moduninstall.

Sign up to vote on this title
UsefulNot useful